AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding

Published in NeurIPS 2025, 2025

AlignVLM: Bridging vision and language latent spaces for multimodal understanding.