AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document UnderstandingPublished in NeurIPS 2025, 2025AlignVLM: Bridging vision and language latent spaces for multimodal understanding.Share on Twitter Facebook LinkedIn Previous Next