BigDocs: A Permissively-Licensed Dataset for Training Vision-Language Models on Document and Code Tasks

Published in ICLR 2025, 2025

BigDocs is a permissively-licensed dataset for training vision-language models on document and code tasks.