BigDocs: A Permissively-Licensed Dataset for Training Vision-Language Models on Document and Code Tasks

Published in RBFM @ NeurIPS 2024, 2024

BigDocs is a permissively-licensed dataset for training vision-language models on document and code tasks.