About me
I am a Ph.D. student in Computer Science at York University supervised by Prof. Enamul Hoque. My research focuses on developing multimodal vision-language models and benchmarks for chart, table, and document comprehension. I earned my Master of Science in Computer Science from York University in 2022 and a Bachelor of Science in Computer Engineering from Koc University in 2020, supported by the fully-funded Al Ghurair Foundation (AGFE) STEM scholarship.
My research, published in top-tier conferences such as ACL 2022, EuroVis 2022, EMNLP 2023, ACL 2024, EMNLP 2024, COLING 2025, and ICLR 2025, as well as workshops at Neurips 2024, ICLR 2025, AAAI 2024 and CVPR 2021 (where I received the Best Paper Award), focuses on advancing chart understanding benchmarks (e.g., ChartQA, Chart2Text) and models (e.g., UniChart, ChartInstruct, ChartGemma). These contributions have been widely adopted, accumulating over 100,000 downloads on open-source platforms like Hugging Face. Notably, our ChartQA benchmark has been featured as a key multimodal evaluation benchmark by OpenAI’s GPT-4 official blog post and Google’s Gemini paper in evaluating their models visual reasoning capabilities. As of March 7th, 2025, my publications have received 946 citations, and I have an h-index of 9 on Google Scholar. My research is also supported by funding awards such as the Google PaliGemma Academic Award.
Professionally, I have worked as a Senior Data Scientist at Arteria AI, leading multimodal document understanding projects for clients in the finance and legal industries in Canada between 2022 and 2024. Currently, I am a visiting researcher at ServiceNow Research, where I focus on designing novel vision-language large model (VLMs) architectures for multimodal document understanding and training cutting-edge VLLMs (e.g., LLama 3.2, Phi3.5, Idefics-3) on large compute clusters with multi-node H100 GPUs.
Research Interests
- Multimodal Large Language Modeling
- Multimodal Chart Understnding
- Multimodal Document Understanding
- Natural Language Processing
- Large Language Models
- Vision - Language
Education
York University
PhD in Computer Science, Sep 2024 - Present- York University
M.Sc. in Computer Science, Sep 2020 - June 2022
GPA: 9/9, Exceptional A+ - Koc University
B.Sc. in Computer Engineering, Sep 2017 - June 2020
GPA: 3.9/4.0
Work Experience
ServiceNow Research
Visiting Researcher, September 2024 - PresentArteria AI
Senior Data Scientist, January 2024 - September 2024Arteria AI
Data Scientist, May 2022 - December 2023York University
Research Assistant, Sep 2020 - June 2022York University
Teaching Assistant, Jan 2021 - Apr 2022ShallowAI
Data Scientist Intern, July 2020 - Sep 2020Koc University
Undergraduate Research Assistant, June 2019 - Jan 2020
Conference Publications
BigDocs: A Permissively-Licensed Dataset for Training Vision-Language Models on Document and Code Tasks.
Juan A. Rodriguez, Xiangru Jian, Siba Smarak Panigrahi, Tianyu Zhang, Aarash Feizi, Abhay Puri, Akshay Kalkunte, Francois Savard, Amirhossein Abaskohi, Ahmed Masry, Perampalli Shravan Nayak, Mahsa Massoud, Rabiul Awal, Pierre-André Noël, Mats L. Richter, Saverio Vadacchino, Shubham Agarwal, Sanket Biswas, Ying Zhang, Sathwik Tejaswi Madhusudhan, João Monteiro, Krishnamurthy (Dj) Dvijotham, Torsten Scholak, Nicolas Chapados, Sean Hughes, Tamer Özsu, Aishwarya Agrawal, Marco Pedersoli, Christopher Pal, Perouz Taslakian, David Vazquez, Issam H. Laradji, Spandana Gella, Sai Rajeswar Mudumba. Published at ICLR 2025ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild
Ahmed Masry*, Megh Thakkar*, Aayush Bajaj, Aaryaman Kartha, Enamul Hoque, Shafiq Joty
Published at COLING 2025Are Large Vision Language Models up to the Challenge of Chart Comprehension and Reasoning? An Extensive Investigation into the Capabilities and Limitations of LVLMs
Mohammed Saidul Islam, Raian Rahman, Ahmed Masry, Md Tahmid Rahman Laskar, Mir Tafseer Nayeem, Enamul Hoque
Published at EMNLP 2024ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning
Ahmed Masry*, Mehrad Shahmohammadi*, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty [*Equal Contribution]
Published at ACL 2024UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning
Ahmed Masry*, Parsa Kavehzadeh*, Xuan Long Do, Shafiq Joty, and Enamul Hoque [*Equal Contribution]
Published at EMNLP 2023Chart-to-Text: A Large-Scale Benchmark for Chart Summarization
Shankar Kantharaj*, Rixie Tiffany Leong*, Xiang Lin*, Ahmed Masry*, Megh Thakkar*, Enamul Hoque, Shafiq Joty [*Equal Contribution]
Published at ACL 2022ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning
Ahmed Masry, Do Long, Jia Qing Tan, Shafiq Joty, and Enamul Hoque.
Published at ACL 2022Chain FL: Decentralized Federated Machine Learning via Blockchain
C. Korkmaz, H. E. Kocas, A. Uysal, A. Masry, O. Ozkasap and B. Akgun
Published at BCCA 2020
Journal Publications
- Chart Question Answering: State of the Art and Future Directions
Enamul Hoque, Parsa Kavehzadeh, Ahmed Masry
Published at EuroVis 2022
Workshop Papers
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
Ahmed Masry, Juan A. Rodriguez, Tianyu Zhang, Suyuchen Wang, Chao Wang, Aarash Feizi, Akshay Kalkunte Suresh, Abhay Puri, Xiangru Jian, Pierre-André Noël, Sathwik Tejaswi Madhusudhan, Marco Pedersoli, Bang Liu, Nicolas Chapados, Yoshua Bengio, Enamul Hoque, Christopher Pal, Issam H. Laradji, David Vazquez, Perouz Taslakian, Spandana Gella, Sai Rajeswar.
Presented at Re-Align @ ICLR 2025BigDocs: A Permissively-Licensed Dataset for Training Vision-Language Models on Document and Code Tasks.
Juan A. Rodriguez, Xiangru Jian, Siba Smarak Panigrahi, Tianyu Zhang, Aarash Feizi, Abhay Puri, Akshay Kalkunte, Francois Savard, Amirhossein Abaskohi, Ahmed Masry, Perampalli Shravan Nayak, Mahsa Massoud, Rabiul Awal, Pierre-André Noël, Mats L. Richter, Saverio Vadacchino, Shubham Agarwal, Sanket Biswas, Ying Zhang, Sathwik Tejaswi Madhusudhan, João Monteiro, Krishnamurthy (Dj) Dvijotham, Torsten Scholak, Nicolas Chapados, Sean Hughes, Tamer Özsu, Aishwarya Agrawal, Marco Pedersoli, Christopher Pal, Perouz Taslakian, David Vazquez, Issam H. Laradji, Spandana Gella, Sai Rajeswar Mudumba.
Presented at RBFM @ NeurIPS 2024LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents
Ahmed Masry and Amir Hajian.
Presented at AIFinSI @ AAAI 2024Integrating Image Data Extraction and Table Parsing Methods for Chart Question Answering
Ahmed Masry, Enamul Hoque Prince
Presented at ChartQA @ CVPR 2021 [Best Paper Award]
Preprints
- Do LLMs Work on Charts? Designing Few-Shot Prompts for Chart Question Answering and Summarization
Xuan Long Do, Mohammad Hassanpour, Ahmed Masry, Parsa Kavehzadeh, Enamul Hoque, Shafiq Joty
Academic Awards
Mitacs Accelerate Award
15K CAD to support my multimodal research at ServiceNow Research.Google PaliGemma Academic Award
5K USD in GCP Credits to support my multimodal research.Al Ghurair Foundation for Education STEM Scholarship Program (AGFE)
A merit-based full-ride scholarship for highly achieving Arab students to pursue a STEM degree.Vehbi Koc Scholar And Dean’s Honor Roll
Merit-based Awards for Academic ExcellenceCVPR 2021 Chart Question Answering Workshop
Achieved Best Paper Award.
Extracurricular Activities
Arabic NLP Winter School
Attended 2025 winter school at MBZUAI campus in Abu Dhabi, UAE (January 2025)Multimodal AI Winter School
Attended winter school at Saudi Data & AI Authority (SDAIA) in Riyadh, Saudi Arabia (December 2024)Al-Ghurair Foundation (AGFE)
Student Ambassador (2018-2019)Robotics Craftsmanship International Academy (University of Coimbra)
Attended 2018 summer robotics school and Ranked 1st in the sumo robots competition, and in the top 10 out of 84 participants.
Academic Services
- ACL 2025 Reviewer
- NAACL 2025 Reviewer
- ACL 2024 Reviewer
- EMNLP 2024 Reviewer
Recent News
- Paper accepted at ICLR 2025
January 2025 - Attended NLP Winter School at MBZUAI (Abu Dhabi, UAE)
January 2025 - Attended Multimodal Winter School at SDAIA (Riyadh, Saudi Arabia)
December 2024 - Paper accepted at COLING 2025
November 2024 - Started as a Visiting Researcher at ServiceNow Research
September 2024 - Received Google’s PaliGemma Academic Program Award
September 2024 - Started my PhD at York University
September 2024 - Paper accepted at ACL 2024
May 2024 - Paper accepted at AIFinSI@AAAI 2024
January 2024 - Promoted to Senior Data Scientist at at Arteria AI
January 2024 - Paper accepted at EMNLP 2023
October 2023 - Presented ChartQA Poster at TMLS 2023
June 2023 - Defended my M.Sc. Thesis at York University
June 2022 - Presented ChartQA Poster at CVR/VISTA 2022
June 2022 - Paper accepted at EuroVis 2022
May 2022 - Started working at ArteriaAI as an NLP Data Scientist.
May 2022 - Paper accepted at ACL 2022
March 2022 - Paper accepted at ACL 2022
March 2022 - Won Best Paper Paper award at ChartQA@CVPR 2021
June 2021 - Paper accepted at ChartQA@CVPR 2021
June 2021 - Joined as a Graduate Teaching Student at York University
January 2021 - Joined as a Graduate Research Assistant at York University
September 2020 - Started Fully Funded M.Sc. in Computer Science at York University
September 2020 - Started as a Data Scientist Intern at Shallow AI
July 2020