About me

I am a Ph.D. student in Computer Science at York University supervised by Prof. Enamul Hoque. My research focuses on developing multimodal vision-language models and benchmarks for chart, table, and document comprehension. I earned my Master of Science in Computer Science from York University in 2022 and a Bachelor of Science in Computer Engineering from Koc University in 2020, supported by the fully-funded Al Ghurair Foundation (AGFE) STEM scholarship.

My research, published in top-tier conferences such as ACL 2022, EuroVis 2022, EMNLP 2023, ACL 2024, EMNLP 2024, COLING 2025, and ICLR 2025, as well as workshops at Neurips 2024, ICLR 2025, AAAI 2024 and CVPR 2021 (where I received the Best Paper Award), focuses on advancing chart understanding benchmarks (e.g., ChartQA, Chart2Text) and models (e.g., UniChart, ChartInstruct, ChartGemma). These contributions have been widely adopted, accumulating over 100,000 downloads on open-source platforms like Hugging Face. Notably, our ChartQA benchmark has been featured as a key multimodal evaluation benchmark by OpenAI’s GPT-4 official blog post and Google’s Gemini paper in evaluating their models visual reasoning capabilities. As of March 7th, 2025, my publications have received 946 citations, and I have an h-index of 9 on Google Scholar. My research is also supported by funding awards such as the Google PaliGemma Academic Award.

Professionally, I have worked as a Senior Data Scientist at Arteria AI, leading multimodal document understanding projects for clients in the finance and legal industries in Canada between 2022 and 2024. Currently, I am a visiting researcher at ServiceNow Research, where I focus on designing novel vision-language large model (VLMs) architectures for multimodal document understanding and training cutting-edge VLLMs (e.g., LLama 3.2, Phi3.5, Idefics-3) on large compute clusters with multi-node H100 GPUs.


Research Interests

  • Multimodal Large Language Modeling
  • Multimodal Chart Understnding
  • Multimodal Document Understanding
  • Natural Language Processing
  • Large Language Models
  • Vision - Language


Education

  • York University
    PhD in Computer Science, Sep 2024 - Present

  • York University
    M.Sc. in Computer Science, Sep 2020 - June 2022
    GPA: 9/9, Exceptional A+
  • Koc University
    B.Sc. in Computer Engineering, Sep 2017 - June 2020
    GPA: 3.9/4.0


Work Experience


Conference Publications

  1. BigDocs: A Permissively-Licensed Dataset for Training Vision-Language Models on Document and Code Tasks.
    Juan A. Rodriguez, Xiangru Jian, Siba Smarak Panigrahi, Tianyu Zhang, Aarash Feizi, Abhay Puri, Akshay Kalkunte, Francois Savard, Amirhossein Abaskohi, Ahmed Masry, Perampalli Shravan Nayak, Mahsa Massoud, Rabiul Awal, Pierre-André Noël, Mats L. Richter, Saverio Vadacchino, Shubham Agarwal, Sanket Biswas, Ying Zhang, Sathwik Tejaswi Madhusudhan, João Monteiro, Krishnamurthy (Dj) Dvijotham, Torsten Scholak, Nicolas Chapados, Sean Hughes, Tamer Özsu, Aishwarya Agrawal, Marco Pedersoli, Christopher Pal, Perouz Taslakian, David Vazquez, Issam H. Laradji, Spandana Gella, Sai Rajeswar Mudumba. Published at ICLR 2025

  2. ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild
    Ahmed Masry*, Megh Thakkar*, Aayush Bajaj, Aaryaman Kartha, Enamul Hoque, Shafiq Joty
    Published at COLING 2025

  3. Are Large Vision Language Models up to the Challenge of Chart Comprehension and Reasoning? An Extensive Investigation into the Capabilities and Limitations of LVLMs
    Mohammed Saidul Islam, Raian Rahman, Ahmed Masry, Md Tahmid Rahman Laskar, Mir Tafseer Nayeem, Enamul Hoque
    Published at EMNLP 2024

  4. ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning
    Ahmed Masry*, Mehrad Shahmohammadi*, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty [*Equal Contribution]
    Published at ACL 2024

  5. UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning
    Ahmed Masry*, Parsa Kavehzadeh*, Xuan Long Do, Shafiq Joty, and Enamul Hoque [*Equal Contribution]
    Published at EMNLP 2023

  6. Chart-to-Text: A Large-Scale Benchmark for Chart Summarization
    Shankar Kantharaj*, Rixie Tiffany Leong*, Xiang Lin*, Ahmed Masry*, Megh Thakkar*, Enamul Hoque, Shafiq Joty [*Equal Contribution]
    Published at ACL 2022

  7. ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning
    Ahmed Masry, Do Long, Jia Qing Tan, Shafiq Joty, and Enamul Hoque.
    Published at ACL 2022

  8. Chain FL: Decentralized Federated Machine Learning via Blockchain
    C. Korkmaz, H. E. Kocas, A. Uysal, A. Masry, O. Ozkasap and B. Akgun
    Published at BCCA 2020


Journal Publications

  1. Chart Question Answering: State of the Art and Future Directions
    Enamul Hoque, Parsa Kavehzadeh, Ahmed Masry
    Published at EuroVis 2022


Workshop Papers

  1. AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding
    Ahmed Masry, Juan A. Rodriguez, Tianyu Zhang, Suyuchen Wang, Chao Wang, Aarash Feizi, Akshay Kalkunte Suresh, Abhay Puri, Xiangru Jian, Pierre-André Noël, Sathwik Tejaswi Madhusudhan, Marco Pedersoli, Bang Liu, Nicolas Chapados, Yoshua Bengio, Enamul Hoque, Christopher Pal, Issam H. Laradji, David Vazquez, Perouz Taslakian, Spandana Gella, Sai Rajeswar.
    Presented at Re-Align @ ICLR 2025

  2. BigDocs: A Permissively-Licensed Dataset for Training Vision-Language Models on Document and Code Tasks.
    Juan A. Rodriguez, Xiangru Jian, Siba Smarak Panigrahi, Tianyu Zhang, Aarash Feizi, Abhay Puri, Akshay Kalkunte, Francois Savard, Amirhossein Abaskohi, Ahmed Masry, Perampalli Shravan Nayak, Mahsa Massoud, Rabiul Awal, Pierre-André Noël, Mats L. Richter, Saverio Vadacchino, Shubham Agarwal, Sanket Biswas, Ying Zhang, Sathwik Tejaswi Madhusudhan, João Monteiro, Krishnamurthy (Dj) Dvijotham, Torsten Scholak, Nicolas Chapados, Sean Hughes, Tamer Özsu, Aishwarya Agrawal, Marco Pedersoli, Christopher Pal, Perouz Taslakian, David Vazquez, Issam H. Laradji, Spandana Gella, Sai Rajeswar Mudumba.
    Presented at RBFM @ NeurIPS 2024

  3. LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents
    Ahmed Masry and Amir Hajian.
    Presented at AIFinSI @ AAAI 2024

  4. Integrating Image Data Extraction and Table Parsing Methods for Chart Question Answering
    Ahmed Masry, Enamul Hoque Prince
    Presented at ChartQA @ CVPR 2021 [Best Paper Award]


Preprints

  1. Do LLMs Work on Charts? Designing Few-Shot Prompts for Chart Question Answering and Summarization
    Xuan Long Do, Mohammad Hassanpour, Ahmed Masry, Parsa Kavehzadeh, Enamul Hoque, Shafiq Joty


Academic Awards

  • Mitacs Accelerate Award
    15K CAD to support my multimodal research at ServiceNow Research.

  • Google PaliGemma Academic Award
    5K USD in GCP Credits to support my multimodal research.

  • Al Ghurair Foundation for Education STEM Scholarship Program (AGFE)
    A merit-based full-ride scholarship for highly achieving Arab students to pursue a STEM degree.

  • Vehbi Koc Scholar And Dean’s Honor Roll
    Merit-based Awards for Academic Excellence

  • CVPR 2021 Chart Question Answering Workshop
    Achieved Best Paper Award.


Extracurricular Activities

  • Arabic NLP Winter School
    Attended 2025 winter school at MBZUAI campus in Abu Dhabi, UAE (January 2025)

  • Multimodal AI Winter School
    Attended winter school at Saudi Data & AI Authority (SDAIA) in Riyadh, Saudi Arabia (December 2024)

  • Al-Ghurair Foundation (AGFE)
    Student Ambassador (2018-2019)

  • Robotics Craftsmanship International Academy (University of Coimbra)
    Attended 2018 summer robotics school and Ranked 1st in the sumo robots competition, and in the top 10 out of 84 participants.


Academic Services

  • ACL 2025 Reviewer
  • NAACL 2025 Reviewer
  • ACL 2024 Reviewer
  • EMNLP 2024 Reviewer


Recent News

  • Paper accepted at ICLR 2025
    January 2025
  • Attended NLP Winter School at MBZUAI (Abu Dhabi, UAE)
    January 2025
  • Attended Multimodal Winter School at SDAIA (Riyadh, Saudi Arabia)
    December 2024
  • Paper accepted at COLING 2025
    November 2024
  • Started as a Visiting Researcher at ServiceNow Research
    September 2024
  • Received Google’s PaliGemma Academic Program Award
    September 2024
  • Started my PhD at York University
    September 2024
  • Paper accepted at ACL 2024
    May 2024
  • Paper accepted at AIFinSI@AAAI 2024
    January 2024
  • Promoted to Senior Data Scientist at at Arteria AI
    January 2024
  • Paper accepted at EMNLP 2023
    October 2023
  • Presented ChartQA Poster at TMLS 2023
    June 2023
  • Defended my M.Sc. Thesis at York University
    June 2022
  • Presented ChartQA Poster at CVR/VISTA 2022
    June 2022
  • Paper accepted at EuroVis 2022
    May 2022
  • Started working at ArteriaAI as an NLP Data Scientist.
    May 2022
  • Paper accepted at ACL 2022
    March 2022
  • Paper accepted at ACL 2022
    March 2022
  • Won Best Paper Paper award at ChartQA@CVPR 2021
    June 2021
  • Paper accepted at ChartQA@CVPR 2021
    June 2021
  • Joined as a Graduate Teaching Student at York University
    January 2021
  • Joined as a Graduate Research Assistant at York University
    September 2020
  • Started Fully Funded M.Sc. in Computer Science at York University
    September 2020
  • Started as a Data Scientist Intern at Shallow AI
    July 2020