Skip to content
View Masoudjafaripour's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Organizations

@Adaptive-Robotic-Lab

Block or report Masoudjafaripour

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Masoudjafaripour/README.md

Hi there 👋, I'm Masoud, a graduate student in Computer Science at the University of Alberta, supervised by Prof. Osmar R. Zaïane, and currently a Research Intern at Electronic Arts (EA) 🎮.

My thesis research focuses on visual and spatial reasoning in multimodal language models. Currently, I am exploring how inference-time scaling, adaptive context control, and RL post-training can improve the efficiency and adaptability of spatial reasoning. Spatial reasoning has broad applications across domains including robotics, autonomous driving, video games, and VR/AR.

At EA, my research focuses on developing small, efficient language models for real-time decision-making and video game applications.


📊 Current Focus

  • Efficient Spatial & Visual Reasoning with LLMs/VLMs/MMLMs
  • Vision-Language Understanding & Embodied Spatial Reasoning
  • 3D Representations, Grounding, & Space Understanding
  • Building Vision-Language Datasets for Post-training MLLMs on Spatial Reasoning Tasks
  • Visual and Geometry Retrieval Systems

🎓 Academic Background

  • M.Sc. in CS, University of Alberta (Present)
  • Ph.D. in ECE, University of Alberta (Transferred to CS)
  • M.Sc. & B.Sc. in ME, Sharif University of Technology & Univ. of Tehran

💬 Connect with Me

Website Badge Twitter Badge LinkedIn Badge ResearchGate Badge Google Scholar Badge

Pinned Loading

  1. nanochat-VLM nanochat-VLM Public

    A minimal, hackable Vision-Language Model built on Karpathy’s nanochat — add image understanding and multimodal chat for under $200 in compute.

    Python 24 3

  2. Spatial_Reasoning_VLMs Spatial_Reasoning_VLMs Public

    A repo for enhancing spatial reasoning in VLMs using CoT and VoT prompting for 3D visual environments

    Python 13 1

  3. World_Model_SSMs_Video_IMGGen World_Model_SSMs_Video_IMGGen Public

    A repo for using different dynamic world modeling for planning, prediction and control: including State-Space Models (SSMs) and Video Models

    Python 13 2

  4. Multimodal_Datasets_Generative_Reasoning Multimodal_Datasets_Generative_Reasoning Public

    A repository for surveying, organizing, and prototyping dataset and benchmark construction pipelines for generative reasoning in multimodal large language models. It focuses on data-centric practic…

    Jupyter Notebook 12

  5. AIFP AIFP Public

    Adaptive Iterative Feedback Prompting for Obstacle-Aware Path Planning via LLMs - LM4Planning - AAAI2025

    Python 10 3

  6. OnlineRLHF OnlineRLHF Public

    A repo for Implemented online preference-based reward learning under human irrationality & delayed feedback

    Python 9 1