Multimodal LLM Visual Reasoning
Enhancing multimodal visual reasoning with chain-of-thought based inference.
During my internship at NMSL, I worked on improving multimodal LLM visual reasoning by incorporating chain-of-thought style mechanisms into autoregressive inference.
The project focused on building a framework that helps multimodal models reason more effectively over visual inputs rather than relying only on direct prediction. It sits at the intersection of multimodal reasoning, language models, and perception.
This experience helped shape my broader interest in systems that communicate and reason about space more robustly.