I'm Chunhui Zhang, a Ph.D. candidate in Computer Science at Dartmouth 🌲, working with 🌟Professor Soroush Vosoughi. I also hold an MSCS degree (research-based) from Brandeis University, where I was honored with the GSAS Fellowship, and a Bachelor's degree in CS from Northeastern University, receiving the Outstanding Honor Thesis Award.
My research focuses on advancing the intrinsic properties of deep learning across diverse modalities, with an emphasis on trustworthiness, scalability, and applicability to real-world challenges. Highlights of my work include:
-
Overcoming Multi-step Complexity in Theory-of-Mind Reasoning: A Scalable Bayesian Planner
Conference: ICML 2025, Spotlight (Top 2.59%).
Authors: Chunhui Zhang, Zhongyu Ouyang, Kwonjoon Lee, Nakul Agarwal, Sean Dae Houlihan, {Soroush Vosoughi, Shao-Yuan Lo} -
Growing Through Experience: Scaling Episodic Grounding in Language Models
Conference: ACL 2025, Oral Presentation (Top 3.24%).
Authors: Chunhui Zhang, Sirui Wang, Zhongyu Ouyang, Xiangchi Yuan, Soroush Vosoughi -
Pretrained Image-Text Models are Secretly Video Captioners
Conference: NAACL 2025 Oral Presentation (Top 2.88%).
Authors: Chunhui Zhang*, Yiren Jian*, Zhongyu Ouyang, Soroush Vosoughi -
Knowing More, Acting Better: Hierarchical Representation for Embodied Decision-Making for PPO Training
Conference: Findings of EMNLP 2025
Authors: Chunhui Zhang, Zhongyu Ouyang, Xingjian Diao, Zheyuan Liu, Soroush Vosoughi -
Superficial Self-Improved Reasoners Benefit from Model Merging
Conference: EMNLP 2025
Authors: Xiangchi Yuan, Chunhui Zhang, Zheyuan Liu, Dachuan Shi, Soroush Vosoughi, Wenke Lee -
Temporal Working Memory: Query-Guided Temporal Segment Refinement for Enhanced Multimodal Understanding
Conference: Findings of NAACL 2025
Authors: {Chunhui Zhang*, Xingjian Diao*}, Weiyi Wu, Zhongyu Ouyang, Peijun Qing, Ming Cheng, Soroush Vosoughi, Jiang Gui -
Working Memory Identifies Reasoning Limits in Language Models
Conference: EMNLP 2024
Authors: Chunhui Zhang, Yiren Jian, Zhongyu Ouyang, Soroush Vosoughi
-
Amazon Science (Sept 2025 – Present)
Applied Scientist Intern, Seattle, WA
Research on reinforcement learning post-training for GUI-based agents, focusing on adaptive reasoning and long-horizon memory in multimodal environments. -
Google DeepMind (Jun 2025 – Sept 2025)
Research Intern, Mountain View, CA
Developed a high-throughput RL training pipeline for the Gemma-3n family, achieving 5× speedup via KV-cache reuse in audio–text long-context scenarios. Contributed to multimodal Gemma-3n open-source tools. -
Honda Research Institute USA (May 2024 – Sept 2024)
Research Intern, San Jose, CA
Worked on multimodal long-context modeling with context-parallel and ring-attention architectures, improving alignment across audio, video, and text modalities for real-world perception tasks.
I am currently exploring Multimodal LLMs (Language-Vision-Audio), memory mechanisms, and reinforcement learning to discover unseen and genuinely new patterns in the real world. My recent work includes training recipes for large-scale models, which ranked Top-2 on PaperWithCode’s Video Captioning Leaderboard, showcasing optimal strategies for resource allocation in post-training.
- Email: [email protected]
- LinkedIn: Chunhui Zhang
- GitHub: chunhuizng
- Google Scholar: My Publications
Feel free to reach out if you're interested in collaboration, career advice, or just a friendly chat about research and life!