Skip to content
View chunhuizng's full-sized avatar

Highlights

  • Pro

Block or report chunhuizng

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
chunhuizng/README.md

Hi there 👋

I'm Chunhui Zhang, a Ph.D. candidate in Computer Science at Dartmouth 🌲, working with 🌟Professor Soroush Vosoughi. I also hold an MSCS degree (research-based) from Brandeis University, where I was honored with the GSAS Fellowship, and a Bachelor's degree in CS from Northeastern University, receiving the Outstanding Honor Thesis Award.


🔭 Research

My research focuses on advancing the intrinsic properties of deep learning across diverse modalities, with an emphasis on trustworthiness, scalability, and applicability to real-world challenges. Highlights of my work include:

  • Overcoming Multi-step Complexity in Theory-of-Mind Reasoning: A Scalable Bayesian Planner
    Conference: ICML 2025, Spotlight (Top 2.59%).
    Authors: Chunhui Zhang, Zhongyu Ouyang, Kwonjoon Lee, Nakul Agarwal, Sean Dae Houlihan, {Soroush Vosoughi, Shao-Yuan Lo}

  • Growing Through Experience: Scaling Episodic Grounding in Language Models
    Conference: ACL 2025, Oral Presentation (Top 3.24%).
    Authors: Chunhui Zhang, Sirui Wang, Zhongyu Ouyang, Xiangchi Yuan, Soroush Vosoughi

  • Pretrained Image-Text Models are Secretly Video Captioners
    Conference: NAACL 2025 Oral Presentation (Top 2.88%).
    Authors: Chunhui Zhang*, Yiren Jian*, Zhongyu Ouyang, Soroush Vosoughi

  • Knowing More, Acting Better: Hierarchical Representation for Embodied Decision-Making for PPO Training
    Conference: Findings of EMNLP 2025
    Authors: Chunhui Zhang, Zhongyu Ouyang, Xingjian Diao, Zheyuan Liu, Soroush Vosoughi

  • Superficial Self-Improved Reasoners Benefit from Model Merging
    Conference: EMNLP 2025
    Authors: Xiangchi Yuan, Chunhui Zhang, Zheyuan Liu, Dachuan Shi, Soroush Vosoughi, Wenke Lee

  • Temporal Working Memory: Query-Guided Temporal Segment Refinement for Enhanced Multimodal Understanding
    Conference: Findings of NAACL 2025
    Authors: {Chunhui Zhang*, Xingjian Diao*}, Weiyi Wu, Zhongyu Ouyang, Peijun Qing, Ming Cheng, Soroush Vosoughi, Jiang Gui

  • Working Memory Identifies Reasoning Limits in Language Models
    Conference: EMNLP 2024
    Authors: Chunhui Zhang, Yiren Jian, Zhongyu Ouyang, Soroush Vosoughi


💼 Internship Experience

  • Amazon Science (Sept 2025 – Present)
    Applied Scientist Intern, Seattle, WA
    Research on reinforcement learning post-training for GUI-based agents, focusing on adaptive reasoning and long-horizon memory in multimodal environments.

  • Google DeepMind (Jun 2025 – Sept 2025)
    Research Intern, Mountain View, CA
    Developed a high-throughput RL training pipeline for the Gemma-3n family, achieving 5× speedup via KV-cache reuse in audio–text long-context scenarios. Contributed to multimodal Gemma-3n open-source tools.

  • Honda Research Institute USA (May 2024 – Sept 2024)
    Research Intern, San Jose, CA
    Worked on multimodal long-context modeling with context-parallel and ring-attention architectures, improving alignment across audio, video, and text modalities for real-world perception tasks.


🌱 Current Focus

I am currently exploring Multimodal LLMs (Language-Vision-Audio), memory mechanisms, and reinforcement learning to discover unseen and genuinely new patterns in the real world. My recent work includes training recipes for large-scale models, which ranked Top-2 on PaperWithCode’s Video Captioning Leaderboard, showcasing optimal strategies for resource allocation in post-training.


📫 Reach Me


💬 Let's Connect

Feel free to reach out if you're interested in collaboration, career advice, or just a friendly chat about research and life!

Popular repositories Loading

  1. scale-bayesian-planner scale-bayesian-planner Public

    Python 8

  2. mllm-video-captioner mllm-video-captioner Public

    We use RL to train a SOTA MLLM captioner.

    Python 7 1

  3. audio-long-form-reasoner audio-long-form-reasoner Public

    Python 7

  4. working-memory-limits working-memory-limits Public

    We identify LLM's long-range reasoning limits by working memory

    Python 4

  5. Tensor-completion-via-capped-nuclear-norm Tensor-completion-via-capped-nuclear-norm Public

    A new algorithm with great speed advantage: We expand the meaning of the two-dimensional capped nuclear norm, and use this for image and video reconstruction

    MATLAB 3 3

  6. GAME GAME Public

    Python 3