Skip to content
View whwu95's full-sized avatar
♥️
I may be slow to respond.
♥️
I may be slow to respond.

Highlights

  • Pro

Block or report whwu95

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
whwu95/README.md

Hi, I'm Wenhao Wu 👋

Wenhao Wu 知乎 github LinkedIn Google Scholar X

I am an Applied Scientist at Amazon AGI, working on the Nova Cross-modal Foundation Model. Before joining Amazon, I spent nearly seven years (2018–2025) at Baidu VIS, where I grew from a research intern into a Senior/Staff Researcher and contributed to multiple large-scale computer vision and multimodal projects. Since 2021, I have been collaborating closely with Chief Scientist Dr. Jingdong Wang (IEEE Fellow). I earned my Ph.D. from the MMLab at The University of Sydney, supervised by Prof. Wanli Ouyang. Previously, I obtained my M.S.E. degree from the University of Chinese Academy of Sciences (UCAS), under the supervision of Prof. Shifeng Chen and Prof. Yu Qiao.

Since 2016, I have been engaged in AI research and development across both academia and industry, gaining extensive experience at Amazon AGI, Baidu AIG, Snap Research, SenseTime Research, Samsung Research, and iQIYI AI. I have also been affiliated with leading academic institutions including MMLab@USYD, MMLab@CUHK, and MMLab@SIAT-CAS.

I am honored to have been awarded the Baidu PhD Fellowship (2023) and the DAAD AInet Fellowship (2025).

Wenhao Wu's GitHub stats Top Langs

Pinned Loading

  1. MVFNet MVFNet Public

    【AAAI'2021】MVFNet: Multi-View Fusion Network for Efficient Video Recognition

    Python 133 12

  2. Text4Vis Text4Vis Public

    【AAAI'2023 & IJCV】Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective

    Python 197 9

  3. Cap4Video Cap4Video Public

    【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

    Python 248 17

  4. ATM ATM Public

    【ICCV'2023】What Can Simple Arithmetic Operations Do for Temporal Modeling?

    Python 74 6

  5. GPT4Vis GPT4Vis Public

    GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?

    Python 186 18

  6. HJYao00/DenseConnector HJYao00/DenseConnector Public

    【NeurIPS 2024】Dense Connector for MLLMs

    Python 179 8