Skip to content
View eric-xw's full-sized avatar
💭
I may be slow to respond.
💭
I may be slow to respond.
  • University of California, Santa Barbara

Highlights

  • Pro

Organizations

@eric-ai-lab

Block or report eric-xw

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official codebase for the paper "Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space"

Python 40 Updated Dec 17, 2025

Official codebase for the paper "Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations"

Python 330 22 Updated Oct 14, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,228 202 Updated Jan 8, 2026

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Python 3,738 387 Updated Dec 23, 2025

[NeurIPS'25] GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents

Python 370 46 Updated Oct 29, 2025

[NeurIPS 2025] More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Python 73 3 Updated May 31, 2025

[EMNLP 2025] Official code for the paper "SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning"

Python 14 1 Updated Jun 30, 2025

Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"

Python 172 10 Updated Jan 8, 2026

Official implementation of the NeurIPS 2025 paper "Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space"

Python 296 37 Updated Dec 12, 2025

Agent S: an open agentic framework that uses computers like a human

Python 9,392 1,076 Updated Dec 16, 2025

Universal memory layer for AI Agents

Python 45,313 4,945 Updated Jan 10, 2026

[ICLR 2025] EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing

Python 20 4 Updated Apr 1, 2025

LLM101n: Let's build a Storyteller

36,128 1,966 Updated Aug 1, 2024

[ACL 2025 Findings] "Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models"

Python 13 Updated Feb 25, 2025

Official repo for the paper "Mojito: Motion Trajectory and Intensity Control for Video Generation""

Python 5 1 Updated Jun 11, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 17,676 1,502 Updated Jan 4, 2026

Large Concept Models: Language modeling in a sentence representation space

Python 2,323 206 Updated Jan 29, 2025

New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos

8,074 524 Updated Jan 6, 2026

A simple screen parsing tool towards pure vision based GUI agent

Jupyter Notebook 24,180 2,076 Updated Sep 12, 2025

Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.

Python 20,772 2,218 Updated Mar 11, 2025

[ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"

Python 30 2 Updated Jun 23, 2025

[ECCV 2024] Official implementation of NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

Python 232 13 Updated Sep 20, 2024

This is the implementation of ACL 2024 Findings paper ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models

4 Updated Jun 11, 2024

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Python 2,906 177 Updated May 26, 2025

Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"

Python 28 3 Updated Jul 31, 2024
26 Updated Jun 20, 2024

Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"

Python 29 1 Updated Jul 15, 2025

[ACL 2025 Findings] "Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA"

Python 25 1 Updated Feb 21, 2025

Letta is the platform for building stateful agents: open AI with advanced memory that can learn and self-improve over time.

Python 20,588 2,145 Updated Jan 3, 2026
Next