Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
Empowering RAG with a memory-based data interface for all-purpose applications!
Trae Agent is an LLM-based agent for general purpose software engineering tasks.
ImageBind One Embedding Space to Bind Them All
A deep learning project for automated chorus detection in songs, featuring a command-line interface (CLI) tool that allows users to input a YouTube link and utilize a pre-trained CRNN model to dete…
A network simulation tool for V2X in 5G. The networking operation is built on python. We utilize Eclipse SUMO (Simulation of Urban MObility) to simulate realistic road traffic. The 5G wireless comm…
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
App for generating QR codes with GitHub logo and export to SVG/PNG/JPEG/WEBP format
VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
[NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Example
ACE-Step: A Step Towards Music Generation Foundation Model
Lets make video diffusion practical!
DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execution, while contributing back to the open-source community.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
[ISMIR 2025] A curated list of vision-to-music generation: methods, datasets, evaluation and challenges.
PyTorch implementation of FractalGen https://arxiv.org/abs/2502.17437
[ICLR 2025 Spotlight] The official implementation of the paper “LOKI:A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models”
A lightweight, powerful framework for multi-agent workflows
Eagle: Frontier Vision-Language Models with Data-Centric Strategies
[ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models (CRFM) at Stanford for holistic, reproducible and transparen…
Official PyTorch implementation for "Large Language Diffusion Models"
No fortress, purely open ground. OpenManus is Coming.
[CVPR'25] Towards More General Video-based Deepfake Detection through Facial Feature Guided Adaptation for Foundation Model (DFD-FCG)
a list of demo websites for automatic music generation research