Lists (2)
Sort Name ascending (A-Z)
Starred repositories
⚡️SwanLab - an open-source, modern-design AI training tracking and visualization tool. Supports Cloud / Self-hosted use. Integrated with PyTorch / Transformers / verl / LLaMA Factory / ms-swift / U…
Label Studio is a multi-type data labeling and annotation tool with standardized output format
A lightweight LMM-based Document Parsing Model
A debugging and profiling tool that can trace and visualize python code execution
OCR, layout analysis, reading order, table recognition in 90+ languages
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
Python implement for real time remote PPG for heart rate measurement (Course Project for Biomedical Sensory)
PhysioKit: Open-source, accessible Physiological Computing Toolkit [Sensors 2023]
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …
This is a web demo for camera-based PPG sensing (rPPG).
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
A web-based collaborative LaTeX editor
Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
基于大模型搭建的聊天机器人,同时支持 微信公众号、企业微信应用、飞书、钉钉 等接入,可选择ChatGPT/Claude/DeepSeek/文心一言/讯飞星火/通义千问/ Gemini/GLM-4/Kimi/LinkAI,能处理文本、语音和图片,访问操作系统和互联网,支持基于自有知识库进行定制企业智能客服。
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Refine high-quality datasets and visual AI models
OpenMMLab Foundational Library for Training Deep Learning Models
Turning a CLIP Model into a Scene Text Detector (CVPR2023) | Turning a CLIP Model into a Scene Text Spotter (TPAMI)
An intelligent coding assistant plugin for Visual Studio Code, developed based on CodeShell
a state-of-the-art-level open visual language model | 多模态预训练模型
Multi-user server for Jupyter notebooks
Infinidat / munch
Forked from dsc/bunchA Munch is a Python dictionary that provides attribute-style access (a la JavaScript objects).