- Beijing
-
15:54
(UTC +08:00) - https://scholar.google.com/citations?user=j4EmuqkAAAAJ
Stars
Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"
Generative Agents: Interactive Simulacra of Human Behavior
ReLE评测:中文AI大模型能力评测(持续更新):目前已囊括303个大模型,覆盖chatgpt、gpt-5、o4-mini、谷歌gemini-2.5、Claude4.5、智谱GLM-Z1、文心一言、qwen3-max、百川、讯飞星火、商汤senseChat、minimax等商用模型, 以及kimi-k2、ernie4.5、minimax-M1、DeepSeek-R1-0528、deepsee…
Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
🎉 Repo for LaWGPT, Chinese-Llama tuned with Chinese Legal knowledge. 基于中文法律知识的大语言模型
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
GAOKAO-Bench is an evaluation framework that utilizes GAOKAO questions as a dataset to evaluate large language models.
BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)
We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiasts…
Open Instruction Generalist is an assistant trained on massive synthetic instructions to perform many millions of tasks
4 bits quantization of LLaMA using GPTQ
A collection of libraries to optimise AI model performances
A High-Performance Pytorch Implementation of face detection models, including RetinaFace and DSFD
An implement of the paper of EDA for Chinese corpus.中文语料的EDA数据增强工具。NLP数据增强。论文阅读笔记。
FaRL for Facial Representation Learning [Official, CVPR 2022]
Synthetic Faces High Quality (SFHQ) Dataset. 425,000 curated 1024x1024 synthetic face images
State-of-the-Art Text Embeddings
Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.
[CVPR 2021] Multi-Modal-CelebA-HQ: A Large-Scale Text-Driven Face Generation and Understanding Dataset
Free English to Chinese Dictionary Database