MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化，也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

4,024 283 Updated Nov 8, 2025

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 62,161 7,522 Updated Nov 6, 2025

OpenVLG / DELLA

Official code for the NAACL 2022 paper "Fuse It More Deeply! A Variational Transformer with Layer-Wise Latent Variable Inference for Text Generation"

Python 35 6 Updated Aug 20, 2022

geekan / HowToLiveLonger

程序员延寿指南 | A programmer's guide to live longer

34,516 2,367 Updated May 19, 2025

huggingface / transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 152,326 31,099 Updated Nov 9, 2025

Shark-NLP / DiffuSeq

[ICLR'23] DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models

Python 819 105 Updated Mar 1, 2024

XiangLi1999 / Diffusion-LM

Diffusion-LM

Python 1,196 156 Updated Aug 8, 2024

YerevaNN / WARP

Code for ACL'2021 paper WARP 🌀 Word-level Adversarial ReProgramming. Outperforming `GPT-3` on SuperGLUE Few-Shot text classification. https://aclanthology.org/2021.acl-long.381/

Python 82 16 Updated Oct 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weicheng Wang ziwo-maker

Highlights

Block or report ziwo-maker

Starred repositories

cv-cat / Spider_XHS

ziwo-maker / retrivel1

fengranMark / ConvRelExpand

padeoe / hf-mirror-site

yule-BUAA / MergeLM

liguodongiot / llm-action

esbatmop / MNBVC