-
University of Maryland
- http://www.mozhi.umiacs.io/
Stars
A collection of tricks and tools to speed up transformer models
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
MLNLP社区用来更好进行论文搜索的工具。Fully-automated scripts for collecting AI-related papers
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、…
中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。
The project proposes a framework to apply topic models on a text-corpus and eventually topic labels on the generated topics.
Code accompanying EMNLP 2020 paper "Interactive Refinement of Cross-Lingual Word Embeddings".
Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)
scripts help chinese netizen, who uses vpn to combat censorship, by modifying the route table so as routing only the censored ip to the vpn
Speedtest script, including PING and DOWNLOAD sorting.
Spectacle allows you to organize your windows without using a mouse.
Vim plugin for intensely nerdy commenting powers
pathogen.vim: manage your runtimepath