-
ཤེས་བྱའི་རིས་མཛོད། 2020.4.23
- QHNU-CS
-
01:27
(UTC -12:00) - [email protected]
- https://github.com/AI-Bod
Lists (1)
Sort Name ascending (A-Z)
Starred repositories
[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.
Code for the paper "Greed is All You Need: An Evaluation of Tokenizer Inference Methods"
Simple-to-use scoring function for arbitrarily tokenized texts.
Codebase for EMNLP Findings Submission titled: Adaptive BPE Tokenization for Enhanced Vocabulary Adaptation in Finetuning Pretrained Language Models
🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy
✨✨Latest Advances on Multimodal Large Language Models
A Keras TensorFlow 2.0 implementation of BERT, ALBERT and adapter-BERT.
⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SFT etc.
This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Language Model Pretraining…
[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623
A tool for extracting plain text from Wikipedia dumps
PyTorch implementation of BERT in "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"
A PyTorch implementation of Transformer in "Attention is All You Need"
Production infrastructure for machine learning at scale
DziriBERT: a Pre-trained Language Model for the Algerian Dialect
repository for Publicly Available Clinical BERT Embeddings
BERT models pretrained on the CORD-19 Kaggle dataset
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Bringing BERT into modernity via both architecture changes and scaling
Builds wordpiece(subword) vocabulary compatible for Google Research's BERT
End-to-End recipes for pre-training and fine-tuning BERT using Azure Machine Learning Service
北京航空航天大学大数据高精尖中心自然语言处理研究团队对信息抽取领域的调研。包括实体识别,关系抽取,属性抽取等子任务,每类子任务分别对学术界和工业界进行调研。
Online playground for OpenAPI tokenizers
The best way to start a full-stack, typesafe Next.js app
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
A feature-rich command-line audio/video downloader