This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Language Model Pretraining…

Python 245 34 Updated Jan 24, 2023

mosaicml / composer

Supercharge Your Model Training

Python 5,419 455 Updated Oct 6, 2025

sail-sg / scaling-with-vocab

[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623

Python 87 5 Updated Sep 26, 2024

thunlp / SubCharTokenization

Python 44 4 Updated Feb 5, 2023

attardi / wikiextractor

A tool for extracting plain text from Wikipedia dumps

Python 3,928 1,005 Updated May 23, 2024

dreamgonfly / BERT-pytorch

PyTorch implementation of BERT in "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"

Python 109 29 Updated Nov 1, 2018

dreamgonfly / transformer-pytorch

A PyTorch implementation of Transformer in "Attention is All You Need"

Python 106 30 Updated Dec 6, 2020

cortexlabs / cortex

Production infrastructure for machine learning at scale

Go 8,034 603 Updated Jun 12, 2024

alger-ia / dziribert

DziriBERT: a Pre-trained Language Model for the Algerian Dialect

Python 165 11 Updated Dec 28, 2022

EmilyAlsentzer / clinicalBERT

repository for Publicly Available Clinical BERT Embeddings

Python 734 151 Updated Aug 25, 2020

manueltonneau / covid-berts

BERT models pretrained on the CORD-19 Kaggle dataset

15 4 Updated Jun 8, 2020

google-research / electra

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Python 2,363 349 Updated Mar 23, 2024

allenai / scibert

A BERT model for scientific text.

Python 1,647 232 Updated Feb 22, 2022

AnswerDotAI / ModernBERT

Bringing BERT into modernity via both architecture changes and scaling

Python 1,546 127 Updated Jun 30, 2025

kwonmha / bert-vocab-builder

Builds wordpiece(subword) vocabulary compatible for Google Research's BERT

Python 231 48 Updated Dec 4, 2020

microsoft / AzureML-BERT

End-to-End recipes for pre-training and fine-tuning BERT using Azure Machine Learning Service

Jupyter Notebook 400 126 Updated Jun 12, 2023

BDBC-KG-NLP / IE-Survey

北京航空航天大学大数据高精尖中心自然语言处理研究团队对信息抽取领域的调研。包括实体识别，关系抽取，属性抽取等子任务，每类子任务分别对学术界和工业界进行调研。

472 69 Updated Apr 29, 2022

dqbd / tiktokenizer

Online playground for OpenAPI tokenizers

TypeScript 1,375 157 Updated Apr 24, 2025

t3-oss / create-t3-app

The best way to start a full-stack, typesafe Next.js app

TypeScript 28,112 1,395 Updated Oct 11, 2025

openai / tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Python 16,247 1,263 Updated Oct 6, 2025

yt-dlp / yt-dlp

A feature-rich command-line audio/video downloader

Python 130,995 10,520 Updated Oct 15, 2025

Starred topics

bert

Google

React

Natural language processing

AIBod AI-Bod

Lists (1)

🚀 My stack

Starred repositories

bert

Google

React

Natural language processing

named-entity-recognition