attentif

A toy implementation of ”Attention Is All You Need”

Demo

BERT

GPT2

Motivation

I made this project in order to get a deeper understanding for the Transformer architecture, BERT, RoBERTa, T5, and GPT models. We often rely on existing Transformer implementation such as Hugging Face Transformers when we need to train a model. However, I wanted to test if I can implement them from scratch, referring to the paper.

This project does include:

torch.nn.Module
torch.nn.Parameter
Existing tokenizer implementation from transformers
And other primitive functions offered by PyTorch

While this project does not include:

Any models from transformers
nn.Transformer
nn.MultiheadAttention
nn.Embedding
nn.LayerNorm
nn.functional.softmax
And other existing modules that plays an essential role in Transformer architecture

Features

We implemented the following features so far. You can find the layers and functions in src/layers, and models in src/models.

Functions

dropout
softmax
gelu
positional_encoding

Layers

Models

BertModel
GPT2Model
T5Model

Schedulers

We use transformers for schedulers for now, but have a plan to implement them from scratch in the future.

AdamW
CrossEntropy

References

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is All You Need. NeurIPS 2017.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL 2019.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. OpenAI.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github/workflows		.github/workflows
assets		assets
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

attentif

Demo

Motivation

Features

Functions

Layers

Models

Schedulers

References

About

Uh oh!

Languages

neet/attentif

Folders and files

Latest commit

History

Repository files navigation

attentif

Demo

Motivation

Features

Functions

Layers

Models

Schedulers

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages