Stars
🟣 Data Processing interview questions and answers to help you prepare for your next machine learning and data science interview in 2026.
Implementing DeepSeek R1's GRPO algorithm from scratch
Master programming by recreating your favorite technologies from scratch.
Foundation for building semantically meaningful themes over emacs
What better way to refresh my knowledge of GF than implementing a grammar for Magic The Gathering?
Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2023
Machine Learning Foundations: Linear Algebra, Calculus, Statistics & Computer Science
Leveraging BERT and c-TF-IDF to create easily interpretable topics.
This repository provides details and links to the ACL anthology corpus/collection including .bib, .pdf and grobid extractions of the pdfs
Metonymy corpus of 26 thousand instances in 189 languages across 24 metonymy patterns
Prevent PyTorch's `CUDA error: out of memory` in just 1 line of code.
SMiLER - Samsung MultiLingual Entity and Relation Extraction dataset
Corpus Creation for Sentiment Analysis in Code-Mixed Tamil-English Text
Statistical Rethinking (2nd ed.) with NumPyro
A LaTeX Beamer templates with GU and CLASP logo for talks
This repository houses the IMPlicature and PRESupposition diagnostic dataset (IMPPRES), consisting of >25k semiautomatically generated sentence pairs illustrating well-studied pragmatic inference t…
investigating use of variational auto encoders with multinomial latent variables for unsupervised data.
Second edition of Springer Book Python for Probability, Statistics, and Machine Learning
HyperLex: a gold standard resource for measuring and evaluating how well semantic models capture graded or soft lexical entailment
A Typed Event-Focused Lexical Inference Benchmark for Evaluating Natural Language Inference
Variational autoencoder implemented in tensorflow and pytorch (including inverse autoregressive flow)
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).
X-SRL Dataset. Including the code for the SRL annotation projection tool and an out-of-the-box word alignment tool based on Multilingual BERT embeddings.
Minor mode for Emacs that deals with parens pairs and tries to be smart about it.
The repository for the paper "When Do You Need Billions of Words of Pretraining Data?"