-
University of Texas at Austin
- Austin, TX
Starred repositories
An open-source AI agent that brings the power of Gemini directly into your terminal.
For our EMNLP 2020 paper “Are ‘Undocumented Workers’ the Same as ‘Illegal Aliens’? Disentangling Denotation and Connotation in Vector Spaces”.
Intrinsic Evaluation of pre-trained word embeddings, using large Word Association Dataset: SWOW (Small World of Words)
A Github repository containing the LWOW project.
[CoNLL'21] MirrorWiC: On Eliciting Word-in-Context Representationsfrom Pretrained Language Models
Sparse and discrete interpretability tool for neural networks
Stanford NLP Python library for understanding and improving PyTorch models via interventions
Stanford NLP Python library for Representation Finetuning (ReFT)
Machine Learning Interviews from FAANG, Snapchat, LinkedIn. I have offers from Snapchat, Coupang, Stitchfix etc. Blog: mlengineer.io.
Learning to Describe Unknown Phrases with Local and Global Contexts
Interpretable Word Sense Representations via Definition Generation
Simple, unified interface to multiple Generative AI providers
Code relating to evaluation of models of compositional sentence semantics.
data and scripts for the shared task "Task 1: Paraphrase and Semantic Similarity in Twitter (PIT)" at SemEval 2015
NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences.
ACL 2024 - Linguistically Conditioned Semantic Textual Similarity
Data, codebook, and models to automatically detect storytelling.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
A set of media framing annotations, along with scripts for obtaining the corresponding news articles
[EMNLP 2023] C-STS: Conditional Semantic Textual Similarity
Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO
Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.
Utilities intended for use with Llama models.
Agentic components of the Llama Stack APIs
Code and documentation to train Stanford's Alpaca models, and generate the data.
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
Resources & scripts for the paper "MTEB: Massive Text Embedding Benchmark"