ML Engineer | Building AI from First Principles
Reconstructing the Transformer from Scratch
- Rebuilt the original "Attention Is All You Need" architecture from paper to production
- Achieved 15.34 BLEU on WMT'14 DEβEN translation
- Implemented: Multi-head attention, sinusoidal embeddings, gradient accumulation, custom LR scheduling
- π Read the deep dive on Medium
Full reimplementation of "Attention Is All You Need"
- Encoder-decoder architecture with 65M parameters
- Multi-head attention mechanisms
- Custom training pipeline with gradient accumulation
- View Project β
Integration of MJX into purejaxrl
- Used Madrona GPU Rendering for ultra-fast environment rendering
- Trained CNN policy using PPO for MJX cube pick task
- Easy integration with existing MLP PPO implementation
- View Project β
- B.S. in Computer Science Honors (Turing Scholar) - The University of Texas at Austin
- Focus: Understanding ML at a fundamental level through implementation
If you're interested in my work, feel free to reach out!