Senior AI Research Engineer | Foundation Models & Scalable Systems | Google Developer Expert (AI)
I am a Senior AI Research Engineer and Google Developer Expert (AI) with a mission to architect and scale the powerful, efficient, and accessible large language models that will define the future.
My expertise covers the full lifecycle of foundation models: from curating massive datasets and architecting cutting-edge training infrastructure to developing production-grade models that set new performance benchmarks. I thrive on solving complex, large-scale challenges and am deeply invested in strengthening the open-source ecosystem that fuels global AI innovation.
- Foundation Model Development: Co-led the end-to-end pre-training of Kakao's Kanana V1 foundation model from a 3T token dataset and implemented compute-efficient scaling techniques like Pruning & Distillation. I also spearheaded key enhancements for Kanana-1.5 (including its 128K long-context extension) and owned the full development of a production embedding model that surpassed larger competitors.
- Scalable AI Infrastructure: Architected and optimized a cutting-edge, scalable LLM training pipeline from the ground up using JAX, MaxText, and TPUs. This work was featured in an official Google Cloud Blog Post and my expertise was recognized with a presentation at Google Cloud Next 2025 (YouTube).
- Open Source Leadership: As a Research Lead at EleutherAI, I co-led the development of Polyglot-Ko, the first open-source Korean large language model, successfully training and releasing models up to 12.8B parameters.
My GitHub activity reflects a consistent track record of contributing high-impact code to the core of the modern AI ecosystem. I focus on strengthening foundational libraries, building scalable systems, and advancing rigorous evaluation. Below are some of my key contributions:
- Hugging Face Transformers: Led the end-to-end integration of the DeepSeek-V3 model, a complex process that spanned its core architecture, unique RoPE scaling logic, and follow-on capabilities like token classification heads.
- Google's MaxText: Architected and implemented the multi-source data blending feature, significantly enhancing the data pipeline for Google's flagship JAX-based large-scale training framework.
- LM-Eval Harness: Expanded the community's model evaluation capabilities by adding and integrating major benchmarks, including
global_mmlu
,hrm8k
, andhumaneval+
. - Broader Ecosystem: My contributions also span high-performance distributed training concepts like Tensor Parallelism in EleutherAI's OSLO, enabling distributed training for generative vision models, and implementing new model support in LLM2Vec.
- Kanana: Compute-efficient Bilingual Language Models. [arXiv:2502.18934]
- A Technical Report for Polyglot-Ko: Open-Source Large-Scale Korean Language Models. [arXiv:2306.02254]
- Knowledge distillation for bert unsupervised domain adaptation. [arXiv:2010.11478]