- Seoul, SouthKorea
- linktr.ee/jlstdio
- https://orcid.org/0009-0001-0112-8503
Highlights
- Pro
Starred repositories
LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.
ESC-50: Dataset for Environmental Sound Classification
The definitive Web UI for local AI, with powerful features and easy setup.
Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
Official code for "Interpretable Language Modeling via Induction-head Ngram Models"
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
Cosmos-Transfer1 is a world-to-world transfer model designed to bridge the perceptual divide between simulated and real-world environments.
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
A toolkit for developing and comparing reinforcement learning algorithms.
[RA-L 2025] FrontierNet: Learning Visual Cues to Explore
ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings. NeurIPS 2022
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
The repository provides code associated with the paper VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation (ICRA 2024)
Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities
Slam Toolbox for lifelong mapping and localization in potentially massive maps with ROS
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
MichalZawalski / embodied-CoT
Forked from openvla/openvlaEmbodied Chain of Thought: A robotic policy that reason to solve the task.
A simple, thread-safe way of executing actions (Such as UI manipulations) on the Unity Main Thread