Anton Shiryaev medphisiker

👋 Hi, I'm Anton Shiryaev

Deep Learning Engineer (CV, LLM & VLM)

| Master’s Student @ ITMO AI Talent Hub
| 🎓 ex-Researcher @ Russian Academy of Sciences
📍 Vladivostok, Russia (GMT+10)

I build end-to-end ML systems — from data collection to production microservices — with a focus on real-time computer vision and multimodal document understanding using Vision-Language Models (VLMs).

📊 GitHub Stats

🧠 What I Do

🔍 Computer Vision: Object detection, segmentation, tracking (YOLO, SORT, Ultralytics) for robotics & industrial automation
📄 Multimodal AI: Building document processing pipelines with Qwen2.5-VL, vLLM, and Arize Phoenix
🏗️ Full ML Lifecycle: Data annotation → Training → Serving → Monitoring → Human-in-the-loop feedback
🌐 Open Source: Lead developer of VLMHyperBench — benchmark for VLMs on Russian documents
🎓 Academic Roots: 8+ years in scientific research (acoustics, signal processing) — published in peer-reviewed journals (Scopus, Web of Science, eLIBRARY)

🛠️ Tech Stack

Category	Tools & Frameworks
Languages	Python
DL / ML	PyTorch, Lightning, Hugging Face, vLLM, Unsloth, LightAutoML, CatBoost
Multimodal	Qwen-VL, VLMEvalKit, LangChain, FAISS, Arize Phoenix
MLOps	MLflow, Weights & Biases, TensorBoard, Prefect, ONNX, TensorRT
DevOps	Docker (+NVIDIA), Git, uv, poetry, Docker Compose, Git SubmVodules
Data & CV	OpenCV, PIL, kornia, FiftyOne, CVAT, Label Studio, MinIO, RabbitMQ

🚀 Featured Projects

Only 3 projects are presented below, the rest can be found at the link bellow:

🔍 View all projects by category (CV / NLP / Multimodal)

😃 Audio-Visual Emotion Recognition

Multimodal system for real-time emotion detection in Zoom/Skype calls.
🥇 1st place in ODS MLOps course • Demo video with live inference

📊 VLMHyperBench

Open benchmark for evaluating Vision-Language Models on Russian documents.
🏆 Winner of Yandex Open Source Grant 2025 • Presented at Data Fest 2025

📑 Document Processing with Qwen2.5-VL

Production service for extracting structured data from Russian documents using multimodal LLMs.
🔁 Human-in-the-loop feedback • Built with vLLM, MinIO, RabbitMQ, Arize Phoenix

📚 Background & Achievements

Master’s in AI, ITMO University (AI Talent Hub) — courses in Multimodal Models, ML System Design, Model Compression
PhD in Physics & Mathematics (Acoustics, RAS) — 8+ years in research, Scopus/WoS publications
🏆 Yandex Open Source Grant 2025
🥇 1st place, ODS MLOps Track
🥉 Top-3, AI Talent Hackathon 2023
🎓 Selectel Career Wave Scholarship (2023, 2024)

📬 Let’s Connect!

📧 Email: [email protected]
📄 Full CV: Google Drive

💡 "All the most interesting things happen at the intersection of fields."
— From acoustics to multimodal AI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly