Skip to content
View medphisiker's full-sized avatar

Block or report medphisiker

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
medphisiker/README.md

👋 Hi, I'm Anton Shiryaev

Deep Learning Engineer (CV, LLM & VLM)

| Master’s Student @ ITMO AI Talent Hub
| 🎓 ex-Researcher @ Russian Academy of Sciences
📍 Vladivostok, Russia (GMT+10)

I build end-to-end ML systems — from data collection to production microservices — with a focus on real-time computer vision and multimodal document understanding using Vision-Language Models (VLMs).


📊 GitHub Stats


🧠 What I Do

  • 🔍 Computer Vision: Object detection, segmentation, tracking (YOLO, SORT, Ultralytics) for robotics & industrial automation
  • 📄 Multimodal AI: Building document processing pipelines with Qwen2.5-VL, vLLM, and Arize Phoenix
  • 🏗️ Full ML Lifecycle: Data annotation → Training → Serving → Monitoring → Human-in-the-loop feedback
  • 🌐 Open Source: Lead developer of VLMHyperBench — benchmark for VLMs on Russian documents
  • 🎓 Academic Roots: 8+ years in scientific research (acoustics, signal processing) — published in peer-reviewed journals (Scopus, Web of Science, eLIBRARY)

🛠️ Tech Stack

Category Tools & Frameworks
Languages Python
DL / ML PyTorch, Lightning, Hugging Face, vLLM, Unsloth, LightAutoML, CatBoost
Multimodal Qwen-VL, VLMEvalKit, LangChain, FAISS, Arize Phoenix
MLOps MLflow, Weights & Biases, TensorBoard, Prefect, ONNX, TensorRT
DevOps Docker (+NVIDIA), Git, uv, poetry, Docker Compose, Git SubmVodules
Data & CV OpenCV, PIL, kornia, FiftyOne, CVAT, Label Studio, MinIO, RabbitMQ

🚀 Featured Projects

Only 3 projects are presented below, the rest can be found at the link bellow:

Multimodal system for real-time emotion detection in Zoom/Skype calls.
🥇 1st place in ODS MLOps course • Demo video with live inference

Open benchmark for evaluating Vision-Language Models on Russian documents.
🏆 Winner of Yandex Open Source Grant 2025 • Presented at Data Fest 2025

📑 Document Processing with Qwen2.5-VL

Production service for extracting structured data from Russian documents using multimodal LLMs.
🔁 Human-in-the-loop feedback • Built with vLLM, MinIO, RabbitMQ, Arize Phoenix

📚 Background & Achievements

  • Master’s in AI, ITMO University (AI Talent Hub) — courses in Multimodal Models, ML System Design, Model Compression
  • PhD in Physics & Mathematics (Acoustics, RAS) — 8+ years in research, Scopus/WoS publications
  • 🏆 Yandex Open Source Grant 2025
  • 🥇 1st place, ODS MLOps Track
  • 🥉 Top-3, AI Talent Hackathon 2023
  • 🎓 Selectel Career Wave Scholarship (2023, 2024)

📬 Let’s Connect!

Telegram GitHub GitLab Kaggle Medium LeetCode

📧 Email: [email protected]
📄 Full CV: Google Drive

💡 "All the most interesting things happen at the intersection of fields."
— From acoustics to multimodal AI.

Popular repositories Loading

  1. docker_yolov8 docker_yolov8 Public

    Jupyter Notebook 3

  2. obsidian_demo obsidian_demo Public

    JavaScript 3

  3. drivers_helper drivers_helper Public

    Прототип помощника для водителей, который будет оповещать их о дорожных знаках.

    Jupyter Notebook 2

  4. jou_mask_rcnn jou_mask_rcnn Public

    Тестовое задание 1 на вакансию CV Engineer в компанию "Русагро_технологии".

    Jupyter Notebook 1

  5. yolo_VIKA yolo_VIKA Public

    Jupyter Notebook 1 1

  6. maching_cv_and_vacancy maching_cv_and_vacancy Public

    Jupyter Notebook 1