-
MarkAny
- Seoul, Korea
- https://www.linkedin.com/in/yonghye-kwon-91641a174/
Lists (31)
Sort Name ascending (A-Z)
6d-pose-estimation
Action Recognition
Agentic-AI
Backbone
cmake
Color Recognition
Crawling
CS-STUDY
cuda
Detection
DETR
Faster
ffmpeg
Human-Detection-Dataset
image-dewarping
llm
media-processing
mini
OCR
OpenDataset
polygon-estimation
Production
Productivity
QA
REID
STT
Tracking
vit-lora
VLM
Youtube
자기계발
Stars
Official Implementation of Upsample Anything: A Simple and Hard to Beat Baseline for Feature Upsampling
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
DaD's a pretty good keypoint detector, probably the best.
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
[ICCV 2025] OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Collection of leaked system prompts
FULL Augment Code, Claude Code, Cluely, CodeBuddy, Comet, Cursor, Devin AI, Junie, Kiro, Leap.new, Lovable, Manus Agent Tools, NotionAI, Orchids.app, Perplexity, Poke, Qoder, Replit, Same.dev, Trae…
MCP integration for Google Calendar to manage events.
Integrating SAM2 with DINOv2/v3 for segmentation
한국어 문장 임베딩 모델들의 성능을 비교하고 시각화하는 프로젝트입니다. 본 프로젝트는 Claude Opus 4로 구현되었습니다.
Frontier Multimodal Foundation Models for Image and Video Understanding
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Benchmarking vision language vision on face tasks
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
[CVPR 2025 Highlight] Official code and models for Encoder-only Mask Transformer (EoMT).
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
An open-source AI agent that brings the power of Gemini directly into your terminal.
PyTorch native quantization and sparsity for training and inference
Official implementation of "Towards Efficient Visual Adaption via Structural Re-parameterization".
InceptionNeXt: When Inception Meets ConvNeXt (CVPR 2024)
[ICLR 2023] "More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity"; [ICML 2023] "Are Large Kernels Better Teachers than Transformers for ConvNets?"
[CVPR2025] Official code for Lost in Translation Found in Context
Video-based AI memory library. Store millions of text chunks in MP4 files with lightning-fast semantic search. No database needed.
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
❄️🔥 Visual Prompt Tuning [ECCV 2022] https://arxiv.org/abs/2203.12119