Stars
verl: Volcano Engine Reinforcement Learning for LLMs
Building a comprehensive and handy list of papers for GUI agents
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
VIP cheatsheet for Stanford's CME 295 Transformers and Large Language Models
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and cont…
ScreenCoder — Turn any UI screenshot into clean, editable HTML/CSS with full control. Fast, accurate, and easy to customize.
FULL Augment Code, Claude Code, Cluely, CodeBuddy, Comet, Cursor, Devin AI, Junie, Kiro, Leap.new, Lovable, Manus, NotionAI, Orchids.app, Perplexity, Poke, Qoder, Replit, Same.dev, Trae, Traycer AI…
Vibetest MCP - automated QA testing using Browser-Use agents
C++ implementation of a ScienceDirect paper "An accelerating cpu-based correlation-based image alignment for real-time automatic optical inspection"
An open-source audio wake word (or phrase) detection framework with a focus on performance and simplicity.
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
Run Segment Anything Model 2 on a live video stream
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Datasets on Website Aesthetics for Machine Learning
Android in docker solution with noVNC supported and video recording
A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
JohannesBuchner / imagehash
Forked from bunchesofdonald/photohashA Python Perceptual Image Hashing Module
Pretty good call graphs for dynamic languages
MagentaA11y is a tool built to simplify the process of accessibility testing.
Google play scraper for Python inspired by <facundoolano/google-play-scraper>
Node.js scraper to get data from Google Play
Consists of ~500k human annotations on the RICO dataset identifying various icons based on their shapes and semantics, and associations between selected general UI elements and their text labels. A…
142,416 structured images for icon classification and recognition
Code released for our CHI2023 paper "UEyes: Understanding Visual Saliency across User Interface Types"