Skip to content
View lygjwy's full-sized avatar
👨‍💻
👨‍💻

Organizations

@IIP-NJU

Block or report lygjwy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Reference PyTorch implementation and models for DINOv3

Jupyter Notebook 8,047 533 Updated Oct 29, 2025

MTEB: Massive Text Embedding Benchmark

Python 2,943 495 Updated Oct 30, 2025

Awesome Unified Multimodal Models

840 25 Updated Aug 17, 2025

[BMVC2023] Official code for TEMI: Exploring the Limits of Deep Image Clustering using Pretrained Models

Python 28 2 Updated Nov 21, 2023

Scalable data pre processing and curation toolkit for LLMs

Python 1,196 185 Updated Nov 1, 2025

[NeurIPS 2025 Spotlight] Towards Understanding Camera Motions in Any Video

HTML 243 33 Updated Oct 23, 2025

Easy Data Preparation with latest LLMs-based Operators and Pipelines.

Python 5 Updated Jun 29, 2025

This is a repo with links to everything you'd ever want to learn about data engineering

Jupyter Notebook 38,442 7,387 Updated Oct 29, 2025

Automatically crawl arXiv papers daily and summarize them using AI. Illustrating them using GitHub Pages.

JavaScript 1,994 631 Updated Oct 31, 2025

Official Repository of "LLM × DATA" Survey Paper

525 52 Updated Oct 28, 2025

Easy Data Preparation with latest LLMs-based Operators and Pipelines.

Python 1,435 97 Updated Oct 31, 2025

🔥[VLDB'26] Official repository for the paper "LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning".

Python 80 6 Updated Jun 3, 2025

Fantastic Data Engineering for Large Language Models

91 4 Updated Dec 29, 2024

Best Papers of Top Venues like CVPR, NeurIPS, ICLR, ICML, ICCV, ECCV, ...

208 12 Updated Oct 31, 2025

A Library for Advanced Deep Time Series Models for General Time Series Analysis.

Python 10,442 1,680 Updated Oct 29, 2025

🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)

JavaScript 6,663 667 Updated Jul 4, 2025

Simplistic mobile RSS client built with Flutter

Dart 1,598 99 Updated Apr 14, 2024

[ICML 2024] Selecting High-Quality Data for Training Language Models

Python 192 14 Updated Jun 20, 2024
Python 373 48 Updated Oct 31, 2025

The official repository for the NLP-KG web application [ACL 2024 Demo].

TypeScript 13 2 Updated Oct 16, 2025

[ICCV 2023 Oral] IOMatch: Simplifying Open-Set Semi-Supervised Learning with Joint Inliers and Outliers Utilization

Jupyter Notebook 53 3 Updated Jan 28, 2024

[ICML'24] Mitigating Privacy Risk in Membership Inference by Convex-Concave Loss

Python 10 Updated Jun 3, 2024

The Open-Source Data Annotation Platform

TypeScript 946 104 Updated Feb 19, 2025

PyTorch native post-training library

Python 5,570 679 Updated Oct 31, 2025

A Survey on Data Selection for Language Models

252 15 Updated Apr 29, 2025

Summarize existing representative LLMs text datasets.

1,377 136 Updated Oct 11, 2025

[ICML'24] Open-Vocabulary Calibration for Fine-tuned CLIP

Python 15 2 Updated Jun 14, 2024

Train transformer language models with reinforcement learning.

Python 16,098 2,261 Updated Nov 1, 2025
Next