Skip to content
View james-hadoop's full-sized avatar

Block or report james-hadoop

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Tools for OpenDataArena: Fair, Open, and Transparent Arena for Data

Python 97 11 Updated Dec 23, 2025

MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.

HTML 150 18 Updated Dec 25, 2025

[ACL 2025 Best Theme Paper] This is the official implementation for the paper: "Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models"

Python 187 14 Updated Aug 29, 2025

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

Python 51,108 4,243 Updated Dec 24, 2025

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 81,790 12,243 Updated Dec 21, 2025

AI-driven database tool and SQL client, The hottest GUI client, supporting MySQL, Oracle, PostgreSQL, DB2, SQL Server, DB2, SQLite, H2, ClickHouse, and more.

Java 24,866 2,713 Updated Dec 19, 2025

🔥 人人可用的开源 BI 工具,数据可视化神器。An open-source BI tool alternative to Tableau.

Java 22,971 3,947 Updated Dec 25, 2025

基于flink的实时流计算web平台

Java 1,862 688 Updated Dec 2, 2025

Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events

C++ 8,753 957 Updated Dec 23, 2025

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code

Java 14,053 4,966 Updated Dec 22, 2025

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance …

Java 11,152 2,240 Updated Dec 27, 2025

MetricFlow allows you to define, build, and maintain metrics in code.

Python 1 Updated Oct 8, 2022

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphic…

Python 1 Updated Apr 27, 2022

Apache DolphinScheduler is a distributed and extensible workflow scheduler platform with powerful DAG visual interfaces, dedicated to solving complex job dependencies in the data pipeline and provi…

Java 1 Updated Jul 10, 2023

MetricFlow allows you to define, build, and maintain metrics in code.

Python 1,427 138 Updated Dec 23, 2025

An easy to use, self-service open BI reporting and BI dashboard platform.

JavaScript 3,096 1,159 Updated Dec 6, 2025
JavaScript 1 Updated May 26, 2017
Jupyter Notebook 1 Updated Oct 12, 2023

spring boot 实践学习案例,是 spring boot 初学者及核心技术巩固的最佳实践。

Java 1 Updated Jun 28, 2017
JavaScript 1 Updated Mar 26, 2018
Java 1 Updated Sep 11, 2016
Scala 1 Updated Dec 29, 2016
Scala 1 Updated Feb 25, 2019

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discr…

Python 8,568 1,391 Updated Oct 14, 2025