Skip to content
View james-hadoop's full-sized avatar

Block or report james-hadoop

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.

HTML 142 17 Updated Dec 12, 2025

[ACL 2025 Best Theme Paper] This is the official implementation for the paper: "Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models"

Python 186 14 Updated Aug 29, 2025

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

Python 50,679 4,203 Updated Dec 16, 2025

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 81,289 12,139 Updated Dec 19, 2025

🔥🔥🔥AI-driven database tool and SQL client, The hottest GUI client, supporting MySQL, Oracle, PostgreSQL, DB2, SQL Server, DB2, SQLite, H2, ClickHouse, and more.

Java 24,829 2,705 Updated Sep 12, 2025

🔥 人人可用的开源 BI 工具,数据可视化神器。An open-source BI tool alternative to Tableau.

Java 22,757 3,922 Updated Dec 18, 2025

基于flink的实时流计算web平台

Java 1,861 688 Updated Dec 2, 2025

Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events

C++ 8,733 950 Updated Dec 16, 2025

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code

Java 14,037 4,957 Updated Dec 18, 2025

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance …

Java 11,109 2,234 Updated Dec 19, 2025

MetricFlow allows you to define, build, and maintain metrics in code.

Python 1 Updated Oct 8, 2022

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphic…

Python 1 Updated Apr 27, 2022

Apache DolphinScheduler is a distributed and extensible workflow scheduler platform with powerful DAG visual interfaces, dedicated to solving complex job dependencies in the data pipeline and provi…

Java 1 Updated Jul 10, 2023

MetricFlow allows you to define, build, and maintain metrics in code.

Python 1,416 136 Updated Dec 18, 2025

An easy to use, self-service open BI reporting and BI dashboard platform.

JavaScript 3,091 1,160 Updated Dec 6, 2025
JavaScript 1 Updated May 26, 2017
Jupyter Notebook 1 Updated Oct 12, 2023

spring boot 实践学习案例,是 spring boot 初学者及核心技术巩固的最佳实践。

Java 1 Updated Jun 28, 2017
JavaScript 1 Updated Mar 26, 2018
Java 1 Updated Sep 11, 2016
Scala 1 Updated Dec 29, 2016
Scala 1 Updated Feb 25, 2019

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discr…

Python 8,548 1,389 Updated Oct 14, 2025