Skip to content
View abcoep's full-sized avatar
πŸ“—
Learning
πŸ“—
Learning

Block or report abcoep

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

🐘 Hadoop

24 repositories

winutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows

Shell 2,201 2,318 Updated May 16, 2024

Apache Hive

Java 5,953 4,788 Updated Dec 23, 2025

Apache HBase

Java 5,534 3,384 Updated Dec 22, 2025

Example source code accompanying O'Reilly's "Hadoop: The Definitive Guide" by Tom White

Makefile 3,506 2,552 Updated Mar 17, 2020

Apache ZooKeeper

Java 12,692 7,328 Updated Dec 19, 2025

Apache Mesos

C++ 5,355 1,670 Updated Aug 23, 2024

The official home of the Presto distributed SQL query engine for big data

Java 16,601 5,512 Updated Dec 23, 2025

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Java 12,329 3,425 Updated Dec 24, 2025

Apache Iceberg

Java 8,354 2,934 Updated Dec 24, 2025

Apache Parquet Format

Thrift 2,153 461 Updated Dec 19, 2025

Upserts, Deletes And Incremental Processing on Big Data.

Java 6,048 2,454 Updated Dec 24, 2025

Apache Phoenix

Java 1,050 1,012 Updated Dec 22, 2025

Scalable, reliable, distributed storage system optimized for data analytics and object store workloads.

Java 1,114 588 Updated Dec 23, 2025

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Python 18,611 2,446 Updated May 16, 2025

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

C++ 16,303 3,957 Updated Dec 23, 2025

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

Python 20,321 4,974 Updated Dec 24, 2025

Pentaho Data Integration ( ETL ) a.k.a Kettle

Java 8,265 3,573 Updated Dec 23, 2025

Apache Atlas - Open Metadata Management and Governance capabilities across the Hadoop platform and beyond

Java 2,047 900 Updated Dec 12, 2025

Apache Ranger - To enable, monitor and manage comprehensive data security across the Hadoop platform and beyond

Java 1,017 1,045 Updated Dec 23, 2025

A composable and fully extensible C++ execution engine library for data management systems.

C++ 3,995 1,419 Updated Dec 24, 2025

Apache DataFusion Comet Spark Accelerator

Scala 1,088 258 Updated Dec 24, 2025

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.

Scala 1,491 554 Updated Dec 24, 2025

A cross platform way to express data transformation, relational algebra, standardized record expression and plans.

Python 1,442 189 Updated Dec 21, 2025

Apache DataFusion SQL Query Engine

Rust 8,187 1,835 Updated Dec 24, 2025