Lists (1)
Sort Name ascending (A-Z)
Stars
Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼
A topic-centric list of HQ open datasets.
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Collection of publicly available IPTV channels from all over the world
Upserts, Deletes And Incremental Processing on Big Data.
A repository of links with advice related to grad school applications, research, phd etc
Code at the speed of thought – Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Example source code accompanying O'Reilly's "Hadoop: The Definitive Guide" by Tom White
A collection of challenge based hack-a-thons including student guide, coach guide, lecture presentations, sample/instructional code and templates. Please visit the What The Hack website at: https:/…
Roadmap to becoming a data engineer in 2021
A curated list of awesome Apache Spark packages and resources.
A beginner-friendly yet powerful Python toolkit for financial analysis and automation — built to make modern investing accessible to everyone
A list of useful resources to learn Data Engineering from scratch
Apache Superset is a Data Visualization and Data Exploration Platform
Learn how to design, develop, deploy and iterate on production-grade ML applications.
Spark: The Definitive Guide's Code Repository
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.