Skip to content
View peadalmeida's full-sized avatar

Block or report peadalmeida

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code

Scala 296 33 Updated Jan 31, 2025

Docker with Airflow and Spark standalone cluster

Python 262 127 Updated Aug 5, 2023

W.E.B. Du Bois Challenge plots

R 19 2 Updated Mar 18, 2024

DuckDB is an analytical in-process SQL database management system

C++ 35,197 2,828 Updated Jan 6, 2026

This is a repo with links to everything you'd ever want to learn about data engineering

Jupyter Notebook 39,256 7,542 Updated Dec 15, 2025

A best practices guide for using AWS EMR. The guide will cover best practices on the topics of cost, performance, security, operational excellence, reliability and application specific best practic…

HTML 109 31 Updated Oct 6, 2025
Python 21 19 Updated Mar 11, 2025

Plug-and-play implementation of an Apache Spark custom data source for AWS DynamoDB.

Scala 175 92 Updated Mar 6, 2021

OpenTracing API for Python. πŸ›‘ This library is DEPRECATED! https://github.com/opentracing/specification/issues/163

Python 753 117 Updated Jul 1, 2022

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Scala 3,565 573 Updated Nov 4, 2025

AWS Glue PySpark - Apache Hudi Quick Start Guide

Python 8 4 Updated Jan 17, 2022

Jupyter magics and kernels for working with remote Spark clusters

Python 1,363 455 Updated Sep 9, 2025

A curated list of awesome dbt resources

1,614 156 Updated Oct 22, 2025

Construindo API's robustas utilizando Python

Python 354 62 Updated Nov 23, 2021

An Awesome List of Open-Source Data Engineering Projects

2,951 524 Updated Oct 4, 2024

Vamos transformar o Brasil em uma API?

JavaScript 9,994 687 Updated Dec 22, 2025

An open source python library for automated feature engineering

Python 7,596 907 Updated Dec 29, 2025

This repo provides an end-to-end example of using streaming feature aggregation with the Amazon SageMaker Feature Store.

Jupyter Notebook 47 14 Updated Jul 14, 2021

lakeFS - Data version control for your data lake | Git for data

Go 5,076 420 Updated Jan 6, 2026

Papers from the computer science community to read and discuss.

Shell 102,118 6,242 Updated Oct 10, 2025

Change data capture for a variety of databases. Please log issues at https://github.com/debezium/dbz/issues.

Java 12,260 2,817 Updated Jan 6, 2026

Kafka Web UI

Java 6,087 887 Updated Jan 5, 2026

Secure and fast microVMs for serverless computing.

Rust 31,719 2,198 Updated Jan 6, 2026

Python API for Deequ

Jupyter Notebook 808 148 Updated Apr 1, 2025

πŸ’© Python Object-Oriented Programming 🐍

Python 37 6 Updated Mar 20, 2024

The Open Source Feature Store for AI/ML

Python 6,589 1,189 Updated Jan 6, 2026

Apache Iceberg

Java 8,399 2,951 Updated Jan 6, 2026

This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring topics across the PySpark repos we've encountered.

Python 1,207 158 Updated Sep 8, 2025

Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker. ⚑

Jupyter Notebook 505 200 Updated Nov 7, 2025
Next