peadalmeida

Pedro Almeida peadalmeida

Data Engineer @ Inter&Co

54 followers · 53 following

Inter&Co
Belo Horizonte, Brazil
@peadalmeida

Achievements

Stars

spotify / big-data-rosetta-code

Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code

Scala 296 33 Updated Jan 31, 2025

victorouttes / spotify_datalake

Python 22 6 Updated Jun 30, 2024

cordon-thiago / airflow-spark

Docker with Airflow and Spark standalone cluster

Python 262 127 Updated Aug 5, 2023

IcaroBernardes / webdubois

W.E.B. Du Bois Challenge plots

R 19 2 Updated Mar 18, 2024

duckdb / duckdb

DuckDB is an analytical in-process SQL database management system

C++ 35,197 2,828 Updated Jan 6, 2026

DataExpert-io / data-engineer-handbook

This is a repo with links to everything you'd ever want to learn about data engineering

Jupyter Notebook 39,256 7,542 Updated Dec 15, 2025

aws / aws-emr-best-practices

A best practices guide for using AWS EMR. The guide will cover best practices on the topics of cost, performance, security, operational excellence, reliability and application specific best practic…

HTML 109 31 Updated Oct 6, 2025

ayyoubmaul / hadoop-docker

Python 21 19 Updated Mar 11, 2025

audienceproject / spark-dynamodb

Plug-and-play implementation of an Apache Spark custom data source for AWS DynamoDB.

Scala 175 92 Updated Mar 6, 2021

opentracing / opentracing-python

OpenTracing API for Python. 🛑 This library is DEPRECATED! https://github.com/opentracing/specification/issues/163

Python 753 117 Updated Jul 1, 2022

awslabs / deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Scala 3,565 573 Updated Nov 4, 2025

GabrielAmazonas / hudi-on-glue-quick-start

AWS Glue PySpark - Apache Hudi Quick Start Guide

Python 8 4 Updated Jan 17, 2022

jupyter-incubator / sparkmagic

Jupyter magics and kernels for working with remote Spark clusters

Python 1,363 455 Updated Sep 9, 2025

Hiflylabs / awesome-dbt

A curated list of awesome dbt resources

1,614 156 Updated Oct 22, 2025

luizalabs / tutorial-python-brasil

Construindo API's robustas utilizando Python

Python 354 62 Updated Nov 23, 2021

gunnarmorling / awesome-opensource-data-engineering

An Awesome List of Open-Source Data Engineering Projects

2,951 524 Updated Oct 4, 2024

BrasilAPI / BrasilAPI

Vamos transformar o Brasil em uma API?

JavaScript 9,994 687 Updated Dec 22, 2025

alteryx / featuretools

An open source python library for automated feature engineering

Python 7,596 907 Updated Dec 29, 2025

aws-samples / amazon-sagemaker-feature-store-streaming-aggregation

This repo provides an end-to-end example of using streaming feature aggregation with the Amazon SageMaker Feature Store.

Jupyter Notebook 47 14 Updated Jul 14, 2021

treeverse / lakeFS

lakeFS - Data version control for your data lake | Git for data

Go 5,076 420 Updated Jan 6, 2026

papers-we-love / papers-we-love

Papers from the computer science community to read and discuss.

Shell 102,118 6,242 Updated Oct 10, 2025

debezium / debezium

Change data capture for a variety of databases. Please log issues at https://github.com/debezium/dbz/issues.

Java 12,260 2,817 Updated Jan 6, 2026

obsidiandynamics / kafdrop

Kafka Web UI

Java 6,087 887 Updated Jan 5, 2026

firecracker-microvm / firecracker

Secure and fast microVMs for serverless computing.

Rust 31,719 2,198 Updated Jan 6, 2026

awslabs / python-deequ

Python API for Deequ

Jupyter Notebook 808 148 Updated Apr 1, 2025

cassiobotaro / poop

💩 Python Object-Oriented Programming 🐍

Python 37 6 Updated Mar 20, 2024

feast-dev / feast

The Open Source Feature Store for AI/ML

Python 6,589 1,189 Updated Jan 6, 2026

apache / iceberg

Apache Iceberg

Java 8,399 2,951 Updated Jan 6, 2026

palantir / pyspark-style-guide

This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring topics across the PySpark repos we've encountered.

Python 1,207 158 Updated Sep 8, 2025

cluster-apps-on-docker / spark-standalone-cluster-on-docker

Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker. ⚡

Jupyter Notebook 505 200 Updated Nov 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pedro Almeida peadalmeida

Achievements

Achievements

Block or report peadalmeida

Stars

spotify / big-data-rosetta-code

victorouttes / spotify_datalake

cordon-thiago / airflow-spark

IcaroBernardes / webdubois

duckdb / duckdb

DataExpert-io / data-engineer-handbook

aws / aws-emr-best-practices

ayyoubmaul / hadoop-docker

audienceproject / spark-dynamodb

opentracing / opentracing-python

awslabs / deequ

GabrielAmazonas / hudi-on-glue-quick-start

jupyter-incubator / sparkmagic

Hiflylabs / awesome-dbt

luizalabs / tutorial-python-brasil

gunnarmorling / awesome-opensource-data-engineering

BrasilAPI / BrasilAPI

alteryx / featuretools

aws-samples / amazon-sagemaker-feature-store-streaming-aggregation

treeverse / lakeFS

papers-we-love / papers-we-love

debezium / debezium

obsidiandynamics / kafdrop

firecracker-microvm / firecracker

awslabs / python-deequ

cassiobotaro / poop

feast-dev / feast

apache / iceberg

palantir / pyspark-style-guide

cluster-apps-on-docker / spark-standalone-cluster-on-docker