Stars
- All languages
- Assembly
- Batchfile
- Blade
- C
- C#
- C++
- CMake
- CSS
- Clojure
- CoffeeScript
- D
- Dockerfile
- Elixir
- Elm
- Emacs Lisp
- Go
- HCL
- HTML
- Handlebars
- Java
- JavaScript
- Jupyter Notebook
- Kotlin
- Lua
- MDX
- Makefile
- Markdown
- OCaml
- Objective-C
- PHP
- Pony
- PostScript
- Prolog
- Python
- R
- Ruby
- Rust
- SCSS
- SVG
- Scala
- Shell
- Svelte
- Swift
- TeX
- TypeScript
- VBA
- Vim Script
- Vue
- XSLT
Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Docker with Airflow and Spark standalone cluster
DuckDB is an analytical in-process SQL database management system
This is a repo with links to everything you'd ever want to learn about data engineering
A best practices guide for using AWS EMR. The guide will cover best practices on the topics of cost, performance, security, operational excellence, reliability and application specific best practicβ¦
Plug-and-play implementation of an Apache Spark custom data source for AWS DynamoDB.
OpenTracing API for Python. π This library is DEPRECATED! https://github.com/opentracing/specification/issues/163
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
AWS Glue PySpark - Apache Hudi Quick Start Guide
Jupyter magics and kernels for working with remote Spark clusters
Construindo API's robustas utilizando Python
An Awesome List of Open-Source Data Engineering Projects
Vamos transformar o Brasil em uma API?
An open source python library for automated feature engineering
This repo provides an end-to-end example of using streaming feature aggregation with the Amazon SageMaker Feature Store.
lakeFS - Data version control for your data lake | Git for data
Papers from the computer science community to read and discuss.
Change data capture for a variety of databases. Please log issues at https://github.com/debezium/dbz/issues.
Secure and fast microVMs for serverless computing.
This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring topics across the PySpark repos we've encountered.
Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker. β‘