-
Xebia Data
- Amsterdam
- https://www.linkedin.com/in/daniel-tom-data-engineer/
Stars
A repo for all spark examples using Rapids Accelerator including ETL, ML/DL, etc.
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
A reactive notebook for Python — run reproducible experiments, query with SQL, execute as a script, deploy as an app, and version with git. All in a modern, AI-native editor.
A native Rust library for Delta Lake, with bindings into Python
The PySpark Custom Data Source Template makes it easy to build and test custom data sources for Apache PySpark. It simplifies environment setup, debugging, and test data management while providing …
Apache Spark - A unified analytics engine for large-scale data processing
Column-wise type annotations for pyspark DataFrames
dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)
A native Delta implementation for integration with any query engine
Command-line interface to PyPI Stats API to get download stats for Python packages
Example files used in the DuckDB - Unity Catalog blog
Dockerfile for Unity Catalog image
A list of Free Software network services and web applications which can be hosted on your own servers
Testcontainers is a Python library that providing a friendly API to run Docker container. It is designed to create runtime environment to use during your automatic tests.
JupyterLab computational environment.
DuckDB-powered Postgres for high performance apps & analytics.
The dbt-toolkit is an early-stage plugin designed to enhance your experience working with dbt-core projects in JetBrains IDEs.
Apache Polaris, the interoperable, open source catalog for Apache Iceberg
A library to convert a pydantic model to a pyarrow schema