Skip to content
View AndreaBozzo's full-sized avatar
:octocat:
:octocat:

Block or report AndreaBozzo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
AndreaBozzo/README.md

πŸ‘‹ Andrea Bozzo

Data Engineer Chronicles - A day in the life

Data Engineer | Software Developer | Analytics Architect
Hi, I'm Andrea, usually trying to not set the database on fire while building scalable data solutions. In my spare time I explore systems programming with Rust & Go. (Coffee consumption not to scale)

🌐 Landing Page β€’ πŸ“ Blog β€’ πŸ’Ό LinkedIn β€’ πŸ“§ Email

profile views


πŸ› οΈ Tech Stack

Languages
Rust Go Python JavaScript

Data Engineering & Databases
Apache Spark Apache Kafka DuckDB PostgreSQL MongoDB Redis

Analytics & BI
Power BI Databricks Apache Superset RisingWave

Cloud & DevOps
Docker Kubernetes AWS Azure GitHub Actions


πŸš€ Featured Project

Fast, lightweight data profiling library built in Rust with Python bindings.

PyPI Downloads Crates.io Downloads GitHub Stars

A high-performance CLI tool and library designed for data engineers to profile datasets locally without sending data to external servers.

  • πŸ”₯ Performance: Written in Rust using Apache Arrow for memory efficiency.
  • 🐍 Python Integration: Full Python bindings via PyO3 for seamless integration in notebooks and pipelines.
  • 🏭 Production Ready: Over 90k+ downloads across platforms, widely used in CI/CD pipelines for automated data quality checks.
  • πŸ”’ Privacy First: Zero telemetry, 100% local execution.

🌟 Open Source Contributions

Contributing to the broader open source ecosystem beyond my own projects.

πŸ€– This section is automatically updated daily via GitHub Actions

  • risingwavelabs/risingwave ⭐ 8532 - 1 merged PR
    • Streaming data platform. Real-time stream processing, low-latency serving, and Iceberg table management.
  • datapizza-labs/datapizza-ai ⭐ 1996 - 3 merged PRs
    • Build reliable Gen AI solutions without overhead πŸ•
  • mariocandela/beelzebub ⭐ 1693 - 1 merged PR
    • A secure low code honeypot framework, leveraging AI for System Virtualization.
  • lakekeeper/lakekeeper ⭐ 1036 - 1 merged PR
    • Lakekeeper is an Apache-Licensed, secure, fast and easy to use Apache Iceberg REST Catalog written in Rust.
  • italia-opensource/awesome-italia-opensource ⭐ 310 - 1 merged PR
    • Italian Open-Source is the first platform dedicated to Italian open-source world
  • CortexFlow/CortexBrain ⭐ 67 - 3 merged PRs
    • CortexBrain is an ambitious open-source project created by CortexFlow, aiming to develop an intelligent, lightweight, and efficient service mesh architecture that seamlessly connects cloud and edge devices
  • piopy/fantacalcio-py ⭐ 41 - 3 merged PRs
    • Piccolo tool per guidarci all'asta spendendo poco
  • informagico/fantavibe ⭐ 3 - 1 merged PR
  • rust-ita/rust-docs-it ⭐ 2 - 1 merged PR
    • Documentazione Rust tradotta in italiano

πŸ“Š GitHub Stats

GitHub Stats Top Languages

GitHub Streak

Contribution Graph


πŸ’‘ Currently

  • πŸ”­ Working on: Building high-performance data pipelines with Rust
  • 🌱 Learning: Advanced systems programming and distributed computing patterns
  • πŸ‘― Looking to collaborate on: Data engineering projects, Python/Rust/Go libraries, open source tools
  • πŸ’¬ Ask me about: Data pipelines, ETL design, Rust best practices, system architecture
  • ⚑ Fun fact: I debug code faster after the third espresso β˜•

🀝 Let's Connect

LinkedIn β€’ Email β€’ GitHub β€’ πŸ’Ž Sponsor

Open to: Consulting on data engineering β€’ Open source collaborations β€’ Interesting data challenges β€’ Python, Rust & Go projects


Pinned Loading

  1. dataprof dataprof Public

    Fast tool for data quality checks.

    Rust 8 1

  2. Osservatorio Osservatorio Public

    Osservatorio - Open Data Processing Platform ( WIP)

    Python 5 5

  3. dce dce Public

    Data Contracts Engine for modern Data platforms. Define, validate, and enforce data quality contracts across multiple formats and cloud providers.

    Rust

  4. LakehouseStarterKit LakehouseStarterKit Public

    Lakehouse starter kit for small teams (1-5). Extract data with dlt, transform with dbt, visualize with Superset. S3-compatible storage with MinIO. Easy to scale.

    Python

  5. rust-ita/rust-docs-it rust-ita/rust-docs-it Public

    Documentazione Rust tradotta in italiano

    Shell 2 1

  6. go-lab go-lab Public

    Sunday project on a Go API gateway

    Go