GitHub - databricks-community/databricks-brazilian-ecommerce-olist: This project analyzes the Brazilian E-Commerce dataset from Olist, covering 100,000+ orders from 2016-2018. It includes batch and streaming data pipelines with PostgreSQL, NiFi, Snowflake, dbt, Debezium, Kafka, Flink, Cassandra, Power BI, and Grafana to deliver end-to-end analytics on sales, payments, and customer behavior.

⚡ Architecture Overview

1. Batch Layer

Source: CSV datasets from kaggle loaded into PostgreSQL
Ingestion: NiFi reads data from PostgreSQL and writes it to S3
Storage & Modeling: Snowflake points to the S3 data for batch layer, and dbt manages transformations, modeling, and lineage tracking
Visualization: Power BI dashboards through snowflake connection.

2. Streaming Layer

Source: PostgreSQL CDC captured using Debezium by mimicing the transactional data into the postgres
Pipeline:
1. Debezium publishes CDC events to Kafka raw topic (olist.order_payments)
2. Raw Kafka events optionally persisted to S3 for auditing
3. Flink processes raw Kafka topics, performs transformations, and outputs to Kafka transformed topics (olist_payments_aggregated_windowed) and (olist_payments_installments_windowed)
4. Transformed data stored in Cassandra for real-time queries
5. Grafana dashboards provide real-time monitoring and metrics visualization

🏗️ Project Architecture

📊 Data Modeling & Lineage

dbt manages:
- Bronze (raw) → Silver (Staging) → Gold (Dims, Facts and Marts)
- Lineage tracking
- Fact and dimension models

🔄 Data Flows

NiFi Flow (Batch ingestion pipeline)

S3 Storage Structure

Batch:
Stream:

📈 Dashboards

Power BI

Grafana

⚙ Technologies Used

Layer	Tool/Technology
Data Ingestion	NiFi, Debezium
Messaging & Streaming	Kafka, Flink
Storage	PostgreSQL, S3, Snowflake
Data Modeling	dbt
Real-time Storage	Cassandra
Visualization	Power BI (BI dashboards), Grafana (real-time metrics)
Containerization	Docker

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Images		Images
connect-plugins		connect-plugins
connectors		connectors
data		data
flink		flink
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile.nifi		Dockerfile.nifi
Readme.md		Readme.md
cassandra-init.cql		cassandra-init.cql
docker-compose.yml		docker-compose.yml
postgres-init.sql		postgres-init.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

⚡ Architecture Overview

1. Batch Layer

2. Streaming Layer

🏗️ Project Architecture

📊 Data Modeling & Lineage

🔄 Data Flows

NiFi Flow (Batch ingestion pipeline)

S3 Storage Structure

📈 Dashboards

Power BI

Grafana

⚙ Technologies Used

About

Uh oh!

Releases

Packages

Languages

databricks-community/databricks-brazilian-ecommerce-olist

Folders and files

Latest commit

History

Repository files navigation

⚡ Architecture Overview

1. Batch Layer

2. Streaming Layer

🏗️ Project Architecture

📊 Data Modeling & Lineage

🔄 Data Flows

NiFi Flow (Batch ingestion pipeline)

S3 Storage Structure

📈 Dashboards

Power BI

Grafana

⚙ Technologies Used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages