Skip to content

hvdv99/bikeshop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BADS Bike Shop

Table of Contents
  1. About The Project
  2. Project Structure
  3. GCP Architecture
  4. Authors

About the project

This project encompasses the development of sophisticated data architecture and processing pipelines for BADS Bike Shop, a fictional entity specializing in bicycle sales and rentals. We've leveraged a dataset, encompassing transactional data from Kaggle, customer data from Mockaroo, and simulated GPS and battery data for rental bikes. Our approach includes two key pipelines: a batch pipeline for analytical insights and a stream pipeline for real-time operational monitoring. We've implemented these using Google Cloud's BigQuery and Dataproc services, creating two dashboards - the BI & KYC Dashboard for customer demographics and the Operations Dashboard for real-time bike tracking.

Built With

Python 3.10 PySpark 3.3.2

Project Structure

├───batch
│   ├───cleaned
│   ├───data
│   └───integration
├───pipelines
└───stream
    ├───data
    ├───kafka
    ├───notebooks
    └───producer

Batch Processing

  • Spark Notebooks: A collection of Jupyter notebooks containing Spark programs for data processing. This includes:
  • Data: Contains datasets used for completeness.

CI/CD Pipelines

Contains a ci/cd pipeline that automates the conversion of Jupyter notebooks (.ipynb) into Python scripts (.py) and subsequently uploads them to a cloud repository.

Stream Processing

  • Data: Holds datasets that simulate streaming data for completeness and testing purposes.
  • Kafka: Contains a docker-compose file to set up a Kafka consumer environment.
  • Notebooks: A Spark program designed to process data incoming from the Kafka stream.
  • Producer: A Python program that simulates GPS stream data, effectively acting as a stream data producer from a laptop.

GCP Architecture

image

Slides for demo

Go to Google Slides.

Authors

  • Andy Huang
  • Huub van de Voort
  • Oumaima Lemhour
  • Roman Nekrasov
  • Tom Teurlings

About

Batch and streaming pipelines for the Data Engineering course at JADS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published