Agri-Unit Data Pipeline

This project demonstrates a data pipeline built to ingest, process, and store agricultural (Agreste) and weather (OpenWeather) data. It showcases a modern, scalable approach for handling diverse data streams.

Project Highlights

Go (Golang): Chosen for its performance, efficiency, and suitability for building robust data ingestion and transformation services.
Docker: All services are containerized using Docker, ensuring consistent development and deployment environments.
PostgreSQL: Serves as the data warehouse for storing structured agricultural and weather data.
Streamlit: An interactive dashboard for data visualization, enabling fast exploration and insight into agricultural and weather trends.
CI Integration: Containers are automatically built and released on each merge to the main branch via a continuous integration pipeline (GitHub Actions).

What It Does

The pipeline performs the following key functions:

Ingests Agreste Data: Processes agricultural production data, anonymizes farm locations using random geographic coordinates, and filters for cereal production.
Ingests OpenWeather Data: Fetches and processes real-time weather information.
Transforms & Stores: Cleans and loads raw data into a PostgreSQL data warehouse for analysis.
Data Visualization: Provides an intuitive Streamlit UI to view farm weather on a map, explore weather history, and analyze cereal yield by farm.

Getting Started

To launch all pipeline services (ingestion, database, and Streamlit interface), follow these steps:

Ensure Docker and Docker Compose are installed on your machine.
Navigate to the deploy/ directory at the root of the project:
```
cd deploy/
```
Launch all services using Docker Compose:
```
docker compose up
```

Once services are running:

Access the Streamlit app via http://localhost:8501.
The Agreste and OpenWeather ingestion services run in the background and are ready to be triggered.

Automated Scheduling with Cron

The pipeline includes cron jobs to automatically trigger ingestion and transformation tasks at regular intervals, ensuring up-to-date data without manual intervention.

Manually Triggering Ingestion (Optional)

You can also manually trigger the ingestion processes for Agreste and OpenWeather data using curl:

Trigger Agreste (Rica) data ingestion:

curl -X POST http://localhost:8080/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "zipUrl": "https://agreste.agriculture.gouv.fr/agreste-web/download/service/SV-Accès micro données RICA/RicaMicrodonnées2023_v2.zip",
    "csvFileName": "Rica_France_micro_Donnees_ex2023.csv"
  }'

Trigger OpenWeather data ingestion:

curl -X POST http://localhost:8081/ingest

Accessing the PostgreSQL Database

To connect directly to the PostgreSQL instance running in Docker:

Find the container name or ID:
```
docker ps
```

Connect to the database using psql:

docker exec -it <postgresql_container_name_or_id> psql -U postgres -d mydatabase

🔁 Continuous Integration & Deployment

Every time code is merged into the main branch:

Containers are automatically built and published using CI (GitHub Actions).
The latest version of each service is deployed consistently, ensuring fast iteration and reliable updates across environments.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
deploy		deploy
services		services
sh		sh
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Agri-Unit Data Pipeline

Project Highlights

What It Does

Getting Started

Automated Scheduling with Cron

Manually Triggering Ingestion (Optional)

Accessing the PostgreSQL Database

🔁 Continuous Integration & Deployment

About

Uh oh!

Releases

Packages

Languages

etidahouse/agri-unit

Folders and files

Latest commit

History

Repository files navigation

Agri-Unit Data Pipeline

Project Highlights

What It Does

Getting Started

Automated Scheduling with Cron

Manually Triggering Ingestion (Optional)

Accessing the PostgreSQL Database

🔁 Continuous Integration & Deployment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages