Datapackage Pipelines Plus Plus

Enhance the datapackage-pipelines framework with some more opinionated architecture and easy to use code

Features

Object oriented processors - allows for extensibility and code re-use.
Docker compose stack - allows for a complex architecture which is easy to setup and ensured to be the same across environments.
Custom processors with additional functionality.

Installation

The common usage is to start a new code repository which will use the pipelines plus plus framework.

Run the following command from within the new (or exising) project directory:

source <(curl -s https://raw.githubusercontent.com/OriHoch/datapackage-pipelines-plus-plus/master/bin/init-pipelines-plus-plus.sh)

This creates the necessary files with an example pipeline which should get you started quickly.

You should edit the README.md file and keep only the parts after the Usage header

Usage

Following guide can be used from within any pipelines plus plus compatible project.

Only pre-requisite is to have Docker and Docker Compose installed - refer to Docker documentation for details.

Starting the services

Start the docker compose services in the background by running:

bin/start.sh

If you encounter any problems, check you docker-compose.override.yml file

The docker-compose.override.yml file contains the docker compose stack and services configuration
The initialization script creates it as a copy of docker-compose.override.example.yml file

Running and debugging pipelines

Once the docker services are running you should be able to see the pipelines dashboard at:

http://localhost:5000

The dashboard shows the status of the pipelines which are configured in the plus_plus.source-spec.yaml file

Usual workflow for debugging a failing pipeline:

View pipelines logs
- docker-compose logs pipelines
Restart pipelines service - will re-run failed pipelines
- docker-compose restart pipelines
View the list of available pipelines:
- bin/dpp.sh
Run a pipeline manually
- bin/dpp.sh run ./pipeline/id

The default pipeline environment provides the following services:

Data files - should appear on the host under data/ directory
Kibana - http://localhost:15601
- Uses Elasticsearch on http://localhost:19200
Adminer (DB admin web ui) - http://localhost:18080
- Can be configured to connect to PostgreSQL - postgresql://postgres:123456@datadb:5432/postgres

Edit the docker-compose.override.yaml to enable additional services like Redash or edit the stack configuration.

Advanced Usage

Running locally - without Docker

The only prerequisite is to be inside an activated Python 3.6 virtualenv

From inside the virtualenv, run:

bin/install.sh

This should install the required dependencies

To run the pipelines CLI you need to setup some environment variables

If the default docker stack is started, you can run:

source .env.example

Otherwise, copy the file to .env, modify the environment variables

Once you set the environment variables, and ensured services are running, you can run the dpp cli directly:

dpp
dpp run ./pipeline/id

Updating the pipelines plus plus framework files

If your project's code is committed to git, you can re-run bin/init-pipelines-plus-plus.sh

(If there is an update to the init-pipelines script itself - run it twice)

This will overwrite your files, and you can check with git what are the changes and decide how to merge them.

The only file it will not overwrite is the docker-compose.override.yml file - which you must merge manually.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
bin		bin
data/datapackage_pipelines_plus_plus_noise		data/datapackage_pipelines_plus_plus_noise
datapackage_pipelines_plus_plus		datapackage_pipelines_plus_plus
devops/letsencrypt		devops/letsencrypt
elasticsearch		elasticsearch
kibana		kibana
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.override.example.yml		docker-compose.override.example.yml
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
plus_plus.source-spec.override.example.yaml		plus_plus.source-spec.override.example.yaml
plus_plus.source-spec.yaml		plus_plus.source-spec.yaml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Datapackage Pipelines Plus Plus

Features

Installation

Usage

Starting the services

Running and debugging pipelines

Advanced Usage

Running locally - without Docker

Updating the pipelines plus plus framework files

About

Uh oh!

Releases 1

Packages

Languages

License

OriHoch/datapackage-pipelines-plus-plus

Folders and files

Latest commit

History

Repository files navigation

Datapackage Pipelines Plus Plus

Features

Installation

Usage

Starting the services

Running and debugging pipelines

Advanced Usage

Running locally - without Docker

Updating the pipelines plus plus framework files

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages