From a CSV file containing movie metadata that we want to "import" into a web service using a supplied endpoint. This endpoint only accepts one object at a time. Since data can be inaccurate, this code will run the main transformations needed for importing as much data (movies) as possible
In addition to cloning this repository, you will need network access to install the application dependencies as part of the setup process, including docker.
Our Python code movymporter reads a CSV file (declared on CSV_IN, see section bellow How to set up and run), runs transformations for
increase the data accuracy (NOTE: If original data cannot be properly transformed, it will be substitute by None) and sends it to a WebServer, through a POST call.
- Clone or download a ZIP of this project, e.g.:
$ git clone [email protected]:elminster-aom/movymporter.git- Ensure that you have the right version of Python (v3.9+)
- Create and activate Python Virtual Environment and install required packages, e.g.:
$ python3 -m venv movymporter \
&& source movymporter/bin/activate \
&& python3 -m pip install --requirement movymporter/requirements.txt- Move into the new environment:
$ cd movymporter-
Run
bin/setup. This will run a web server at http://localhost:9009. Please leave this running! If you are using Windows, run all commands in thebindirectory using PowerShell. For example, instead ofbin/setuprunpwsh bin\setup. -
In a new terminal window, make a test POST call to the server.
If you are using Linux or Mac OS the command is:
curl http://localhost:9009/movies -d '{"year":1997, "length": 123, "title": "Face Off", "subject": "action", "actor": "Cage, Nicholas", "actress": "Allen, Joan", "director": "Woo, John", "popularity": 82, "awards": "No", "image": "NicholasCage.png"}'If you are using Windows the command is:
Invoke-RestMethod -Method POST -Uri http://localhost:9009/movies -Body '{"year":1997, "length": 123, "title": "Face Off", "subject": "action", "actor": "Cage, Nicholas", "actress": "Allen, Joan", "director": "Woo, John", "popularity": 82, "awards": "No", "image": "NicholasCage.png"}'
After you do this, you should see the request show up in the web server's output.
- (optional) Check the monitoring endpoint:
curl http://localhost:9009/metrics(Invoke-RestMethod -Method GET -Uri http://localhost:9009/metricsfor Windows)
- You should see 1 count for
sink_post_total - Reset the metrics count by calling
bin/reset(pwsh bin\resetfor Windows) in the same window as the log
- All available settings are based on an environment variables file in the home of our application. For its creation you can use this template:
$ nano .env
# copy-paste this content:
LOG_LEVEL = INFO
CSV_IN = backup.csv
URL_OUT = http://localhost:9009/movies
STOP_ON_ERRORS = 0
$ chmod 0600 .envWhere variables mean:
LOG_LEVEL: Output detail level, possible values are (from more to less verbose):DEBUG,INFO,WARNING,ERROR,CRITICALandNOTSETCSV_IN: CSV File (including relative or absolute path) with data sourceURL_OUT: Full URL (including POST hook) for register data in our WebServer (DB)STOP_ON_ERRORS: Data import will stop after first registry error if value is different than0
- Start the importing, run main program
movymporter:
$ python3 movymporter| Command | Description |
|---|---|
bin/setup |
Builds and start the service that you'll be writing a script against |
bin/reset |
Restarts the service to reset prometheus metrics |
bin/log |
Display logs from server |
bin/clean |
Stops and remove docker containers |
bin/destroy |
Destroy all local docker artifacts. Use with caution |
- The code has been tested only on an Unix-like systems
- The code has been tested with Python 3.9.4
- For a detailed list of Python modules check out the [requirements.txt]
- Concepts like tunning or replication are out of the scope of this exercise
- The web server is a mock server, so data is not persisted
- The Docker CLI is required for setting up and running the web server
- The code is not optimized for large CSV files
aiocsvfunctionality needs to be provided by this code. Some additional transformations needs to operate the full row as a binary string, before converting them to a dictionary- Take profit of Prometheus availability for increase visibility of the application, e.g.:
- Ratio of total of movies registered against total of movies provided
- processing time of the 3 phases, reading, transformation and posting
- Ratio of movies with incomplete fields against total of movies provided
- ...