Overview • Containers • Quick start • Schema development • Test • Debug
Requirements:
Docker compose environment (based on pycsw) for development and testing with CKAN Open Data portals.1
Tip
It can be easily tested with a CKAN-type Open Data portal deployment: mjanez/ckan-docker2.
Available components:
- pycsw: The pycsw app. An OARec and OGC CSW server implementation written in Python.
- ckan2pycsw: Software to achieve interoperability with the open data portals based on CKAN. To do this,
ckan2pycswreads data from an instance using the CKAN API, generates INSPIRE ISO-19115/ISO-19139 3 metadata using pygeometa, or another custom schema, and populates a pycsw instance that exposes the metadata using CSW and OAI-PMH.
List of containers:
| Repository | pycsw version | Type | Docker tag | Size | Notes |
|---|---|---|---|---|---|
| ckan-pycsw | 3.0-dev | base image | mjanez/ckan-pycsw:latest |
~535 MB | Development & test latest version |
| ckan-pycsw | 3.0-dev | base image | mjanez/ckan-pycsw:3.0-dev |
~535 MB | Last stable release according to pycsw master (3.0-dev) |
| ckan-pycsw | 2.6.2 | base image | mjanez/ckan-pycsw:2.6.2 |
~346 MB | Last stable release according to pycsw 2.6.2 |
| ckan-pycsw | 2.6.1 | base image | mjanez/ckan-pycsw:2.6.1 |
~346 MB | Stable release according to pycsw 2.6.1 |
| ckan-pycsw | 2.6.1 | base image | mjanez/ckan-pycsw:main |
~442 MB | Deprecated and only maintained for legacy systems (pin to version ckan-pycsw:2.6.1). |
| Repository | Type | Docker tag | Size | Notes |
|---|---|---|---|---|
| Python | base image | python:3.11-slim-bullseye |
~45 MB | Slim variant for reduced footprint |
Note
GHCR and Dev Dockerfiles using latest stable tag images as base.
| Ports | Container |
|---|---|
| 0.0.0.0:8000->8000/tcp | pycsw |
| 0.0.0.0:5678->5678/tcp | ckan-pycsw debug (debugpy) |
Copy the .env.example template and configure by changing the .env file. Configure the following variables:
PYCSW_SERVER_URL: Base server URL for pycsw configuration (e.g.,http://localhost:8000)CKAN_URL: Your CKAN instance URLPYCSW_PORT: Published port for pycsw service
cp .env.example .envNote In pycsw 3.0,
PYCSW_SERVER_URLis used for server configuration (server.urlinpycsw.yml), whilePYCSW_URLpoints to the CSW endpoint (/csw) for client requests.
Select the CKAN Schema (PYCSW_CKAN_SCHEMA), and the pycsw output schema (PYCSW_OUTPUT_SCHEMA):
- Default:
PYCSW_CKAN_SCHEMA=iso19139_geodcatap PYCSW_OUTPUT_SCHEMA=iso19139_inspire ... SSL_UNVERIFIED_MODE=True
- Avalaible:
-
CKAN metadata schema (
PYCSW_CKAN_SCHEMA):iso19139_geodcatap, default: [WIP] Schema based on GeoDCAT-AP custom dataset schema.iso19139_base: [WIP] Base schema.
-
pycsw metadata schema (
PYCSW_OUTPUT_SCHEMA):iso19139_inspire, default: Customised schema based on ISO 19139 INSPIRE metadata schema. 4iso19139: Standard pycsw schema based on ISO 19139.
-
Change SSL_UNVERIFIED_MODE to avoid SSL errors when using a self-signed certificate in CKAN development.
- Default:
SSL_UNVERIFIED_MODE=True
Warning
Enabling SSL_UNVERIFIED_MODE can expose your application to security risks by allowing unverified SSL certificates. Use this setting only in a trusted development environment and never in production.
To deploy the environment, docker compose will build the latest source in the repo.
If you can deploy a 5 minutes image, use the stable image (ghcr.io/mjanez/ckan-pycsw:main) with docker-compose.ghcr.yml
git clone https://github.com/mjanez/ckan-pycsw
cd ckan-pycsw
docker compose up --build
# Github main registry image
docker compose -f docker-compose.ghcr.yml --build
# Or detached mode
docker compose up -d --buildTip
Deploy the dev (multistage build) docker-compose.dev.yml with:
docker compose -f docker-compose.dev.yml up --buildIf needed, to build a specific container simply run:
docker build -t target_name xxxx/Requirements:
>=Python 3.9
Dependencies:
python3 -m pip install --user pipx
python3 -m pipx ensurepath --force
# You will need to open a new terminal or re-login for the PATH changes to take effect.
pipx install pdm
pdm install --no-selfConfiguration:
# pycsw 3.0 uses YAML configuration
# PYCSW_SERVER_URL is the server base (no /csw), PYCSW_URL is the CSW endpoint
PYCSW_SERVER_URL=http://localhost:8000 PYCSW_URL=http://localhost:8000/csw envsubst < ckan-pycsw/conf/pycsw.yml.template > pycsw.yml
# Or update pycsw.yml vars manually
vi pycsw.ymlGenerate database and add:
rm -f cite.db
# Remember create and update .env vars. Next add to .env environment:
bash doc/scripts/00_ennvars.shRun ckan2pycsw:
PYCSW_CONFIG=pycsw.yml pdm run python3 ckan2pycsw/ckan2pycsw.pyUser-defined metadata schemas can be added, both for CKAN metadata input: ckan2pycsw/schemas/ckan/* and for output schemas in pycsw: ckan2pycsw/schemas/pygeometa/*.
You can customise and extend the metadata schemas that serve as templates to import as many metadata elements as possible from a custom schema into CKAN. e.g. Based on a custom schema from ckanext-scheming.
-
Create a new folder in
schemas/ckan/with the name intended for the schema. e.g.iso19139_spain. -
Create the
main.j2with the Jinja template to render the metadata.Examples in: `schemas/ckan/iso19139_geodcatap -
Add all needed mappings (
.yaml) to a new folder inckan2pycsw/mappings/. e.g.iso19139_spain -
Update
ckan2pycsw/mappings/ckan-pycsw_assigments.yamlto include the pycsw and ckan schema mapping. e.g.iso19139_geodcatap: ckan_geodcatap iso19139_base: ckan_base iso19139_inspire: inspire ... iso19139_spain: iso19139_spain
-
Modify
.envto select the newPYCSW_CKAN_SCHEMA:PYCSW_CKAN_SCHEMA=iso19139_spain PYCSW_OUTPUT_SCHEMA=iso19139
New metadata schemas can be extended or added to convert elements extracted from CKAN into standard metadata profiles that can be exposed in the pycsw CSW Catalogue.
-
Create a new folder in
schemas/pygeometa/with the name intended for the schema. e.g.iso19139_spain. -
Add a
__init__.pyfile with the extended pygeometa schema class. e.g.import ast import logging import os from typing import Union from lxml import etree from owslib.iso import CI_OnlineResource, CI_ResponsibleParty, MD_Metadata from pygeometa.schemas.base import BaseOutputSchema from model.template import render_j2_template LOGGER = logging.getLogger(__name__) THISDIR = os.path.dirname(os.path.realpath(__file__)) class ISO19139_spainOutputSchema(BaseOutputSchema): """ISO 19139 - Spain output schema""" def __init__(self): """ Initialize object :returns: pygeometa.schemas.base.BaseOutputSchema """ super().__init__('iso19139_spain', 'xml', THISDIR) ...
-
Create the
main.j2with the Jinja template to render the metadata, macros can be added for more specific templates, for example:iso19139_inspire-regulation.j2, orcontact.j2, more examples in:schemas/pygeometa/iso19139_inspire -
Add the Python class and the schema identifier to
ckan2pycsw.py, e.g.from schemas.pygeometa.iso19139_inspire import ISO19139_inspireOutputSchema, ISO19139_spainOutputSchema ... OUPUT_SCHEMA = { 'iso19139_inspire': ISO19139_inspireOutputSchema, 'iso19139': ISO19139OutputSchema, 'iso19139_spain: ISO19139_spainOutputSchema }
-
Add all mappings (
.yaml) to a new folder inckan2pycsw/mappings/. e.g.iso19139_spain -
Update
ckan2pycsw/mappings/ckan-pycsw_assigments.yamlto include the pycsw and ckan schema mapping. e.g.iso19139_geodcatap: ckan_geodcatap iso19139_base: ckan_base iso19139_inspire: inspire ... iso19139_spain: iso19139_spain
-
Modify
.envto select the newPYCSW_OUTPUT_SCHEMA:PYCSW_CKAN_SCHEMA=iso19139_geodcatap PYCSW_OUTPUT_SCHEMA=iso19139_spain
The project includes a comprehensive test suite using pytest. Tests validate:
- CKAN to ISO19139 XML transformation
- pycsw 3.0 compatibility with OWSLib ≥0.29.0
- None value handling in Service datasets
- All DCAT types (Dataset, Series, Service)
# Run all tests in isolated environment
docker compose -f docker-compose.test.yml up --abort-on-container-exit
# Cleanup
docker compose -f docker-compose.test.yml down -vcd ckan-pycsw
# Install dev dependencies
pdm install -d
# Run all tests
pdm run pytest tests/ -v
# Run with coverage
pdm run pytest tests/ --cov=ckan2pycsw --cov-report=htmlFor detailed testing documentation, see tests/README.md.
pycsw 3.0 provides multiple API endpoints:
- OGC API - Records (default):
http://localhost:8000/ - CSW 2.0/3.0:
http://localhost:8000/csw - OAI-PMH:
http://localhost:8000/oaipmh - OpenSearch:
http://localhost:8000/opensearch - SRU:
http://localhost:8000/sru
Note
PYCSW_URLis configured to point to the CSW endpoint (/csw) by default, as it's the primary endpoint for catalog services.
Perform a GetRecords request and return all:
{PYCSW_URL}?request=GetRecords&service=CSW&version=3.0.0&typeNames=gmd:MD_Metadata&outputSchema=http://www.isotc211.org/2005/gmd&elementSetName=full
- The
ckan-pycswlogs will be created in the/logfolder. - Metadata records in
XMLformat (ISO 19139) are stored in the/metadatafolder.
Note The
GetRecordsoperation allows clients to discover resources (datasets). The response is anXMLdocument and the output schema can be specified.
The development environment uses debugpy (Microsoft's Python debugger) for remote debugging.
-
Build and run dev container:
docker compose -f docker-compose.dev.yml up -d --build
-
In VS Code, use the "Python: Remote Attach (debugpy)" configuration (
.vscode/launch.json):- Connects to
localhost:5678 - Path mappings configured for
/srv/app/ckan2pycsw
- Connects to
-
Set breakpoints in your code and start debugging
-
The container will wait for debugger to attach before running
Note
We upgraded from deprecated ptvsd to debugpy for better compatibility and performance.
-
Install dev dependencies:
cd ckan-pycsw pdm install -d -
Use one of these VS Code debug configurations:
- "Python: Current File": Debug the active Python file
- "Python: Pytest Current File": Debug tests in active file
- "Python: Pytest All Tests": Debug all tests
-
Set breakpoints and press F5 to start debugging
Note
By default, the Python extension looks for and loads a file named .env in the current workspace folder. More info about Python debugger and Environment variables use.
VS Code launch configurations are provided in .vscode/launch.json:
- Remote Attach: Attach to Docker container debugger (port 5678)
- Current File: Debug any Python file locally
- Pytest Current File: Debug tests in active file
- Pytest All Tests: Debug entire test suite
For detailed debugging information, see tests/README.md.
Footnotes
-
Extends the @frafra coat2pycsw package. ↩
-
A custom installation of Docker Compose with specific extensions for spatial data and GeoDCAT-AP/INSPIRE metadata profiles. ↩
-
INSPIRE dataset and service metadata based on ISO/TS 19139:2007. ↩
-
The output pycsw schema (
iso19139_inspire), to comply with INSPIRE ISO 19139 is WIP. The validation of the dataset/series is complete and conforms to the INSPIRE reference validator datasets and dataset series (Conformance Class 1, 2, 2b and 2c). In contrast, spatial data services still fail in only 1 dimension [WIP]. ↩