CMeta is a compact metadata format for describing data lakes, tables, and columns in a way that is:
- 🪶 Token-efficient: compresses schema/metadata for LLMs with narrow context windows
- 🧑💻 Human-readable: easy to read and edit manually
- 🔄 Reversible: can be converted to and from JSON (compact or verbose)
This package provides Python utilities to:
- Parse CMeta text ⇄ JSON
- Convert to/from a compact JSON structure
- Convert to/from a flat, extended JSON structure (very verbose for some specific Nexus use cases)
uv add nuklai-cmeta
Uses uv. You can also install with
pip install nuklai-cmeta
if you prefer.
-
Hierarchy:
- Lake → Tables → Columns
:
denotes containers*
denotes columns
-
Descriptions: in
[ ... ]
-
Types: in
< ... >
using full SQL types (string
,int
,boolean
,date
,timestamp
, etc.) -
Nested fields: use dot notation (
car.engine.horsepower
) -
Escape rules:
]
→\]
<
→\<
>
→\>
Example:
Webshop[Contains all webshop data]:
users[Contains all registered users]:
* user_id<int>[Unique ID of a user]
* name<string>[Full name of a user]
* email<string>[Email address]
orders[Customer orders]:
* id<int>[Order id]
* total<double>[Total amount]
* created_at<timestamp>[When created]
from cmeta import parse_cmeta
text = """
Webshop[Contains all webshop data]:
users[Contains all registered users]:
* user_id<int>[Unique ID]
* name<string>[Full name]
"""
model = parse_cmeta(text)
print(model.lakes[0].tables[0].columns[0].name)
# "user_id"
from cmeta import to_cmeta
cmeta_str = to_cmeta(model)
print(cmeta_str)
from cmeta import model_to_compact_json, compact_json_to_model
cj = model_to_compact_json(model)
print(cj[0]["tables"][0]["columns"][0])
# {'name': 'user_id', 'type': 'int', 'description': 'Unique ID'}
m2 = compact_json_to_model(cj)
assert to_cmeta(m2) == to_cmeta(model)
Compact JSON format:
[
{
"name": "Webshop",
"description": "Contains all webshop data",
"tables": [
{
"name": "users",
"description": "Contains all registered users",
"columns": [
{"name": "user_id", "type": "int", "description": "Unique ID"}
]
}
]
}
]
from cmeta import model_to_extended_json, extended_json_to_model
ej = model_to_extended_json(model)
print(ej[0])
# {
# 'columnName': 'user_id',
# 'columnDescription': 'Unique ID',
# 'dataType': 'int',
# 'sourceDescription': 'Contains all webshop data',
# 'sourceName': 'Webshop',
# 'tableDescription': 'Contains all registered users',
# 'tableName': 'users'
# }
m3 = extended_json_to_model(ej)
Extended JSON format:
[
{
"columnName": "user_id",
"columnDescription": "Unique ID",
"dataType": "int",
"sourceDescription": "Contains all webshop data",
"sourceName": "Webshop",
"tableDescription": "Contains all registered users",
"tableName": "users"
}
]
CMeta supports common datatypes:
string
int
,bigint
float
,double
,decimal
boolean
date
,timestamp
json
,array
,map
Clone and set up:
uv venv && source .venv/bin/activate
make install
Run checks:
make ci # lint + typecheck + test
make lint # ruff
make format # autoformat
make test # pytest
Dev workflow for trusted contributors:
-
Bump the version in
pyproject.toml
. -
Commit & push to
main
. -
Tag the release:
git tag v0.1.0 && git push origin v0.1.0
-
The GitHub Actions workflow will:
- run tests
- build wheels/sdist
- publish to PyPI
MIT — free for personal and commercial use.