Cutana - Astronomical Cutout Pipeline

Cutana is a high-performance Python pipeline for creating astronomical image cutouts from large FITS datasets. It provides both an interactive Jupyter-based UI and a programmatic API for efficient processing of astronomical survey data like ESA Euclid observations.

Note: Cutana is currently optimised for Euclid Q1/IDR1 data. Some defaults and assumptions are Euclid-specific:

Flux conversion expects the MAGZERO header keyword (configurable via config.flux_conversion_keywords.AB_zeropoint)

Filter detection patterns are tuned for Euclid bands (VIS, NIR-Y, NIR-H, NIR-J)

FITS structure assumes one file per channel/filter

For other surveys, you may need to adjust these settings or disable flux conversion (config.apply_flux_conversion = False).

Support for Datalabs Users

For users experiencing problems with Cutana in the ESA Datalabs environment, please open a service desk ticket at: https://support.cosmos.esa.int/situ-service-desk/servicedesk/customer/portal/5

Quick Start

Installation

pip install cutana

Or for development:

# Create conda environment
conda env create -f environment.yml
conda activate cutana

Interactive UI (Recommended)

For most users, the interactive interface provides the easiest way to process astronomical data:

import cutana_ui
cutana_ui.start() # optionally can specify e.g. ui_scale=0.6 for smaller UI

This launches a step-by-step interface where you can:

Select your source catalogue (CSV format)
Configure processing parameters (image extensions, output format, resolution)
Monitor progress with live previews and status updates

Programmatic API

For automated workflows or integration into larger systems:

import pandas as pd
from cutana import get_default_config, Orchestrator

# Configure processing
config = get_default_config()
config.source_catalogue = "sources.csv" # See format below
config.output_dir = "cutouts_output/"
config.output_format = "zarr"  # or "fits"
config.target_resolution = 256
config.selected_extensions = ["VIS"]  # Extensions to process
# 1 output channel for VIS, details explained below
config.channel_weights = {
        "VIS": [1.0],
    }
config.console_log_level = "INFO" # Show INFO logs in console

# Process cutouts
orchestrator = Orchestrator(config)
results = orchestrator.run()

Input Data Format

Your source catalogue must be a CSV file containing these columns:

SourceID,RA,Dec,diameter_pixel,fits_file_paths
TILE_102018666_12345,45.123,12.456,128,"['/path/to/tile_vis.fits', '/path/to/tile_nir.fits']"
TILE_102018666_12346,45.124,12.457,256,"['/path/to/tile_vis.fits','/path/to/tile_nir.fits']"

Required Columns:

SourceID: Unique identifier for each astronomical object
RA: Right Ascension in degrees (0-360°, ICRS coordinate system)
Dec: Declination in degrees (-90 to +90°, ICRS coordinate system)
diameter_pixel: Cutout size in pixels (creates square cutouts). Alternatively diameter_arcsec.
fits_file_paths: JSON-formatted list of FITS file paths containing the source, use consistent order

Output Formats

ZARR Format (recommended): All cutouts stored in a efficient archives, ideal for large datasets and analysis workflows. Cutana uses the Zarr format for high-performance storage and the images_to_zarr library for conversion. (See the Output section below for sample code to access)

FITS Format: Individual FITS files per source, best for compatibility with existing astronomical software.

Multi-Channel Processing

Cutana automatically handles sources with multiple FITS files and allows channel mixing through configurable weights.

Channel Weights Configuration

The channel_weights parameter controls how multiple FITS files are combined into output channels. Each key represents a FITS extension name, and the corresponding value is a list of weights for image channels respectively.

Important: The FITS files listed in your source catalogue's fits_file_paths column must be ordered to match the extensions defined in channel_weights. The weights are not normalised, so careful consideration should be given when choosing them.

# Configure channel weights (ordered dictionary format)
config.channel_weights = {
    "VIS": [1.0, 0.0, 0.5],    # RGB weights for VIS band
    "NIR_H": [0.0, 1.0, 0.3],  # RGB weights for NIR H-band
    "NIR_J": [0.0, 0.0, 0.8]   # RGB weights for NIR J-band
}

Example: If your source catalogue contains:

SourceID,RA,Dec,diameter_pixel,fits_file_paths
TILE_123_456,45.1,12.4,128,"['/path/to/vis.fits', '/path/to/nir_h.fits', '/path/to/nir_j.fits']"

The order of files in fits_file_paths must correspond to the order of keys in channel_weights:

/path/to/vis.fits → VIS extension
/path/to/nir_h.fits → NIR_H extension
/path/to/nir_j.fits → NIR_J extension

Image Normalisation

Cutana supports multiple stretch algorithms with unified parameter configuration:

config = get_default_config()

# Set normalisation method
config.normalisation_method = "asinh"  # "linear", "log", "asinh", "zscale"

# Configure normalisation parameters (method-specific defaults applied automatically)
config.normalisation.percentile = 99.8  # Data clipping percentile
config.normalisation.a = 0.7            # Transition parameter (asinh/log)
config.normalisation.n_samples = 1000   # ZScale samples  
config.normalisation.contrast = 0.25    # ZScale contrast

Image stretching is powered by fitsbolt for consistent processing.

config_path = save_config_toml(config, f"{config.output_dir}/cutana_config.toml")

Output

In the case of zarr files the output will be organised in batches. Per batch one folder is created each with an images.zarr and an images_metadata.parquet. The folders are named using the format batch_cutout_process_{index}_{unique_id} where each process gets a unique identifier.

Metadata

With the output zarr files, a metadata parquet file is created containing the following information: source_id, ra, dec, idx_in_zarr, diameter_arcsec, diameter_pixel, processing_timestap.

This can be read with:

import pandas as pd

metadata=pd.read_parquet("output_path/batch_cutout_process_*/images_metadata.parquet", engine='pyarrow')

This parquet provides a direct mapping between the individual images within the .zarr files and the Source IDs and the cutouts. Note: No source IDs will be stored in the .zarr files!

idx_in_zarr is the index position of the source cutout within the .zarr file.

Images

To open the .zarr files and look at example images, the following code can be used:

# Open the images
import zarr

# File: a dictionary that contains "images" of shape n_images,H,W,C
file = zarr.open("output_path/batch_cutout_process_*/images.zarr", mode='r')
print(list(file.keys()))

# Example plots
from matplotlib import pyplot as plt
import numpy as np

fig, axes = plt.subplots(4, 4, figsize=(8, 8))
for ax in axes.flatten():
    index=np.random.randint(0, file['images'].shape[0])
    image = file['images'][index]
    ax.imshow(image, cmap='gray',origin="lower")
    ax.axis('off')
plt.tight_layout()
plt.show()

If fits was selected as the output, then the fits files contain the info in the header of the PRIMARY extension.

To select a specific image from the .zarr files, the above output parquet file must be used as described in the Metadata section.

Cutout Extraction Behavior

Padding Factor

The padding_factor (in UI zoom-out) parameter controls the extraction area relative to the source size from your catalogue:

padding_factor = 1.0 (default): Extracts exactly the source size (diameter_pixel or diameter_arcsec)
padding_factor < 1.0 (zoom-in): Extracts a smaller area (source_size × padding_factor pixels)
- Minimum value: 0.25 (extracts 1/4 of source_size)
- Example: padding_factor = 0.5 with 10px source extracts 5×5 pixels
padding_factor > 1.0 (zoom-out): Extracts a larger area (source_size × padding_factor pixels)
- Maximum value: 10.0 (extracts 10× source_size)
- Example: padding_factor = 2.0 with 10px source extracts 20×20 pixels

All extracted cutouts are then resized to the final target_resolution (e.g., 256×256) in the processing pipeline.

Stretch/Normalisation Configuration

Cutana supports multiple image stretching methods with unified parameter naming for optimal visualisation and analysis. The normalisation_method parameter controls the stretch algorithm, with a single set of parameters that automatically apply appropriate defaults based on the chosen method:

Unified Parameter Details: All normalisation parameters are now stored in the config.normalisation DotMap object:

config.normalisation.percentile: Percentile for data clipping, applied to all stretch methods (default: 99.8, range: 0-100)
config.normalisation.a: Unified transition parameter with method-specific defaults:
- ASINH: 0.7 (controls linear-to-logarithmic transition, range: 0.001-3.0)
- Log: 1000.0 (scale factor for transition point, range: 0.01-10000.0)
config.normalisation.n_samples: Number of samples for ZScale algorithm (default: 1000, range: 100-10000)
config.normalisation.contrast: Contrast adjustment for ZScale (default: 0.25, range: 0.01-1.0)

Note: The unified a parameter automatically applies the appropriate default value based on the selected normalisation method, eliminating the need for method-specific parameter names.

Linear Stretch

from cutana import get_default_config

config = get_default_config()
config.normalisation_method = "linear"
config.normalisation.percentile = 99.8             # Percentile clipping (default)

ASINH Stretch (Recommended)

from cutana import get_default_config

config = get_default_config()
config.normalisation_method = 'asinh'
config.normalisation.percentile = 99.8             # Percentile clipping (default)
config.normalisation.a = 0.7                       # Transition parameter (default for asinh)

Log Stretch

from cutana import get_default_config

config = get_default_config()
config.normalisation_method = 'log'
config.normalisation.percentile = 99.8             # Percentile clipping (default)
config.normalisation.a = 1000.0                    # Scale factor (default for log)

ZScale Stretch

from cutana import get_default_config

config = get_default_config()
config.normalisation_method = 'zscale'
config.normalisation.percentile = 99.8             # Percentile clipping (default)
config.normalisation.n_samples = 1000              # Number of samples (default)
config.normalisation.contrast = 0.25               # Contrast parameter (default)

Performance Considerations

Memory and CPU Optimization

Cutana is designed to optimally utilize available system memory and CPU cores while maintaining stability and preventing system overload. The pipeline includes intelligent load balancing that automatically adjusts worker processes and memory allocation based on real-time system monitoring.

Memory Constraints

In most deployment scenarios, memory will be the limiting factor rather than CPU cores. Cutana's load balancer continuously monitors memory usage and dynamically adjusts the number of worker processes to prevent memory exhaustion.

Critical Usage Guidelines

IMPORTANT: Users SHALL NOT run other memory-intensive activities in parallel with Cutana processing.

Running competing memory-intensive processes will interfere with Cutana's load balancing algorithms and may lead to:

System crashes due to memory exhaustion
Degraded performance and increased processing time
Inconsistent or failed processing results
Potential data corruption in extreme cases

Optimal Usage

For best performance:

Dedicated Resources: Run Cutana on dedicated compute resources when possible
Monitor System: Use system monitoring tools to track memory usage during processing
Batch Size Tuning: Adjust N_batch_cutout_process based on available memory
Worker Configuration: Let the load balancer automatically determine optimal worker count, or manually set max_workers conservatively

The load balancer will automatically scale down processing if memory pressure increases, but prevention through proper resource management is always preferable.

Advanced features

To avoid bright objects at the border of an cutout making the target too faint, the user can set

int x # larger than 1, smaller than config.target_resolution
config.normalisation.crop_enable=True
config.normalisation.crop_height = x
config.normalisation.crop_width = x

Then after resizing, during normalisation the image is cropped to crop_height x crop_width around the center and the maximum value is determined inside this cropped region.

Configuration Parameters

The following table describes all configuration parameters available in Cutana:

Parameter	Type	Default	Range/Allowed Values	Description
General Settings
`name`	str	"cutana_run"	-	Run identifier
`log_level`	str	"INFO"	DEBUG, INFO, WARNING, ERROR, CRITICAL, TRACE	Logging level for files
`console_log_level`	str	"WARNING"	DEBUG, INFO, WARNING, ERROR, CRITICAL, TRACE	Console/notebook logging level
Input/Output Configuration
`source_catalogue`	str	None	File path	Path to source catalogue CSV file (required)
`output_dir`	str	"cutana_output"	Directory path	Output directory for results
`output_format`	str	"zarr"	zarr, fits	Output format
`data_type`	str	"float32"	float32, uint8	Output data type
Processing Configuration
`max_workers`	int	16	1-1024	Maximum number of worker processes
`N_batch_cutout_process`	int	1000	10-10000	Batch size within each process
`max_workflow_time_seconds`	int	1354571	600-5000000	Maximum total workflow time (~2 weeks default)
Cutout Processing Parameters
`target_resolution`	int	256	16-2048	Target cutout size in pixels (square cutouts)
`padding_factor`	float	1.0	0.25-10.0	Padding factor for cutout extraction (1.0 = no padding)
`normalisation_method`	str	"linear"	linear, log, asinh, zscale, none	Normalisation method
`interpolation`	str	"bilinear"	bilinear, nearest, cubic, lanczos	Interpolation method
FITS File Handling
`fits_extensions`	list	["PRIMARY"]	List of str/int	Default FITS extensions to process
`selected_extensions`	list	[]	List of str/int/dict	Extensions selected by user (set by UI)
`available_extensions`	list	[]	List	Available extensions (discovered during analysis)
Flux Conversion Settings
`apply_flux_conversion`	bool	True	-	Whether to apply flux conversion (for Euclid data)
`flux_conversion_keywords.AB_zeropoint`	str	"MAGZERO"	-	Header keyword for AB magnitude zeropoint
`user_flux_conversion_function`	callable	None	-	Custom flux conversion function (deprecated)
Image Normalization Parameters
`normalisation.percentile`	float	99.8	0-100	Percentile for data clipping
`normalisation.a`	float	0.7 (asinh), 1000.0 (log)	0.001-10000.0	Unified transition parameter
`normalisation.n_samples`	int	1000	100-10000	Number of samples for ZScale algorithm
`normalisation.contrast`	float	0.25	0.01-1.0	Contrast adjustment for ZScale
`normalisation.crop_enable`	bool	False	-	Enable cropping during normalization
`normalisation.crop_width`	int	-	0-5000	Crop width in pixels
`normalisation.crop_height`	int	-	0-5000	Crop height in pixels
Advanced Processing Settings
`channel_weights`	dict	{"PRIMARY": [1.0]}	Dict of str: list[float]	Channel weights for multi-channel processing
File Management
`tracking_file`	str	"workflow_tracking.json"	-	Job tracking file
`config_file`	str	None	File path	Path to saved configuration file
Load Balancer Configuration
`loadbalancer.memory_safety_margin`	float	0.15	0.01-0.5	Safety margin for memory allocation (15%)
`loadbalancer.memory_poll_interval`	int	3	1-60	Poll memory every N seconds
`loadbalancer.memory_peak_window`	int	30	10-300	Track peak memory over N second windows
`loadbalancer.main_process_memory_reserve_gb`	float	4.0	0.5-10.0	Reserved memory for main process
`loadbalancer.initial_workers`	int	1	1-8	Start with N workers until memory usage is known
`loadbalancer.max_sources_per_process`	int	150000	1+	Maximum sources per job/process
`loadbalancer.log_interval`	int	30	5-300	Log memory estimates every N seconds
`loadbalancer.event_log_file`	str	None	File path	Optional file path for LoadBalancer event logging
UI Configuration
`ui.preview_samples`	int	10	1-50	Number of preview samples to generate
`ui.preview_size`	int	256	16-512	Size of preview cutouts
`ui.auto_regenerate_preview`	bool	True	-	Auto-regenerate preview on config change

Backend API Reference

Orchestrator Class

The Orchestrator class is the main entry point for programmatic access to Cutana's cutout processing capabilities.

Constructor

from cutana import Orchestrator
from cutana import get_default_config

orchestrator = Orchestrator(config, status_panel=None)

Parameters:

config (DotMap): Configuration object created with get_default_config()
status_panel (optional): UI status panel reference for direct updates

Main Processing Methods

`start_processing(catalogue_data)`

Start the main cutout processing workflow.

import pandas as pd
from cutana import Orchestrator, get_default_config

# Load your source catalogue
catalogue_df = pd.read_csv("sources.csv")

# Configure processing
config = get_default_config()
config.output_dir = "cutouts_output/"
config.output_format = "zarr"
config.target_resolution = 256
config.selected_extensions = ["VIS", "NIR_H", "NIR_J"]
config.channel_weights = {
    "VIS": [1.0, 0.0, 0.5],
    "NIR_H": [0.0, 1.0, 0.3],
    "NIR_J": [0.0, 0.0, 0.8]
}

# Process cutouts
orchestrator = Orchestrator(config)
result = orchestrator.start_processing(catalogue_df)

Parameters:

catalogue_data (pandas.DataFrame): DataFrame containing source catalogue with required columns

Returns:

dict: Result dictionary containing:
- status (str): "completed", "failed", or "stopped"
- total_sources (int): Number of sources processed
- completed_batches (int): Number of completed processing batches
- mapping_csv (str): Path to source-to-zarr mapping CSV file
- error (str): Error message if status is "failed"

`run()`

Simplified entry point that loads catalogue from config and runs processing.

config = get_default_config()
config.source_catalogue = "sources.csv"
config.output_dir = "cutouts_output/"

orchestrator = Orchestrator(config)
result = orchestrator.run()

Returns:

dict: Same format as start_processing()

Progress and Status Methods

`get_progress()`

Get current progress and status information.

progress = orchestrator.get_progress()
print(f"Completed: {progress['completed_sources']}/{progress['total_sources']}")

Returns:

dict: Progress information including completed/total sources, runtime, errors

`get_progress_for_ui(completed_sources=None)`

Get progress information optimized for UI display.

from cutana.progress_report import ProgressReport

progress_report = orchestrator.get_progress_for_ui()
print(f"Progress: {progress_report.progress_percent:.1f}%")
print(f"Memory: {progress_report.memory_used_gb:.1f}/{progress_report.memory_total_gb:.1f} GB")

Parameters:

completed_sources (int, optional): Override completed sources count

Returns:

ProgressReport: Dataclass with UI-relevant progress information including system resources

Control Methods

`stop_processing()`

Stop all active subprocesses gracefully.

result = orchestrator.stop_processing()
print(f"Stopped {len(result['stopped_processes'])} processes")

Returns:

dict: Stop operation results with list of stopped process IDs

`can_resume()`

Check if a workflow can be resumed from saved state.

if orchestrator.can_resume():
    print("Previous workflow can be resumed")

Returns:

bool: True if resumption is possible

Configuration Functions

`get_default_config()`

Get the default configuration object.

from cutana import get_default_config

config = get_default_config()
config.output_dir = "my_cutouts/"
config.target_resolution = 512

Returns:

DotMap: Configuration object with all default parameters

`save_config_toml(config, filepath)`

Save configuration to TOML file.

from cutana import save_config_toml

config_path = save_config_toml(config, "cutana_config.toml")

Parameters:

config (DotMap): Configuration to save
filepath (str): Path to save TOML file

Returns:

str: Path to saved file

`load_config_toml(filepath)`

Load configuration from TOML file.

from cutana import load_config_toml

config = load_config_toml("cutana_config.toml")

Parameters:

filepath (str): Path to TOML configuration file

Returns:

DotMap: Loaded configuration merged with defaults

Catalogue Functions

`load_and_validate_catalogue(filepath)`

Load and validate source catalogue from CSV file using catalogue_preprocessor.

from cutana.catalogue_preprocessor import load_and_validate_catalogue

try:
    catalogue_df = load_and_validate_catalogue("sources.csv")
    print(f"Loaded {len(catalogue_df)} sources")
except CatalogueValidationError as e:
    print(f"Validation error: {e}")

Parameters:

filepath (str): Path to CSV catalogue file

Returns:

pandas.DataFrame: Validated catalogue DataFrame

Raises:

CatalogueValidationError: If catalogue format is invalid

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github		.github
assets		assets
benchmarking		benchmarking
cutana		cutana
cutana_ui		cutana_ui
tests		tests
.flake8		.flake8
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENCE.txt		LICENCE.txt
README.md		README.md
colours.txt		colours.txt
conftest.py		conftest.py
environment.yml		environment.yml
playwright.config.py		playwright.config.py
pyproject.toml		pyproject.toml
run_playwright_tests.py		run_playwright_tests.py

License

esa/Cutana

Folders and files

Latest commit

History

Repository files navigation

Cutana - Astronomical Cutout Pipeline

Support for Datalabs Users

Quick Start

Installation

Interactive UI (Recommended)

Programmatic API

Input Data Format

Output Formats

Multi-Channel Processing

Channel Weights Configuration

Image Normalisation

Output

Metadata

Images

Cutout Extraction Behavior

Padding Factor

Stretch/Normalisation Configuration

Linear Stretch

ASINH Stretch (Recommended)

Log Stretch

ZScale Stretch

Performance Considerations

Memory and CPU Optimization

Memory Constraints

Critical Usage Guidelines

Optimal Usage

Advanced features

Configuration Parameters

Backend API Reference

Orchestrator Class

Constructor

Main Processing Methods

start_processing(catalogue_data)

run()

Progress and Status Methods

get_progress()

get_progress_for_ui(completed_sources=None)

Control Methods

stop_processing()

can_resume()

Configuration Functions

get_default_config()

save_config_toml(config, filepath)

load_config_toml(filepath)

Catalogue Functions

load_and_validate_catalogue(filepath)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages

`start_processing(catalogue_data)`

`run()`

`get_progress()`

`get_progress_for_ui(completed_sources=None)`

`stop_processing()`

`can_resume()`

`get_default_config()`

`save_config_toml(config, filepath)`

`load_config_toml(filepath)`

`load_and_validate_catalogue(filepath)`