Skip to content

Conversation

hsteptoe
Copy link
Contributor

🚀 Pull Request

Description

Fixes #6126 by exposing the shape geometry to the user. Adds to docs and docstrings to make it clearer how to use this.


Consult Iris pull request check list


Add any of the below labels to trigger actions on this PR:

  • benchmark_this Request that this pull request be benchmarked to check if it introduces performance shifts

@pp-mo
Copy link
Member

pp-mo commented Aug 28, 2024

From @SciTools/peloton : discussed what we are seeing here and in #6126.
We think this is looking useful, but the two of you seem to have it covered @hsteptoe @acchamber ?
Please shout if you need help.

@hsteptoe
Copy link
Contributor Author

Discussion with @acchamber notes:

  • Don't move imports out of functions as this creates circular import errors
  • Check target_crs assignment for occasion when cube doesn't have a CRS
  • Re-write conversion of 0-360 -> -180-180 coords in case of degree based CRSs

@hsteptoe
Copy link
Contributor Author

What are @SciTools/peloton (@pp-mo) thoughts about adding rasterio as a Iris dependency?

This would facilitate better handling of shapes to rasters (which is essentially the problem we're solving) with the added bonus that we could also mask to other shape types, like lines, to extract a trajectory from a Cube, for example.

@acchamber
Copy link
Contributor

I've had a few other discussions with other users now where they can't use this function with OSGB shapefiles which this PR would fix. Where are we on being able to include it in the next release?

@hsteptoe @pp-mo

@hsteptoe
Copy link
Contributor Author

hsteptoe commented Dec 2, 2024

I've had a few other discussions with other users now where they can't use this function with OSGB shapefiles which this PR would fix. Where are we on being able to include it in the next release?

@hsteptoe @pp-mo

I was planning on going to the AVD surgery this week to discuss some changes and/or make the case for rasterio

@trexfeathers
Copy link
Contributor

Thanks for your patience everyone. It's sometimes a struggle to balance everything and we just haven't had an opportunity to discuss this. Coming to the Surgery (UK Met Office) is an ideal next step.

@stephenworsley
Copy link
Contributor

@pp-mo, @trexfeathers and @stephenworsley have discussed this at the Surgery and agree in principle that we could consider adding rasterio as an optional dependency.

@trexfeathers
Copy link
Contributor

From @SciTools/peloton: would @hsteptoe and @acchamber appreciate any more input from core devs at this point?

@hsteptoe
Copy link
Contributor Author

hsteptoe commented Feb 5, 2025

From @SciTools/peloton: would @hsteptoe and @acchamber appreciate any more input from core devs at this point?

I just need to find more time, which is in short supply as we get near the end of FY...

@hsteptoe
Copy link
Contributor Author

hsteptoe commented Feb 14, 2025

This is now ready for some initial (alpha?) testing 🎉. @acchamber are you available to try out the new features? (Updates to docs etc. will follow if this is successful) Tests are incomplete are should not be review yet.

This is a fairly substantial re-write. New/improved features include:

  • Automatic CRS transforms. Will handle situation where the cube and geometry have different CRSs by transforming the geometry.
  • Handles all shape types, including Points, Lines and MultiPolygons.

There are some breaking changes, principally moving away from having a minimum_weight keyword, replacing it with an all_touched keyword. The rational here is for consistency across shape types. Minimum weight doesn't make sense for Points and Lines. Instead, Line pixilation is based on Bresenham's line algorithm. Polygon pixilation is based on either: (a) pixel centres that intersect the polygon, or (b) all pixels touched by the polygon irrespective of the area of intersection.

My informal tests so far show working behaviour for iris test data from rotated_pole.nc, toa_brightness_stereographic.nc, E1_north_america.nc, air_temp.pp. Know limitation (not-working) for semi-structured model grids, such as orca2_votemper.nc. Some test shapefiles 👉test_shapefiles.zip

@bjlittle bjlittle added this to the v3.13 milestone Feb 17, 2025
Comment on lines +55 to +57
[project.optional-dependencies]
RASTERIO = ["rasterio"]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjlittle I remember that we stopped doing PyPI optional dependencies during your tenure with Iris. Can you remember why? Would you anticipate problems if we started again?

Maybe it's just a problem with a single-source-of-truth?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trexfeathers @bjlittle Did you come to a conclusion about this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't. However we are attempting to standardise across our repositories and we don't have an agreed way to handle optional dependencies, so please remove this entry.

@CLAassistant
Copy link

CLAassistant commented Oct 6, 2025

CLA assistant check
All committers have signed the CLA.

@hsteptoe
Copy link
Contributor Author

hsteptoe commented Oct 7, 2025

@stephenworsley @trexfeathers Can you advise on apparently nox related errors that I'm seeing in the current CI failures, giving an error message: CondaHTTPError: HTTP 401 CONNECTION FAILED for url...

And, do you know why my local pre-commit version of numpydoc-validation might pass, but the github action one fails?

@trexfeathers
Copy link
Contributor

@stephenworsley @trexfeathers Can you advise on apparently nox related errors that I'm seeing in the current CI failures, giving an error message: CondaHTTPError: HTTP 401 CONNECTION FAILED for url...

And, do you know why my local pre-commit version of numpydoc-validation might pass, but the github action one fails?

@hsteptoe really sorry for missing this. There is a MO-specific step you will need to do for the lockfiles, will message you offline.

I'm less sure about pre-commit. There is some nuance about only testing files which have been edited; perhaps the merge from main means that your PR is changing more than your local branch was.

@trexfeathers trexfeathers moved this to 👀 In Review in 🦌 Iris 3.14 Oct 15, 2025
@trexfeathers trexfeathers changed the title Bugfix shape masking Refactor shape masking Oct 15, 2025
@hsteptoe
Copy link
Contributor Author

@stephenworsley @trexfeathers Can you advise on apparently nox related errors that I'm seeing in the current CI failures, giving an error message: CondaHTTPError: HTTP 401 CONNECTION FAILED for url...
And, do you know why my local pre-commit version of numpydoc-validation might pass, but the github action one fails?

@hsteptoe really sorry for missing this. There is a MO-specific step you will need to do for the lockfiles, will message you offline.

I'm less sure about pre-commit. There is some nuance about only testing files which have been edited; perhaps the merge from main means that your PR is changing more than your local branch was.

pre-commit related errors may have been something wrong going on with my pre-commit cache. Did a pre-commit clean and re-ran and that seemed to resolve the inconsistency issues.

Copy link
Contributor

@trexfeathers trexfeathers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still left to review:

  • _shapefiles.py
  • tests
  • lock files

Comment on lines +88 to +90
#. `@hsteptoe <https://github.com/hsteptoe>`_ added `rasterio <https://rasterio.readthedocs.io/en/stable/index.html>`_
and `affiine <https://affine.readthedocs.io/en/latest/>`_ as a dependency for :func:`iris.util.mask_cube_from_shapefile`.
This is to support the new functionality of handling additional shapefiles and projections. (:issue:`6126`, :pull:`6129`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very important to emphasise that this is an optional dependency.

Comment on lines +55 to +57
[project.optional-dependencies]
RASTERIO = ["rasterio"]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't. However we are attempting to standardise across our repositories and we don't have an agreed way to handle optional dependencies, so please remove this entry.

horizontal grid.
(:issue:`5770`, :pull:`6581`)

#. `@hsteptoe <https://github.com/hsteptoe>`_ and `@...`_ (reviewer) extended :func:`iris.util.mask_cube_from_shapefile`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#. `@hsteptoe <https://github.com/hsteptoe>`_ and `@...`_ (reviewer) extended :func:`iris.util.mask_cube_from_shapefile`
#. `@hsteptoe <https://github.com/hsteptoe>`_ and `@trexfeathers`_ (reviewer) extended :func:`iris.util.mask_cube_from_shapefile`

Comment on lines +2229 to +2234
cube: iris.cube.Cube,
shape: shapely.Geometry,
shape_crs: cartopy.crs | pyproj.CRS,
in_place: bool = False,
minimum_weight: float = 0.0,
**kwargs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've changed the order of parameters. Anyone calling this without using keywords e.g. mask_cube_from_shapefile(my_cube, my_shape, 0.5, True) will get a failure.

A number between 0-1 describing what % of a cube cell area must the shape overlap to be masked.
Only applied to polygon shapes. If the shape is a line or point then this is ignored.
Other Parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you decide which ones were kwargs and which ones were named parameters?

Comment on lines +2371 to +2372
For best masking results, both the cube _and_ masking geometry should have a
coordinate reference system (CRS) defined. Masking results will be most reliable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify that the masking geometry CRS comes from the function parameter (as opposed to being part of the shape object itself)?

Mask a :class:`~iris.cube.Cube` with any shape object, (e.g. Natural Earth Shapefiles via cartopy).
Finds the overlap between the `shape` and the :class:`~iris.cube.Cube` and
masks out any cells that __do not__ intersect the shape.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
masks out any cells that __do not__ intersect the shape.
masks out any cells that _do not_ intersect the shape.

This is rendering weird.

Comment on lines +2278 to +2282
By default, all cells touched by geometries are kept (equivalent to `minimum_weight=0`). This behaviour
can be changed by increasing the `minimum_weight` keyword argument or setting `all_touched=False`,
then only the only cells whose center is within the polygon or that are selected by Bresenham’s line algorithm
(for line type shapes) are kept. For points, the `minimum_weight` is ignored, and the cell that intersects the point
is kept.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
By default, all cells touched by geometries are kept (equivalent to `minimum_weight=0`). This behaviour
can be changed by increasing the `minimum_weight` keyword argument or setting `all_touched=False`,
then only the only cells whose center is within the polygon or that are selected by Bresenhams line algorithm
(for line type shapes) are kept. For points, the `minimum_weight` is ignored, and the cell that intersects the point
is kept.
By default, all cells touched by geometries are kept (equivalent to ``minimum_weight=0``). This behaviour
can be changed by increasing the ``minimum_weight`` keyword argument or setting ``all_touched=False``,
then only the only cells whose center is within the polygon or that are selected by Bresenhams line algorithm
(for line type shapes) are kept. For points, the `minimum_weight` is ignored, and the cell that intersects the point
is kept.

If you want monospace: it's double-backtick.

If you want to refer to a parameter: I believe the standard recommends single-backtick.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few more examples elsewhere in this docstring.

Comment on lines +2352 to +2354
>>> admin1 = shpreader.natural_earth(resolution='110m',
category='cultural',
name='admin_1_states_provinces_lakes')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
>>> admin1 = shpreader.natural_earth(resolution='110m',
category='cultural',
name='admin_1_states_provinces_lakes')
>>> admin1 = shpreader.natural_earth(resolution='110m',
... category='cultural',
... name='admin_1_states_provinces_lakes')

Extract a rectangular region covering the UK from a stereographic projection cube:
>>> cube = iris.load_cube(iris.sample_data_path("toa_brightness_stereographic.nc"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to print some one line Cube summaries to demonstrate the effect that masking is having.

Example:

Suggested change
>>> cube = iris.load_cube(iris.sample_data_path("toa_brightness_stereographic.nc"))
>>> cube = iris.load_cube(iris.sample_data_path("toa_brightness_stereographic.nc"))
>>> print(cube.summary(shorten=True))
toa_brightness_temperature / (K) (projection_y_coordinate: 160; projection_x_coordinate: 256)

Comment on lines +21 to +27
import cartopy
import cf_units
from dask import array as da
import numpy as np
import numpy.ma as ma
import pyproj
import shapely
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imports that are exclusively for type-hinting should be moved into the TYPE_CHECKING code block:

iris/lib/iris/util.py

Lines 38 to 39 in 8333991

if TYPE_CHECKING:
from iris.cube import Cube, CubeList

from iris._deprecation import warn_deprecated
from iris._lazy_data import is_lazy_data, is_lazy_masked_data
from iris._shapefiles import create_shapefile_mask
from iris._shapefiles import create_shape_mask
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be moved within mask_cube_from_shape(), since it invokes an optional dependency. Otherwise anyone who wants to use ANYTHING in iris.util needs to have Rasterio and Affine installed!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this would fix the benchmarking CI by the way)

Copy link
Contributor

@trexfeathers trexfeathers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still left to review:

  • tests
  • lock files
  • confirm stable behaviour given the heavy refactor

Comment on lines +35 to +36
# to "OFF" in the environment. Having PROJ_NETWORK = "ON"
# can lead to some coordinate transformations resulting in Inf values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a pointless feature! I'd be interested to know why.

Comment on lines +27 to +30
if "iris.cube" in sys.modules:
import iris.cube
if "iris.analysis.cartography" in sys.modules:
import iris.analysis.cartography
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unprecedented. Something larger is wrong if these modules are not available.

Comment on lines +66 to +76
minimum_weight : float, optional
The minimum weight of the geometry to be included in the mask.
If the weight is less than this value, the geometry will not be
included in the mask. Defaults to 0.0.
all_touched : bool, optional
If True, all pixels touched by the geometry will be included in the mask.
If False, only pixels fully covered by the geometry will be included in the mask.
Defaults to True.
invert : bool, optional
If True, the mask will be inverted, so that pixels not covered by the geometry
will be included in the mask. Defaults to False.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about the defaults.

Stating optional is the correct way to say that something defaults to None. So that's correct for geometry_crs/all_touched/invert, but minimum_weight should say: float, default=0.0.

However: all_touched and invert incorrectly state that they have other default values. If this should be the case, then some changes are needed and optional needs to be replaced with default=. If this should NOT be the case, then the docstring needs correcting.

Comment on lines +268 to +290
Examples
--------
>>> from shapely.geometry import box
>>> from pyproj import CRS
>>> from iris._shapefiles import is_geometry_valid
Create a valid geometry covering Canada, and check
its validity for the WGS84 coordinate system:
>>> canada = box(-143.5,42.6,-37.8,84.0)
>>> wgs84 = CRS.from_epsg(4326)
>>> is_geometry_valid(canada, wgs84)
The function returns silently if the geometry is valid.
Now create an invalid geometry covering the Bering Sea,
and check its validity for the WGS84 coordinate system.
>>> bering_sea = box(148.42,49.1,-138.74,73.12)
>>> wgs84 = CRS.from_epsg(4326)
>>> is_geometry_valid(bering_sea, wgs84) # doctest: +IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last)
ValueError: Geometry crossing the 180th meridian is not supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is admirable, but I should make you aware that by default no private modules - those named beginning with _ - are rendered in the documentation.

Still useful for future developers, or if we ever choose to make this public.

Comment on lines +311 to +314
# Make pyproj transformer
# Transforms the input geometry to the WGS84 coordinate system
t = Transformer.from_crs(geometry_crs, WGS84_crs, always_xy=True).transform
geometry = shapely.ops.transform(t, geometry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be calling _transform_geometry() here - an ideal way to ensure D.R.Y.

Comment on lines +422 to +423
for xy in mask_xy:
weighted_mask_template[xy] = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're concerned about performance, I believe there are ways to refactor this to assign False in a single NumPy operation instead of a loop.

If you're not worried then I'm not worried.

Comment on lines +390 to +401
if not cube.coord(axis="X", dim_coords=True).has_bounds():
cube.coord(axis="X", dim_coords=True).guess_bounds()
if not cube.coord(axis="Y", dim_coords=True).has_bounds():
cube.coord(axis="Y", dim_coords=True).guess_bounds()
# Get the shape of the cube
w = len(cube.coord(axis="X", dim_coords=True).points)
h = len(cube.coord(axis="Y", dim_coords=True).points)
# Get the bounds of the cube
# x_bounds = _get_mod_rebased_coord_bounds(cube.coord(x_name))
# y_bounds = _get_mod_rebased_coord_bounds(cube.coord(y_name))
x_bounds = cube.coord(axis="X", dim_coords=True).bounds
y_bounds = cube.coord(axis="Y", dim_coords=True).bounds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had trouble reading this section. Also calling cube.coord() is expensive.

Suggested change
if not cube.coord(axis="X", dim_coords=True).has_bounds():
cube.coord(axis="X", dim_coords=True).guess_bounds()
if not cube.coord(axis="Y", dim_coords=True).has_bounds():
cube.coord(axis="Y", dim_coords=True).guess_bounds()
# Get the shape of the cube
w = len(cube.coord(axis="X", dim_coords=True).points)
h = len(cube.coord(axis="Y", dim_coords=True).points)
# Get the bounds of the cube
# x_bounds = _get_mod_rebased_coord_bounds(cube.coord(x_name))
# y_bounds = _get_mod_rebased_coord_bounds(cube.coord(y_name))
x_bounds = cube.coord(axis="X", dim_coords=True).bounds
y_bounds = cube.coord(axis="Y", dim_coords=True).bounds
x_coord, y_coord = [cube.coord(axis=a, dim_coords=True) for a in ("X", "Y")]
w, h = [len(c.points) for c in (x_coord, y_coord)]
for coord in (x_coord, y_coord):
if not coord.has_bounds():
coord.guess_bounds()
x_bounds, y_bounds = [c.bounds() for c in (x_coord, y_coord)]

w = len(x_coord.points)
h = len(y_coord.points)
# Mask by weight if minimum_weight > 0.0
if minimum_weight > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed preserving existing behaviour with minimum_weight - #6129 (comment) - to avoid breaking changes. I'm having a hard time confirming this:

  • The code for this behaviour has been heavily refactored
  • The existing tests have been rewritten and can no longer be run in their original form due to API changes

Can you help? Do you have anything 'ready-made' that I can use to confirm stable behaviour? Otherwise I'll write something myself. Thanks

Comment on lines +441 to +442
x_points = cube.coord(axis="X", dim_coords=True).points
y_points = cube.coord(axis="Y", dim_coords=True).points
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please assign the coords to variables to avoid running Cube.coord() more than you need to (it's slow)

Comment on lines +205 to +214
# Define raster transform based on cube
# This maps the geometry domain onto the cube domain
tr = _make_raster_cube_transform(cube)
# Generate mask from geometry
mask_template = rfeatures.geometry_mask(
geometries=shapely.get_parts(geometry),
out_shape=(h, w),
transform=tr,
all_touched=all_touched,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be up for giving me a 5 minute lesson about this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark_this Request that this pull request be benchmarked to check if it introduces performance shifts Type: Bug

Projects

Status: No status
Status: 👀 In Review

Development

Successfully merging this pull request may close these issues.

_transform_coord_system geometry is always assume to be None

9 participants