-
Notifications
You must be signed in to change notification settings - Fork 296
Refactor shape masking #6129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Refactor shape masking #6129
Conversation
From @SciTools/peloton : discussed what we are seeing here and in #6126. |
Discussion with @acchamber notes:
|
What are @SciTools/peloton (@pp-mo) thoughts about adding This would facilitate better handling of shapes to rasters (which is essentially the problem we're solving) with the added bonus that we could also mask to other shape types, like lines, to extract a trajectory from a Cube, for example. |
I was planning on going to the AVD surgery this week to discuss some changes and/or make the case for rasterio |
Thanks for your patience everyone. It's sometimes a struggle to balance everything and we just haven't had an opportunity to discuss this. Coming to the Surgery (UK Met Office) is an ideal next step. |
@pp-mo, @trexfeathers and @stephenworsley have discussed this at the Surgery and agree in principle that we could consider adding rasterio as an optional dependency. |
for more information, see https://pre-commit.ci
From @SciTools/peloton: would @hsteptoe and @acchamber appreciate any more input from core devs at this point? |
I just need to find more time, which is in short supply as we get near the end of FY... |
This is now ready for some initial (alpha?) testing 🎉. @acchamber are you available to try out the new features? (Updates to docs etc. will follow if this is successful) Tests are incomplete are should not be review yet. This is a fairly substantial re-write. New/improved features include:
There are some breaking changes, principally moving away from having a My informal tests so far show working behaviour for iris test data from |
…bugfix-shape-masking
for more information, see https://pre-commit.ci
[project.optional-dependencies] | ||
RASTERIO = ["rasterio"] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bjlittle I remember that we stopped doing PyPI optional dependencies during your tenure with Iris. Can you remember why? Would you anticipate problems if we started again?
Maybe it's just a problem with a single-source-of-truth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trexfeathers @bjlittle Did you come to a conclusion about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We didn't. However we are attempting to standardise across our repositories and we don't have an agreed way to handle optional dependencies, so please remove this entry.
@stephenworsley @trexfeathers Can you advise on apparently And, do you know why my local pre-commit version of |
@hsteptoe really sorry for missing this. There is a MO-specific step you will need to do for the lockfiles, will message you offline. I'm less sure about pre-commit. There is some nuance about only testing files which have been edited; perhaps the merge from |
pre-commit related errors may have been something wrong going on with my pre-commit cache. Did a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still left to review:
-
_shapefiles.py
- tests
- lock files
#. `@hsteptoe <https://github.com/hsteptoe>`_ added `rasterio <https://rasterio.readthedocs.io/en/stable/index.html>`_ | ||
and `affiine <https://affine.readthedocs.io/en/latest/>`_ as a dependency for :func:`iris.util.mask_cube_from_shapefile`. | ||
This is to support the new functionality of handling additional shapefiles and projections. (:issue:`6126`, :pull:`6129`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very important to emphasise that this is an optional dependency.
[project.optional-dependencies] | ||
RASTERIO = ["rasterio"] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We didn't. However we are attempting to standardise across our repositories and we don't have an agreed way to handle optional dependencies, so please remove this entry.
horizontal grid. | ||
(:issue:`5770`, :pull:`6581`) | ||
|
||
#. `@hsteptoe <https://github.com/hsteptoe>`_ and `@...`_ (reviewer) extended :func:`iris.util.mask_cube_from_shapefile` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#. `@hsteptoe <https://github.com/hsteptoe>`_ and `@...`_ (reviewer) extended :func:`iris.util.mask_cube_from_shapefile` | |
#. `@hsteptoe <https://github.com/hsteptoe>`_ and `@trexfeathers`_ (reviewer) extended :func:`iris.util.mask_cube_from_shapefile` |
cube: iris.cube.Cube, | ||
shape: shapely.Geometry, | ||
shape_crs: cartopy.crs | pyproj.CRS, | ||
in_place: bool = False, | ||
minimum_weight: float = 0.0, | ||
**kwargs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've changed the order of parameters. Anyone calling this without using keywords e.g. mask_cube_from_shapefile(my_cube, my_shape, 0.5, True)
will get a failure.
A number between 0-1 describing what % of a cube cell area must the shape overlap to be masked. | ||
Only applied to polygon shapes. If the shape is a line or point then this is ignored. | ||
Other Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you decide which ones were kwargs
and which ones were named parameters?
For best masking results, both the cube _and_ masking geometry should have a | ||
coordinate reference system (CRS) defined. Masking results will be most reliable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify that the masking geometry CRS comes from the function parameter (as opposed to being part of the shape object itself)?
Mask a :class:`~iris.cube.Cube` with any shape object, (e.g. Natural Earth Shapefiles via cartopy). | ||
Finds the overlap between the `shape` and the :class:`~iris.cube.Cube` and | ||
masks out any cells that __do not__ intersect the shape. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
masks out any cells that __do not__ intersect the shape. | |
masks out any cells that _do not_ intersect the shape. |
This is rendering weird.
By default, all cells touched by geometries are kept (equivalent to `minimum_weight=0`). This behaviour | ||
can be changed by increasing the `minimum_weight` keyword argument or setting `all_touched=False`, | ||
then only the only cells whose center is within the polygon or that are selected by Bresenham’s line algorithm | ||
(for line type shapes) are kept. For points, the `minimum_weight` is ignored, and the cell that intersects the point | ||
is kept. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default, all cells touched by geometries are kept (equivalent to `minimum_weight=0`). This behaviour | |
can be changed by increasing the `minimum_weight` keyword argument or setting `all_touched=False`, | |
then only the only cells whose center is within the polygon or that are selected by Bresenham’s line algorithm | |
(for line type shapes) are kept. For points, the `minimum_weight` is ignored, and the cell that intersects the point | |
is kept. | |
By default, all cells touched by geometries are kept (equivalent to ``minimum_weight=0``). This behaviour | |
can be changed by increasing the ``minimum_weight`` keyword argument or setting ``all_touched=False``, | |
then only the only cells whose center is within the polygon or that are selected by Bresenham’s line algorithm | |
(for line type shapes) are kept. For points, the `minimum_weight` is ignored, and the cell that intersects the point | |
is kept. |
If you want monospace: it's double-backtick.
If you want to refer to a parameter: I believe the standard recommends single-backtick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few more examples elsewhere in this docstring.
>>> admin1 = shpreader.natural_earth(resolution='110m', | ||
category='cultural', | ||
name='admin_1_states_provinces_lakes') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> admin1 = shpreader.natural_earth(resolution='110m', | |
category='cultural', | |
name='admin_1_states_provinces_lakes') | |
>>> admin1 = shpreader.natural_earth(resolution='110m', | |
... category='cultural', | |
... name='admin_1_states_provinces_lakes') |
Extract a rectangular region covering the UK from a stereographic projection cube: | ||
>>> cube = iris.load_cube(iris.sample_data_path("toa_brightness_stereographic.nc")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to print some one line Cube
summaries to demonstrate the effect that masking is having.
Example:
>>> cube = iris.load_cube(iris.sample_data_path("toa_brightness_stereographic.nc")) | |
>>> cube = iris.load_cube(iris.sample_data_path("toa_brightness_stereographic.nc")) | |
>>> print(cube.summary(shorten=True)) | |
toa_brightness_temperature / (K) (projection_y_coordinate: 160; projection_x_coordinate: 256) |
import cartopy | ||
import cf_units | ||
from dask import array as da | ||
import numpy as np | ||
import numpy.ma as ma | ||
import pyproj | ||
import shapely |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imports that are exclusively for type-hinting should be moved into the TYPE_CHECKING
code block:
Lines 38 to 39 in 8333991
if TYPE_CHECKING: | |
from iris.cube import Cube, CubeList |
from iris._deprecation import warn_deprecated | ||
from iris._lazy_data import is_lazy_data, is_lazy_masked_data | ||
from iris._shapefiles import create_shapefile_mask | ||
from iris._shapefiles import create_shape_mask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be moved within mask_cube_from_shape()
, since it invokes an optional dependency. Otherwise anyone who wants to use ANYTHING in iris.util
needs to have Rasterio and Affine installed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this would fix the benchmarking CI by the way)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still left to review:
- tests
- lock files
- confirm stable behaviour given the heavy refactor
# to "OFF" in the environment. Having PROJ_NETWORK = "ON" | ||
# can lead to some coordinate transformations resulting in Inf values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a pointless feature! I'd be interested to know why.
if "iris.cube" in sys.modules: | ||
import iris.cube | ||
if "iris.analysis.cartography" in sys.modules: | ||
import iris.analysis.cartography |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unprecedented. Something larger is wrong if these modules are not available.
minimum_weight : float, optional | ||
The minimum weight of the geometry to be included in the mask. | ||
If the weight is less than this value, the geometry will not be | ||
included in the mask. Defaults to 0.0. | ||
all_touched : bool, optional | ||
If True, all pixels touched by the geometry will be included in the mask. | ||
If False, only pixels fully covered by the geometry will be included in the mask. | ||
Defaults to True. | ||
invert : bool, optional | ||
If True, the mask will be inverted, so that pixels not covered by the geometry | ||
will be included in the mask. Defaults to False. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused about the defaults.
Stating optional
is the correct way to say that something defaults to None
. So that's correct for geometry_crs
/all_touched
/invert
, but minimum_weight
should say: float, default=0.0
.
However: all_touched
and invert
incorrectly state that they have other default values. If this should be the case, then some changes are needed and optional
needs to be replaced with default=
. If this should NOT be the case, then the docstring needs correcting.
Examples | ||
-------- | ||
>>> from shapely.geometry import box | ||
>>> from pyproj import CRS | ||
>>> from iris._shapefiles import is_geometry_valid | ||
Create a valid geometry covering Canada, and check | ||
its validity for the WGS84 coordinate system: | ||
>>> canada = box(-143.5,42.6,-37.8,84.0) | ||
>>> wgs84 = CRS.from_epsg(4326) | ||
>>> is_geometry_valid(canada, wgs84) | ||
The function returns silently if the geometry is valid. | ||
Now create an invalid geometry covering the Bering Sea, | ||
and check its validity for the WGS84 coordinate system. | ||
>>> bering_sea = box(148.42,49.1,-138.74,73.12) | ||
>>> wgs84 = CRS.from_epsg(4326) | ||
>>> is_geometry_valid(bering_sea, wgs84) # doctest: +IGNORE_EXCEPTION_DETAIL | ||
Traceback (most recent call last) | ||
ValueError: Geometry crossing the 180th meridian is not supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is admirable, but I should make you aware that by default no private modules - those named beginning with _
- are rendered in the documentation.
Still useful for future developers, or if we ever choose to make this public.
# Make pyproj transformer | ||
# Transforms the input geometry to the WGS84 coordinate system | ||
t = Transformer.from_crs(geometry_crs, WGS84_crs, always_xy=True).transform | ||
geometry = shapely.ops.transform(t, geometry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be calling _transform_geometry()
here - an ideal way to ensure D.R.Y.
for xy in mask_xy: | ||
weighted_mask_template[xy] = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're concerned about performance, I believe there are ways to refactor this to assign False
in a single NumPy operation instead of a loop.
If you're not worried then I'm not worried.
if not cube.coord(axis="X", dim_coords=True).has_bounds(): | ||
cube.coord(axis="X", dim_coords=True).guess_bounds() | ||
if not cube.coord(axis="Y", dim_coords=True).has_bounds(): | ||
cube.coord(axis="Y", dim_coords=True).guess_bounds() | ||
# Get the shape of the cube | ||
w = len(cube.coord(axis="X", dim_coords=True).points) | ||
h = len(cube.coord(axis="Y", dim_coords=True).points) | ||
# Get the bounds of the cube | ||
# x_bounds = _get_mod_rebased_coord_bounds(cube.coord(x_name)) | ||
# y_bounds = _get_mod_rebased_coord_bounds(cube.coord(y_name)) | ||
x_bounds = cube.coord(axis="X", dim_coords=True).bounds | ||
y_bounds = cube.coord(axis="Y", dim_coords=True).bounds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had trouble reading this section. Also calling cube.coord()
is expensive.
if not cube.coord(axis="X", dim_coords=True).has_bounds(): | |
cube.coord(axis="X", dim_coords=True).guess_bounds() | |
if not cube.coord(axis="Y", dim_coords=True).has_bounds(): | |
cube.coord(axis="Y", dim_coords=True).guess_bounds() | |
# Get the shape of the cube | |
w = len(cube.coord(axis="X", dim_coords=True).points) | |
h = len(cube.coord(axis="Y", dim_coords=True).points) | |
# Get the bounds of the cube | |
# x_bounds = _get_mod_rebased_coord_bounds(cube.coord(x_name)) | |
# y_bounds = _get_mod_rebased_coord_bounds(cube.coord(y_name)) | |
x_bounds = cube.coord(axis="X", dim_coords=True).bounds | |
y_bounds = cube.coord(axis="Y", dim_coords=True).bounds | |
x_coord, y_coord = [cube.coord(axis=a, dim_coords=True) for a in ("X", "Y")] | |
w, h = [len(c.points) for c in (x_coord, y_coord)] | |
for coord in (x_coord, y_coord): | |
if not coord.has_bounds(): | |
coord.guess_bounds() | |
x_bounds, y_bounds = [c.bounds() for c in (x_coord, y_coord)] |
w = len(x_coord.points) | ||
h = len(y_coord.points) | ||
# Mask by weight if minimum_weight > 0.0 | ||
if minimum_weight > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed preserving existing behaviour with minimum_weight
- #6129 (comment) - to avoid breaking changes. I'm having a hard time confirming this:
- The code for this behaviour has been heavily refactored
- The existing tests have been rewritten and can no longer be run in their original form due to API changes
Can you help? Do you have anything 'ready-made' that I can use to confirm stable behaviour? Otherwise I'll write something myself. Thanks
x_points = cube.coord(axis="X", dim_coords=True).points | ||
y_points = cube.coord(axis="Y", dim_coords=True).points |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please assign the coords to variables to avoid running Cube.coord()
more than you need to (it's slow)
# Define raster transform based on cube | ||
# This maps the geometry domain onto the cube domain | ||
tr = _make_raster_cube_transform(cube) | ||
# Generate mask from geometry | ||
mask_template = rfeatures.geometry_mask( | ||
geometries=shapely.get_parts(geometry), | ||
out_shape=(h, w), | ||
transform=tr, | ||
all_touched=all_touched, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be up for giving me a 5 minute lesson about this?
🚀 Pull Request
Description
Fixes #6126 by exposing the shape geometry to the user. Adds to docs and docstrings to make it clearer how to use this.
Consult Iris pull request check list
Add any of the below labels to trigger actions on this PR: