Skip to content

Commit 59252b3

Browse files
authored
Merge pull request zarrs#221 from zarrs/chunk_cache_refactor
refactor!: `ChunkCache` impls now borrow an `Array` + remove `ArrayChunkCacheExt`
2 parents f2dc2a1 + 8291f3d commit 59252b3

File tree

6 files changed

+710
-813
lines changed

6 files changed

+710
-813
lines changed

CHANGELOG.md

Lines changed: 36 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -12,36 +12,36 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1212
- Add `VlenCodec::with_index_location()`
1313
- Add `numcodecs.adler32` codec
1414
- Add `ChunkCacheTypePartialDecoder`, `ChunkCachePartialDecoderLru{Chunk,Size}Limit[ThreadLocal]`
15-
- Add `Array::storage()`
15+
- Add `Array::storage()` and `Array::with_storage()`
1616
- Add `Array<T>::[async_]readable()` where `T: [Async]ReadableWritableStorageTraits`
1717

1818
### Changed
1919
- **Breaking**: Refactor `ArrayBuilder`
2020
- All fields are now private
21-
- Add `ArrayBuilder::{new_with_chunk_grid,chunk_grid_metadata,build_metadata}()`
21+
- Add `ArrayBuilder::{new_with_chunk_grid,chunk_grid_metadata,build_metadata,attributes_mut}()`
2222
- Add `ArrayBuilder{ChunkGrid,DataType,FillValue}`
2323
- Change `ArrayBuilder::new()` to take a broader range of types for each parameter, and swap order of `chunk_grid`/`data_type`. See below
24-
```diff
25-
ArrayBuilder::new(
26-
// array shape
27-
vec![8, 8], // or [8, 8], &[8, 8], etc.
28-
- DataType::Float32,
29-
- vec![4, 4].try_into()?, // no longer valid
30-
- f32::NAN.into(), // no longer valid
31-
+ // regular chunk shape or chunk grid metadata
32-
+ vec![4, 4], // or [4, 4], &[4, 4], "{"name":"regular",...}", MetadataV3::new_with_configuration(...)
33-
+ // data type or data type metadata
34-
+ DataType::Float32, // or "float32", "{"name":"float32"}", MetadataV3::new("float32").
35-
+ // fill value or fill value metadata
36-
+ f32::NAN, // or "NaN", FillValue, FillValueMetadataV3
37-
)
38-
.build()
39-
```
24+
```diff
25+
ArrayBuilder::new(
26+
// array shape
27+
vec![8, 8], // or [8, 8], &[8, 8], etc.
28+
- DataType::Float32,
29+
- vec![4, 4].try_into()?, // no longer valid
30+
- f32::NAN.into(), // no longer valid
31+
+ // regular chunk shape or chunk grid metadata
32+
+ vec![4, 4], // or [4, 4], &[4, 4], "{"name":"regular",...}", MetadataV3::new_with_configuration(...)
33+
+ // data type or data type metadata
34+
+ DataType::Float32, // or "float32", "{"name":"float32"}", MetadataV3::new("float32").
35+
+ // fill value or fill value metadata
36+
+ f32::NAN, // or "NaN", FillValue, FillValueMetadataV3
37+
)
38+
.build()
39+
```
4040
- **Breaking**: change the `{Array,Chunk}Representation::new[_unchecked]` `fill_value` parameter to take `impl Into<FillValue>` instead of `FillValue`
41-
```diff
42-
- ChunkRepresentation::new(chunk_shape(), DataType::Float32, 0.0f32.into())?,
43-
+ ChunkRepresentation::new(chunk_shape(), DataType::Float32, 0.0f32)?,
44-
```
41+
```diff
42+
- ChunkRepresentation::new(chunk_shape(), DataType::Float32, 0.0f32.into())?,
43+
+ ChunkRepresentation::new(chunk_shape(), DataType::Float32, 0.0f32)?,
44+
```
4545
- **Breaking**: `Array::set_shape()` now returns a `Result`
4646
- Previously it was possible to resize an array to a shape incompatible with a `rectangular` chunk grid
4747
- **Breaking**: Refactor `ChunkGridTraits` and `ChunkGridPlugin`, chunk grids are initialised with the array shape
@@ -52,11 +52,24 @@ ArrayBuilder::new(
5252
- **Breaking**: `ArrayShardedExt::inner_chunk_grid_shape()` no longer returns an `Option`
5353
- **Breaking**: change `array::codecs()` to return an `Arc`d instead of borrowed `CodecChain`
5454
- **Breaking**: Add `size()` method to `{Array,Bytes}PartialDecoderTraits`
55-
- **Breaking**: Add `retrieve_chunk_subset` method to the `ChunkCache` trait
55+
- **Breaking:: Refactor the `ChunkCache` trait
56+
- The previous API supported misuse (e.g. using a chunk cache with different arrays)
57+
- **Breaking**: Add `retrieve_chunk_subset()` and `array()` methods (required)
58+
- Add `retrieve_{array_subset,chunks}()` methods with `_elements()` and `_ndarray()` variants (provided)
59+
- Remove `array` from method parameters, the `ChunkCache` must borrow/own an `Array` instead. See below
60+
```diff
61+
- let cache = ChunkCacheEncodedLruChunkLimit::new(50);
62+
- array.retrieve_chunk_opt_cached(&cache, &[0, 1], &CodecOptions::default()),
63+
+ let cache = ChunkCacheEncodedLruChunkLimit::new(&array, 50);
64+
+ cache.retrieve_chunk(&[0, 1], &CodecOptions::default()),
65+
```
5666
- Bump `zarrs_metadata_ext` to 0.2.0
5767
- Bump `zarrs_storage` to 0.4.0
5868
- Bump `blosc-src` to 0.3.6
5969

70+
### Removed
71+
- Remove `ArrayChunkCacheExt`. Use the `ChunkCache` methods instead
72+
6073
### Fixed
6174
- Permit data types with empty configurations that do not require one
6275

zarrs/src/array.rs

Lines changed: 25 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,6 @@ pub use crate::metadata::{
8585
};
8686
use zarrs_metadata_ext::v2_to_v3::ArrayMetadataV2ToV3Error;
8787

88-
pub use chunk_cache::array_chunk_cache_ext_sync::ArrayChunkCacheExt;
8988
pub use chunk_cache::{
9089
chunk_cache_lru::*, ChunkCache, ChunkCacheType, ChunkCacheTypeDecoded, ChunkCacheTypeEncoded,
9190
ChunkCacheTypePartialDecoder,
@@ -198,11 +197,12 @@ pub fn chunk_shape_to_array_shape(chunk_shape: &[std::num::NonZeroU64]) -> Array
198197
/// - Variants without the `_opt` suffix use default [`CodecOptions`](crate::array::codec::CodecOptions).
199198
/// - **Experimental**: `async_` prefix variants can be used with async stores (requires `async` feature).
200199
///
201-
/// Additional methods are offered by extension traits:
200+
/// Additional [`Array`] methods are offered by extension traits:
202201
/// - [`ArrayShardedExt`] and [`ArrayShardedReadableExt`]: see [Reading Sharded Arrays](#reading-sharded-arrays).
203-
/// - [`ArrayChunkCacheExt`]: see [Chunk Caching](#chunk-caching).
204202
/// - [`[Async]ArrayDlPackExt`](ArrayDlPackExt): methods for [`DLPack`](https://arrow.apache.org/docs/python/dlpack.html) tensor interop.
205203
///
204+
/// [`ChunkCache`] implementations offer a similar API to [`Array::ReadableStorageTraits`](crate::storage::ReadableStorageTraits), except with [Chunk Caching](#chunk-caching) support.
205+
///
206206
/// ### Chunks and Array Subsets
207207
/// Several convenience methods are available for querying the underlying chunk grid:
208208
/// - [`chunk_origin`](Array::chunk_origin)
@@ -293,16 +293,19 @@ pub fn chunk_shape_to_array_shape(chunk_shape: &[std::num::NonZeroU64]) -> Array
293293
/// Another alternative is to use [Chunk Caching](#chunk-caching).
294294
///
295295
/// ### Chunk Caching
296-
/// The [`ArrayChunkCacheExt`] trait adds [`Array`] retrieve methods that utilise chunk caching:
297-
/// - [`retrieve_chunk_opt_cached`](ArrayChunkCacheExt::retrieve_chunk_opt_cached)
298-
/// - [`retrieve_chunks_opt_cached`](ArrayChunkCacheExt::retrieve_chunks_opt_cached)
299-
/// - [`retrieve_chunk_subset_opt_cached`](ArrayChunkCacheExt::retrieve_chunk_subset_opt_cached)
300-
/// - [`retrieve_array_subset_opt_cached`](ArrayChunkCacheExt::retrieve_array_subset_opt_cached)
301-
///
302-
/// `_elements` and `_ndarray` variants are also available.
303-
/// Each method has a `cache` parameter that implements the [`ChunkCache`] trait.
296+
/// `zarrs` supports three types of chunk caches:
297+
/// - [`ChunkCacheTypeDecoded`]: caches decoded chunks.
298+
/// - Preferred where decoding is expensive and memory is abundant.
299+
/// - [`ChunkCacheTypeEncoded`]: caches encoded chunks.
300+
/// - Preferred where decoding is cheap and memory is scarce, provided that data is well compressed/sparse.
301+
/// - [`ChunkCacheTypePartialDecoder`]: caches partial decoders.
302+
/// - Preferred where chunks are repeatedly *partially retrieved*.
303+
/// - Useful for retrieval of inner chunks from sharded arrays, as the partial decoder caches shard indexes (but **not** inner chunks).
304+
/// - Memory usage of this cache is highly dependent on the array codecs and whether the codec chain ([`Array::codecs`]) ends up decoding entire chunks or caching inputs. See:
305+
/// - [`CodecTraits::partial_decoder_decodes_all`](crate::array::codec::CodecTraits::partial_decoder_decodes_all), and
306+
/// - [`CodecTraits::partial_decoder_should_cache_input`](crate::array::codec::CodecTraits::partial_decoder_should_cache_input).
304307
///
305-
/// Several Least Recently Used (LRU) chunk caches are provided by `zarrs`:
308+
/// `zarrs` implements the following Least Recently Used (LRU) chunk caches:
306309
/// - [`ChunkCacheDecodedLruChunkLimit`]: a decoded chunk cache with a fixed chunk capacity..
307310
/// - [`ChunkCacheDecodedLruSizeLimit`]: a decoded chunk cache with a fixed size in bytes.
308311
/// - [`ChunkCacheEncodedLruChunkLimit`]: an encoded chunk cache with a fixed chunk capacity.
@@ -311,16 +314,21 @@ pub fn chunk_shape_to_array_shape(chunk_shape: &[std::num::NonZeroU64]) -> Array
311314
/// - [`ChunkCachePartialDecoderLruSizeLimit`]: a partial decoder chunk cache with a fixed size in bytes.
312315
///
313316
/// There are also `ThreadLocal` suffixed variants of all of these caches that have a per-thread cache.
314-
///
315317
/// `zarrs` consumers can create custom caches by implementing the [`ChunkCache`] trait.
316318
///
319+
/// Chunk caches implement the [`ChunkCache`] trait which has cached versions of the equivalent [`Array`] methods:
320+
/// - [`retrieve_chunk`](ChunkCache::retrieve_chunk)
321+
/// - [`retrieve_chunks`](ChunkCache::retrieve_chunks)
322+
/// - [`retrieve_chunk_subset`](ChunkCache::retrieve_chunk_subset)
323+
/// - [`retrieve_array_subset`](ChunkCache::retrieve_array_subset)
324+
///
325+
/// `_elements` and `_ndarray` variants are also available.
326+
///
317327
/// Chunk caching is likely to be effective for remote stores where redundant retrievals are costly.
318328
/// Chunk caching may not outperform disk caching with a filesystem store.
319329
/// The above caches use internal locking to support multithreading, which has a performance overhead.
320330
/// **Prefer not to use a chunk cache if chunks are not accessed repeatedly**.
321-
/// Cached retrieve methods do not use partial decoders, and any intersected chunk is fully decoded if not present in the cache.
322-
/// The encoded chunk caches may be optimal if dealing with highly compressed/sparse data with a fast codec.
323-
/// However, the decoded chunk caches are likely to be more performant in most cases.
331+
/// Aside from [`ChunkCacheTypePartialDecoder`]-based caches, caches do not use partial decoders and any intersected chunk is fully retrieved if not present in the cache.
324332
///
325333
/// For many access patterns, chunk caching may reduce performance.
326334
/// **Benchmark your algorithm/data.**
@@ -374,24 +382,17 @@ pub struct Array<TStorage: ?Sized> {
374382
fill_value: FillValue,
375383
/// Specifies a list of codecs to be used for encoding and decoding chunks.
376384
codecs: Arc<CodecChain>,
377-
// /// Optional user defined attributes.
378-
// attributes: serde_json::Map<String, serde_json::Value>,
379385
/// An optional list of storage transformers.
380386
storage_transformers: StorageTransformerChain,
381387
/// An optional list of dimension names.
382388
dimension_names: Option<Vec<DimensionName>>,
383-
// /// Additional fields annotated with `"must_understand": false`.
384-
// additional_fields: AdditionalFields,
385389
/// Metadata used to create the array
386390
metadata: ArrayMetadata,
387391
}
388392

389393
impl<TStorage: ?Sized> Array<TStorage> {
390394
/// Replace the storage backing an array.
391-
pub(crate) fn with_storage<TStorage2: ?Sized>(
392-
&self,
393-
storage: Arc<TStorage2>,
394-
) -> Array<TStorage2> {
395+
pub fn with_storage<TStorage2: ?Sized>(&self, storage: Arc<TStorage2>) -> Array<TStorage2> {
395396
Array {
396397
storage,
397398
path: self.path.clone(),

zarrs/src/array/array_builder.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -356,6 +356,11 @@ impl ArrayBuilder {
356356
self
357357
}
358358

359+
/// Return a mutable reference to the attributes.
360+
pub fn attributes_mut(&mut self) -> &mut serde_json::Map<String, serde_json::Value> {
361+
&mut self.attributes
362+
}
363+
359364
/// Set the additional fields.
360365
///
361366
/// Set additional fields not defined in the Zarr specification.

0 commit comments

Comments
 (0)