Skip to content

Conversation

@tfds-copybara
Copy link
Collaborator

Update TFDS to 4.2.0

API:

  • Add tfds build to the CLI. See documentation.
  • DownloadManager now returns Pathlib-like objects
  • Datasets returned by tfds.as_numpy are compatible with len(ds)
  • New tfds.features.Dataset to represent nested datasets
  • Add tfds.ReadConfig(add_tfds_id=True) to add a unique identifiant to the example ex['tfds_id'] (e.g. b'train.tfrecord-00012-of-01024__123')
  • Add num_parallel_calls option to tfds.ReadConfig to overwrite to default AUTOTUNE option
  • tfds.ImageFolder now support tfds.decode.SkipDecoder
  • Add multichannel audio support to tfds.features.Audio
  • Better tfds.as_dataframe visualization (ffmpeg video if installed, bounding boxes,...)
  • Add try_gcs to tfds.builder(..., try_gcs=True)
  • Simpler BuilderConfig definition: global VERSION and RELEASE_NOTES are applied to all BuilderConfig. Config description is now optional.

Breaking compatibility changes:

  • To guarantee better deterministic, new validations are performed on the keys when creating a dataset (to avoid filenames as keys (non-deterministic) and restrict key to str, bytes and int). New errors likely indicates an issue in the dataset implementation.
  • tfds.core.benchmark now returns a pd.DataFrame (instead of a dict)
  • tfds.units is not visible anymore from the public API

Bug fixes:

  • Support 0-len sequence with images of dynamic shape (Fix Support sequence length 0 for images with unknown shape #2616)
  • Progression bar correctly updated when copying files.
  • Many bug fixes (GPath consistency with pathlib, s3 compatibility, TQDM visual artifacts, GCS crash on windows, re-download when checksums updated,...)
  • Better debugging and error message (e.g. human readable size,...)
  • Allow max_examples_per_splits=0 in tfds build --max_examples_per_splits=0 to test _split_generators only (without _generate_examples).

@google-cla google-cla bot added the cla: yes Author has signed CLA label Jan 5, 2021
@tfds-copybara tfds-copybara force-pushed the cl_350137917 branch 2 times, most recently from ae3c56a to c05b831 Compare January 6, 2021 14:37
API:

 * Add `tfds build` to the CLI. See [documentation](https://www.tensorflow.org/datasets/cli#tfds_build_download_and_prepare_a_dataset).
 * DownloadManager now returns [Pathlib-like](https://docs.python.org/3/library/pathlib.html#basic-use) objects
 * Datasets returned by `tfds.as_numpy` are compatible with `len(ds)`
 * New `tfds.features.Dataset` to represent nested datasets
 * Add `tfds.ReadConfig(add_tfds_id=True)` to add a unique identifiant to the example `ex['tfds_id']` (e.g. `b'train.tfrecord-00012-of-01024__123'`)
 * Add `num_parallel_calls` option to `tfds.ReadConfig` to overwrite to default `AUTOTUNE` option
 * `tfds.ImageFolder` now support `tfds.decode.SkipDecoder`
 * Add multichannel audio support to `tfds.features.Audio`
 * Better `tfds.as_dataframe` visualization (ffmpeg video if installed, bounding boxes,...)
 * Add `try_gcs` to `tfds.builder(..., try_gcs=True)`
 * Simpler `BuilderConfig` definition: global `VERSION` and `RELEASE_NOTES` are applied to all `BuilderConfig`. Config description is now optional.

Breaking compatibility changes:

* Removed non-plain text config of text datasets and remove config: `multi_nli/plain_text` -> `multi_nli`
* To guarantee better deterministic, new validations are performed on the keys when creating a dataset (to avoid filenames as keys (non-deterministic) and restrict key to `str`, `bytes` and `int`). New errors likely indicates an issue in the dataset implementation.
* `tfds.core.benchmark` now returns a `pd.DataFrame` (instead of a `dict`)
* `tfds.units` is not visible anymore from the public API

Bug fixes:

* Support 0-len sequence with images of dynamic shape (Fix #2616)
* Progression bar correctly updated when copying files.
* Many bug fixes (GPath consistency with pathlib, s3 compatibility, TQDM visual artifacts, GCS crash on windows, re-download when checksums updated,...)
* Better debugging and error message (e.g. human readable size,...)
* Allow `max_examples_per_splits=0` in `tfds build --max_examples_per_splits=0` to test `_split_generators` only (without `_generate_examples`).

And of course, new datasets and many datasets updates.

Thank you the community for their many valuable contributions and to supporting us in this project!!!

PiperOrigin-RevId: 350344016
@google-cla
Copy link

google-cla bot commented Jan 6, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@google-cla google-cla bot added cla: no Author has not signed CLA and removed cla: yes Author has signed CLA labels Jan 6, 2021
@tfds-copybara tfds-copybara merged commit ccb1bbc into master Jan 6, 2021
@tfds-copybara tfds-copybara deleted the cl_350137917 branch January 6, 2021 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: no Author has not signed CLA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support sequence length 0 for images with unknown shape

3 participants