These release notes apply to the core Dataproc service, and include:
Announcements of the latest Dataproc image versions installed on the Compute Engine VMs used in Dataproc clusters. See the Dataproc version list for a list of supported Dataproc images, with links to pages that list the software components installed on current and recently released Dataproc images
Announcements of new and updated Dataproc and Serverless for Apache Spark features, bug fixes, known issues, and deprecated functionality
Release schedule: The release of the latest Dataproc images can take up to one week to roll out to all regions. Until the rollout is complete, the latest Dataproc images may not be available in your region.
You can see the latest product updates for all of Google Cloud on the Google Cloud page, browse and filter all release notes in the Google Cloud console, or programmatically access release notes in BigQuery.
To get the latest product updates delivered to you, add the URL of this page to your feed reader, or add the feed URL directly.
October 06, 2025
Dataproc on Compute Engine: The following diagnostic properties are now enabled by default for new Dataproc clusters created with 2.0+ image versions:
dataproc:diagnostic.capture.enabled
: Collects checkpoint diagnostic data in the cluster temp bucket.dataproc:dataproc.logging.extended.enabled
: Collects logs for the Knox, Zeppelin, Ranger-usersync, Jupyter_notebook, Jupyter_kernel_gateway components and the Spark History-Server in Cloud Logging.dataproc:dataproc.logging.syslog.enabled
: Collects VM syslogs in Cloud Logging.
Note: To disable any of these features, set the corresponding property to false
during cluster creation.
To continue using the Ops Agent initialization action opsagent.sh
to ingest syslogs from Dataproc cluster nodes, do one of the following:
- Recommended: Use
opsagent_nosyslog.sh
since VM syslogs are emitted by default from Dataproc clusters. - Set the
dataproc:dataproc.logging.syslog.enabled=false
and continue usingopsagent.sh
to ingest syslogs.
New Serverless for Apache Spark runtime versions:
- 2.3.13
- 3.0.0-RC5
Serverless for Apache Spark: Upgraded Apache Spark to version 3.5.3 in the latest 2.3 Serverless for Apache Spark runtime versions.
October 03, 2025
New Dataproc on Compute Engine subminor image versions:
- 2.0.150-debian10, 2.0.150-ubuntu18, 2.0.150-rocky8
- 2.1.99-debian11, 2.1.99-ubuntu20, 2.1.99-ubuntu20-arm, 2.1.99-rocky8
- 2.2.67-debian12, 2.2.67-ubuntu22, 2.2.67-ubuntu22-arm, 2.2.67-rocky9
- 2.3.14-debian12, 2.3.14-ubuntu22, 2.3.14-ubuntu22-arm, 2.3.14-ml-ubuntu22, 2.3.14-rocky9
September 15, 2025
New Dataproc on Compute Engine subminor image versions:
- 2.0.149-debian10, 2.0.149-ubuntu18, 2.0.149-rocky8
- 2.1.98-debian11, 2.1.98-ubuntu20, 2.1.98-ubuntu20-arm, 2.1.98-rocky8
- 2.2.66-debian12, 2.2.66-ubuntu22, 2.2.66-ubuntu22-arm, 2.2.66-rocky9
- 2.3.13-debian12, 2.3.13-ubuntu22, 2.3.13-ubuntu22-arm, 2.3.13-ml-ubuntu22, 2.3.13-rocky9
September 11, 2025
New Serverless for Apache Spark runtime versions:
- 1.2.61
- 2.2.61
- 2.3.12
- 3.0.0-RC4
September 08, 2025
Announcing the Preview release of Dataproc on Compute Engine image version 3.0.0-RC1:
- Spark 4.0.0
- Hadoop 3.4.1
- Hive 4.1.0
- Tez 0.10.5
- Cloud Storage Connector 3.1.4
- Conda 24.11
- Java 17
- Python 3.11
- R 4.3
- Scala 2.13
Announcing the Preview release of Serverless for Apache Spark 3.0.0-RC3 runtime:
- Spark 4.0.0
- BigQuery Spark Connector 0.42.3
- Cloud Storage Connector 3.1.5
- Conda 25.3.0
- Java 21
- Python 3.12
- R 4.4
- Scala 2.13
New Dataproc on Compute Engine subminor image versions:
- 2.3.11-debian12, 2.3.11-ubuntu22, 2.3.11-ubuntu22-arm, 2.3.11-ml-ubuntu22, 2.3.11-rocky9
September 05, 2025
New Dataproc Serverless for Spark runtime versions:
- 1.2.60
- 2.2.60
- 2.3.11
September 02, 2025
Multi-tenant clusters are now available in Preview. Many data engineers and scientists can share a multi-tenant cluster to execute their workloads in isolation from each other.
August 29, 2025
New Dataproc on Compute Engine subminor image versions:
- 2.0.147-debian10, 2.0.147-ubuntu18, 2.0.147-rocky8
- 2.1.96-debian11, 2.1.96-ubuntu20, 2.1.96-ubuntu20-arm, 2.1.96-rocky8
- 2.2.64-debian12, 2.2.64-ubuntu22, 2.2.64-ubuntu22-arm, 2.2.64-rocky9
- 2.3.10-debian12, 2.3.10-ubuntu22, 2.3.10-ubuntu22-arm, 2.3.10-ml-ubuntu22, 2.3.10-rocky9
New Dataproc Serverless for Spark runtime versions:
- 1.2.59
- 2.2.59
- 2.3.10
August 22, 2025
New Dataproc Serverless for Spark runtime versions:
- 1.2.58
- 2.2.58
- 2.3.9
August 21, 2025
Serverless for Apache Spark: Fixed a bug in Dataproc Batches that occasionally caused higher latency before an application was started.
August 19, 2025
New Dataproc on Compute Engine subminor image versions:
- 2.0.146-debian10, 2.0.146-ubuntu18, 2.0.146-rocky8
- 2.1.95-debian11, 2.1.95-ubuntu20, 2.1.95-ubuntu20-arm, 2.1.95-rocky8
- 2.2.63-debian12, 2.2.63-ubuntu22, 2.2.63-ubuntu22-arm, 2.2.63-rocky9
- 2.3.9-debian12, 2.3.9-ubuntu22, 2.3.9-ubuntu22-arm, 2.3.9-ml-ubuntu22, 2.3.9-rocky9
August 14, 2025
New Dataproc Serverless for Spark runtime versions:
- 1.2.57
- 2.2.57
- 2.3.8
August 12, 2025
New Dataproc on Compute Engine subminor image versions:
- 2.0.145-debian10, 2.0.145-ubuntu18, 2.0.145-rocky8
- 2.1.94-debian11, 2.1.94-ubuntu20, 2.1.94-ubuntu20-arm, 2.1.94-rocky8
- 2.2.62-debian12, 2.2.62-ubuntu22, 2.2.62-ubuntu22-arm, 2.2.62-rocky9
- 2.3.8-debian12, 2.3.8-ubuntu22, 2.3.8-ubuntu22-arm, 2.3.8-ml-ubuntu22, 2.3.8-rocky9
New Dataproc Serverless for Spark runtime versions:
- 1.2.56
- 2.2.56
- 2.3.7
Dataproc on Compute Engine: Image versions 2.2
and 2.3
: The Iceberg optional component supports the BigLake Iceberg REST catalog.
Dataproc on Compute Engine: Sharing checkpoint diagnostic data: Setting the dataproc:diagnostic.capture.access=GOOGLE_DATAPROC_DIAGNOSE
property during cluster creation shares all of the temp bucket contents with Google Cloud support if uniform bucket-level access is enabled on temp bucket. If object-level access control is in effect on the temp bucket, only the checkpoint diagnostic data folder corresponding to the cluster in Cloud Storage is shared.
August 11, 2025
New Dataproc on Compute Engine subminor image versions:
- 2.1.93-debian11, 2.1.93-rocky8, 2.1.93-ubuntu20, 2.1.93-ubuntu20-arm
- 2.2.61-debian12, 2.2.61-rocky9, 2.2.61-ubuntu22, 2.2.61-ubuntu22-arm
July 31, 2025
New Dataproc Serverless for Spark runtime versions:
- 1.1.111
- 1.2.55
- 2.2.55
- 2.3.6
Dataproc Serverless for Spark: Subminor version 1.1.111
is the last release of runtime version 1.1
, which will no longer be supported and will not receive new releases.
July 25, 2025
New Dataproc on Compute Engine subminor image versions:
2.3.7-debian12
, 2.3.7-ubuntu22
, 2.3.7-ubuntu22-arm
, 2.3.7-ml-ubuntu22
, and 2.3.7-rocky9
The 2.3.7-ml-ubuntu22
image extends the 2.3 base image with ML-specific libraries.
July 15, 2025
Dataproc on Compute Engine: Starting August 18, 2025, the following diagnostic properties will be enabled by default for newly created Dataproc clusters:
dataproc:diagnostic.capture.enabled
: Enables the collection of checkpoint data in the cluster temp bucket.dataproc:dataproc.logging.extended.enabled
: Enables the collection of logs for the Knox, Zeppelin, Ranger-usersync, Jupyter_notebook, Jupyter_kernel_gateway components and the Spark History-Server in Cloud Logging.dataproc:dataproc.logging.syslog.enabled
: Enables the collection of VM syslogs in Cloud Logging.To continue using the Ops Agent initialization action
opsagent.sh
to ingest syslogs from Dataproc cluster nodes, do one of the following:- Recommended: Use
opsagent_nosyslog.sh
since VM syslogs will now be emitted by default from Dataproc clusters. - Set the
dataproc:dataproc.logging.syslog.enabled=false
and continue usingopsagent.sh
to ingest syslogs.
- Recommended: Use
Note: To disable any of these features, set the corresponding property to false
during cluster creation.
New Dataproc on Compute Engine subminor image versions:
2.3.6-debian12
, 2.3.6-ubuntu22
, 2.3.6-ml-ubuntu22
, and 2.3.6-rocky9
The 2.3.6-ml-ubuntu22
image extends the 2.3 base image with ML-specific libraries.
Dataproc now allows Dynamic update of multi-tenancy clusters.
July 07, 2025
The Cluster Scheduled Stop feature is available in preview. You can use this feature to stop clusters after a specified idle period, at a specified future time, or after a specified period from the cluster creation or update request.
July 04, 2025
New Dataproc on Compute Engine subminor image versions:
2.3.5-debian12
, 2.3.5-ubuntu22
, 2.3.5-ml-ubuntu22
, and 2.3.5-rocky9
The 2.3.5-ml-ubuntu22
image extends the 2.3 base image with ML-specific libraries.
Serverless for Apache Spark (formerly known as Dataproc Serverless for Spark) now supports OS Login organization policy. Organizations, folders, and projects that enforce the OS Login policy can now use Serverless for Apache Spark.
July 01, 2025
New Dataproc Serverless for Spark runtime versions:
- 1.1.110
- 1.2.54
- 2.2.54
- 2.3.5
June 20, 2025
New Dataproc Serverless for Spark runtime versions:
- 1.1.109
- 1.2.53
- 2.2.53
- 2.3.4
Dataproc Serverless for Spark: Upgraded the Cloud Storage connector version to 2.2.28
in the 1.1
runtime.
Dataproc Serverless for Spark: The built-in Iceberg now supports the BigLake Iceberg REST catalog on the 2.2
runtime.
New Dataproc on Compute Engine subminor image versions:
- 2.0.144-debian10, 2.0.144-rocky8, 2.0.144-ubuntu18
- 2.1.92-debian11, 2.1.92-rocky8, 2.1.92-ubuntu20, 2.1.92-ubuntu20-arm
- 2.2.60-debian12, 2.2.60-rocky9, 2.2.60-ubuntu22
- 2.3.4-debian12, 2.3.4-rocky9, 2.3.4-ubuntu22, and
2.3.4-ml-ubuntu22
.
The 2.3.4-ml-ubuntu22
image extends the 2.3 base image with ML-specific libraries.
Dataproc on Compute Engine: Upgraded the Cloud Storage connector version to 2.2.28
in the latest 2.0
and 2.1
images.
Dataproc on Compute Engine: Dataproc now automatically configures Knox Gateway configuration properties gateway.dispatch.whitelist.services
and gateway.dispatch.whitelist
for component web UIs within the cluster.
Dataproc on Compute Engine: Fixed a bug in trino-jvm cluster properties
. To configure Trino JVM options prefixed with trino-jvm
, follow these guidelines:
- Configure JVM options starting with
-XX:
, without:
. For JVM flags without a value, add=
at the end. For example, addtrino-jvm:-XX+HeapDumpOnOutOfMemoryError=
as -XX:+HeapDumpOnOutOfMemoryError
in thejvm.config
. - Specify JVM options system properties with a
-D
prefix the same way. For example,trino-jvm:-Dsystem.property.name=value
. - Any value containing
:
cannot be provided as a cluster property.
Dataproc on Compute Engine & Dataproc Serverless: Backported GH-3198 in Parquet addressing CVE-2025-46762.
June 10, 2025
New Dataproc Serverless for Spark runtime versions:
- 1.1.108
- 1.2.52
- 2.2.52
- 2.3.3
June 09, 2025
Announcing the GA release of Dataproc on Compute Engine image version 2.3:
Image Version 2.3 is a lightweight image that contains only core components, reducing exposure to Common Vulnerabilities and Exposures (CVEs). For higher security compliance requirements, use the image version 2.3 or later when creating a Dataproc cluster. Optional components can still be deployed on-demand.
The following images are the latest available 2.3
subminor image versions:
2.3.3-debian12
,2.3.3-rocky9
,2.3.3-ubuntu22
, and2.3.3-ml-ubuntu22
.
The 2.3.3-ml-ubuntu22
image extends the 2.3 base image with ML-specific libraries.
June 06, 2025
New Dataproc Serverless for Spark runtime versions:
- 1.1.107
- 1.2.51
- 2.2.51
- 2.3.2
Dataproc Serverless for Spark: Fixed a bug that prevented the spark.executorEnv
property from correctly setting specific executor environment variables across all runtimes.
June 01, 2025
New Dataproc on Compute Engine subminor image versions:
- 2.0.143-debian10, 2.0.143-rocky8, 2.0.143-ubuntu18
- 2.1.91-debian11, 2.1.90-rocky8, 2.1.91-ubuntu20, 2.1.91-ubuntu20-arm
- 2.2.59-debian12, 2.2.59-rocky9, 2.2.59-ubuntu22
Dataproc on Compute Engine: Fixed the ordering of log entries generated from clusters created with 2.2+
image versions by assigning timestamps closer to the log generation time.
May 30, 2025
New Dataproc Serverless for Spark runtime versions:
- 1.1.106
- 1.2.50
- 2.2.50
- 2.3.1
The support dates for Dataproc on Compute Engine image versions 2.0
, 2.1
, and 2.2
have been extended, as follows:
- Image version
2.2
: Supported until 03/31/2027 - Image version
2.1
: Supported until 03/31/2026 - Image version
2.0
Supported until 09/30/2025
May 28, 2025
Announcing the General Availability release of Spark on BigQuery, which lets you create a serverless Spark session in a BigQuery Studio notebook. Use this feature to create, run, and test Spark jobs quickly and easily. For more information, see Run PySpark code in BigQuery Studio notebooks.
New Dataproc Serverless for Spark runtime versions:
- 1.1.105
- 1.2.49
- 2.2.49
- 2.3.0
Announcing the General Availability (GA) release of Dataproc Serverless for Spark runtime versions 2.3, which include the following components:
- Spark 3.5.1
- BigQuery Spark Connector 0.42.3
- Cloud Storage Connector 3.1.2
- Java 17
- Python 3.11
- R 4.3
- Scala 2.13
May 23, 2025
Dataproc now supports the creation of zero-scale clusters, available in preview. This feature provides a cost-effective way to use Dataproc clusters, as they utilize only secondary workers that can be scaled down to zero when not in use.
New Dataproc on Compute Engine subminor image versions:
- 2.0.142-debian10, 2.0.142-rocky8, 2.0.142-ubuntu18
- 2.1.90-debian11, 2.1.90-rocky8, 2.1.90-ubuntu20, 2.1.90-ubuntu20-arm
- 2.2.58-debian12, 2.2.58-rocky9, 2.2.58-ubuntu22
May 22, 2025
New Dataproc Serverless for Spark runtime versions:
- 1.1.104
- 1.2.48
- 2.2.48
May 15, 2025
New Dataproc Serverless for Spark runtime versions:
- 1.1.103
- 1.2.47
- 2.2.47
New Dataproc on Compute Engine subminor image versions:
- 2.0.141-debian10, 2.0.141-rocky8, 2.0.141-ubuntu18
- 2.1.89-debian11, 2.1.89-rocky8, 2.1.89-ubuntu20, 2.1.89-ubuntu20-arm
- 2.2.57-debian12, 2.2.57-rocky9, 2.2.57-ubuntu22
May 12, 2025
Dataproc Serverless for Spark: Spark UI for Dataproc Serverless batches and interactive sessions, which lets you monitor and debug your serverless Spark workloads, now features Event Timeline and Task Quantile views for enhanced troubleshooting.
May 09, 2025
New Dataproc on Compute Engine subminor image versions:
- 2.0.140-debian10, 2.0.140-rocky8, 2.0.140-ubuntu18
- 2.1.88-debian11, 2.1.88-rocky8, 2.1.88-ubuntu20, 2.1.88-ubuntu20-arm
- 2.2.56-debian12, 2.2.56-rocky9, 2.2.56-ubuntu22
May 08, 2025
New Dataproc Serverless for Spark runtime versions:
- 1.1.102
- 1.2.46
- 2.2.46
May 07, 2025
Dataproc on Compute Engine: The default enabling of the following cluster properties previously announced to occur on May 10, 2025 (see the February 10, 2025 release note) has been postponed to a future date. The future date will be announced in a release note at least one month in advance of the change. Until then, these diagnostic properties will continue to be set to false by default unless set to true by the user.
dataproc:diagnostic.capture.enabled
dataproc:dataproc.logging.extended.enabled
dataproc:dataproc.logging.syslog.enabled
May 02, 2025
New Dataproc on Compute Engine subminor image versions:
- 2.0.139-debian10, 2.0.139-rocky8, 2.0.139-ubuntu18
- 2.1.87-debian11, 2.1.87-rocky8, 2.1.87-ubuntu20, 2.1.87-ubuntu20-arm
- 2.2.55-debian12, 2.2.55-rocky9, 2.2.55-ubuntu22
Dataproc on Compute Engine: Upgraded NodeProblemDetector to 0.8.20 based version for 2.2 image.
Dataproc on Compute Engine: Upgraded oauth2l to v1.3.3 to address CVEs.
Dataproc on Compute Engine: Fixed an issue with Apache Hudi that caused failure in Hudi CLI.
May 01, 2025
New Dataproc Serverless for Spark runtime versions:
- 1.1.101
- 1.2.45
- 2.2.45
Native Query Execution now supports reading Apache ORC complex types.
Dataproc Serverless: Backported GH-3168 in Parquet addressing CVE-2025-30065.
April 29, 2025
New Dataproc on Compute Engine subminor image versions:
2.0.138-debian10, 2.0.138-rocky8, 2.0.138-ubuntu18
2.1.86-debian11, 2.1.86-rocky8, 2.1.86-ubuntu20, 2.1.86-ubuntu20-arm
2.2.54-debian12, 2.2.54-rocky9, 2.2.54-ubuntu22
Dataproc on Compute Engine: Fixed Job ID retrieval in Dataproc job logs for clusters created with 2.0
, 2.1
image versions, by ignoring timestamp prefix.
Dataproc on Compute Engine: Added an temporary object hold on the spark-job-history
folder in Cloud Stroage to prevent deletion by Cloud Storage life cycling.
Dataproc on Compute Engine: Backported GH-3168 in Parquet addressing CVE-2025-30065.
April 18, 2025
New Dataproc Serverless for Spark runtime versions:
- 1.1.99
- 1.2.43
- 2.2.43
April 17, 2025
New Dataproc on Compute Engine subminor image versions:
- 2.0.137-debian10, 2.0.137-rocky8, 2.0.137-ubuntu18
- 2.1.85-debian11, 2.1.85-rocky8, 2.1.85-ubuntu20, 2.1.85-ubuntu20-arm
- 2.2.53-debian12, 2.2.53-rocky9, 2.2.53-ubuntu22
Dataproc on Compute Engine: The Spark BigQuery connector has been upgraded to version 0.34.1
in the latest 2.2
image version.
Fixed a bug in which Jupyter fails to restart upon cluster restart on Personal Authentication clusters.
April 09, 2025
Dataproc Serverless for Spark: Gemini Cloud Assist Investigations is available in Preview for the following runtimes:
- 1.1
- 1.2
- 2.2
April 08, 2025
New Dataproc on Compute Engine subminor image versions:
- 2.2.52-debian12, 2.2.52-rocky9, 2.2.52-ubuntu22
Dataproc on Compute Engine: Fixed an issue with the retrieval of an Access token when using the ranger-gcs-plugin
with 2.2 images.
April 03, 2025
New Dataproc Serverless for Spark runtime versions:
- 1.1.98
- 1.2.42
- 2.2.42
Dataproc Serverless for Spark: Installed CUDA, cuDNN and NCCL NVIDIA libraries in 1.2 and 2.2 runtimes.
April 01, 2025
New Dataproc on Compute Engine subminor image versions:
- 2.2.51-debian12, 2.2.51-rocky9, 2.2.51-ubuntu22
Dataproc on Compute Engine: Hyperdisk-Balanced is now the default primary disk type when creating a cluster from the console.
Dataproc on Compute Engine: Fixed incorrectly attributed Dataproc job logs in Cloud Logging for clusters created with 2.2+ image versions. This happened when multiple Dataproc jobs were running concurrently on the same cluster.
March 31, 2025
New Dataproc Serverless for Spark runtime versions:
- 1.1.97
- 1.2.41
- 2.2.41
March 28, 2025
New Dataproc Serverless for Spark runtime versions:
- 1.1.96
- 1.2.40
- 2.2.40
Dataproc Serverless for Spark: Hadoop Native libraries are installed by default in all runtimes.
March 17, 2025
New Dataproc on Compute Engine subminor image versions:
- 2.0.136-debian10, 2.0.136-rocky8, 2.0.136-ubuntu18
- 2.1.84-debian11, 2.1.84-rocky8, 2.1.84-ubuntu20, 2.1.84-ubuntu20-arm
- 2.2.50-debian12, 2.2.50-rocky9, 2.2.50-ubuntu22
Dataproc on Compute Engine: Spark upgraded to version 3.5.3 in the latest Dataproc image version 2.2.
Dataproc on Compute Engine: The latest Dataproc 2.2 image version now supports Spark data lineage.
Dataproc on Compute Engine: Added support for Enhanced Flexibility Mode (EFM) with primary worker shuffle mode on Spark for image version 2.2.50 and above.
March 14, 2025
New Dataproc Serverless for Spark runtime versions:
- 1.1.95
- 1.2.39
- 2.2.39
March 10, 2025
New Dataproc on Compute Engine subminor image versions:
- 2.0.135-debian10, 2.0.135-rocky8, 2.0.135-ubuntu18
- 2.1.83-debian11, 2.1.83-rocky8, 2.1.83-ubuntu20, 2.1.83-ubuntu20-arm
- 2.2.49-debian12, 2.2.49-rocky9, 2.2.49-ubuntu22
March 04, 2025
Dataproc is now available in the europe-north2
region (Stockholm, Sweden).
March 03, 2025
New Dataproc Serverless for Spark runtime versions:
- 1.1.94
- 1.2.38
- 2.2.38
March 01, 2025
New Dataproc on Compute Engine subminor image versions:
- 2.0.134-debian10, 2.0.134-rocky8, 2.0.134-ubuntu18
- 2.1.82-debian11, 2.1.82-rocky8, 2.1.82-ubuntu20, 2.1.82-ubuntu20-arm
- 2.2.48-debian12, 2.2.48-rocky9, 2.2.48-ubuntu22
Dataproc on Compute Engine: Explicitly disabled sha1
, md5
algorithms for use with kex
and kex-gss
sshd
features.
February 24, 2025
New Dataproc on Compute Engine subminor image versions:
- 2.0.133-debian10, 2.0.133-rocky8, 2.0.133-ubuntu18
- 2.1.81-debian11, 2.1.81-rocky8, 2.1.81-ubuntu20, 2.1.81-ubuntu20-arm
- 2.2.47-debian12, 2.2.47-rocky9, 2.2.47-ubuntu22
New Dataproc Serverless for Spark runtime versions:
- 1.1.93
- 1.2.37
- 2.2.37
February 17, 2025
New Dataproc on Compute Engine subminor image versions:
- 2.0.132-debian10, 2.0.132-rocky8, 2.0.132-ubuntu18
- 2.1.80-debian11, 2.1.80-rocky8, 2.1.80-ubuntu20, 2.1.80-ubuntu20-arm
- 2.2.46-debian12, 2.2.46-rocky9, 2.2.46-ubuntu22
New Dataproc Serverless for Spark runtime versions:
- 1.1.92
- 1.2.36
- 2.2.36
February 11, 2025
Data Lineage for Dataproc Hive is now in Public Preview, which can be enabled using the Hive Lineage initialization action.
February 10, 2025
Dataproc on Compute Engine: To help diagnose Dataproc clusters, you can set the following cluster properties to true when you create a cluster:
dataproc:diagnostic.capture.enabled
: When set totrue
, enables the collection of checkpoint data in the cluster temp bucket.dataproc:dataproc.logging.extended.enabled
: When set totrue
, enables the collection of logs for the Knox, Zeppelin, Solr, Trino, Presto, Ranger-usersync, Jupyter_notebook components and the Spark History-Server in Cloud Logging.dataproc:dataproc.logging.syslog.enabled
: When set totrue
, enables the collection of VM syslogs in Cloud Logging.
Note: starting May 10, 2025, these properties will be set to true
by default.
February 09, 2025
New Dataproc Serverless for Spark runtime versions:
- 1.1.91
- 1.2.35
- 2.2.35
February 07, 2025
New Dataproc on Compute Engine subminor image versions:
- 2.0.131-debian10, 2.0.131-rocky8, 2.0.131-ubuntu18
- 2.1.79-debian11, 2.1.79-rocky8, 2.1.79-ubuntu20, 2.1.79-ubuntu20-arm
- 2.2.45-debian12, 2.2.45-rocky9, 2.2.45-ubuntu22
Spark UI for Dataproc Serverless Batches and Interactive sessions, which lets you to monitor and debug your serverless Spark workloads, is now available for CMEK (Customer-Managed Encryption Keys) and Assured Workloads. The Spark UI is available by default and free of cost.
February 02, 2025
New Dataproc Serverless for Spark runtime versions:
- 1.1.90
- 1.2.34
- 2.2.34
January 31, 2025
New Dataproc on Compute Engine subminor image versions:
- 2.0.130-debian10, 2.0.130-rocky8, 2.0.130-ubuntu18
- 2.1.78-debian11, 2.1.78-rocky8, 2.1.78-ubuntu20, 2.1.78-ubuntu20-arm
- 2.2.44-debian12, 2.2.44-rocky9, 2.2.44-ubuntu22
- New
Hyperdisk Balanced
primary disk type available on Dataproc clusters. - New machine types available for Hyperdisk Balanced disk type on clusters: C4, C4A, and N4.
January 30, 2025
New Dataproc Serverless for Spark runtime versions:
- 1.1.89
- 1.2.33
- 2.2.33
Dataproc on Compute Engine: Private Google Access is now automatically enabled in the configured subnetwork when creating clusters with internal IP addresses.
Dataproc Serverless for Spark: Private Google Access is now automatically enabled in the configured subnetwork when running batch workloads and interactive sessions.
January 24, 2025
New Dataproc on Compute Engine subminor image versions:
- 2.0.129-debian10, 2.0.129-rocky8, 2.0.129-ubuntu18
- 2.1.77-debian11, 2.1.77-rocky8, 2.1.77-ubuntu20, 2.1.77-ubuntu20-arm
- 2.2.43-debian12, 2.2.43-rocky9, 2.2.43-ubuntu22
Dataproc cluster caching now supports ARM images.
Zeppelin component added to 2.1-Ubuntu20-arm images.
January 23, 2025
New Dataproc Serverless for Spark runtime versions:
- 1.1.88
- 1.2.32
- 2.2.32
January 17, 2025
New Dataproc on Compute Engine subminor image versions:
- 2.0.128-debian10, 2.0.128-rocky8, 2.0.128-ubuntu18
- 2.1.76-debian11, 2.1.76-rocky8, 2.1.76-ubuntu20, 2.1.76-ubuntu20-arm
- 2.2.42-debian12, 2.2.42-rocky9, 2.2.42-ubuntu22
New Dataproc Serverless for Spark runtime versions:
- 1.1.87
- 1.2.31
- 2.2.31
Dataproc Serverless for Spark:
January 13, 2025
Dataproc Serverless for Spark: On March 10, 2025, the Dataproc Resource Manager API will be enabled as part of General Availability (GA) for Dataproc Serverless 3.0+
versions.
User action will not be required in response to this API enablement change.
The Dataproc Resource Manager will be implemented as a stand-alone Google Cloud API, dataprocrm.googleapis.com
. It will allow Dataproc distributions of open source software, ,particularly Apache Spark, to directly communicate resource requirements.
January 10, 2025
New Dataproc on Compute Engine subminor image versions:
- 2.0.127-debian10, 2.0.127-rocky8, 2.0.127-ubuntu18
- 2.1.75-debian11, 2.1.75-rocky8, 2.1.75-ubuntu20, 2.1.75-ubuntu20-arm
- 2.2.41-debian12, 2.2.41-rocky9, 2.2.41-ubuntu22
December 12, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.126-debian10, 2.0.126-rocky8, 2.0.126-ubuntu18
- 2.1.74-debian11, 2.1.74-rocky8, 2.1.74-ubuntu20, 2.1.74-ubuntu20-arm
- 2.2.40-debian12, 2.2.40-rocky9, 2.2.40-ubuntu22
Dataproc on Compute Engine: Updated Dataproc Metastore (DPMS) gRPC proxy image version to v. 0.0.70
November 20, 2024
Dataproc Serverless for Spark: Spark Lineage is available for all supported Dataproc Serverless for Spark runtime versions.
November 18, 2024
Dataproc is now available in the northamerica-south1
region (Queretaro, Mexico).
November 11, 2024
Announcing the General Availability (GA) of Flexible shapes for Dataproc secondary workers which allows you to provide a ranked selection of machine types to use for the creation of VMs.
Announcing the General Availability (GA) of Spot and non-preemptible VM mixing for Dataproc secondary workers which allows you to mix spot and non-preemptible secondary workers when you create a Dataproc cluster.
October 31, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.86
- 1.2.30
- 2.2.30
New Dataproc on Compute Engine subminor image versions:
- 2.0.125-debian10, 2.0.125-rocky8, 2.0.125-ubuntu18
- 2.1.73-debian11, 2.1.73-rocky8, 2.1.73-ubuntu20, 2.1.73-ubuntu20-arm
- 2.2.39-debian12, 2.2.39-rocky9, 2.2.39-ubuntu22
Note: When using Dataproc version 2.0.125 with the ranger-gcs-plugin, please create a customer support request for your project to use the enhanced version of the plugin prior to its GA release. This note does not apply Dataproc on Compute Engine image versions 2.1 and 2.2.
Disabled HiveServer2 Ranger policy synchronization in non-HA clusters for latest image version 2.1 and later. Policy synchronization is causing instability of the HiveServer2 process while trying to connect to ZooKeeper, which is not active by default in non-HA clusters.
October 25, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.124-debian10, 2.0.124-rocky8, 2.0.124-ubuntu18
- 2.1.72-debian11, 2.1.72-rocky8, 2.1.72-ubuntu20, 2.1.72-ubuntu20-arm
- 2.2.38-debian12, 2.2.38-rocky9, 2.2.38-ubuntu22
Dataproc Serverless for Spark: The Hadoop Google Secret Manager Credential Provider feature is now available in the Dataproc Serverless for Spark 1.2 and 2.2 runtimes.
New Dataproc Serverless for Spark runtime versions:
- 1.1.85
- 1.2.29
- 2.2.29
Dataproc Serverless for Spark: Added common AI/ML Python packages by default to Dataproc Serverless for Spark 1.2 and 2.2 runtimes.
Dataproc Serverless for Spark: Upgraded Cloud Storage connector to 3.0.3 version in the latest 1.2 and 2.2 runtimes.
October 21, 2024
Announcing the General Availability (GA) release of Spark UI for Dataproc Serverless Batches and Interactive sessions which allows you to monitor and debug your serverless Spark workloads. Spark UI is available by default and free of cost for all Dataproc Serverless workloads.
October 18, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.123-debian10, 2.0.123-rocky8, 2.0.123-ubuntu18
- 2.1.71-debian11, 2.1.71-rocky8, 2.1.71-ubuntu20, 2.1.71-ubuntu20-arm
- 2.2.37-debian12, 2.2.37-rocky9, 2.2.37-ubuntu22
October 17, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.84
- 1.2.28
- 2.2.28
October 14, 2024
Dataproc Clusters created with image versions 2.0.57+, 2.1.5+, or 2.2+: Secondary workers' control plane operations are made by the Dataproc Service Agent service account (service-<project-number>@dataproc-accounts.iam.gserviceaccount.com
). They will no longer use the Google APIs Service Agent service account (<project-number>@cloudservices.gserviceaccount.com
).
New Dataproc on Compute Engine subminor image versions:
- 2.0.122-debian10, 2.0.122-rocky8, 2.0.122-ubuntu18
- 2.1.70-debian11, 2.1.70-rocky8, 2.1.70-ubuntu20, 2.1.70-ubuntu20-arm
- 2.2.36-debian12, 2.2.36-rocky9, 2.2.36-ubuntu22
October 11, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.83
- 1.2.27
- 2.2.27
October 08, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.121-debian10, 2.0.121-rocky8, 2.0.121-ubuntu18
- 2.1.69-debian11, 2.1.69-rocky8, 2.1.69-ubuntu20, 2.1.69-ubuntu20-arm
- 2.2.35-debian12, 2.2.35-rocky9, 2.2.35-ubuntu22
October 04, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.82
- 1.2.26
- 2.2.26
September 30, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.81
- 1.2.25
- 2.2.25
Blocklisted the following Dataproc on Compute Engine subminor image versions:
- 2.0.120-debian10, 2.0.120-rocky8, 2.0.120-ubuntu18
- 2.1.68-debian11, 2.1.68-rocky8, 2.1.68-ubuntu20, 2.1.68-ubuntu20-arm
- 2.2.34-debian12, 2.2.34-rocky9, 2.2.34-ubuntu22
September 23, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.79
- 1.2.23
- 2.2.23
Dataproc Serverless for Spark: In runtime versions 1.2
and 2.2
, minimized the dynamic memory footprint of the Spark application by setting XX:MaxHeapFreeRatio
to 30% and XX:MinHeapFreeRatio
to 10%.
Dataproc Serverless for Spark: Added the google-cloud-dlp
Python package by default to the Dataproc Serverless for Spark runtimes.
Dataproc Serverless for Spark: Fixed an issue that would cause some batches and sessions to fail to start when using the premium compute tier.
September 21, 2024
Blocklisted the following Dataproc on Compute Engine subminor image versions:
- 2.0.119-debian10, 2.0.103-rocky8, 2.0.103-ubuntu18
- 2.1.67-debian11, 2.1.51-rocky8, 2.1.51-ubuntu20, 2.1.51-ubuntu20-arm
- 2.2.33-debian12, 2.2.17-rocky9, 2.2.17-ubuntu22
September 16, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.118-debian10, 2.0.118-rocky8, 2.0.118-ubuntu18
- 2.1.66-debian11, 2.1.66-rocky8, 2.1.66-ubuntu20, 2.1.66-ubuntu20-arm
- 2.2.32-debian12, 2.2.32-rocky9, 2.2.32-ubuntu22
September 13, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.78
- 1.2.22
- 2.2.22
Dataproc Serverless for Spark: Fixed a bug that caused some batches and sessions to fail to start when using the premium compute tier.
September 06, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.117-debian10, 2.0.117-rocky8, 2.0.117-ubuntu18
- 2.1.65-debian11, 2.1.65-rocky8, 2.1.65-ubuntu20, 2.1.65-ubuntu20-arm
- 2.2.31-debian12, 2.2.31-rocky9, 2.2.31-ubuntu22
Dataproc on Compute Engine: The latest 2.2
image versions now support Hudi 0.15.0
.
Dataproc on Compute Engine: The latest 2.2 image versions support Hudi Trino integration natively. If both components are selected when you create a Dataproc cluster, Trino will be configured to support Hudi automatically.
September 04, 2024
Dataproc on Compute Engine: Dataproc image version 2.2 will become the default Dataproc on Compute Engine image version on September 6, 2024.
September 03, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.116-debian10, 2.0.116-rocky8, 2.0.116-ubuntu18
- 2.1.64-debian11, 2.1.64-rocky8, 2.1.64-ubuntu20, 2.1.64-ubuntu20-arm
- 2.2.30-debian12, 2.2.30-rocky9, 2.2.30-ubuntu22,
Dataproc on Compute Engine: Apache Spark upgraded to version 3.5.1
in image version 2.2
starting with image version 2.2.30
.
Dataproc on GKE runtime versions 2.0 (Spark 3.1) is deprecated.
August 26, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.115-debian10, 2.0.115-rocky8, 2.0.115-ubuntu18
- 2.1.63-debian11, 2.1.63-rocky8, 2.1.63-ubuntu20, 2.1.63-ubuntu20-arm
- 2.2.29-debian12, 2.2.29-rocky9, 2.2.29-ubuntu22
August 22, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.77
- 1.2.21
- 2.0.85
- 2.2.21
Dataproc Serverless for Spark: Subminor version 2.0.85
is the last release of runtime version 2.0
, which will no longer be supported and will not receive new releases.
August 19, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.114-debian10, 2.0.114-rocky8, 2.0.114-ubuntu18
- 2.1.62-debian11, 2.1.62-rocky8, 2.1.62-ubuntu20, 2.1.62-ubuntu20-arm
- 2.2.28-debian12, 2.2.28-rocky9, 2.2.28-ubuntu22
syslog
is now available for Dataproc cluster nodes in Cloud Logging. See Dataproc logs for cluster and job log information.
August 15, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.76
- 1.2.20
- 2.0.84
- 2.2.20
August 12, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.75
- 1.2.19
- 2.0.83
- 2.2.19
July 31, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.74
- 1.2.18
- 2.0.82
- 2.2.18
Dataproc Serverless for Spark: Upgraded Spark BigQuery connector to version 0.36.4 in the latest 1.2 and 2.2 Dataproc Serverless for Spark runtime versions.
July 26, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.73
- 1.2.17
- 2.0.81
- 2.2.17
July 25, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.113-debian10, 2.0.113-rocky8, 2.0.113-ubuntu18
- 2.1.61-debian11, 2.1.61-rocky8, 2.1.61-ubuntu20, 2.1.61-ubuntu20-arm
- 2.2.27-debian12, 2.2.27-rocky9, 2.2.27-ubuntu22
Enabled user sync by default for clusters using Ranger.
Replaced Spark external packages with connector folder on Dataproc 2.2 clusters.
Fixed a bug that caused intermittent delays and failures in clusters with 3 HDFS.
July 22, 2024
Hyperdisks for Dataproc clusters are now created with default throughput and IOPS. When this behavior becomes configurable, it will be announced in a future release note.
Added support for N4 and C4 machine types for Dataproc image versions 2.1 and above. The following default configurations are now applied to clusters created with N4 or C4 machine types:
bootdisktype = "hyperdisk-balanced"
nictype = "gvnic"
When a Cluster, Job, AutoscalingPolicy, or WorkflowTemplate API resource does not exist and the requestor does not have access to the project, a 403 error code is now issued instead of a 404 error code.
July 19, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.72
- 1.2.16
- 2.0.80
- 2.2.16
Note: Dataproc Serverless for Spark runtime versions 1.1.71, 1.2.15, 2.0.79, and 2.2.15 were not released.
July 18, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.112-debian10, 2.0.112-rocky8, 2.0.112-ubuntu18
- 2.1.60-debian11, 2.1.60-rocky8, 2.1.60-ubuntu20, 2.1.60-ubuntu20-arm
- 2.2.26-debian12, 2.2.26-rocky9, 2.2.26-ubuntu22
July 17, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.70
- 1.2.14
- 2.0.78
- 2.2.14
July 12, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.111-debian10, 2.0.112-rocky8, 2.0.112-ubuntu18
- 2.1.59-debian11, 2.1.60-rocky8, 2.1.60-ubuntu20, 2.1.60-ubuntu20-arm
- 2.2.25-debian12, 2.2.26-rocky9, 2.2.26-ubuntu22
July 11, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.69
- 1.2.13
- 2.0.77
- 2.2.13
July 08, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.110-debian10, 2.0.110-rocky8, 2.0.110-ubuntu18
- 2.1.58-debian11, 2.1.58-rocky8, 2.1.58-ubuntu20, 2.1.58-ubuntu20-arm
- 2.2.24-debian12, 2.2.24-rocky9, 2.2.24-ubuntu22
July 05, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.68
- 1.2.12
- 2.0.76
- 2.2.12
July 03, 2024
Added Cloud Profiler support in Dataproc Serverless for Spark. Enable profiling via the dataproc.profiling.enabled=true
property and configure it via dataproc.profiling.name=<PROFILE_NAME>
New Dataproc on Compute Engine subminor image versions:
- 2.0.109-debian10, 2.0.109-rocky8, 2.0.109-ubuntu18
- 2.1.57-debian11, 2.1.57-rocky8, 2.1.57-ubuntu20, 2.1.57-ubuntu20-arm
- 2.2.23-debian12, 2.2.23-rocky9, 2.2.23-ubuntu22
Dataproc on Compute Engine: Apache Hadoop upgraded to version 3.2.4
in image version 2.0
starting with image version 2.0.109
.
June 28, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.108-debian10, 2.0.108-rocky8, 2.0.108-ubuntu18
- 2.1.56-debian11, 2.1.56-rocky8, 2.1.56-ubuntu20, 2.1.56-ubuntu20-arm
- 2.2.22-debian12, 2.2.22-rocky9, 2.2.22-ubuntu22
Backported fixes for HIVE-25958 and HIVE-20220 (new configuration hive.groupby.enable.deterministic.distribution=false/true
).
June 26, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.67
- 1.2.11
- 2.0.75
- 2.2.11
Dataproc Serverless for Spark: To fix compatibility with open table formats (Apache Iceberg, Apache Hudi and Delta Lake), the ANTLR version downgraded from 4.13.1 to 4.9.3 in Dataproc Serverless for Spark runtime versions 1.2 and 2.2.
June 25, 2024
The Dataproc Component Gateway is now activated by default when you create a Dataproc on Compute Engine cluster using the Google Cloud console.
June 24, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.107-debian10, 2.0.107-rocky8, 2.0.107-ubuntu18
- 2.1.55-debian11, 2.1.55-rocky8, 2.1.55-ubuntu20, 2.1.55-ubuntu20-arm
- 2.2.21-debian12, 2.2.21-rocky9, 2.2.21-ubuntu22
June 21, 2024
Dataproc Serverless for Spark: To fix compatibility with open table formats (Apache Iceberg, Apache Hudi and Delta Lake), the ANTLR version will be downgraded from 4.13.1 to 4.9.3 in Dataproc Serverless for Spark runtime versions 1.2 and 2.2 on June 26, 2024.
June 20, 2024
Dataproc Serverless for Spark: Spark runtime version 2.2 will become the default Dataproc Serverless for Spark runtime version on September 6, 2024.
New Dataproc Serverless for Spark runtime versions:
- 1.1.66
- 1.2.10
- 2.0.74
- 2.2.10
June 13, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.106-debian10, 2.0.106-rocky8, 2.0.106-ubuntu18
- 2.1.54-debian11, 2.1.54-rocky8, 2.1.54-ubuntu20, 2.1.54-ubuntu20-arm
- 2.2.20-debian12, 2.2.20-rocky9, 2.2.20-ubuntu22
New Dataproc Serverless for Spark runtime versions:
- 1.1.65
- 1.2.9
- 2.0.73
- 2.2.9
Dataproc Serverless for Spark: Upgraded Spark BigQuery connector to version 0.36.3 in the latest 1.2 and 2.2 Dataproc Serverless for Spark runtime versions.
Support configuration to prevent HiveMetaStore metrics expensive database queries. To prevent expensive queries during HiveMetaStore startup, set Hive property metastore.initial.metadata.count.enabled
to false
.
June 11, 2024
The Apache Spark in BigQuery feature is available in Private Preview. This feature lets you create a Spark session in a BigQuery notebook that you can use to develop and submit PySpark code from BigQuery. To access this feature, fill in and submit the Dataproc Preview access request form.
June 06, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.105-debian10, 2.0.105-rocky8, 2.0.105-ubuntu18
- 2.1.53-debian11, 2.1.53-rocky8, 2.1.53-ubuntu20, 2.1.53-ubuntu20-arm
- 2.2.19-debian12, 2.2.19-rocky9, 2.2.19-ubuntu22
Dataproc on Compute Engine: When creating a cluster with the latest Dataproc on Compute Engine image versions, the secondary worker boot disk type now defaults to the primary worker boot disk type, which is pd-standard
if the primary worker boot disk type is not specified.
June 05, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.64
- 1.2.8
- 2.0.72
- 2.2.8
June 03, 2024
Dataproc on Compute Engine: Update restartable job error messages to include job IDs.
Dataproc Serverless for Spark: Automatically apply goog-dataproc-session-id
, goog-dataproc-session-uuid
and goog-dataproc-location
labels for a session resource.
May 30, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.104-debian10, 2.0.104-rocky8, 2.0.104-ubuntu18
- 2.1.52-debian11, 2.1.52-rocky8, 2.1.52-ubuntu20, 2.1.52-ubuntu20-arm
- 2.2.18-debian12, 2.2.18-rocky9, 2.2.18-ubuntu22
New Dataproc Serverless for Spark runtime versions:
- 1.1.63
- 1.2.7
- 2.0.71
- 2.1.50
- 2.2.7
Dataproc Serverless for Spark: Subminor version 2.1.50
is the last release of runtime version 2.1
, which will no longer be supported and will not receive new releases.
Dataproc Serverless for Spark: Removed Spark data lineage support for runtime version 1.2
.
Dataproc Serverless for Spark: Enabled Spark checkpoint (spark.checkpoint.compress
) and RDD (spark.rdd.compress
) compression in the latest 1.2
and 2.2
runtime versions.
May 23, 2024
Blocklisted the following Dataproc on Compute Engine subminor image versions:
- 2.0.103-debian10, 2.0.103-rocky8, 2.0.103-ubuntu18
- 2.1.51-debian11, 2.1.51-rocky8, 2.1.51-ubuntu20, 2.1.51-ubuntu20-arm
- 2.2.17-debian12, 2.2.17-rocky9, 2.2.17-ubuntu22
May 22, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.62
- 1.2.6
- 2.0.70
- 2.1.49
- 2.2.6
Upgraded Spark BigQuery connector to version 0.36.2 in the latest 1.2 and 2.2 Dataproc Serverless for Spark runtime versions.
May 16, 2024
New Dataproc on Compute Engine subminor image versions:
2.0.102-debian10, 2.0.102-rocky8, 2.0.102-ubuntu18
2.1.50-debian11, 2.1.50-rocky8, 2.1.50-ubuntu20, 2.1.50-ubuntu20-arm
2.2.16-debian12, 2.2.16-rocky9, 2.2.16-ubuntu22
Anaconda's default
channel is disabled for package installations on Dataproc on Compute Engine.
May 09, 2024
New Dataproc on Compute Engine subminor image versions:
2.0.101-debian10, 2.0.101-rocky8, 2.0.101-ubuntu18
2.1.49-debian11, 2.1.49-rocky8, 2.1.49-ubuntu20, 2.1.49-ubuntu20-arm
2.2.15-debian12, 2.2.15-rocky9, 2.2.15-ubuntu22
May 08, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.61
- 1.2.5
- 2.0.69
- 2.1.48
- 2.2.5
May 06, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.100-debian10, 2.0.100-rocky8, 2.0.100-ubuntu18
- 2.1.48-debian11, 2.1.48-rocky8, 2.1.48-ubuntu20, 2.1.48-ubuntu20-arm
- 2.2.14-debian12, 2.2.14-rocky9, 2.2.14-ubuntu22
Dataproc on Compute Engine:
- Backported patches for HIVE-14557, HIVE-19326, HIVE-20514, HIVE-21100, HIVE-22165, HIVE-22416, HIVE-24435.
- Hive: Improved ORC split generation.
May 01, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.60
- 1.2.4
- 2.0.68
- 2.1.47
- 2.2.4
Dataproc Serverless for Spark:
- Upgraded Spark RAPIDS to version 24.04.0 in 1.2 and 2.2 Dataproc Serverless for Spark runtimes.
When you submit a Dataproc Serverless Batch with a CMEK key:
- In addition to encrypting disk and Cloud Storage data, Dataproc Serverless will use your CMEK to also encrypt batch job arguments. This change will require you to do the following:
- Assign the Cloud KMS CryptoKey Encrypter/Decrypter role to the Dataproc Service Agent service account.
- Enable the Cloud KMS API on the project that runs Dataproc Batches resources.
- If the Dataproc Service Agent role is not attached to the Dataproc Service Agent service account, then add the
serviceusage.services.use
permission to the custom role attached to the Dataproc Service Agent service account.
- batches.list will return an
unreachable
field that lists any batches with job arguments that couldn't be decrypted. You can issue a batches.get request to obtain more information on an unreachable batch. - Multi-regional and cross-regional CMEKs will no longer be permitted. The key (CMEK) must be located in the same location as the encrypted resource.
For example, the CMEK used to encrypt a batch that runs in the
us-central1
region must also be located in theus-central1
region.
April 29, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.99-debian10, 2.0.99-rocky8, 2.0.99-ubuntu18
- 2.1.47-debian11, 2.1.47-rocky8, 2.1.47-ubuntu20, 2.1.47-ubuntu20-arm
- 2.2.13-debian12, 2.2.13-rocky9, 2.2.13-ubuntu22
April 26, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.59
- 1.2.3
- 2.0.67
- 2.1.46
- 2.2.3
April 21, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.98-debian10, 2.0.98-rocky8, 2.0.98-ubuntu18
- 2.1.46-debian11, 2.1.46-rocky8, 2.1.46-ubuntu20, 2.1.46-ubuntu20-arm
- 2.2.12-debian12, 2.2.12-rocky9, 2.2.12-ubuntu22
April 20, 2024
Announcing Dataproc Workflow Templates supports the CMEK organization policy.
April 18, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.58
- 1.2.2
- 2.0.66
- 2.1.45
- 2.2.2
Set the soft delete policy of newly created Dataproc staging and temp Cloud Storage buckets to 0
days.
Updated the default autoscaling V2 cool-down time from 2m
to 1m
to reduce scaling latency.
Fixed a bug where Dataproc Serverless sessions that live longer than 48 hours are underbilled.
April 09, 2024
Dataproc Serverless for Spark: The preview release of Advanced troubleshooting, including Gemini-assisted troubleshooting, is now available for Spark workloads submitted with the following or later-released runtime versions:
- 1.1.55
- 1.2.0-RC1
- 2.0.63
- 2.1.42
- 2.2.0-RC15
Dataproc Serverless for Spark: Announcing the preview release of Autotuning Spark workloads.
April 04, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.57
- 1.2.1
- 2.0.65
- 2.1.44
- 2.2.1
Added bigframes
Python package by default in the Dataproc Serverless for Spark runtime versions 1.2 and 2.2
April 02, 2024
The following previously released sub-minor versions of Dataproc on Compute Engine images have been rolled back and can only be used when updating existing clusters that already use them:
- 2.0.97-debian10, 2.0.97-rocky8, 2.0.97-ubuntu18
- 2.1.45-debian11, 2.1.45-rocky8, 2.1.45-ubuntu20, 2.1.45-ubuntu20-arm
- 2.2.11-debian12, 2.2.11-rocky9, 2.2.11-ubuntu22
March 29, 2024
Dataproc Serverless for Spark: runtime version 2.2 will become the default Dataproc Serverless for Spark runtime version on May 3, 2024.
Note: This announcement was updated in the April 19, 2024 release note.
March 28, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.97-debian10, 2.0.97-rocky8, 2.0.97-ubuntu18
- 2.1.45-debian11, 2.1.45-rocky8, 2.1.45-ubuntu20, 2.1.45-ubuntu20-arm
- 2.2.11-debian12, 2.2.11-rocky9, 2.2.11-ubuntu22
Note: the above subminor image versions were rolled back on April 2, 2024
Dataproc on Compute Engine: New Hadoop Google Secret Manager Credential Provider feature introduced in latest Dataproc on Compute Engine 2.0 image versions.
March 27, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.56
- 1.2.0
- 2.0.64
- 2.1.43
- 2.2.0
Announcing the General Availability (GA) release of Dataproc Serverless for Spark runtime versions 1.2 and 2.2, which include the following components:
- Spark 3.5.1
- BigQuery Spark Connector 0.36.1
- Cloud Storage Connector 3.0.0
- Conda 24.1
- Java 17
- Python 3.12
- R 4.3
- Scala 2.12 (1.2 runtime) and Scala 2.13 (2.2 runtime)
Dataproc Serverless for Spark:
March 21, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.96-debian10, 2.0.96-rocky8, 2.0.96-ubuntu18
- 2.1.44-debian11, 2.1.44-rocky8, 2.1.44-ubuntu20, 2.1.44-ubuntu20-arm
- 2.2.10-debian12, 2.2.10-rocky9, 2.2.10-ubuntu22
March 20, 2024
Announcing the Preview release of Dataproc Serverless for Spark 1.2 runtime:
- Spark 3.5.0
- BigQuery Spark Connector 0.35.1
- Cloud Storage Connector 3.0.0
- Conda 23.11
- Java 17
- Python 3.12
- R 4.3
- Scala 2.12
New Dataproc Serverless for Spark runtime versions:
- 1.1.55
- 1.2.0-RC1
- 2.0.63
- 2.1.42
- 2.2.0-RC15
Dataproc Serverless for Spark:
March 14, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.95-debian10, 2.0.95-rocky8, 2.0.95-ubuntu18
- 2.1.43-debian11, 2.1.43-rocky8, 2.1.43-ubuntu20, 2.1.43-ubuntu20-arm
- 2.2.9-debian12, 2.2.9-rocky9, 2.2.9-ubuntu22
New Dataproc Serverless for Spark runtime versions:
- 1.1.54
- 2.0.62
- 2.1.41
- 2.2.0-RC14
Added the bigframes
(BigQuery DataFrames) Python package in the Dataproc Serverless for Spark 2.1 runtime.
March 07, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.53
- 2.0.61
- 2.1.40
- 2.2.0-RC13
Dataproc Serverless for Spark: Upgraded Cloud Storage connector to 2.2.20 version in the latest 1.1, 2.0, and 2.1 runtimes.
March 06, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.94-debian10, 2.0.94-rocky8, 2.0.94-ubuntu18
- 2.1.42-debian11, 2.1.42-rocky8, 2.1.42-ubuntu20, 2.1.42-ubuntu20-arm
- 2.2.8-debian12, 2.2.8-rocky9, 2.2.8-ubuntu22
Dataproc on Compute Engine: Upgraded Cloud Storage connector version to 2.2.20 for 2.0 and 2.1 images.
Dataproc on Compute Engine: Mounted Java cacerts into containers by default when the Docker-on-YARN feature is enabled.
March 04, 2024
Dataproc Serverless for Spark: Extended Spark metrics collected for a batch now include executor:resultSize
, executor:shuffleBytesWritten
, and executor:shuffleTotalBytesRead
.
February 29, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.52
- 2.0.60
- 2.1.39
- 2.2.0-RC12
February 28, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.93-debian10, 2.0.93-rocky8, 2.0.93-ubuntu18
- 2.1.41-debian11, 2.1.41-rocky8, 2.1.41-ubuntu20, 2.1.41-ubuntu20-arm
- 2.2.7-debian12, 2.2.7-rocky9, 2.2.7-ubuntu22
Dataproc on Compute Engine: The new Secret Manager credential provider feature is available in the latest 2.1 image versions.
Dataproc on Compute Engine:
- Upgraded Zookeeper to 3.8.3 for Dataproc 2.2.
- Upgraded ORC for Hive to 1.15.13 for Dataproc 2.1.
- Upgraded ORC for Spark to 1.7.10 for Dataproc 2.1.
- Extended expiry for the internal Knox Gateway certificate from one year to five years from cluster creation for Dataproc images 2.0, 2.1, and 2.2.
Dataproc on Compute Engine: Fixed ZooKeeper startup failures in image 2.2 HA (High Availability) clusters that use fully qualified hostnames.
February 22, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.51
- 2.0.59
- 2.1.38
- 2.2.0-RC11
February 16, 2024
Dataproc on Compute Engine: The internalIpOnly cluster configuration setting now defaults to true for clusters created with 2.2 image versions. Also see Create a Dataproc cluster with internal IP addresses only.
February 15, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.50
- 2.0.58
- 2.1.37
- 2.2.0-RC10
Dataproc Serverless for Spark: Spark Lineage is available for Dataproc Serverless for Spark 1.1 runtime.
February 08, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.92-debian10, 2.0.92-rocky8, 2.0.92-ubuntu18
- 2.1.40-debian11, 2.1.40-rocky8, 2.1.40-ubuntu20, 2.1.40-ubuntu20-arm
- 2.2.6-debian12, 2.2.6-rocky9, 2.2.6-ubuntu22
Dataproc on Compute Engine Ranger Cloud Storage enhancement:
- Enabled downscoping
- Added caching of tokens in local cache
Both settings are configurable and can be enabled by customers: see Use Ranger with caching and downscoping .
Dataproc on Compute Engine: The new Secret Manager credential provider feature is available in the latest 2.2 image versions.
Dataproc on Compute Engine: Backported patch for HADOOP-18652.
New Dataproc Serverless for Spark runtime versions:
- 1.1.49
- 2.0.57
- 2.1.36
- 2.2.0-RC9
Dataproc Serverless for Spark: Backported patch for HADOOP-18652.
February 02, 2024
Dataproc on Compute Engine: Bucket ttl validation now also runs for buckets created by Dataproc.
Dataproc on Compute Engine: Added a warning during cluster creation if the cluster Cloud Storage staging bucket is using the legacy fine-grained/ACL IAM configuration instead of the recommended Uniform bucket-level access controls.
Dataproc Serverless for Spark: When dynamic allocation is enabled, the initial executor number is determined by max of spark.dynamicAllocation.initialExecutors
and spark.executor.instances
.
February 01, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.91-debian10, 2.0.91-rocky8, 2.0.91-ubuntu18
- 2.1.39-debian11, 2.1.39-rocky8, 2.1.39-ubuntu20, 2.1.39-ubuntu20-arm
- 2.2.5-debian12, 2.2.5-rocky9, 2.2.5-ubuntu22
New Dataproc Serverless for Spark runtime versions:
- 1.1.48
- 2.0.56
- 2.1.35
- 2.2.0-RC8
Dataproc on Compute Engine: Backported patches for HIVE-21214, HIVE-23154, HIVE-23354 and HIVE-23614.
January 31, 2024
Dataproc is now available in the africa-south1
region (Johannesburg, South Africa).
The GitHub Ops Agent initialization action installs the Ops Agent on a Dataproc cluster, and provides metrics similar to the metrics that were enabled with the --metric-sources=monitoring-agent-defaults setting available for use with Dataproc images versions prior to version 2.2.
January 25, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.47
- 2.0.55
- 2.1.34
- 2.2.0-RC7
January 24, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.90-debian10, 2.0.90-rocky8, 2.0.90-ubuntu18
- 2.1.38-debian11, 2.1.38-rocky8, 2.1.38-ubuntu20, 2.1.38-ubuntu20-arm
- 2.2.4-debian12, 2.2.4-rocky9, 2.2.4-ubuntu22
Backport HIVE-19568: Active/Passive HiveServer2 HA: Disallow direct connection to passive instance.
Backport HIVE-27715: Remove ThreadPoolExecutorWithOomHook.
January 19, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.89-debian10, 2.0.89-rocky8, 2.0.89-ubuntu18
- 2.1.37-debian11, 2.1.37-rocky8, 2.1.37-ubuntu20, 2.1.37-ubuntu20-arm
- 2.2.3-debian12, 2.2.3-rocky9, 2.2.3-ubuntu22
Dataproc on Compute Engine: The default yarn.nm.liveness-monitor.expiry-interval-ms
Hadoop YARN setting has been changed in the latest image versions from 15000
(15 seconds) to 120000
(2 minutes).
Dataproc on Compute Engine: Upgraded Cloud Storage connector version to 2.2.19 in the latest 2.0 and 2.1 images.
Dataproc on Compute Engine: Upgraded Miniconda to 23.11, Python to 3.11, and curl to 8.5 to fix CVE-2023-38545 in the latest 2.2 images.
Dataproc on Compute Engine: Fixed the gsutil: command not found
error in the latest Ubuntu images.
Dataproc on Compute Engine: Fixed Trino startup issue in the latest 2.2 images.
New Dataproc Serverless for Spark runtime versions:
- 1.1.46
- 2.0.54
- 2.1.33
- 2.2.0-RC6
Dataproc Serverless for Spark: Upgraded Cloud Storage connector to 2.2.19 version in the latest 1.1, 2.0, and 2.1 runtimes.
January 17, 2024
Beginning March 31, 2024, when you submit a Dataproc Serverless Batch with a CMEK key:
- In addition to encrypting disk and Cloud Storage data, Dataproc Serverless will use your CMEK to also encrypt batch job arguments. This change will require that you assign the Cloud KMS CryptoKey Encrypter/Decrypter and the Service Usage Consumer role to the Dataproc Service Agent service account.
- batches.list will return an
unreachable
field that lists any batches with job arguments that couldn't be decrypted. You can issue a batches.get request to obtain more information on an unreachable batch. - Multi-regional and cross-regional CMEKs will no longer be permitted. The key (CMEK) must be located in the same location as the encrypted resource.
For example, the CMEK used to encrypt a batch that runs in the
us-central1
region must also be located in theus-central1
region.
January 15, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.45
- 2.0.53
- 2.1.32
- 2.2.0-RC5
Dataproc Serverless for Spark:
- Upgraded Spark RAPIDS to version 23.12.1
Upgraded the following components to the following versions in the latest 2.2 runtime:
- Spark BigQuery connector version 0.35.0
- Cloud Storage connector version 3.0.0
- Conda version 23.11
- Dataproc Spark Enhancements are now available in the Google Cloud console Dataproc cluster and job creation pages.
January 05, 2024
New Dataproc Serverless for Spark runtime versions:
- 1.1.44
- 2.0.52
- 2.1.31
- 2.2.0-RC4
January 04, 2024
The following previously released sub-minor versions of Dataproc images have been rolled back and can only be used when updating existing clusters that already use them:
- 2.0.88-debian10, 2.0.88-rocky8, 2.0.88-ubuntu18
- 2.1.36-debian11, 2.1.36-rocky8, 2.1.36-ubuntu20, 2.1.36-ubuntu20-arm
- 2.2.2-debian12, 2.2.2-rocky9, 2.2.2-ubuntu22
January 02, 2024
New Dataproc on Compute Engine subminor image versions:
- 2.0.88-debian10, 2.0.88-rocky8, 2.0.88-ubuntu18
- 2.1.36-debian11, 2.1.36-rocky8, 2.1.36-ubuntu20, 2.1.36-ubuntu20-arm
2.2.2-debian12, 2.2.2-rocky9, 2.2.2-ubuntu22
Rollback Notice: See the January 4, 2024 release note rollback notice.
Dataproc on Compute Engine: Changed the Hive Server2 and MetaStore maximum default JVM heap size to 32GiB. Previously, the limit was set to 1/4 of total node memory, which could be too large on large-memory machines.
Dataproc on Compute Engine: Backported the patch for YARN-10975 in the latest 2.0 images.
December 21, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.43
- 2.0.51
- 2.1.30
- 2.2.0-RC3
December 18, 2023
New Dataproc on Compute Engine subminor image versions:
- 2.0.87-debian10, 2.0.87-rocky8, 2.0.87-ubuntu18
- 2.1.35-debian11, 2.1.35-rocky8, 2.1.35-ubuntu20, 2.1.35-ubuntu20-arm
- 2.2.1-debian12, 2.2.1-rocky9, 2.2.1-ubuntu22
December 14, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.42
- 2.0.50
- 2.1.29
- 2.2.0-RC2
Added the google-cloud-secret-manager
Python package in the latest Dataproc Serverless for Spark runtimes.
December 11, 2023
Announcing the GA release of Dataproc on Compute Engine image version 2.2 :
- 2.2.0-debian12, 2.2.0-rocky9, 2.2.0-ubuntu22
The 2.2.0 release includes the following components:
- Debian-12 / Ubuntu-2204 / RockyLinux 9
- Apache Hadoop 3.3.6
- Apache Spark 3.5.0
- Spark-BigQuery Connector 0.34.0
- Cloud Storage Connector 3.0.0
- Trino 432
- Apache Flink 1.17.0
- Apache Ranger 2.4.0
- Apache Solr 9.2.1
- R 4.2
- Hue 4.11.0
- JupyterLab Notebook 3.6
Monitoring-agent-defaults metrics are not available in Dataproc on Compute Engine image version 2.2 clusters unless the Ops Agent is installed. Other metrics for Dataproc provided components will continue to work.
Blocklisted the following Dataproc on Compute Engine Images due to issue with increase in startup time:
- 2.0.86-debian10, 2.0.86-rocky8, 2.0.86-ubuntu18
- 2.1.34-debian11, 2.1.34-rocky8, 2.1.34-ubuntu20, 2.1.34-ubuntu20-arm
December 06, 2023
Announcing the Preview release of Dataproc Serverless for Spark 2.2 runtime:
- Spark 3.5.0
- BigQuery Spark Connector 0.34.0
- Cloud Storage Connector 3.0.0-RC1
- Conda 23.10
- Java 17
- Python 3.12
- R 4.3
- Scala 2.13
New Dataproc Serverless for Spark runtime versions:
- 1.1.41
- 2.0.49
- 2.1.28
- 2.2.0-RC1
December 04, 2023
Added the Confidential Computing option on the "Manage Security" panel on the "Create a Dataproc cluster on Compute Engine" page in the Google Cloud console.
New Dataproc on Compute Engine subminor image versions:
- 2.0.85-debian10, 2.0.85-rocky8, 2.0.85-ubuntu18
- 2.1.33-debian11, 2.1.33-rocky8, 2.1.33-ubuntu20, 2.1.33-ubuntu20-arm
Updated the Zookeeper component version from 3.8.0 to 3.8.3 in the latest Dataproc on Compute Engine 2.1 image version.
Fixed Dataproc Hub issue in latest Dataproc on Compute Engine 2.1 image.
Backported HIVE-21698 in Hive 3.1.3 component in latest Dataproc on Compute Engine image versions.
December 01, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.40
- 2.0.48
- 2.1.27
The Cloud Storage connector has been upgraded to version 2.2.18 in all Dataproc Serverless for Spark runtimes.
November 17, 2023
New Dataproc on Compute Engine subminor image versions:
- 2.0.84-debian10, 2.0.84-rocky8, 2.0.84-ubuntu18
- 2.1.32-debian11, 2.1.32-rocky8, 2.1.32-ubuntu20, 2.1.32-ubuntu20-arm
- 2.2.0-RC3-debian11/-ubuntu22/-rocky9
Upgraded the Cloud Storage connector version to 2.2.18 in the latest 2.0 and 2.1 Dataproc on Compute Engine image versions.
In the Flink component in the latest Dataproc on Compute Engine 2.1 image version, added the following java-storage client properties:
gs.retry.max.attempts
property to set the max number of retry attemptsgs.retry.total.timeout
property to set the total retry timeout
Fixed a regression in the Zeppelin websocket rules that caused a websocket error in Zeppelin notebooks.
The Python kernel does not work in Zeppelin on the Dataproc on Compute Engine 2.1 image version. Other kernels are not impacted.
The Zeppelin REST API does not work (drops query parameters) on Dataproc on Compute Engine 2.0 and 2.1 image versions via the Component Gateway. Other Zeppelin interactions can also break as a result of dropped query parameters.
November 15, 2023
You can use CMEK (Customer Managed Encrytion Keys) with encrypted Dataproc cluster data, incuding persistent disk data, job arguments and queries submitted with Dataproc jobs, and cluster data saved in the cluster Dataproc staging bucket. See Use CMEK with cluster data for more information.
November 10, 2023
Announcing the General Availability (GA) release of Dataproc Jupyter Plugin and its availability in Vertex AI Workbench instance notebooks.
New Dataproc on Compute Engine subminor image versions:
- 2.0.83-debian10, 2.0.83-rocky8, 2.0.83-ubuntu18
- 2.1.31-debian11, 2.1.31-rocky8, 2.1.31-ubuntu20, 2.1.31-ubuntu20-arm
November 08, 2023
Announcing the release of Workflow Template CMEK (Customer Managed Encryption Key) encryption. Use this feature to apply CMEK encryption to workflow template job arguments. For example, when this feature is enabled, the query string of a workflow template SparkSQL job is encrypted using CMEK.
You can now use Dataproc Serverless autoscaling V2 to help you manage Dataproc Serverless workloads, improve workload performance, and save costs.
November 07, 2023
Set spark.shuffle.mapOutput.minSizeForBroadcast=128m
to fix SPARK-38101
when Dataproc Serverless Spark dynamic allocation is enabled.
November 01, 2023
Announcing the Preview release of Dataproc Flexible VMs. This feature lets you specify prioritized lists of secondary worker VM types that Dataproc will select from when creating your cluster. Dataproc will select the VM type with sufficient available capacity while taking quotas and reservations into account.
October 30, 2023
New Dataproc on Compute Engine subminor image versions:
- 2.0.82-debian10, 2.0.82-rocky8, 2.0.82-ubuntu18
- 2.1.30-debian11, 2.1.30-rocky8, 2.1.30-ubuntu20, 2.1.30-ubuntu20-arm
Added spark.dataproc.scaling.version=2
config to let customers control the Dataproc Serverless for Spark autoscaling version.
Increased the TTL for Dataproc on Compute Engine custom images from 60 days to 365 days.
Fixed Knox rewrite rules for Zeppelin URLs in some cases in the latest 2.0 and 2.1 Dataproc on Compute Engine image versions.
October 27, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.37
- 2.0.45
- 2.1.24
October 25, 2023
Announcing the General Availability (GA) release of Dataproc Serverless GPU accelerators.
October 23, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.36
- 2.0.44
- 2.1.23
Dataproc on Compute Engine: Dataproc now collects the dataproc.googleapis.com/job/yarn/vcore_seconds
and dataproc.googleapis.com/job/yarn/memory_seconds
job-level resource attribution metrics to track YARN application vcore and memory usage during job execution. These metrics are collected by default and are not chargeable to customers.
Dataproc on Compute Engine: Dataproc now collects a dataproc.googleapis.com/node/yarn/nodemanager/health
health metric to track the health of individual YARN node managers running on VMs. This metric is written against the gce_instance
monitored resource to help you find suspect nodes. It is collected by default and is not chargeable to customers.
Dataproc on Compute Engine: Properties dataproc:agent.ha.enabled
and dataproc:componentgateway.ha.enabled
now default to true
to provide high availability for the Dataproc Agent and Component Gateway.
October 13, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.35
- 2.0.43
- 2.1.22
October 12, 2023
New Dataproc on Compute Engine subminor image versions:
- 2.0.80-debian10, 2.0.80-rocky8, 2.0.80-ubuntu18
- 2.1.28-debian11, 2.1.28-rocky8, 2.1.28-ubuntu20, 2.1.28-ubuntu20-arm
October 09, 2023
Announcing the General Availability (GA) release of Dataproc Serverless for Spark Interactive sessions.
October 06, 2023
New Dataproc on Compute Engine image version 2.2
is available for preview with upgraded components.
New Dataproc on Compute Engine subminor image versions:
- 2.0.79-debian10, 2.0.79-rocky8, 2.0.79-ubuntu18
- 2.1.27-debian11, 2.1.27-rocky8, 2.1.27-ubuntu20, 2.1.27-ubuntu20-arm
- 2.2.0-RC2-debian11, 2.2.0-RC2-rocky9, 2.2.0-RC2-ubuntu22
Upgraded Hadoop version from 3.3.3
to 3.3.6
in the latest Dataproc on Compute Engine 2.1 image version.
New Dataproc Serverless for Spark runtime versions:
- 1.1.34
- 2.0.42
- 2.1.21
Upgraded the Cloud Storage connector version to 2.2.17 in the latest Dataproc Serverless for Spark runtimes.
Added the gs.http.connect-timeout
and gs.http.read-timeout
properties in Flink to
set the connection timeout and read timeout for java-storage client
in the latest Dataproc on Compute Engine 2.1 image version.
Added the gs.filesink.entropy.enabled
property in Flink to enable entropy
injection in filesink Cloud Storage path in the latest Dataproc on Compute Engine 2.1 image version.
September 28, 2023
New Dataproc on Compute Engine subminor image versions:
- 2.0.78-debian10, 2.0.78-rocky8, 2.0.78-ubuntu18
- 2.1.26-debian11, 2.1.26-rocky8, 2.1.26-ubuntu20, 2.1.26-ubuntu20-arm
Upgraded the Cloud Storage connector version to 2.2.17 in the latest 2.0 and 2.1 Dataproc on Compute Engine image versions.
Upgraded Hive version from 3.1.2
to 3.1.3
in the latest Dataproc on Compute Engine 2.0 image version.
September 22, 2023
New Dataproc on Compute Engine subminor image versions:
- 2.0.77-debian10, 2.0.77-rocky8, 2.0.77-ubuntu18
- 2.1.25-debian11, 2.1.25-rocky8, 2.1.25-ubuntu20, 2.1.25-ubuntu20-arm
New Dataproc Serverless for Spark runtime versions:
- 1.1.32
- 2.0.40
- 2.1.19
In the latest Dataproc on Compute Engine 2.0 and 2.1 image versions, unset the CLOUDSDK_PYTHON
variable to allow the gcloud
command-line tool to use its bundled Python interpreter.
Fixed Jupyter notebooks bug that made Scala compilation errors invisible with the Toree kernel in Dataproc on Compute Engine 2.1 images.
September 19, 2023
Dataproc is now available in the me-central2
region (Dammam, Saudi Arabia).
September 15, 2023
New Dataproc on Compute Engine subminor image versions:
- 2.0.76-debian10, 2.0.76-rocky8, 2.0.76-ubuntu18
- 2.1.24-debian11, 2.1.24-rocky8, 2.1.24-ubuntu20, 2.1.24-ubuntu20-arm
New Dataproc Serverless for Spark runtime versions:
- 1.1.31
- 2.0.39
- 2.1.18
Scala has been upgraded to version 2.12.18
and Apache Tez has been upgraded to version 0.10.2
in Dataproc on Compute Engine 2.1 images.
September 13, 2023
Announcing the Private Preview release of the Dataproc on Compute Engine Flink Jobs resource. During Private Preview, you can contact your Google Cloud Sales representative to have your project(s) added to an allowlist to allow you to submit Flink jobs to the Dataproc on Compute Engine service.
September 12, 2023
The dataproc.diagnostics.enabled
property is now avaiable to enable running diagnostics on Dataproc Serverless for Spark. The existing spark.dataproc.diagnostics.enabled
property will be deprecated for use with newer runtimes.
September 08, 2023
Dataproc Auto zone placement for clusters is now available in the Google Cloud console by selecting the "Any" option for the cluster zone.
New Dataproc Serverless for Spark runtime versions:
- 1.1.30
- 2.0.38
- 2.1.17
New Dataproc on Compute Engine subminor image versions:
- 2.0.75-debian10, 2.0.75-rocky8, 2.0.75-ubuntu18
- 2.1.23-debian11, 2.1.23-rocky8, 2.1.23-ubuntu20, 2.1.23-ubuntu20-arm
The Apache Spark version has been upgraded from 3.3.0
to 3.3.2
in Dataproc on Compute Engine 2.1
images.
September 04, 2023
Announcing the General Availability (GA) release of Data Lineage for Dataproc, which captures data transformations (lineage events) in Dataproc Spark jobs, and publishes them to Dataplex Lineage.
Dataproc Serverless Interactive sessions detail and list pages are now available in the Google Cloud console.
August 29, 2023
Announcing the Preview release of Dataproc Serverless for Spark Interactive sessions and the Dataproc Jupyter Plugin.
August 25, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.29
- 2.0.37
- 2.1.16
August 23, 2023
Fixed a Dataproc Serverless issue where Spark batches failed with unhelpful error messages.
August 22, 2023
Dataproc is now available in the europe-west10
region (Berlin).
August 17, 2023
New Dataproc on Compute Engine subminor image versions:
- 2.0.74-debian10, 2.0.74-rocky8, 2.0.74-ubuntu18
- 2.1.22-debian11, 2.1.22-rocky8, 2.1.22-ubuntu20, 2.1.22-ubuntu20-arm
New Dataproc Serverless for Spark runtime versions:
- 1.1.28
- 2.0.36
- 2.1.15
Backported the patches for HIVE-20618 in the new Dataproc on Compute Engine 2.0 and 2.1 images.
August 11, 2023
New Dataproc on Compute Engine subminor image versions:
- 2.0.73-debian10, 2.0.73-rocky8, 2.0.73-ubuntu18
- 2.1.21-debian11, 2.1.21-rocky8, 2.1.21-ubuntu20, 2.1.21-ubuntu20-arm
New Dataproc Serverless for Spark runtime versions:
- 1.1.27
- 2.0.35
- 2.1.14
Added new Dataproc Serverless Templates for batch workload creation:
- Cloud Spanner to Cloud Storage
- Cloud Storage to JDBC
- Cloud Storage to Cloud Storage
- Hive to BigQuery
- JDBC to Cloud Spanner
- JDBC to JDBC
- Pub/Sub to Cloud Storage
Improved the reliability of Dataproc Serverless compute node initialization with a Premium disk tier option.
August 07, 2023
Added a dataproc:dataproc.cluster.caching.enabled
flag to enable and disable Dataproc on Compute Engine cluster caching. The flag is false
by default. Use this feature with the latest Dataproc on Compute Engine images.
August 06, 2023
The following previously released sub-minor versions of Dataproc on Compute Engine images unintentionally reverted several dependency library versions. This caused a risk of backward-incompatibility for some workloads.
These sub-minor versions have been rolled back, and can only be used when updating existing clusters that already use them:
- 2.0.71-debian10, 2.0.71-rocky8, 2.0.71-ubuntu18
- 2.1.19-debian11, 2.1.19-rocky8, 2.1.19-ubuntu20, 2.1.19-ubuntu20-arm
August 05, 2023
New Dataproc on Compute Engine image versions:
- 2.0.72-debian10, 2.0.72-rocky8, 2.0.72-ubuntu18
- 2.1.20-debian11, 2.1.20-rocky8, 2.1.20-ubuntu20, 2.1.20-ubuntu20-arm
Upgraded Hudi to 0.12.3 and added the BigQuery Sync tool as part of the Hudi optional component.
Downgraded Cloud Storage connector version to 2.2.15 in all Dataproc on Compute Engine image versions to prevent potential performance regression.
Backported ZEPPELIN-5434 to image 2.1 to fix CVE-2022-2048.
Backported the patches for HIVE-22170 and HIVE-22331.
August 03, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.26
- 2.0.34
- 2.1.13
Downgraded Cloud Storage connector to 2.2.15 version in all Dataproc Serverless for Spark runtimes to prevent potential performance regression.
July 30, 2023
New Dataproc on Compute Engine image versions:
- 2.0.71-debian10, 2.0.72-rocky8, 2.0.72-ubuntu18
- 2.1.19-debian11, 2.1.20-rocky8, 2.1.20-ubuntu20, 2.1.20-ubuntu20-arm
Note: The above image versions were rolled back. See the August 6, 2023 release note
The Maximum total memory per core for Dataproc Serverless Premium compute tiers has increased to 24576m (7424m for Standard compute tiers unchanged). See Dataproc Serverless Resource allocation properties.
July 28, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.25
- 2.0.33
- 2.1.12
July 26, 2023
Clusters cannot be created with a driver node group if the cluster image version is older than 2.0.57 or 2.1.5, or if the permissions for the staging bucket are missing.
Added recommendation details in Autoscaler Stackdriver logs for the CANCEL
and DO_NOT_CANCEL
recommendations.
July 21, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.24
- 2.0.32
- 2.1.11
New Dataproc on Compute Engine image versions, which includes a 2.1.18-ubuntu20-arm
image that supports ARM machine types:
- 2.0.70-debian10, 2.0.70-rocky8, 2.0.70-ubuntu18
- 2.1.18-debian11, 2.1.18-rocky8, 2.1.18-ubuntu20, 2.1.18-ubuntu20-arm
Fixed a race condition in Spark startup that could lead to nodes failing to initialize when using premium
disk tier.
July 14, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.23
- 2.0.31
- 2.1.10
Clusters that use a driver node group now configure YARN queues with user-limit-factor
set to 2, allowing for a single user to burst to 2x utilization of capacity, which is set to 50. This achieves better resource utilization for workloads submitted by a single user.
Upgraded the Cloud Storage connector version to 2.2.16 in Dataproc Serverless for Spark runtimes.
July 10, 2023
New Dataproc on Compute Engine image versions:
- 2.0.69-debian10, 2.0.69-rocky8, 2.0.69-ubuntu18
- 2.1.17-debian11, 2.1.17-rocky8, 2.1.17-ubuntu20
Upgraded the Cloud Storage connector version to 2.2.16 for Dataproc on Compute Engine 2.0 and 2.1 images.
July 07, 2023
Dataproc Serverless Spark 1.1 and 2.0 runtime subminor versions can now be used 365 days after their release (instead of 90 days).
The goog-dataproc-batch-id
, goog-dataproc-batch-uuid
and goog-dataproc-location
labels are now automatically applied to Dataproc Serverless batch resources.
Dataproc Serverless for Spark now supports updating the BigQuery connector using the dataproc.sparkBqConnector.version
and dataproc.sparkBqConnector.uri
properties
see Use the BigQuery connector with Dataproc Serverless for Spark.
July 06, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.22
- 2.0.30
- 2.1.9
June 29, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.21
- 2.0.29
- 2.1.8
Added support for Premium compute and storage pricing tiers for Dataproc Serverless Spark workloads. Premium compute offers higher performance per core, and Premium storage offers higher throughput and IOPs. To use Premium compute and storage, set the following Spark runtime environment properties:
spark.dataproc.(driver|executor).compute.tier=premium
spark.dataproc.(driver|executor).storage.tier=premium
.
June 28, 2023
New Dataproc on Compute Engine subminor image versions:
- 2.0.68-debian10, 2.0.68-rocky8, 2.0.68-ubuntu18
- 2.1.16-debian11, 2.1.16-rocky8, 2.1.16-ubuntu20
Backported ZEPPELIN-5755 to Zeppelin 0.10 in 2.1 images for Spark 3.3 support.
June 26, 2023
Added Dataproc Serverless Templates for batch creation:
- Cloud Storage to BigQuery
- Cloud Storage to Cloud Spanner
- Hive to Cloud Storage
- JDBC to BigQuery
- JDBC to Cloud Storage
June 22, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.20
- 2.0.28
- 2.1.7
June 16, 2023
New Dataproc on Compute Engine subminor image versions:
- 2.0.67-debian10, 2.0.67-rocky8, 2.0.67-ubuntu18
- 2.1.15-debian11, 2.1.15-rocky8, 2.1.15-ubuntu20
Fixed a bug that caused cluster creation to fail when ATSv2
is enabled for tables that have a garbage collection policy setup other than maxversions
.
June 14, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.19
- 2.0.27
- 2.1.6
June 08, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.18
- 2.0.26
- 2.1.5
June 02, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.17
- 2.0.25
- 2.1.4
Upgrade Cloud Storage connector to 2.2.14 version in Dataproc Serverless for Spark runtimes.
June 01, 2023
New sub-minor versions of Dataproc images:
- 2.0.66-debian10, 2.0.66-rocky8, 2.0.66-ubuntu18
- 2.1.14-debian11, 2.1.14-rocky8, 2.1.14-ubuntu20
Upgrade Cloud Storage connector version to 2.2.14 for 2.0 and 2.1 images
Backport HIVE-22891, HIVE-21660, HIVE-21915 to 2.0 images.
Backport HIVE-22891, HIVE-21660, HIVE-25520, HIVE-25521 to 2.1 images.
May 26, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.16
- 2.0.24
- 2.1.3
New sub-minor versions of Dataproc images:
- 2.0.65-debian10, 2.0.65-rocky8, 2.0.65-ubuntu18
- 2.1.13-debian11, 2.1.13-rocky8, 2.1.13-ubuntu20
May 24, 2023
Upgraded the Cloud Storage connector to 2.2.13 version in Dataproc on Compute Engine 2.0 and 2.1 image versions.
Unauthorized callers attempting to get, delete, or terminate non-existent Sessions will now receive a 403 response code instead of a 404 response code. This does not impact authorized callers.
Fixed Serverless history server endpoint URL when Persistent History Server (PHS) was setup without using a wildcard.
May 19, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.15
- 2.0.23
- 2.1.2
Upgraded the Cloud Storage connector to 2.2.13 version in Dataproc Serverless for Spark runtimes.
Fixed the NoClassDefFoundError
for log4j
class in Zeppelin BigQuery interpreter in 2.0 images.
Backported HIVE-22891
to 2.0 images.
May 18, 2023
New sub-minor versions of Dataproc images:
- 2.0.64-debian10, 2.0.64-rocky8, 2.0.64-ubuntu18
- 2.1.12-debian11, 2.1.12-rocky8, 2.1.12-ubuntu20
You can now use --properties=dataproc:componentgateway.ha.enabled=true
to enable the Dataproc Component Gateway and Knox along with the Spark History Server (SHS) UI in HA mode.
May 11, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.14
- 2.0.22
- 2.1.1
May 05, 2023
Announcing the General Availability (GA) release of Dataproc Serverless for Spark runtime version 2.1, which includes the following components:
- Spark 3.4.0
- BigQuery Spark Connector 0.28.1
- Cloud Storage Connector 2.2.11
- Conda 23.3
- Java 17
- Python 3.11
- R 4.2
- Scala 2.13
New Dataproc Serverless for Spark runtime versions:
- 1.1.13
- 2.0.21
- 2.1.0
Upgraded Conda to 23.3 in Dataproc Serverless for Spark runtime 2.1.
April 28, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.12
- 2.0.20
- 2.1.0-RC8
Upgrade Spark to 3.4.0 and its dependencies in Dataproc Serverless for Spark 2.1 runtime:
- Jetty to 9.4.51.v20230217
- ORC to 1.8.3
- Parquet to 1.13.0
- Protobuf to 3.22.3
New sub-minor versions of Dataproc images:
- 1.5.89-debian10, 1.5.89-rocky8, 1.5.89-ubuntu18
- 2.0.63-debian10, 2.0.63-rocky8, 2.0.63-ubuntu18
- 2.1.11-debian11, 2.1.11-rocky8, 2.1.11-ubuntu20
hive
principal will be used for Hive catalog queries via presto in kerberos cluster.
April 24, 2023
Dataproc now supports the usage of cross-project service account.
Autoscaler recommendation reasoning details are available now in Cloud Logging logs.
Default batch TTL is set to 4 hours for Dataproc Serverless for Spark runtime version 2.1.
April 20, 2023
New sub-minor versions of Dataproc images:
- 1.5.88-debian10, 1.5.88-rocky8, 1.5.88-ubuntu18
- 2.0.62-debian10, 2.0.62-rocky8, 2.0.62-ubuntu18
- 2.1.10-debian11, 2.1.10-rocky8, 2.1.10-ubuntu20
Running Spark jobs with the DataprocFileOutoutputCommitter is now supported. Enable the committer for Spark applications that write to a Cloud Storage destination concurrently.
April 18, 2023
Add Autoscaler recommendation reasoning details in Cloud Logging.
Dataproc on GKE SLM force delete timeout exception converted to DataprocIoException
.
April 17, 2023
Announcing Dataproc General Availability (GA) support for CMEK organization policy.
April 14, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.11
- 2.0.19
- 2.1.0-RC7
Make spark
user an owner for all items in the driver working directory for Dataproc Serverless for Spark workloads to fix permissions issues after Hadoop upgrade to 3.3.5.
April 06, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.10
- 2.0.18
- 2.1.0-RC6
Upgrade Hadoop to 3.3.5 in Dataproc Serverless for Spark runtimes
April 04, 2023
Announcing the General Availability (GA) release of Key Access Justifications for Dataproc.
March 30, 2023
Dataproc is now available in the me-central1
region (Doha).
March 28, 2023
New sub-minor versions of Dataproc images:
- 1.5.87-debian10, 1.5.87-rocky8, 1.5.87-ubuntu18
- 2.0.61-debian10, 2.0.61-rocky8, 2.0.61-ubuntu18
- 2.1.9-debian11, 2.1.9-rocky8, 2.1.9-ubuntu20
Dataproc cluster creation now supports the pd-extreme
disk type.
Dataproc on GKE now disallows update operations.
Dataproc on GKE diagnose operation now verifies that the master agent is running.
March 27, 2023
New sub-minor versions of Dataproc images:
- 1.5.86-debian10, 1.5.86-rocky8, 1.5.86-ubuntu18
- 2.0.60-debian10, 2.0.60-rocky8, 2.0.60-ubuntu18
- 2.1.8-debian11, 2.1.8-rocky8, 2.1.8-ubuntu20
New Dataproc Serverless for Spark runtime versions:
- 1.1.9
- 2.0.17
- 2.1.0-RC5
March 24, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.8
- 2.0.16
- 2.1.0-RC4
Upgrade Python to 3.11 and Conda to 23.1 in Dataproc Serverless for Spark runtime 2.1
March 23, 2023
Dataproc is now available in the europe-west12
region (Turin).
March 17, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.7
- 2.0.15
- 2.1.0-RC3
March 16, 2023
New sub-minor versions of Dataproc images:
- 1.5.85-debian10, 1.5.85-rocky8, 1.5.85-ubuntu18
- 2.0.59-debian10, 2.0.59-rocky8, 2.0.59-ubuntu18
- 2.1.7-debian11, 2.1.7-rocky8, 2.1.7-ubuntu20
- Upgrade Flink to 1.15.3 from 1.15.0 in 2.1 images
March 10, 2023
New Dataproc Serverless for Spark runtime versions:
- 1.1.6
- 2.0.14
- 2.1.0-RC2