You are viewing documentation for an unreleased version of Debezium.
If you want to view the latest stable version of this page, please go here.

Debezium Connector for Oracle

Table of Contents

Overview

Debezium’s Oracle connector captures and records row-level changes that occur in databases on an Oracle server, including tables that are added while the connector is running. You can configure the connector to emit change events for specific subsets of schemas and tables, or to ignore, mask, or truncate values in specific columns.

For information about the Oracle Database versions that are compatible with this connector, see the Debezium release overview.

Debezium can ingest change events from Oracle by using the native LogMiner database package, the XStream API, or OpenLogReplicator.

How the Oracle connector works

To optimally configure and run a Debezium Oracle connector, it is helpful to understand how the connector works.

Adapter Modes

The Debezium Oracle connector supports multiple adapters for capturing changes from the Oracle database transaction logs. You can configure the connector to use a specific adapter by setting the database.connection.adapter configuration property.

The connector supports the following adapters:

LogMiner

By default, the Debezium Oracle connector uses the native Oracle LogMiner API to read and stream changes from database transaction logs. You can configure Debezium to use LogMiner in either of the following distinct modes:

Uncommitted changes mode

In uncommitted changes mode, the Debezium Oracle connector receives a continuous stream of change events for both committed and in-flight uncommitted transactions. Because Oracle interleaves operations from multiple transactions, the connector must buffer events until a corresponding commit or rollback is detected.

Because this the default mode for the connector, it’s important to allocate enough memory to the connector to handle the largest transactions and highest concurrency levels across all captured tables.

Uncommitted changes mode minimizes the load on the source database by delegating most processing to the connector. Oracle LogMiner is responsible only for reading redo and archive logs from disk and streaming the raw changes.

Enable this mode by setting the database.connection.adapter property to logminer (default).

Committed changes mode

In committed changes mode, the Debezium Oracle connector instructs Oracle LogMiner to stream only committed changes. Because the connector receives only committed events, it can immediately forward them to destination topics without maintaining an internal transaction buffer.

This mode requires significantly less connector memory, but shifts a much greater processing and memory burden to the source database. For large transactions, this can easily exceed the database’s available PGA memory, potentially leading to PGA_AGGREGATE_LIMIT errors.

This mode is recommended only if you can guarantee that all transactions fit within the database’s PGA memory space. Otherwise, use the uncommitted changes mode.

Enable this mode by setting the database.connection.adapter property to logminer_unbuffered.

This feature is currently incubating and might change in future releases.

OpenLogReplicator

You can configure the Debezium Oracle connector to use OpenLogReplicator, an open-source application that reads Oracle changes directly from the redo and archive logs with minimal impact on the database.

Enable OpenLogReplicator by setting the database.connection.adapter property to olr. For more information, see the OpenLogReplicator support section.

Oracle XStream

The Debezium Oracle connector can also use Oracle XStream, a commercial component of Oracle GoldenGate.

Enable Oracle XStream by setting the database.connection.adapter property to xstream. For more details, see the XStream section.

Snapshots

Typically, the redo logs on an Oracle server are configured to not retain the complete history of the database. As a result, the Debezium Oracle connector cannot retrieve the entire history of the database from the logs. To enable the connector to establish a baseline for the current state of the database, the first time that the connector starts, it performs an initial consistent snapshot of the database.

If the time needed to complete the initial snapshot exceeds the UNDO_RETENTION time that is set for the database (fifteen minutes, by default), an ORA-01555 exception can occur. For more information about the error, and about the steps that you can take to recover from it, see the Frequently asked questions.

During a table’s snapshot, it’s possible for Oracle to raise an ORA-01466 exception. This happens when a user modifies the schema of the table or adds, changes, or drops an index or related object associated with the table being snapshot. In the event this happens, the connector will stop and the initial snapshot will need to be taken from the beginning.

To remediate the problem, you can configure the snapshot.database.errors.max.retries property with a value greater than 0 so that the specific table’s snapshot will restart. While the entire snapshot will not start from the beginning when retrying, the specific table in question will be re-read from the beginning and the table’s topic will contain duplicate snapshot events.

Default workflow that the Oracle connector uses to perform an initial snapshot

The following workflow lists the steps that Debezium takes to create a snapshot. These steps describe the process for a snapshot when the snapshot.mode configuration property is set to its default value, which is initial. You can customize the way that the connector creates snapshots by changing the value of the snapshot.mode property. If you configure a different snapshot mode, the connector completes the snapshot by using a modified version of this workflow.

When the snapshot mode is set to the default, the connector completes the following tasks to create a snapshot:

  1. Establish a connection to the database.

  2. Determine the tables to be captured. By default, the connector captures all tables except those with schemas that exclude them from capture. After the snapshot completes, the connector continues to stream data for the specified tables. If you want the connector to capture data only from specific tables you can direct the connector to capture the data for only a subset of tables or table elements by setting properties such as table.include.list or table.exclude.list.

  3. Obtain a ROW SHARE MODE lock on each of the captured tables to prevent structural changes from occurring during creation of the snapshot. Debezium holds the locks for only a short time.

  4. Read the current system change number (SCN) position from the server’s redo log.

  5. Capture the structure of all database tables, or all tables that are designated for capture. The connector persists schema information in its internal database schema history topic. The schema history provides information about the structure that is in effect when a change event occurs.

    By default, the connector captures the schema of every table in the database that is in capture mode, including tables that are not configured for capture. If tables are not configured for capture, the initial snapshot captures only their structure; it does not capture any table data. For more information about why snapshots persist schema information for tables that you did not include in the initial snapshot, see Understanding why initial snapshots capture the schema for all tables.

  6. Release the locks obtained in Step 3. Other database clients can now write to any previously locked tables.

  7. At the SCN position that was read in Step 4, the connector scans the tables that are designated for capture (SELECT * FROM …​ AS OF SCN 123). During the scan, the connector completes the following tasks:

    1. Confirms that the table was created before the snapshot began. If the table was created after the snapshot began, the connector skips the table. After the snapshot is complete, and the connector transitions to streaming, it emits change events for any tables that were created after the snapshot began.

    2. Produces a read event for each row that is captured from a table. All read events contain the same SCN position, which is the SCN position that was obtained in step 4.

    3. Emits each read event to the Kafka topic for the source table.

    4. Releases data table locks, if applicable.

  8. Record the successful completion of the snapshot in the connector offsets.

The resulting initial snapshot captures the current state of each row in the captured tables. From this baseline state, the connector captures subsequent changes as they occur.

After the snapshot process begins, if the process is interrupted due to connector failure, rebalancing, or other reasons, the process restarts after the connector restarts. After the connector completes the initial snapshot, it continues streaming from the position that it read in Step 3 so that it does not miss any updates. If the connector stops again for any reason, after it restarts, it resumes streaming changes from where it previously left off.

Table 1. Settings for snapshot.mode connector configuration property
Setting Description

always

Perform snapshot on each connector start. After the snapshot completes, the connector begins to stream event records for subsequent database changes.

initial

The connector performs a database snapshot as described in the default workflow for creating an initial snapshot. After the snapshot completes, the connector begins to stream event records for subsequent database changes.

initial_only

The connector performs a database snapshot and stops before streaming any change event records, not allowing any subsequent change events to be captured.

schema_only

Deprecated, see no_data.

no_data

The connector captures the structure of all relevant tables, performing all of the steps described in the default snapshot workflow, except that it does not create READ events to represent the data set at the point of the connector’s start-up (Step 6).

schema_only_recovery

Deprecated, see recovery.

recovery

Set this option to restore a database schema history topic that is lost or corrupted. After a restart, the connector runs a snapshot that rebuilds the topic from the source tables. You can also set the property to periodically prune a database schema history topic that experiences unexpected growth.

WARNING: Do not use this mode to perform a snapshot if schema changes were committed to the database after the last connector shutdown.

when_needed

After the connector starts, it performs a snapshot only if it detects one of the following circumstances:

  • It cannot detect any topic offsets.

  • A previously recorded offset specifies a log position that is not available on the server.

configuration_based

Set the snapshot mode to configuration_based to control snapshot behavior through the set of connector properties that have the prefix 'snapshot.mode.configuration.based'.

custom

The custom snapshot mode lets you inject your own implementation of the io.debezium.spi.snapshot.Snapshotter interface. Set the snapshot.mode.custom.name configuration property to the name provided by the name() method of your implementation. The name is specified on the classpath of your Kafka Connect cluster. If you use the Debezium DebeziumEngine, the name is included in the connector JAR file. For more information, see custom snapshotter SPI.

For more information, see snapshot.mode in the table of connector configuration properties.

Understanding why initial snapshots capture the schema history for all tables

The initial snapshot that a connector runs captures two types of information:

Table data

Information about INSERT, UPDATE, and DELETE operations in tables that are named in the connector’s table.include.list property.

Schema data

DDL statements that describe the structural changes that are applied to tables. Schema data is persisted to both the internal schema history topic, and to the connector’s schema change topic, if one is configured.

After you run an initial snapshot, you might notice that the snapshot captures schema information for tables that are not designated for capture. By default, initial snapshots are designed to capture schema information for every table that is present in the database, not only from tables that are designated for capture. Connectors require that the table’s schema is present in the schema history topic before they can capture a table. By enabling the initial snapshot to capture schema data for tables that are not part of the original capture set, Debezium prepares the connector to readily capture event data from these tables should that later become necessary. If the initial snapshot does not capture a table’s schema, you must add the schema to the history topic before the connector can capture data from the table.

In some cases, you might want to limit schema capture in the initial snapshot. This can be useful when you want to reduce the time required to complete a snapshot. Or when Debezium connects to the database instance through a user account that has access to multiple logical databases, but you want the connector to capture changes only from tables in a specific logic database.

Additional information

Capturing data from tables not captured by the initial snapshot (no schema change)

In some cases, you might want the connector to capture data from a table whose schema was not captured by the initial snapshot. Depending on the connector configuration, the initial snapshot might capture the table schema only for specific tables in the database. If the table schema is not present in the history topic, the connector fails to capture the table, and reports a missing schema error.

You might still be able to capture data from the table, but you must perform additional steps to add the table schema.

Prerequisites
Procedure
  1. Stop the connector.

  2. Remove the internal database schema history topic that is specified by the schema.history.internal.kafka.topic property.

  3. In the connector configuration:

    1. Set the snapshot.mode to recovery.

    2. (Optional) Set the value of schema.history.internal.store.only.captured.tables.ddl to false to ensure that in the future the connector can readily capture data for tables that are not currently designated for capture. Connectors can capture data from a table only if the table’s schema history is present in the history topic.

    3. Add the tables that you want the connector to capture to table.include.list.

  4. Restart the connector. The snapshot recovery process rebuilds the schema history based on the current structure of the tables.

  5. (Optional) After the snapshot completes, initiate an incremental snapshot on the newly added tables. The incremental snapshot first streams the historical data of the newly added tables, and then resumes reading changes from the redo and archive logs for previously configured tables, including changes that occur while that connector was off-line.

  6. (Optional) Reset the snapshot.mode back to no_data to prevent the connector from initiating recovery after a future restart.

Capturing data from tables not captured by the initial snapshot (schema change)

If a schema change is applied to a table, records that are committed before the schema change have different structures than those that were committed after the change. When Debezium captures data from a table, it reads the schema history to ensure that it applies the correct schema to each event. If the schema is not present in the schema history topic, the connector is unable to capture the table, and an error results.

If you want to capture data from a table that was not captured by the initial snapshot, and the schema of the table was modified, you must add the schema to the history topic, if it is not already available. You can add the schema by running a new schema snapshot, or by running an initial snapshot for the table.

Prerequisites
  • You want to capture data from a table with a schema that the connector did not capture during the initial snapshot.

  • A schema change was applied to the table so that the records to be captured do not have a uniform structure.

Procedure
Initial snapshot captured the schema for all tables (store.only.captured.tables.ddl was set to false)
  1. Edit the table.include.list property to specify the tables that you want to capture.

  2. Restart the connector.

  3. Initiate an incremental snapshot if you want to capture existing data from the newly added tables.

Initial snapshot did not capture the schema for all tables (store.only.captured.tables.ddl was set to true)

If the initial snapshot did not save the schema of the table that you want to capture, complete one of the following procedures:

Procedure 1: Schema snapshot, followed by incremental snapshot

In this procedure, the connector first performs a schema snapshot. You can then initiate an incremental snapshot to enable the connector to synchronize data.

  1. Stop the connector.

  2. Remove the internal database schema history topic that is specified by the schema.history.internal.kafka.topic property.

  3. Clear the offsets in the configured Kafka Connect offset.storage.topic. For more information about how to remove offsets, see the Debezium community FAQ.

    Removing offsets should be performed only by advanced users who have experience in manipulating internal Kafka Connect data. This operation is potentially destructive, and should be performed only as a last resort.

  4. Set values for properties in the connector configuration as described in the following steps:

    1. Set the value of the snapshot.mode property to no_data.

    2. Edit the table.include.list to add the tables that you want to capture.

  5. Restart the connector.

  6. Wait for Debezium to capture the schema of the new and existing tables. Data changes that occurred any tables after the connector stopped are not captured.

  7. To ensure that no data is lost, initiate an incremental snapshot.

Procedure 2: Initial snapshot, followed by optional incremental snapshot

In this procedure the connector performs a full initial snapshot of the database. As with any initial snapshot, in a database with many large tables, running an initial snapshot can be a time-consuming operation. After the snapshot completes, you can optionally trigger an incremental snapshot to capture any changes that occur while the connector is off-line.

  1. Stop the connector.

  2. Remove the internal database schema history topic that is specified by the schema.history.internal.kafka.topic property.

  3. Clear the offsets in the configured Kafka Connect offset.storage.topic. For more information about how to remove offsets, see the Debezium community FAQ.

    Removing offsets should be performed only by advanced users who have experience in manipulating internal Kafka Connect data. This operation is potentially destructive, and should be performed only as a last resort.

  4. Edit the table.include.list to add the tables that you want to capture.

  5. Set values for properties in the connector configuration as described in the following steps:

    1. Set the value of the snapshot.mode property to initial.

    2. (Optional) Set schema.history.internal.store.only.captured.tables.ddl to false.

  6. Restart the connector. The connector takes a full database snapshot. After the snapshot completes, the connector transitions to streaming.

  7. (Optional) To capture any data that changed while the connector was off-line, initiate an incremental snapshot.

Ad hoc snapshots

By default, a connector runs an initial snapshot operation only after it starts for the first time. Following this initial snapshot, under normal circumstances, the connector does not repeat the snapshot process. Any future change event data that the connector captures comes in through the streaming process only.

However, in some situations the data that the connector obtained during the initial snapshot might become stale, lost, or incomplete. To provide a mechanism for recapturing table data, Debezium includes an option to perform ad hoc snapshots. You might want to perform an ad hoc snapshot after any of the following changes occur in your Debezium environment:

  • The connector configuration is modified to capture a different set of tables.

  • Kafka topics are deleted and must be rebuilt.

  • Data corruption occurs due to a configuration error or some other problem.

You can re-run a snapshot for a table for which you previously captured a snapshot by initiating a so-called ad-hoc snapshot. Ad hoc snapshots require the use of signaling tables. You initiate an ad hoc snapshot by sending a signal request to the Debezium signaling table.

When you initiate an ad hoc snapshot of an existing table, the connector appends content to the topic that already exists for the table. If a previously existing topic was removed, Debezium can create a topic automatically if automatic topic creation is enabled.

Ad hoc snapshot signals specify the tables to include in the snapshot. The snapshot can capture the entire contents of the database, or capture only a subset of the tables in the database. Also, the snapshot can capture a subset of the contents of the table(s) in the database.

You specify the tables to capture by sending an execute-snapshot message to the signaling table. Set the type of the execute-snapshot signal to incremental or blocking, and provide the names of the tables to include in the snapshot, as described in the following table:

Table 2. Example of an ad hoc execute-snapshot signal record
Field Default Value

type

incremental

Specifies the type of snapshot that you want to run.
Currently, you can request incremental or blocking snapshots.

data-collections

N/A

An array that contains regular expressions matching the fully-qualified names of the tables to include in the snapshot.
For the Oracle connector, use the following format to specify the fully qualified name of a table: database.schema.table.

additional-conditions

N/A

An optional array that specifies a set of additional conditions that the connector evaluates to determine the subset of records to include in a snapshot.
Each additional condition is an object that specifies the criteria for filtering the data that an ad hoc snapshot captures. You can set the following parameters for each additional condition:

data-collection

The fully-qualified name of the table that the filter applies to. You can apply different filters to each table.

filter

Specifies column values that must be present in a database record for the snapshot to include it, for example, "color='blue'".

The values that you assign to the filter parameter are the same types of values that you might specify in the WHERE clause of SELECT statements when you set the snapshot.select.statement.overrides property for a blocking snapshot.

surrogate-key

N/A

An optional string that specifies the column name that the connector uses as the primary key of a table during the snapshot process.

Triggering an ad hoc incremental snapshot

You initiate an ad hoc incremental snapshot by adding an entry with the execute-snapshot signal type to the signaling table, or by sending a signal message to a Kafka signaling topic. After the connector processes the message, it begins the snapshot operation. The snapshot process reads the first and last primary key values and uses those values as the start and end point for each table. Based on the number of entries in the table, and the configured chunk size, Debezium divides the table into chunks, and proceeds to snapshot each chunk, in succession, one at a time.

For more information, see Incremental snapshots.

Triggering an ad hoc blocking snapshot

You initiate an ad hoc blocking snapshot by adding an entry with the execute-snapshot signal type to the signaling table or signaling topic. After the connector processes the message, it begins the snapshot operation. The connector temporarily stops streaming, and then initiates a snapshot of the specified table, following the same process that it uses during an initial snapshot. After the snapshot completes, the connector resumes streaming.

For more information, see Blocking snapshots.

Incremental snapshots

To provide flexibility in managing snapshots, Debezium includes a supplementary snapshot mechanism, known as incremental snapshotting. Incremental snapshots rely on the Debezium mechanism for sending signals to a Debezium connector. Incremental snapshots are based on the DDD-3 design document.

In an incremental snapshot, instead of capturing the full state of a database all at once, as in an initial snapshot, Debezium captures each table in phases, in a series of configurable chunks. You can specify the tables that you want the snapshot to capture and the size of each chunk. The chunk size determines the number of rows that the snapshot collects during each fetch operation on the database. The default chunk size for incremental snapshots is 1024 rows.

As an incremental snapshot proceeds, Debezium uses watermarks to track its progress, maintaining a record of each table row that it captures. This phased approach to capturing data provides the following advantages over the standard initial snapshot process:

  • You can run incremental snapshots in parallel with streamed data capture, instead of postponing streaming until the snapshot completes. The connector continues to capture near real-time events from the change log throughout the snapshot process, and neither operation blocks the other.

  • If the progress of an incremental snapshot is interrupted, you can resume it without losing any data. After the process resumes, the snapshot begins at the point where it stopped, rather than recapturing the table from the beginning.

  • You can run an incremental snapshot on demand at any time, and repeat the process as needed to adapt to database updates. For example, you might re-run a snapshot after you modify the connector configuration to add a table to its table.include.list property.

Incremental snapshot process

When you run an incremental snapshot, Debezium sorts each table by primary key and then splits the table into chunks based on the configured chunk size. Working chunk by chunk, it then captures each table row in a chunk. For each row that it captures, the snapshot emits a READ event. That event represents the value of the row when the snapshot for the chunk began.

As a snapshot proceeds, it’s likely that other processes continue to access the database, potentially modifying table records. To reflect such changes, INSERT, UPDATE, or DELETE operations are committed to the transaction log as per usual. Similarly, the ongoing Debezium streaming process continues to detect these change events and emits corresponding change event records to Kafka.

How Debezium resolves collisions among records with the same primary key

In some cases, the UPDATE or DELETE events that the streaming process emits are received out of sequence. That is, the streaming process might emit an event that modifies a table row before the snapshot captures the chunk that contains the READ event for that row. When the snapshot eventually emits the corresponding READ event for the row, its value is already superseded. To ensure that incremental snapshot events that arrive out of sequence are processed in the correct logical order, Debezium employs a buffering scheme for resolving collisions. Only after collisions between the snapshot events and the streamed events are resolved does Debezium emit an event record to Kafka.

Snapshot window

To assist in resolving collisions between late-arriving READ events and streamed events that modify the same table row, Debezium employs a so-called snapshot window. The snapshot window demarcates the interval during which an incremental snapshot captures data for a specified table chunk. Before the snapshot window for a chunk opens, Debezium follows its usual behavior and emits events from the transaction log directly downstream to the target Kafka topic. But from the moment that the snapshot for a particular chunk opens, until it closes, Debezium performs a de-duplication step to resolve collisions between events that have the same primary key..

For each data collection, the Debezium emits two types of events, and stores the records for them both in a single destination Kafka topic. The snapshot records that it captures directly from a table are emitted as READ operations. Meanwhile, as users continue to update records in the data collection, and the transaction log is updated to reflect each commit, Debezium emits UPDATE or DELETE operations for each change.

As the snapshot window opens, and Debezium begins processing a snapshot chunk, it delivers snapshot records to a memory buffer. During the snapshot windows, the primary keys of the READ events in the buffer are compared to the primary keys of the incoming streamed events. If no match is found, the streamed event record is sent directly to Kafka. If Debezium detects a match, it discards the buffered READ event, and writes the streamed record to the destination topic, because the streamed event logically supersede the static snapshot event. After the snapshot window for the chunk closes, the buffer contains only READ events for which no related transaction log events exist. Debezium emits these remaining READ events to the table’s Kafka topic.

The connector repeats the process for each snapshot chunk.

To enable Debezium to perform incremental snapshots, you must grant the connector permission to write to the signaling table.

Write permission is unnecessary only for connectors that can be configured to perform read-only incrementals snapshots (MariaDB, MySQL, or PostgreSQL).

Currently, you can use either of the following methods to initiate an incremental snapshot:

The Debezium connector for Oracle does not support schema changes while an incremental snapshot is running.

Triggering an incremental snapshot

To initiate an incremental snapshot, you can send an ad hoc snapshot signal to the signaling table on the source database. You submit snapshot signals as SQL INSERT queries.

After Debezium detects the change in the signaling table, it reads the signal, and runs the requested snapshot operation.

The query that you submit specifies the tables to include in the snapshot, and, optionally, specifies the type of snapshot operation. Debezium currently supports the incremental and blocking snapshot types.

To specify the tables to include in the snapshot, provide a data-collections array that lists the tables, or an array of regular expressions used to match tables, for example,

{"data-collections": ["public.MyFirstTable", "public.MySecondTable"]}

The data-collections array for an incremental snapshot signal has no default value. If the data-collections array is empty, Debezium interprets the empty array to mean that no action is required, and it does not perform a snapshot.

If the name of a table that you want to include in a snapshot contains a dot (.), a space, or some other non-alphanumeric character, you must escape the table name in double quotes.
For example, to include a table that exists in the public schema in the db1 database, and that has the name My.Table, use the following format: "db1.public.\"My.Table\"".

Prerequisites
Using a source signaling channel to trigger an incremental snapshot
  1. Send a SQL query to add the ad hoc incremental snapshot request to the signaling table:

    INSERT INTO <signalTable> (id, type, data) VALUES ('<id>', '<snapshotType>', '{"data-collections": ["<fullyQualfiedTableName>","<fullyQualfiedTableName>"],"type":"<snapshotType>","additional-conditions":[{"data-collection": "<fullyQualfiedTableName>", "filter": "<additional-condition>"}]}');

    For example,

    INSERT INTO db1.myschema.debezium_signal (id, type, data) (1)
    values ('ad-hoc-1',   (2)
        'execute-snapshot',  (3)
        '{"data-collections": ["db1.schema1.table1", "db1.schema1.table2"], (4)
        "type":"incremental", (5)
        "additional-conditions":[{"data-collection": "db1.schema1.table1" ,"filter":"color=\'blue\'"}]}'); (6)

    The values of the id,type, and data parameters in the command correspond to the fields of the signaling table.
    The following table describes the parameters in the example:

    Table 3. Descriptions of fields in a SQL command for sending an incremental snapshot signal to the signaling table
    Item Value Description

    1

    database.schema.debezium_signal

    Specifies the fully-qualified name of the signaling table on the source database.

    2

    ad-hoc-1

    The id parameter specifies an arbitrary string that is assigned as the id identifier for the signal request.
    Use this string to identify logging messages to entries in the signaling table. Debezium does not use this string. Rather, during the snapshot, Debezium generates its own id string as a watermarking signal.

    3

    execute-snapshot

    The type parameter specifies the operation that the signal is intended to trigger.

    4

    data-collections

    A required component of the data field of a signal that specifies an array of table names or regular expressions to match table names to include in the snapshot.
    The array lists regular expressions that use the format database.schema.table to match the fully-qualified names of the tables. This format is the same as the one that you use to specify the name of the connector’s signaling table.

    5

    incremental

    An optional type component of the data field of a signal that specifies the type of snapshot operation to run.
    Valid values are incremental and blocking.
    If you do not specify a value, the connector defaults to performing an incremental snapshot.

    6

    additional-conditions

    An optional array that specifies a set of additional conditions that the connector evaluates to determine the subset of records to include in a snapshot.
    Each additional condition is an object with data-collection and filter properties. You can specify different filters for each data collection.
    * The data-collection property is the fully-qualified name of the data collection that the filter applies to. For more information about the additional-conditions parameter, see Running an ad hoc incremental snapshots with additional-conditions.

Running an ad hoc incremental snapshots with additional-conditions

If you want a snapshot to include only a subset of the content in a table, you can modify the signal request by appending an additional-conditions parameter to the snapshot signal.

The SQL query for a typical snapshot takes the following form:

SELECT * FROM <tableName> ....

By adding an additional-conditions parameter, you append a WHERE condition to the SQL query, as in the following example:

SELECT * FROM <data-collection> WHERE <filter> ....

The following example shows a SQL query to send an ad hoc incremental snapshot request with an additional condition to the signaling table:

INSERT INTO <signalTable> (id, type, data) VALUES ('<id>', '<snapshotType>', '{"data-collections": ["<fullyQualfiedTableName>","<fullyQualfiedTableName>"],"type":"<snapshotType>","additional-conditions":[{"data-collection": "<fullyQualfiedTableName>", "filter": "<additional-condition>"}]}');

For example, suppose you have a products table that contains the following columns:

  • id (primary key)

  • color

  • quantity

If you want an incremental snapshot of the products table to include only the data items where color=blue, you can use the following SQL statement to trigger the snapshot:

INSERT INTO db1.myschema.debezium_signal (id, type, data) VALUES('ad-hoc-1', 'execute-snapshot', '{"data-collections": ["db1.schema1.products"],"type":"incremental", "additional-conditions":[{"data-collection": "db1.schema1.products", "filter": "color=blue"}]}');

The additional-conditions parameter also enables you to pass conditions that are based on more than one column. For example, using the products table from the previous example, you can submit a query that triggers an incremental snapshot that includes the data of only those items for which color=blue and quantity>10:

INSERT INTO db1.myschema.debezium_signal (id, type, data) VALUES('ad-hoc-1', 'execute-snapshot', '{"data-collections": ["db1.schema1.products"],"type":"incremental", "additional-conditions":[{"data-collection": "db1.schema1.products", "filter": "color=blue AND quantity>10"}]}');

The following example, shows the JSON for an incremental snapshot event that is captured by a connector.

Example 1. Incremental snapshot event message
{
    "before":null,
    "after": {
        "pk":"1",
        "value":"New data"
    },
    "source": {
        ...
        "snapshot":"incremental" (1)
    },
    "op":"r", (2)
    "ts_ms":"1620393591654",
    "ts_us":"1620393591654547",
    "ts_ns":"1620393591654547920",
    "transaction":null
}
Table 4. Description of fields in an incremental snapshot event message
Item Field name Description

1

snapshot

Specifies the type of snapshot operation to run.
Currently, the only valid options are blocking and incremental.
Specifying a type value in the SQL query that you submit to the signaling table is optional.
If you do not specify a value, the connector runs an incremental snapshot.

2

op

Specifies the event type.
The value for snapshot events is r, signifying a READ operation.

Using the Kafka signaling channel to trigger an incremental snapshot

You can send a message to the configured Kafka topic to request the connector to run an ad hoc incremental snapshot.

The key of the Kafka message must match the value of the topic.prefix connector configuration option.

The value of the message is a JSON object with type and data fields.

The signal type is execute-snapshot, and the data field must have the following fields:

Table 5. Execute snapshot data fields
Field Default Value

type

incremental

The type of the snapshot to be executed. Currently Debezium supports the incremental and blocking types.
See the next section for more details.

data-collections

N/A

An array of comma-separated regular expressions that match the fully-qualified names of tables to include in the snapshot.
Specify the names by using the same format as is required for the signal.data.collection configuration option.

additional-conditions

N/A

An optional array of additional conditions that specifies criteria that the connector evaluates to designate a subset of records to include in a snapshot.
Each additional condition is an object that specifies the criteria for filtering the data that an ad hoc snapshot captures. You can set the following parameters for each additional condition: data-collection:: The fully-qualified name of the table that the filter applies to. You can apply different filters to each table. filter:: Specifies column values that must be present in a database record for the snapshot to include it, for example, "color='blue'".

The values that you assign to the filter parameter are the same types of values that you might specify in the WHERE clause of SELECT statements when you set the snapshot.select.statement.overrides property for a blocking snapshot.

Example 2. An execute-snapshot Kafka message
Key = `test_connector`

Value = `{"type":"execute-snapshot","data": {"data-collections": ["{collection-container}.table1", "{collection-container}.table2"], "type": "INCREMENTAL"}}`
Ad hoc incremental snapshots with additional-conditions

Debezium uses the additional-conditions field to select a subset of a table’s content.

Typically, when Debezium runs a snapshot, it runs a SQL query such as:

SELECT * FROM <tableName> …​.

When the snapshot request includes an additional-conditions property, the data-collection and filter parameters of the property are appended to the SQL query, for example:

SELECT * FROM <data-collection> WHERE <filter> …​.

For example, given a products table with the columns id (primary key), color, and brand, if you want a snapshot to include only content for which color='blue', when you request the snapshot, you could add the additional-conditions property to filter the content: :leveloffset: +1

Key = `test_connector`

Value = `{"type":"execute-snapshot","data": {"data-collections": ["db1.schema1.products"], "type": "INCREMENTAL", "additional-conditions": [{"data-collection": "db1.schema1.products" ,"filter":"color='blue'"}]}}`

You can also use the additional-conditions property to pass conditions based on multiple columns. For example, using the same products table as in the previous example, if you want a snapshot to include only the content from the products table for which color='blue', and brand='MyBrand', you could send the following request: :leveloffset: +1

Key = `test_connector`

Value = `{"type":"execute-snapshot","data": {"data-collections": ["db1.schema1.products"], "type": "INCREMENTAL", "additional-conditions": [{"data-collection": "db1.schema1.products" ,"filter":"color='blue' AND brand='MyBrand'"}]}}`

Stopping an incremental snapshot

In some situations, it might be necessary to stop an incremental snapshot. For example, you might realize that snapshot was not configured correctly, or maybe you want to ensure that resources are available for other database operations. You can stop a snapshot that is already running by sending a signal to the signaling table on the source database.

You submit a stop snapshot signal to the signaling table by sending it in a SQL INSERT query. The stop-snapshot signal specifies the type of the snapshot operation as incremental, and optionally specifies the tables that you want to omit from the currently running snapshot. After Debezium detects the change in the signaling table, it reads the signal, and stops the incremental snapshot operation if it’s in progress.

Additional resources

You can also stop an incremental snapshot by sending a JSON message to the Kafka signaling topic.

Prerequisites
Using a source signaling channel to stop an incremental snapshot
  1. Send a SQL query to stop the ad hoc incremental snapshot to the signaling table:

    INSERT INTO <signalTable> (id, type, data) values ('<id>', 'stop-snapshot', '{"data-collections": ["<fullyQualfiedTableName>","<fullyQualfiedTableName>"],"type":"incremental"}');

    For example,

    INSERT INTO db1.myschema.debezium_signal (id, type, data) (1)
    values ('ad-hoc-1',   (2)
        'stop-snapshot',  (3)
        '{"data-collections": ["db1.schema1.table1", "db1.schema1.table2"], (4)
        "type":"incremental"}'); (5)

    The values of the id, type, and data parameters in the signal command correspond to the fields of the signaling table.
    The following table describes the parameters in the example:

    Table 6. Descriptions of fields in a SQL command for sending a stop incremental snapshot signal to the signaling table
    Item Value Description

    1

    database.schema.debezium_signal

    Specifies the fully-qualified name of the signaling table on the source database.

    2

    ad-hoc-1

    The id parameter specifies an arbitrary string that is assigned as the id identifier for the signal request.
    Use this string to identify logging messages to entries in the signaling table. Debezium does not use this string.

    3

    stop-snapshot

    Specifies type parameter specifies the operation that the signal is intended to trigger.

    4

    data-collections

    An optional component of the data field of a signal that specifies an array of table names or regular expressions to match table names to remove from the snapshot.
    The array lists regular expressions which match tables by their fully-qualified names in the format database.schema.table

    If you omit this component from the data field, the signal stops the entire incremental snapshot that is in progress.

    5

    incremental

    A required component of the data field of a signal that specifies the type of snapshot operation that is to be stopped.
    Currently, the only valid option is incremental.
    If you do not specify a type value, the signal fails to stop the incremental snapshot.

Using the Kafka signaling channel to stop an incremental snapshot

You can send a signal message to the configured Kafka signaling topic to stop an ad hoc incremental snapshot.

The key of the Kafka message must match the value of the topic.prefix connector configuration option.

The value of the message is a JSON object with type and data fields.

The signal type is stop-snapshot, and the data field must have the following fields:

Table 7. Execute snapshot data fields
Field Default Value

type

incremental

The type of the snapshot to be executed. Currently Debezium supports only the incremental type.
See the next section for more details.

data-collections

N/A

An optional array of comma-separated regular expressions that match the fully-qualified names of the tables an array of table names or regular expressions to match table names to remove from the snapshot.
Specify table names by using the format database.schema.table.

The following example shows a typical stop-snapshot Kafka message:

Key = `test_connector`

Value = `{"type":"stop-snapshot","data": {"data-collections": ["db1.schema1.table1", "db1.schema1.table2"], "type": "INCREMENTAL"}}`

Custom snapshotter SPI

For more advanced uses, you can fine-tune control of the snapshot by implementing one of the following interfaces:

io.debezium.snapshot.spi.Snapshotter

Controls whether the connector takes a snapshot.

io.debezium.snapshot.spi.SnapshotQuery

Controls how data is queried during a snapshot.

io.debezium.snapshot.spi.SnapshotLock

Controls whether the connector locks tables when taking a snapshot.

io.debezium.snapshot.spi.Snapshotter interface. All built-in snapshot modes implement this interface.
/**
 * {@link Snapshotter} is used to determine the following details about the snapshot process:
 * <p>
 * - Whether a snapshot occurs. <br>
 * - Whether streaming continues during the snapshot. <br>
 * - Whether the snapshot includes schema (if supported). <br>
 * - Whether to snapshot data or schema following an error.
 * <p>
 * Although Debezium provides many default snapshot modes,
 * to provide more advanced functionality, such as partial snapshots,
 * you can customize implementation of the interface.
 * For more information, see the documentation.
 *
 *
 *
 */
@Incubating
public interface Snapshotter extends Configurable {

    /**
     * @return the name of the snapshotter.
     *
     *
     */
    String name();

    /**
     * @param offsetExists is {@code true} when the connector has an offset context (i.e. restarted)
     * @param snapshotInProgress is {@code true} when the connector is started, but a snapshot is already in progress
     *
     * @return {@code true} if the snapshotter should take a data snapshot
     */
    boolean shouldSnapshotData(boolean offsetExists, boolean snapshotInProgress);

    /**
     * @param offsetExists is {@code true} when the connector has an offset context (i.e. restarted)
     * @param snapshotInProgress is {@code true} when the connector is started, but a snapshot is already in progress
     *
     * @return {@code true} if the snapshotter should take a schema snapshot
     */
    boolean shouldSnapshotSchema(boolean offsetExists, boolean snapshotInProgress);

    /**
     * @return {@code true} if the snapshotter should stream after taking a snapshot
     */
    boolean shouldStream();

    /**
     * @return {@code true} whether the schema can be recovered if database schema history is corrupted.
     */
    boolean shouldSnapshotOnSchemaError();

    /**
     * @return {@code true} whether the snapshot should be re-executed when there is a gap in data stream.
     */
    boolean shouldSnapshotOnDataError();

    /**
     *
     * @return {@code true} if streaming should resume from the start of the snapshot
     * transaction, or {@code false} for when a connector resumes and takes a snapshot,
     * streaming should resume from where streaming previously left off.
     */
    default boolean shouldStreamEventsStartingFromSnapshot() {
        return true;
    }

    /**
     * Lifecycle hook called after the snapshot phase is successful.
     */
    default void snapshotCompleted() {
        // no operation
    }

    /**
     * Lifecycle hook called after the snapshot phase is aborted.
     */
    default void snapshotAborted() {
        // no operation
    }
}
io.debezium.snapshot.spi.SnapshotQuery interface. All built-in snapshot query modes implement this interface.
/**
 * {@link SnapshotQuery} is used to determine the query used during a data snapshot
 *
 *
 */
public interface SnapshotQuery extends Configurable, Service {

    /**
     * @return the name of the snapshot lock.
     *
     *
     */
    String name();

    /**
     * Generate a valid query string for the specified table, or an empty {@link Optional}
     * to skip snapshotting this table (but that table will still be streamed from)
     *
     * @param tableId the table to generate a query for
     * @param snapshotSelectColumns the columns to be used in the snapshot select based on the column
     *                              include/exclude filters
     * @return a valid query string, or none to skip snapshotting this table
     */
    Optional<String> snapshotQuery(String tableId, List<String> snapshotSelectColumns);

}
io.debezium.snapshot.spi.SnapshotLock interface. All built-in snapshot lock modes implement this interface.
/**
 * {@link SnapshotLock} is used to determine the table lock mode used during schema snapshot
 *
 *
 */
public interface SnapshotLock extends Configurable, Service {

    /**
     * @return the name of the snapshot lock.
     *
     *
     */
    String name();

    /**
     * Returns a SQL statement for locking the given table during snapshotting, if required by the specific snapshotter
     * implementation.
     */
    Optional<String> tableLockingStatement(Duration lockTimeout, String tableId);

}

Blocking snapshots

To provide more flexibility in managing snapshots, Debezium includes a supplementary ad hoc snapshot mechanism, known as a blocking snapshot. Blocking snapshots rely on the Debezium mechanism for sending signals to a Debezium connector.

A blocking snapshot behaves just like an initial snapshot, except that you can trigger it at run time.

You might want to run a blocking snapshot rather than use the standard initial snapshot process in the following situations:

  • You add a new table and you want to complete the snapshot while the connector is running.

  • You add a large table, and you want the snapshot to complete in less time than is possible with an incremental snapshot.

Blocking snapshot process

When you run a blocking snapshot, Debezium stops streaming, and then initiates a snapshot of the specified table, following the same process that it uses during an initial snapshot. After the snapshot completes, the streaming is resumed.

Configure snapshot

You can set the following properties in the data component of a signal:

  • data-collections: to specify which tables must be snapshot.

  • data-collections: Specifies the tables that you want the snapshot to include.
    This property accepts a comma-separated list of regular expressions that match fully-qualified table names. The behavior of the property is similar to the behavior of the table.include.list property, which specifies the tables to capture in a blocking snapshot.

  • additional-conditions: You can specify different filters for different table.

    • The data-collection property is the fully-qualified name of the table for which the filter will be applied, and can be case-sensitive or case-insensitive depending on the database.

    • The filter property will have the same value used in the snapshot.select.statement.overrides, the fully-qualified name of the table that should match by case.

For example:

  {"type": "blocking", "data-collections": ["schema1.table1", "schema1.table2"], "additional-conditions": [{"data-collection": "schema1.table1", "filter": "SELECT * FROM [schema1].[table1] WHERE column1 = 0 ORDER BY column2 DESC"}, {"data-collection": "schema1.table2", "filter": "SELECT * FROM [schema1].[table2] WHERE column2 > 0"}]}
Possible duplicates

A delay might exist between the time that you send the signal to trigger the snapshot, and the time when streaming stops and the snapshot starts. As a result of this delay, after the snapshot completes, the connector might emit some event records that duplicate records captured by the snapshot.

Topic names

By default, the Oracle connector writes change events for all INSERT, UPDATE, and DELETE operations that occur in a table to a single Apache Kafka topic that is specific to that table. The connector uses the following convention to name change event topics:

topicPrefix.schemaName.tableName

The following list provides definitions for the components of the default name:

topicPrefix

The topic prefix as specified by the topic.prefix connector configuration property.

schemaName

The name of the schema in which the operation occurred.

tableName

The name of the table in which the operation occurred.

For example, if fulfillment is the server name, inventory is the schema name, and the database contains tables with the names orders, customers, and products, the Debezium Oracle connector emits events to the following Kafka topics, one for each table in the database:

fulfillment.inventory.orders
fulfillment.inventory.customers
fulfillment.inventory.products

The connector applies similar naming conventions to label its internal database schema history topics, schema change topics, and transaction metadata topics.

If the default topic name do not meet your requirements, you can configure custom topic names. To configure custom topic names, you specify regular expressions in the logical topic routing SMT. For more information about using the logical topic routing SMT to customize topic naming, see Topic routing.

Schema history topic

When a database client queries a database, the client uses the database’s current schema. However, the database schema can be changed at any time, which means that the connector must be able to identify what the schema was at the time each insert, update, or delete operation was recorded. Also, a connector cannot necessarily apply the current schema to every event. If an event is relatively old, it’s possible that it was recorded before the current schema was applied.

To ensure correct processing of events that occur after a schema change, Oracle includes in the redo log not only the row-level changes that affect the data, but also the DDL statements that are applied to the database. As the connector encounters these DDL statements in the redo log, it parses them and updates an in-memory representation of each table’s schema. The connector uses this schema representation to identify the structure of the tables at the time of each insert, update, or delete operation and to produce the appropriate change event. In a separate database schema history Kafka topic, the connector records all DDL statements along with the position in the redo log where each DDL statement appeared.

When the connector restarts after either a crash or a graceful stop, it starts reading the redo log from a specific position, that is, from a specific point in time. The connector rebuilds the table structures that existed at this point in time by reading the database schema history Kafka topic and parsing all DDL statements up to the point in the redo log where the connector is starting.

This database schema history topic is internal for internal connector use only. Optionally, the connector can also emit schema change events to a different topic that is intended for consumer applications.

Additional resources

Schema change topic

You can configure a Debezium Oracle connector to produce schema change events that describe structural changes that are applied to tables in the database. The connector writes schema change events to a Kafka topic named <serverName>, where serverName is the namespace that is specified in the topic.prefix configuration property.

Debezium emits a new message to the schema change topic whenever it streams data from a new table, or when the structure of the table is altered.

Following a change in table structure, you must follow (the schema evolution procedure).

Messages that the connector sends to the schema change topic contain a payload, and, optionally, also contain the schema of the change event message.

The schema for the schema change event has the following elements:

name

The name of the schema change event message.

type

The type of the change event message.

version

The version of the schema. The version is an integer that is incremented each time the schema is changed.

fields

The fields that are included in the change event message.

Example: Schema of the Oracle connector schema change topic

The following example shows a typical schema in JSON format.

{
  "schema": {
    "type": "struct",
    "fields": [
      {
        "type": "string",
        "optional": false,
        "field": "databaseName"
      }
    ],
    "optional": false,
    "name": "io.debezium.connector.oracle.SchemaChangeKey",
    "version": 1
  },
  "payload": {
    "databaseName": "inventory"
  }
}

The payload of a schema change event message includes the following elements:

ddl

Provides the SQL CREATE, ALTER, or DROP statement that results in the schema change.

databaseName

The name of the database to which the statements are applied. The value of databaseName serves as the message key.

tableChanges

A structured representation of the entire table schema after the schema change. The tableChanges field contains an array that includes entries for each column of the table. Because the structured representation presents data in JSON or Avro format, consumers can easily read messages without first processing them through a DDL parser.

By default, the connector uses the ALL_TABLES database view to identify the table names to store in the schema history topic. Within that view, the connector can access data only from tables that are available to the user account through which it connects to the database.

You can modify settings so that the schema history topic stores a different subset of tables. Use one of the following methods to alter the set of tables that the topic stores:

When the connector is configured to capture a table, it stores the history of the table’s schema changes not only in the schema change topic, but also in an internal database schema history topic. The internal database schema history topic is for connector use only and it is not intended for direct use by consuming applications. Ensure that applications that require notifications about schema changes consume that information only from the schema change topic.

Never partition the database schema history topic. For the database schema history topic to function correctly, it must maintain a consistent, global order of the event records that the connector emits to it.

To ensure that the topic is not split among partitions, set the partition count for the topic by using one of the following methods:

  • If you create the database schema history topic manually, specify a partition count of 1.

  • If you use the Apache Kafka broker to create the database schema history topic automatically, the topic is created, set the value of the Kafka num.partitions configuration option to 1.

The schema change topic message format is in an incubating state and might change without notice.

Example: Message emitted to the Oracle connector schema change topic

The following example shows a typical schema change message in JSON format. The message contains a logical representation of the table schema.

{
  "schema": {
  ...
  },
  "payload": {
    "source": {
      "version": "3.3.0.Final",
      "connector": "oracle",
      "name": "server1",
      "ts_ms": 1588252618953,
      "ts_us": 1588252618953000,
      "ts_ns": 1588252618953000000,
      "snapshot": "true",
      "db": "ORCLPDB1",
      "schema": "DEBEZIUM",
      "table": "CUSTOMERS",
      "txId" : null,
      "scn" : "1513734",
      "commit_scn": "1513754",
      "lcr_position" : null,
      "rs_id": "001234.00012345.0124",
      "ssn": 1,
      "redo_thread": 1,
      "user_name": "user",
      "row_id": "AAASgjAAMAAAACnAAA",
      "commit_ts_ms": 1588252619012,
      "start_scn": "1513734",
      "start_ts_ms": 1588252618953
    },
    "ts_ms": 1588252618953, (1)
    "ts_us": 1588252618953987, (1)
    "ts_ns": 1588252618953987512, (1)
    "databaseName": "ORCLPDB1", (2)
    "schemaName": "DEBEZIUM", //
    "ddl": "CREATE TABLE \"DEBEZIUM\".\"CUSTOMERS\" \n   (    \"ID\" NUMBER(9,0) NOT NULL ENABLE, \n    \"FIRST_NAME\" VARCHAR2(255), \n    \"LAST_NAME" VARCHAR2(255), \n    \"EMAIL\" VARCHAR2(255), \n     PRIMARY KEY (\"ID\") ENABLE, \n     SUPPLEMENTAL LOG DATA (ALL) COLUMNS\n   ) SEGMENT CREATION IMMEDIATE \n  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 \n NOCOMPRESS LOGGING\n  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645\n  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1\n  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)\n  TABLESPACE \"USERS\" ", (3)
    "tableChanges": [ (4)
      {
        "type": "CREATE", (5)
        "id": "\"ORCLPDB1\".\"DEBEZIUM\".\"CUSTOMERS\"", (6)
        "table": { (7)
          "defaultCharsetName": null,
          "primaryKeyColumnNames": [ (8)
            "ID"
          ],
          "columns": [ (9)
            {
              "name": "ID",
              "jdbcType": 2,
              "nativeType": null,
              "typeName": "NUMBER",
              "typeExpression": "NUMBER",
              "charsetName": null,
              "length": 9,
              "scale": 0,
              "position": 1,
              "optional": false,
              "autoIncremented": false,
              "generated": false
            },
            {
              "name": "FIRST_NAME",
              "jdbcType": 12,
              "nativeType": null,
              "typeName": "VARCHAR2",
              "typeExpression": "VARCHAR2",
              "charsetName": null,
              "length": 255,
              "scale": null,
              "position": 2,
              "optional": false,
              "autoIncremented": false,
              "generated": false
            },
            {
              "name": "LAST_NAME",
              "jdbcType": 12,
              "nativeType": null,
              "typeName": "VARCHAR2",
              "typeExpression": "VARCHAR2",
              "charsetName": null,
              "length": 255,
              "scale": null,
              "position": 3,
              "optional": false,
              "autoIncremented": false,
              "generated": false
            },
            {
              "name": "EMAIL",
              "jdbcType": 12,
              "nativeType": null,
              "typeName": "VARCHAR2",
              "typeExpression": "VARCHAR2",
              "charsetName": null,
              "length": 255,
              "scale": null,
              "position": 4,
              "optional": false,
              "autoIncremented": false,
              "generated": false
            }
          ],
          "attributes": [ (10)
            {
              "customAttribute": "attributeValue"
            }
          ]
        }
      }
    ]
  }
}
Table 8. Descriptions of fields in messages emitted to the schema change topic
Item Field name Description

1

ts_ms, ts_us, ts_ns

Optional field that displays the time at which the connector processed the event. The time is based on the system clock in the JVM running the Kafka Connect task.

In the source object, ts_ms indicates the time that the change was made in the database. By comparing the value for payload.source.ts_ms with the value for payload.ts_ms, you can determine the lag between the source database update and Debezium.

2

databaseName
schemaName

Identifies the database and the schema that contains the change.

3

ddl

This field contains the DDL that is responsible for the schema change.

4

tableChanges

An array of one or more items that contain the schema changes generated by a DDL command.

5

type

Describes the kind of change. The type can be set to one of the following values:

CREATE

Table created.

ALTER

Table modified.

DROP

Table deleted.

6

id

Full identifier of the table that was created, altered, or dropped. In the case of a table rename, this identifier is a concatenation of <old>,<new> table names.

7

table

Represents table metadata after the applied change.

8

primaryKeyColumnNames

List of columns that compose the table’s primary key.

9

columns

Metadata for each column in the changed table.

10

attributes

Custom attribute metadata for each table change.

In messages that the connector sends to the schema change topic, the message key is the name of the database that contains the schema change. In the following example, the payload field contains the databaseName key:

{
  "schema": {
    "type": "struct",
    "fields": [
      {
        "type": "string",
        "optional": false,
        "field": "databaseName"
      }
    ],
    "optional": false,
    "name": "io.debezium.connector.oracle.SchemaChangeKey",
    "version": 1
  },
  "payload": {
    "databaseName": "ORCLPDB1"
  }
}

Transaction Metadata

Debezium can generate events that represent transaction metadata boundaries and that enrich data change event messages.

Limits on when Debezium receives transaction metadata

Debezium registers and receives metadata only for transactions that occur after you deploy the connector. Metadata for transactions that occur before you deploy the connector is not available.

Database transactions are represented by a statement block that is enclosed between the BEGIN and END keywords. Debezium generates transaction boundary events for the BEGIN and END delimiters in every transaction. Transaction boundary events contain the following fields:

status

BEGIN or END.

id

String representation of the unique transaction identifier.

ts_ms

The time of a transaction boundary event (BEGIN or END event) at the data source. If the data source does not provide Debezium with the event time, then the field instead represents the time at which Debezium processes the event.

event_count (for END events)

Total number of events emmitted by the transaction.

data_collections (for END events)

An array of pairs of data_collection and event_count elements that indicates the number of events that the connector emits for changes that originate from a data collection.

The following example shows a typical transaction boundary message:

Example: Oracle connector transaction boundary event
{
  "status": "BEGIN",
  "id": "5.6.641",
  "ts_ms": 1486500577125,
  "event_count": null,
  "data_collections": null
}

{
  "status": "END",
  "id": "5.6.641",
  "ts_ms": 1486500577691,
  "event_count": 2,
  "data_collections": [
    {
      "data_collection": "ORCLPDB1.DEBEZIUM.CUSTOMER",
      "event_count": 1
    },
    {
      "data_collection": "ORCLPDB1.DEBEZIUM.ORDER",
      "event_count": 1
    }
  ]
}

Unless overridden via the