You are viewing documentation for an outdated version of Debezium.
If you want to view the latest stable version of this page, please go here.

Debezium Connector for Oracle

Table of Contents

Overview

Debezium’s Oracle connector captures and records row-level changes that occur in databases on an Oracle server, including tables that are added while the connector is running. You can configure the connector to emit change events for specific subsets of schemas and tables, or to ignore, mask, or truncate values in specific columns.

For information about the Oracle Database versions that are compatible with this connector, see the Debezium release overview.

Debezium ingests change events from Oracle by using the native LogMiner database package , the XStream API, or OpenLogReplicator.

How the Oracle connector works

To optimally configure and run a Debezium Oracle connector, it is helpful to understand how the connector performs snapshots, streams change events, determines Kafka topic names, uses metadata, and implements event buffering.

Snapshots

Typically, the redo logs on an Oracle server are configured to not retain the complete history of the database. As a result, the Debezium Oracle connector cannot retrieve the entire history of the database from the logs. To enable the connector to establish a baseline for the current state of the database, the first time that the connector starts, it performs an initial consistent snapshot of the database.

If the time needed to complete the initial snapshot exceeds the UNDO_RETENTION time that is set for the database (fifteen minutes, by default), an ORA-01555 exception can occur. For more information about the error, and about the steps that you can take to recover from it, see the Frequently asked questions.

During a table’s snapshot, it’s possible for Oracle to raise an ORA-01466 exception. This happens when a user modifies the schema of the table or adds, changes, or drops an index or related object associated with the table being snapshot. In the event this happens, the connector will stop and the initial snapshot will need to be taken from the beginning.

To remediate the problem, you can configure the snapshot.database.errors.max.retries property with a value greater than 0 so that the specific table’s snapshot will restart. While the entire snapshot will not start from the beginning when retrying, the specific table in question will be re-read from the beginning and the table’s topic will contain duplicate snapshot events.

Default workflow that the Oracle connector uses to perform an initial snapshot

The following workflow lists the steps that Debezium takes to create a snapshot. These steps describe the process for a snapshot when the snapshot.mode configuration property is set to its default value, which is initial. You can customize the way that the connector creates snapshots by changing the value of the snapshot.mode property. If you configure a different snapshot mode, the connector completes the snapshot by using a modified version of this workflow.

When the snapshot mode is set to the default, the connector completes the following tasks to create a snapshot:

  1. Establish a connection to the database.

  2. Determine the tables to be captured. By default, the connector captures all tables except those with schemas that exclude them from capture. After the snapshot completes, the connector continues to stream data for the specified tables. If you want the connector to capture data only from specific tables you can direct the connector to capture the data for only a subset of tables or table elements by setting properties such as table.include.list or table.exclude.list.

  3. Obtain a ROW SHARE MODE lock on each of the captured tables to prevent structural changes from occurring during creation of the snapshot. Debezium holds the locks for only a short time.

  4. Read the current system change number (SCN) position from the server’s redo log.

  5. Capture the structure of all database tables, or all tables that are designated for capture. The connector persists schema information in its internal database schema history topic. The schema history provides information about the structure that is in effect when a change event occurs.

    By default, the connector captures the schema of every table in the database that is in capture mode, including tables that are not configured for capture. If tables are not configured for capture, the initial snapshot captures only their structure; it does not capture any table data. For more information about why snapshots persist schema information for tables that you did not include in the initial snapshot, see Understanding why initial snapshots capture the schema for all tables.

  6. Release the locks obtained in Step 3. Other database clients can now write to any previously locked tables.

  7. At the SCN position that was read in Step 4, the connector scans the tables that are designated for capture (SELECT * FROM …​ AS OF SCN 123). During the scan, the connector completes the following tasks:

    1. Confirms that the table was created before the snapshot began. If the table was created after the snapshot began, the connector skips the table. After the snapshot is complete, and the connector transitions to streaming, it emits change events for any tables that were created after the snapshot began.

    2. Produces a read event for each row that is captured from a table. All read events contain the same SCN position, which is the SCN position that was obtained in step 4.

    3. Emits each read event to the Kafka topic for the source table.

    4. Releases data table locks, if applicable.

  8. Record the successful completion of the snapshot in the connector offsets.

The resulting initial snapshot captures the current state of each row in the captured tables. From this baseline state, the connector captures subsequent changes as they occur.

After the snapshot process begins, if the process is interrupted due to connector failure, rebalancing, or other reasons, the process restarts after the connector restarts. After the connector completes the initial snapshot, it continues streaming from the position that it read in Step 3 so that it does not miss any updates. If the connector stops again for any reason, after it restarts, it resumes streaming changes from where it previously left off.

Table 1. Settings for snapshot.mode connector configuration property
Setting Description

always

Perform snapshot on each connector start. After the snapshot completes, the connector begins to stream event records for subsequent database changes.

initial

The connector performs a database snapshot as described in the default workflow for creating an initial snapshot. After the snapshot completes, the connector begins to stream event records for subsequent database changes.

initial_only

The connector performs a database snapshot and stops before streaming any change event records, not allowing any subsequent change events to be captured.

schema_only

Deprecated, see no_data.

no_data

The connector captures the structure of all relevant tables, performing all of the steps described in the default snapshot workflow, except that it does not create READ events to represent the data set at the point of the connector’s start-up (Step 6).

schema_only_recovery

Deprecated, see recovery.

recovery

Set this option to restore a database schema history topic that is lost or corrupted. After a restart, the connector runs a snapshot that rebuilds the topic from the source tables. You can also set the property to periodically prune a database schema history topic that experiences unexpected growth.

WARNING: Do not use this mode to perform a snapshot if schema changes were committed to the database after the last connector shutdown.

when_needed

After the connector starts, it performs a snapshot only if it detects one of the following circumstances:

  • It cannot detect any topic offsets.

  • A previously recorded offset specifies a log position that is not available on the server.

configuration_based

Set the snapshot mode to configuration_based to control snapshot behavior through the set of connector properties that have the prefix 'snapshot.mode.configuration.based'.

custom

The custom snapshot mode lets you inject your own implementation of the io.debezium.spi.snapshot.Snapshotter interface. Set the snapshot.mode.custom.name configuration property to the name provided by the name() method of your implementation. The name is specified on the classpath of your Kafka Connect cluster. If you use the Debezium EmbeddedEngine, the name is included in the connector JAR file. For more information, see custom snapshotter SPI.

For more information, see snapshot.mode in the table of connector configuration properties.

Understanding why initial snapshots capture the schema history for all tables

The initial snapshot that a connector runs captures two types of information:

Table data

Information about INSERT, UPDATE, and DELETE operations in tables that are named in the connector’s table.include.list property.

Schema data

DDL statements that describe the structural changes that are applied to tables. Schema data is persisted to both the internal schema history topic, and to the connector’s schema change topic, if one is configured.

After you run an initial snapshot, you might notice that the snapshot captures schema information for tables that are not designated for capture. By default, initial snapshots are designed to capture schema information for every table that is present in the database, not only from tables that are designated for capture. Connectors require that the table’s schema is present in the schema history topic before they can capture a table. By enabling the initial snapshot to capture schema data for tables that are not part of the original capture set, Debezium prepares the connector to readily capture event data from these tables should that later become necessary. If the initial snapshot does not capture a table’s schema, you must add the schema to the history topic before the connector can capture data from the table.

In some cases, you might want to limit schema capture in the initial snapshot. This can be useful when you want to reduce the time required to complete a snapshot. Or when Debezium connects to the database instance through a user account that has access to multiple logical databases, but you want the connector to capture changes only from tables in a specific logic database.

Additional information

Capturing data from tables not captured by the initial snapshot (no schema change)

In some cases, you might want the connector to capture data from a table whose schema was not captured by the initial snapshot. Depending on the connector configuration, the initial snapshot might capture the table schema only for specific tables in the database. If the table schema is not present in the history topic, the connector fails to capture the table, and reports a missing schema error.

You might still be able to capture data from the table, but you must perform additional steps to add the table schema.

Prerequisites
Procedure
  1. Stop the connector.

  2. Remove the internal database schema history topic that is specified by the schema.history.internal.kafka.topic property.

  3. In the connector configuration:

    1. Set the snapshot.mode to recovery.

    2. (Optional) Set the value of schema.history.internal.store.only.captured.tables.ddl to false to ensure that in the future the connector can readily capture data for tables that are not currently designated for capture. Connectors can capture data from a table only if the table’s schema history is present in the history topic.

    3. Add the tables that you want the connector to capture to table.include.list.

  4. Restart the connector. The snapshot recovery process rebuilds the schema history based on the current structure of the tables.

  5. (Optional) After the snapshot completes, initiate an incremental snapshot on the newly added tables. The incremental snapshot first streams the historical data of the newly added tables, and then resumes reading changes from the redo and archive logs for previously configured tables, including changes that occur while that connector was off-line.

  6. (Optional) Reset the snapshot.mode back to no_data to prevent the connector from initiating recovery after a future restart.

Capturing data from tables not captured by the initial snapshot (schema change)

If a schema change is applied to a table, records that are committed before the schema change have different structures than those that were committed after the change. When Debezium captures data from a table, it reads the schema history to ensure that it applies the correct schema to each event. If the schema is not present in the schema history topic, the connector is unable to capture the table, and an error results.

If you want to capture data from a table that was not captured by the initial snapshot, and the schema of the table was modified, you must add the schema to the history topic, if it is not already available. You can add the schema by running a new schema snapshot, or by running an initial snapshot for the table.

Prerequisites
  • You want to capture data from a table with a schema that the connector did not capture during the initial snapshot.

  • A schema change was applied to the table so that the records to be captured do not have a uniform structure.

Procedure
Initial snapshot captured the schema for all tables (store.only.captured.tables.ddl was set to false)
  1. Edit the table.include.list property to specify the tables that you want to capture.

  2. Restart the connector.

  3. Initiate an incremental snapshot if you want to capture existing data from the newly added tables.

Initial snapshot did not capture the schema for all tables (store.only.captured.tables.ddl was set to true)

If the initial snapshot did not save the schema of the table that you want to capture, complete one of the following procedures:

Procedure 1: Schema snapshot, followed by incremental snapshot

In this procedure, the connector first performs a schema snapshot. You can then initiate an incremental snapshot to enable the connector to synchronize data.

  1. Stop the connector.

  2. Remove the internal database schema history topic that is specified by the schema.history.internal.kafka.topic property.

  3. Clear the offsets in the configured Kafka Connect offset.storage.topic. For more information about how to remove offsets, see the Debezium community FAQ.

    Removing offsets should be performed only by advanced users who have experience in manipulating internal Kafka Connect data. This operation is potentially destructive, and should be performed only as a last resort.

  4. Set values for properties in the connector configuration as described in the following steps:

    1. Set the value of the snapshot.mode property to no_data.

    2. Edit the table.include.list to add the tables that you want to capture.

  5. Restart the connector.

  6. Wait for Debezium to capture the schema of the new and existing tables. Data changes that occurred any tables after the connector stopped are not captured.

  7. To ensure that no data is lost, initiate an incremental snapshot.

Procedure 2: Initial snapshot, followed by optional incremental snapshot

In this procedure the connector performs a full initial snapshot of the database. As with any initial snapshot, in a database with many large tables, running an initial snapshot can be a time-consuming operation. After the snapshot completes, you can optionally trigger an incremental snapshot to capture any changes that occur while the connector is off-line.

  1. Stop the connector.

  2. Remove the internal database schema history topic that is specified by the schema.history.internal.kafka.topic property.

  3. Clear the offsets in the configured Kafka Connect offset.storage.topic. For more information about how to remove offsets, see the Debezium community FAQ.

    Removing offsets should be performed only by advanced users who have experience in manipulating internal Kafka Connect data. This operation is potentially destructive, and should be performed only as a last resort.

  4. Edit the table.include.list to add the tables that you want to capture.

  5. Set values for properties in the connector configuration as described in the following steps:

    1. Set the value of the snapshot.mode property to initial.

    2. (Optional) Set schema.history.internal.store.only.captured.tables.ddl to false.

  6. Restart the connector. The connector takes a full database snapshot. After the snapshot completes, the connector transitions to streaming.

  7. (Optional) To capture any data that changed while the connector was off-line, initiate an incremental snapshot.

Ad hoc snapshots

By default, a connector runs an initial snapshot operation only after it starts for the first time. Following this initial snapshot, under normal circumstances, the connector does not repeat the snapshot process. Any future change event data that the connector captures comes in through the streaming process only.

However, in some situations the data that the connector obtained during the initial snapshot might become stale, lost, or incomplete. To provide a mechanism for recapturing table data, Debezium includes an option to perform ad hoc snapshots. You might want to perform an ad hoc snapshot after any of the following changes occur in your Debezium environment:

  • The connector configuration is modified to capture a different set of tables.

  • Kafka topics are deleted and must be rebuilt.

  • Data corruption occurs due to a configuration error or some other problem.

You can re-run a snapshot for a table for which you previously captured a snapshot by initiating a so-called ad-hoc snapshot. Ad hoc snapshots require the use of signaling tables. You initiate an ad hoc snapshot by sending a signal request to the Debezium signaling table.

When you initiate an ad hoc snapshot of an existing table, the connector appends content to the topic that already exists for the table. If a previously existing topic was removed, Debezium can create a topic automatically if automatic topic creation is enabled.

Ad hoc snapshot signals specify the tables to include in the snapshot. The snapshot can capture the entire contents of the database, or capture only a subset of the tables in the database. Also, the snapshot can capture a subset of the contents of the table(s) in the database.

You specify the tables to capture by sending an execute-snapshot message to the signaling table. Set the type of the execute-snapshot signal to incremental or blocking, and provide the names of the tables to include in the snapshot, as described in the following table:

Table 2. Example of an ad hoc execute-snapshot signal record
Field Default Value

type

incremental

Specifies the type of snapshot that you want to run.
Currently, you can request incremental or blocking snapshots.

data-collections

N/A

An array that contains regular expressions matching the fully-qualified names of the tables to include in the snapshot.
For the Oracle connector, use the following format to specify the fully qualified name of a table: database.schema.table.

additional-conditions

N/A

An optional array that specifies a set of additional conditions that the connector evaluates to determine the subset of records to include in a snapshot.
Each additional condition is an object that specifies the criteria for filtering the data that an ad hoc snapshot captures. You can set the following parameters for each additional condition:

data-collection

The fully-qualified name of the table that the filter applies to. You can apply different filters to each table.

filter

Specifies column values that must be present in a database record for the snapshot to include it, for example, "color='blue'".

The values that you assign to the filter parameter are the same types of values that you might specify in the WHERE clause of SELECT statements when you set the snapshot.select.statement.overrides property for a blocking snapshot.

surrogate-key

N/A

An optional string that specifies the column name that the connector uses as the primary key of a table during the snapshot process.

Triggering an ad hoc incremental snapshot

You initiate an ad hoc incremental snapshot by adding an entry with the execute-snapshot signal type to the signaling table, or by sending a signal message to a Kafka signaling topic. After the connector processes the message, it begins the snapshot operation. The snapshot process reads the first and last primary key values and uses those values as the start and end point for each table. Based on the number of entries in the table, and the configured chunk size, Debezium divides the table into chunks, and proceeds to snapshot each chunk, in succession, one at a time.

For more information, see Incremental snapshots.

Triggering an ad hoc blocking snapshot

You initiate an ad hoc blocking snapshot by adding an entry with the execute-snapshot signal type to the signaling table or signaling topic. After the connector processes the message, it begins the snapshot operation. The connector temporarily stops streaming, and then initiates a snapshot of the specified table, following the same process that it uses during an initial snapshot. After the snapshot completes, the connector resumes streaming.

For more information, see Blocking snapshots.

Incremental snapshots

To provide flexibility in managing snapshots, Debezium includes a supplementary snapshot mechanism, known as incremental snapshotting. Incremental snapshots rely on the Debezium mechanism for sending signals to a Debezium connector. Incremental snapshots are based on the DDD-3 design document.

In an incremental snapshot, instead of capturing the full state of a database all at once, as in an initial snapshot, Debezium captures each table in phases, in a series of configurable chunks. You can specify the tables that you want the snapshot to capture and the size of each chunk. The chunk size determines the number of rows that the snapshot collects during each fetch operation on the database. The default chunk size for incremental snapshots is 1024 rows.

As an incremental snapshot proceeds, Debezium uses watermarks to track its progress, maintaining a record of each table row that it captures. This phased approach to capturing data provides the following advantages over the standard initial snapshot process:

  • You can run incremental snapshots in parallel with streamed data capture, instead of postponing streaming until the snapshot completes. The connector continues to capture near real-time events from the change log throughout the snapshot process, and neither operation blocks the other.

  • If the progress of an incremental snapshot is interrupted, you can resume it without losing any data. After the process resumes, the snapshot begins at the point where it stopped, rather than recapturing the table from the beginning.

  • You can run an incremental snapshot on demand at any time, and repeat the process as needed to adapt to database updates. For example, you might re-run a snapshot after you modify the connector configuration to add a table to its table.include.list property.

Incremental snapshot process

When you run an incremental snapshot, Debezium sorts each table by primary key and then splits the table into chunks based on the configured chunk size. Working chunk by chunk, it then captures each table row in a chunk. For each row that it captures, the snapshot emits a READ event. That event represents the value of the row when the snapshot for the chunk began.

As a snapshot proceeds, it’s likely that other processes continue to access the database, potentially modifying table records. To reflect such changes, INSERT, UPDATE, or DELETE operations are committed to the transaction log as per usual. Similarly, the ongoing Debezium streaming process continues to detect these change events and emits corresponding change event records to Kafka.

How Debezium resolves collisions among records with the same primary key

In some cases, the UPDATE or DELETE events that the streaming process emits are received out of sequence. That is, the streaming process might emit an event that modifies a table row before the snapshot captures the chunk that contains the READ event for that row. When the snapshot eventually emits the corresponding READ event for the row, its value is already superseded. To ensure that incremental snapshot events that arrive out of sequence are processed in the correct logical order, Debezium employs a buffering scheme for resolving collisions. Only after collisions between the snapshot events and the streamed events are resolved does Debezium emit an event record to Kafka.

Snapshot window

To assist in resolving collisions between late-arriving READ events and streamed events that modify the same table row, Debezium employs a so-called snapshot window. The snapshot window demarcates the interval during which an incremental snapshot captures data for a specified table chunk. Before the snapshot window for a chunk opens, Debezium follows its usual behavior and emits events from the transaction log directly downstream to the target Kafka topic. But from the moment that the snapshot for a particular chunk opens, until it closes, Debezium performs a de-duplication step to resolve collisions between events that have the same primary key..

For each data collection, the Debezium emits two types of events, and stores the records for them both in a single destination Kafka topic. The snapshot records that it captures directly from a table are emitted as READ operations. Meanwhile, as users continue to update records in the data collection, and the transaction log is updated to reflect each commit, Debezium emits UPDATE or DELETE operations for each change.

As the snapshot window opens, and Debezium begins processing a snapshot chunk, it delivers snapshot records to a memory buffer. During the snapshot windows, the primary keys of the READ events in the buffer are compared to the primary keys of the incoming streamed events. If no match is found, the streamed event record is sent directly to Kafka. If Debezium detects a match, it discards the buffered READ event, and writes the streamed record to the destination topic, because the streamed event logically supersede the static snapshot event. After the snapshot window for the chunk closes, the buffer contains only READ events for which no related transaction log events exist. Debezium emits these remaining READ events to the table’s Kafka topic.

The connector repeats the process for each snapshot chunk.

To enable Debezium to perform incremental snapshots, you must grant the connector permission to write to the signaling table.

Write permission is unnecessary only for connectors that can be configured to perform read-only incrementals snapshots (MariaDB, MySQL, or PostgreSQL).

Currently, you can use either of the following methods to initiate an incremental snapshot:

The Debezium connector for Oracle does not support schema changes while an incremental snapshot is running.

Triggering an incremental snapshot

To initiate an incremental snapshot, you can send an ad hoc snapshot signal to the signaling table on the source database. You submit snapshot signals as SQL INSERT queries.

After Debezium detects the change in the signaling table, it reads the signal, and runs the requested snapshot operation.

The query that you submit specifies the tables to include in the snapshot, and, optionally, specifies the type of snapshot operation. Debezium currently supports the incremental and blocking snapshot types.

To specify the tables to include in the snapshot, provide a data-collections array that lists the tables, or an array of regular expressions used to match tables, for example,

{"data-collections": ["public.MyFirstTable", "public.MySecondTable"]}

The data-collections array for an incremental snapshot signal has no default value. If the data-collections array is empty, Debezium interprets the empty array to mean that no action is required, and it does not perform a snapshot.

If the name of a table that you want to include in a snapshot contains a dot (.), a space, or some other non-alphanumeric character, you must escape the table name in double quotes.
For example, to include a table that exists in the public schema in the db1 database, and that has the name My.Table, use the following format: "db1.public.\"My.Table\"".

Prerequisites
Using a source signaling channel to trigger an incremental snapshot
  1. Send a SQL query to add the ad hoc incremental snapshot request to the signaling table:

    INSERT INTO <signalTable> (id, type, data) VALUES ('<id>', '<snapshotType>', '{"data-collections": ["<fullyQualfiedTableName>","<fullyQualfiedTableName>"],"type":"<snapshotType>","additional-conditions":[{"data-collection": "<fullyQualfiedTableName>", "filter": "<additional-condition>"}]}');

    For example,

    INSERT INTO db1.myschema.debezium_signal (id, type, data) (1)
    values ('ad-hoc-1',   (2)
        'execute-snapshot',  (3)
        '{"data-collections": ["db1.schema1.table1", "db1.schema1.table2"], (4)
        "type":"incremental", (5)
        "additional-conditions":[{"data-collection": "db1.schema1.table1" ,"filter":"color=\'blue\'"}]}'); (6)

    The values of the id,type, and data parameters in the command correspond to the fields of the signaling table.
    The following table describes the parameters in the example:

    Table 3. Descriptions of fields in a SQL command for sending an incremental snapshot signal to the signaling table
    Item Value Description

    1

    database.schema.debezium_signal

    Specifies the fully-qualified name of the signaling table on the source database.

    2

    ad-hoc-1

    The id parameter specifies an arbitrary string that is assigned as the id identifier for the signal request.
    Use this string to identify logging messages to entries in the signaling table. Debezium does not use this string. Rather, during the snapshot, Debezium generates its own id string as a watermarking signal.

    3

    execute-snapshot

    The type parameter specifies the operation that the signal is intended to trigger.

    4

    data-collections

    A required component of the data field of a signal that specifies an array of table names or regular expressions to match table names to include in the snapshot.
    The array lists regular expressions that use the format database.schema.table to match the fully-qualified names of the tables. This format is the same as the one that you use to specify the name of the connector’s signaling table.

    5

    incremental

    An optional type component of the data field of a signal that specifies the type of snapshot operation to run.
    Valid values are incremental and blocking.
    If you do not specify a value, the connector defaults to performing an incremental snapshot.

    6

    additional-conditions

    An optional array that specifies a set of additional conditions that the connector evaluates to determine the subset of records to include in a snapshot.
    Each additional condition is an object with data-collection and filter properties. You can specify different filters for each data collection.
    * The data-collection property is the fully-qualified name of the data collection that the filter applies to. For more information about the additional-conditions parameter, see Running an ad hoc incremental snapshots with additional-conditions.

Running an ad hoc incremental snapshots with additional-conditions

If you want a snapshot to include only a subset of the content in a table, you can modify the signal request by appending an additional-conditions parameter to the snapshot signal.

The SQL query for a typical snapshot takes the following form:

SELECT * FROM <tableName> ....

By adding an additional-conditions parameter, you append a WHERE condition to the SQL query, as in the following example:

SELECT * FROM <data-collection> WHERE <filter> ....

The following example shows a SQL query to send an ad hoc incremental snapshot request with an additional condition to the signaling table:

INSERT INTO <signalTable> (id, type, data) VALUES ('<id>', '<snapshotType>', '{"data-collections": ["<fullyQualfiedTableName>","<fullyQualfiedTableName>"],"type":"<snapshotType>","additional-conditions":[{"data-collection": "<fullyQualfiedTableName>", "filter": "<additional-condition>"}]}');

For example, suppose you have a products table that contains the following columns:

  • id (primary key)

  • color

  • quantity

If you want an incremental snapshot of the products table to include only the data items where color=blue, you can use the following SQL statement to trigger the snapshot:

INSERT INTO db1.myschema.debezium_signal (id, type, data) VALUES('ad-hoc-1', 'execute-snapshot', '{"data-collections": ["db1.schema1.products"],"type":"incremental", "additional-conditions":[{"data-collection": "db1.schema1.products", "filter": "color=blue"}]}');

The additional-conditions parameter also enables you to pass conditions that are based on more than one column. For example, using the products table from the previous example, you can submit a query that triggers an incremental snapshot that includes the data of only those items for which color=blue and quantity>10:

INSERT INTO db1.myschema.debezium_signal (id, type, data) VALUES('ad-hoc-1', 'execute-snapshot', '{"data-collections": ["db1.schema1.products"],"type":"incremental", "additional-conditions":[{"data-collection": "db1.schema1.products", "filter": "color=blue AND quantity>10"}]}');

The following example, shows the JSON for an incremental snapshot event that is captured by a connector.

Example 1. Incremental snapshot event message
{
    "before":null,
    "after": {
        "pk":"1",
        "value":"New data"
    },
    "source": {
        ...
        "snapshot":"incremental" (1)
    },
    "op":"r", (2)
    "ts_ms":"1620393591654",
    "ts_us":"1620393591654547",
    "ts_ns":"1620393591654547920",
    "transaction":null
}
Table 4. Description of fields in an incremental snapshot event message
Item Field name Description

1

snapshot

Specifies the type of snapshot operation to run.
Currently, the only valid options are blocking and incremental.
Specifying a type value in the SQL query that you submit to the signaling table is optional.
If you do not specify a value, the connector runs an incremental snapshot.

2

op

Specifies the event type.
The value for snapshot events is r, signifying a READ operation.

Using the Kafka signaling channel to trigger an incremental snapshot

You can send a message to the configured Kafka topic to request the connector to run an ad hoc incremental snapshot.

The key of the Kafka message must match the value of the topic.prefix connector configuration option.

The value of the message is a JSON object with type and data fields.

The signal type is execute-snapshot, and the data field must have the following fields:

Table 5. Execute snapshot data fields
Field Default Value

type

incremental

The type of the snapshot to be executed. Currently Debezium supports the incremental and blocking types.
See the next section for more details.

data-collections

N/A

An array of comma-separated regular expressions that match the fully-qualified names of tables to include in the snapshot.
Specify the names by using the same format as is required for the signal.data.collection configuration option.

additional-conditions

N/A

An optional array of additional conditions that specifies criteria that the connector evaluates to designate a subset of records to include in a snapshot.
Each additional condition is an object that specifies the criteria for filtering the data that an ad hoc snapshot captures. You can set the following parameters for each additional condition: data-collection:: The fully-qualified name of the table that the filter applies to. You can apply different filters to each table. filter:: Specifies column values that must be present in a database record for the snapshot to include it, for example, "color='blue'".

The values that you assign to the filter parameter are the same types of values that you might specify in the WHERE clause of SELECT statements when you set the snapshot.select.statement.overrides property for a blocking snapshot.

Example 2. An execute-snapshot Kafka message
Key = `test_connector`

Value = `{"type":"execute-snapshot","data": {"data-collections": ["{collection-container}.table1", "{collection-container}.table2"], "type": "INCREMENTAL"}}`
Ad hoc incremental snapshots with additional-conditions

Debezium uses the additional-conditions field to select a subset of a table’s content.

Typically, when Debezium runs a snapshot, it runs a SQL query such as:

SELECT * FROM <tableName> …​.

When the snapshot request includes an additional-conditions property, the data-collection and filter parameters of the property are appended to the SQL query, for example:

SELECT * FROM <data-collection> WHERE <filter> …​.

For example, given a products table with the columns id (primary key), color, and brand, if you want a snapshot to include only content for which color='blue', when you request the snapshot, you could add the additional-conditions property to filter the content: :leveloffset: +1

Key = `test_connector`

Value = `{"type":"execute-snapshot","data": {"data-collections": ["db1.schema1.products"], "type": "INCREMENTAL", "additional-conditions": [{"data-collection": "db1.schema1.products" ,"filter":"color='blue'"}]}}`

You can also use the additional-conditions property to pass conditions based on multiple columns. For example, using the same products table as in the previous example, if you want a snapshot to include only the content from the products table for which color='blue', and brand='MyBrand', you could send the following request: :leveloffset: +1

Key = `test_connector`

Value = `{"type":"execute-snapshot","data": {"data-collections": ["db1.schema1.products"], "type": "INCREMENTAL", "additional-conditions": [{"data-collection": "db1.schema1.products" ,"filter":"color='blue' AND brand='MyBrand'"}]}}`

Stopping an incremental snapshot

In some situations, it might be necessary to stop an incremental snapshot. For example, you might realize that snapshot was not configured correctly, or maybe you want to ensure that resources are available for other database operations. You can stop a snapshot that is already running by sending a signal to the signaling table on the source database.

You submit a stop snapshot signal to the signaling table by sending it in a SQL INSERT query. The stop-snapshot signal specifies the type of the snapshot operation as incremental, and optionally specifies the tables that you want to omit from the currently running snapshot. After Debezium detects the change in the signaling table, it reads the signal, and stops the incremental snapshot operation if it’s in progress.

Additional resources

You can also stop an incremental snapshot by sending a JSON message to the Kafka signaling topic.

Prerequisites
Using a source signaling channel to stop an incremental snapshot
  1. Send a SQL query to stop the ad hoc incremental snapshot to the signaling table:

    INSERT INTO <signalTable> (id, type, data) values ('<id>', 'stop-snapshot', '{"data-collections": ["<fullyQualfiedTableName>","<fullyQualfiedTableName>"],"type":"incremental"}');

    For example,

    INSERT INTO db1.myschema.debezium_signal (id, type, data) (1)
    values ('ad-hoc-1',   (2)
        'stop-snapshot',  (3)
        '{"data-collections": ["db1.schema1.table1", "db1.schema1.table2"], (4)
        "type":"incremental"}'); (5)

    The values of the id, type, and data parameters in the signal command correspond to the fields of the signaling table.
    The following table describes the parameters in the example:

    Table 6. Descriptions of fields in a SQL command for sending a stop incremental snapshot signal to the signaling table
    Item Value Description

    1

    database.schema.debezium_signal

    Specifies the fully-qualified name of the signaling table on the source database.

    2

    ad-hoc-1

    The id parameter specifies an arbitrary string that is assigned as the id identifier for the signal request.
    Use this string to identify logging messages to entries in the signaling table. Debezium does not use this string.

    3

    stop-snapshot

    Specifies type parameter specifies the operation that the signal is intended to trigger.

    4

    data-collections

    An optional component of the data field of a signal that specifies an array of table names or regular expressions to match table names to remove from the snapshot.
    The array lists regular expressions which match tables by their fully-qualified names in the format database.schema.table

    If you omit this component from the data field, the signal stops the entire incremental snapshot that is in progress.

    5

    incremental

    A required component of the data field of a signal that specifies the type of snapshot operation that is to be stopped.
    Currently, the only valid option is incremental.
    If you do not specify a type value, the signal fails to stop the incremental snapshot.

Using the Kafka signaling channel to stop an incremental snapshot

You can send a signal message to the configured Kafka signaling topic to stop an ad hoc incremental snapshot.

The key of the Kafka message must match the value of the topic.prefix connector configuration option.

The value of the message is a JSON object with type and data fields.

The signal type is stop-snapshot, and the data field must have the following fields:

Table 7. Execute snapshot data fields
Field Default Value

type

incremental

The type of the snapshot to be executed. Currently Debezium supports only the incremental type.
See the next section for more details.

data-collections

N/A

An optional array of comma-separated regular expressions that match the fully-qualified names of the tables an array of table names or regular expressions to match table names to remove from the snapshot.
Specify table names by using the format database.schema.table.

The following example shows a typical stop-snapshot Kafka message:

Key = `test_connector`

Value = `{"type":"stop-snapshot","data": {"data-collections": ["db1.schema1.table1", "db1.schema1.table2"], "type": "INCREMENTAL"}}`

Custom snapshotter SPI

For more advanced uses, you can fine-tune control of the snapshot by implementing one of the following interfaces:

io.debezium.snapshot.spi.Snapshotter

Controls whether the connector takes a snapshot.

io.debezium.snapshot.spi.SnapshotQuery

Controls how data is queried during a snapshot.

io.debezium.snapshot.spi.SnapshotLock

Controls whether the connector locks tables when taking a snapshot.

io.debezium.snapshot.spi.Snapshotter interface. All built-in snapshot modes implement this interface.
/**
 * {@link Snapshotter} is used to determine the following details about the snapshot process:
 * <p>
 * - Whether a snapshot occurs. <br>
 * - Whether streaming continues during the snapshot. <br>
 * - Whether the snapshot includes schema (if supported). <br>
 * - Whether to snapshot data or schema following an error.
 * <p>
 * Although Debezium provides many default snapshot modes,
 * to provide more advanced functionality, such as partial snapshots,
 * you can customize implementation of the interface.
 * For more information, see the documentation.
 *
 *
 *
 */
@Incubating
public interface Snapshotter extends Configurable {

    /**
     * @return the name of the snapshotter.
     *
     *
     */
    String name();

    /**
     * @param offsetExists is {@code true} when the connector has an offset context (i.e. restarted)
     * @param snapshotInProgress is {@code true} when the connector is started, but a snapshot is already in progress
     *
     * @return {@code true} if the snapshotter should take a data snapshot
     */
    boolean shouldSnapshotData(boolean offsetExists, boolean snapshotInProgress);

    /**
     * @param offsetExists is {@code true} when the connector has an offset context (i.e. restarted)
     * @param snapshotInProgress is {@code true} when the connector is started, but a snapshot is already in progress
     *
     * @return {@code true} if the snapshotter should take a schema snapshot
     */
    boolean shouldSnapshotSchema(boolean offsetExists, boolean snapshotInProgress);

    /**
     * @return {@code true} if the snapshotter should stream after taking a snapshot
     */
    boolean shouldStream();

    /**
     * @return {@code true} whether the schema can be recovered if database schema history is corrupted.
     */
    boolean shouldSnapshotOnSchemaError();

    /**
     * @return {@code true} whether the snapshot should be re-executed when there is a gap in data stream.
     */
    boolean shouldSnapshotOnDataError();

    /**
     *
     * @return {@code true} if streaming should resume from the start of the snapshot
     * transaction, or {@code false} for when a connector resumes and takes a snapshot,
     * streaming should resume from where streaming previously left off.
     */
    default boolean shouldStreamEventsStartingFromSnapshot() {
        return true;
    }

    /**
     * Lifecycle hook called after the snapshot phase is successful.
     */
    default void snapshotCompleted() {
        // no operation
    }

    /**
     * Lifecycle hook called after the snapshot phase is aborted.
     */
    default void snapshotAborted() {
        // no operation
    }
}
io.debezium.snapshot.spi.SnapshotQuery interface. All built-in snapshot query modes implement this interface.
/**
 * {@link SnapshotQuery} is used to determine the query used during a data snapshot
 *
 *
 */
public interface SnapshotQuery extends Configurable, Service {

    /**
     * @return the name of the snapshot lock.
     *
     *
     */
    String name();

    /**
     * Generate a valid query string for the specified table, or an empty {@link Optional}
     * to skip snapshotting this table (but that table will still be streamed from)
     *
     * @param tableId the table to generate a query for
     * @param snapshotSelectColumns the columns to be used in the snapshot select based on the column
     *                              include/exclude filters
     * @return a valid query string, or none to skip snapshotting this table
     */
    Optional<String> snapshotQuery(String tableId, List<String> snapshotSelectColumns);

}
io.debezium.snapshot.spi.SnapshotLock interface. All built-in snapshot lock modes implement this interface.
/**
 * {@link SnapshotLock} is used to determine the table lock mode used during schema snapshot
 *
 *
 */
public interface SnapshotLock extends Configurable, Service {

    /**
     * @return the name of the snapshot lock.
     *
     *
     */
    String name();

    /**
     * Returns a SQL statement for locking the given table during snapshotting, if required by the specific snapshotter
     * implementation.
     */
    Optional<String> tableLockingStatement(Duration lockTimeout, String tableId);

}

Blocking snapshots

To provide more flexibility in managing snapshots, Debezium includes a supplementary ad hoc snapshot mechanism, known as a blocking snapshot. Blocking snapshots rely on the Debezium mechanism for sending signals to a Debezium connector.

A blocking snapshot behaves just like an initial snapshot, except that you can trigger it at run time.

You might want to run a blocking snapshot rather than use the standard initial snapshot process in the following situations:

  • You add a new table and you want to complete the snapshot while the connector is running.

  • You add a large table, and you want the snapshot to complete in less time than is possible with an incremental snapshot.

Blocking snapshot process

When you run a blocking snapshot, Debezium stops streaming, and then initiates a snapshot of the specified table, following the same process that it uses during an initial snapshot. After the snapshot completes, the streaming is resumed.

Configure snapshot

You can set the following properties in the data component of a signal:

  • data-collections: to specify which tables must be snapshot.

  • data-collections: Specifies the tables that you want the snapshot to include.
    This property accepts a comma-separated list of regular expressions that match fully-qualified table names. The behavior of the property is similar to the behavior of the table.include.list property, which specifies the tables to capture in a blocking snapshot.

  • additional-conditions: You can specify different filters for different table.

    • The data-collection property is the fully-qualified name of the table for which the filter will be applied, and can be case-sensitive or case-insensitive depending on the database.

    • The filter property will have the same value used in the snapshot.select.statement.overrides, the fully-qualified name of the table that should match by case.

For example:

  {"type": "blocking", "data-collections": ["schema1.table1", "schema1.table2"], "additional-conditions": [{"data-collection": "schema1.table1", "filter": "SELECT * FROM [schema1].[table1] WHERE column1 = 0 ORDER BY column2 DESC"}, {"data-collection": "schema1.table2", "filter": "SELECT * FROM [schema1].[table2] WHERE column2 > 0"}]}
Possible duplicates

A delay might exist between the time that you send the signal to trigger the snapshot, and the time when streaming stops and the snapshot starts. As a result of this delay, after the snapshot completes, the connector might emit some event records that duplicate records captured by the snapshot.

Topic names

By default, the Oracle connector writes change events for all INSERT, UPDATE, and DELETE operations that occur in a table to a single Apache Kafka topic that is specific to that table. The connector uses the following convention to name change event topics:

topicPrefix.schemaName.tableName

The following list provides definitions for the components of the default name:

topicPrefix

The topic prefix as specified by the topic.prefix connector configuration property.

schemaName

The name of the schema in which the operation occurred.

tableName

The name of the table in which the operation occurred.

For example, if fulfillment is the server name, inventory is the schema name, and the database contains tables with the names orders, customers, and products, the Debezium Oracle connector emits events to the following Kafka topics, one for each table in the database:

fulfillment.inventory.orders
fulfillment.inventory.customers
fulfillment.inventory.products

The connector applies similar naming conventions to label its internal database schema history topics, schema change topics, and transaction metadata topics.

If the default topic name do not meet your requirements, you can configure custom topic names. To configure custom topic names, you specify regular expressions in the logical topic routing SMT. For more information about using the logical topic routing SMT to customize topic naming, see Topic routing.

Schema history topic

When a database client queries a database, the client uses the database’s current schema. However, the database schema can be changed at any time, which means that the connector must be able to identify what the schema was at the time each insert, update, or delete operation was recorded. Also, a connector cannot necessarily apply the current schema to every event. If an event is relatively old, it’s possible that it was recorded before the current schema was applied.

To ensure correct processing of events that occur after a schema change, Oracle includes in the redo log not only the row-level changes that affect the data, but also the DDL statements that are applied to the database. As the connector encounters these DDL statements in the redo log, it parses them and updates an in-memory representation of each table’s schema. The connector uses this schema representation to identify the structure of the tables at the time of each insert, update, or delete operation and to produce the appropriate change event. In a separate database schema history Kafka topic, the connector records all DDL statements along with the position in the redo log where each DDL statement appeared.

When the connector restarts after either a crash or a graceful stop, it starts reading the redo log from a specific position, that is, from a specific point in time. The connector rebuilds the table structures that existed at this point in time by reading the database schema history Kafka topic and parsing all DDL statements up to the point in the redo log where the connector is starting.

This database schema history topic is internal for internal connector use only. Optionally, the connector can also emit schema change events to a different topic that is intended for consumer applications.

Additional resources

Schema change topic

You can configure a Debezium Oracle connector to produce schema change events that describe structural changes that are applied to tables in the database. The connector writes schema change events to a Kafka topic named <serverName>, where serverName is the namespace that is specified in the topic.prefix configuration property.

Debezium emits a new message to the schema change topic whenever it streams data from a new table, or when the structure of the table is altered.

Following a change in table structure, you must follow (the schema evolution procedure).

Messages that the connector sends to the schema change topic contain a payload, and, optionally, also contain the schema of the change event message.

The schema for the schema change event has the following elements:

name

The name of the schema change event message.

type

The type of the change event message.

version

The version of the schema. The version is an integer that is incremented each time the schema is changed.

fields

The fields that are included in the change event message.

Example: Schema of the Oracle connector schema change topic

The following example shows a typical schema in JSON format.

{
  "schema": {
    "type": "struct",
    "fields": [
      {
        "type": "string",
        "optional": false,
        "field": "databaseName"
      }
    ],
    "optional": false,
    "name": "io.debezium.connector.oracle.SchemaChangeKey",
    "version": 1
  },
  "payload": {
    "databaseName": "inventory"
  }
}

The payload of a schema change event message includes the following elements:

ddl

Provides the SQL CREATE, ALTER, or DROP statement that results in the schema change.

databaseName

The name of the database to which the statements are applied. The value of databaseName serves as the message key.

tableChanges

A structured representation of the entire table schema after the schema change. The tableChanges field contains an array that includes entries for each column of the table. Because the structured representation presents data in JSON or Avro format, consumers can easily read messages without first processing them through a DDL parser.

By default, the connector uses the ALL_TABLES database view to identify the table names to store in the schema history topic. Within that view, the connector can access data only from tables that are available to the user account through which it connects to the database.

You can modify settings so that the schema history topic stores a different subset of tables. Use one of the following methods to alter the set of tables that the topic stores:

When the connector is configured to capture a table, it stores the history of the table’s schema changes not only in the schema change topic, but also in an internal database schema history topic. The internal database schema history topic is for connector use only and it is not intended for direct use by consuming applications. Ensure that applications that require notifications about schema changes consume that information only from the schema change topic.

Never partition the database schema history topic. For the database schema history topic to function correctly, it must maintain a consistent, global order of the event records that the connector emits to it.

To ensure that the topic is not split among partitions, set the partition count for the topic by using one of the following methods:

  • If you create the database schema history topic manually, specify a partition count of 1.

  • If you use the Apache Kafka broker to create the database schema history topic automatically, the topic is created, set the value of the Kafka num.partitions configuration option to 1.

The schema change topic message format is in an incubating state and might change without notice.

Example: Message emitted to the Oracle connector schema change topic

The following example shows a typical schema change message in JSON format. The message contains a logical representation of the table schema.

{
  "schema": {
  ...
  },
  "payload": {
    "source": {
      "version": "3.1.3.Final",
      "connector": "oracle",
      "name": "server1",
      "ts_ms": 1588252618953,
      "ts_us": 1588252618953000,
      "ts_ns": 1588252618953000000,
      "snapshot": "true",
      "db": "ORCLPDB1",
      "schema": "DEBEZIUM",
      "table": "CUSTOMERS",
      "txId" : null,
      "scn" : "1513734",
      "commit_scn": "1513754",
      "lcr_position" : null,
      "rs_id": "001234.00012345.0124",
      "ssn": 1,
      "redo_thread": 1,
      "user_name": "user",
      "row_id": "AAASgjAAMAAAACnAAA",
      "commit_ts_ms": 1588252619012,
      "start_scn": "1513734",
      "start_ts_ms": 1588252618953
    },
    "ts_ms": 1588252618953, (1)
    "ts_us": 1588252618953987, (1)
    "ts_ns": 1588252618953987512, (1)
    "databaseName": "ORCLPDB1", (2)
    "schemaName": "DEBEZIUM", //
    "ddl": "CREATE TABLE \"DEBEZIUM\".\"CUSTOMERS\" \n   (    \"ID\" NUMBER(9,0) NOT NULL ENABLE, \n    \"FIRST_NAME\" VARCHAR2(255), \n    \"LAST_NAME" VARCHAR2(255), \n    \"EMAIL\" VARCHAR2(255), \n     PRIMARY KEY (\"ID\") ENABLE, \n     SUPPLEMENTAL LOG DATA (ALL) COLUMNS\n   ) SEGMENT CREATION IMMEDIATE \n  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 \n NOCOMPRESS LOGGING\n  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645\n  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1\n  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)\n  TABLESPACE \"USERS\" ", (3)
    "tableChanges": [ (4)
      {
        "type": "CREATE", (5)
        "id": "\"ORCLPDB1\".\"DEBEZIUM\".\"CUSTOMERS\"", (6)
        "table": { (7)
          "defaultCharsetName": null,
          "primaryKeyColumnNames": [ (8)
            "ID"
          ],
          "columns": [ (9)
            {
              "name": "ID",
              "jdbcType": 2,
              "nativeType": null,
              "typeName": "NUMBER",
              "typeExpression": "NUMBER",
              "charsetName": null,
              "length": 9,
              "scale": 0,
              "position": 1,
              "optional": false,
              "autoIncremented": false,
              "generated": false
            },
            {
              "name": "FIRST_NAME",
              "jdbcType": 12,
              "nativeType": null,
              "typeName": "VARCHAR2",
              "typeExpression": "VARCHAR2",
              "charsetName": null,
              "length": 255,
              "scale": null,
              "position": 2,
              "optional": false,
              "autoIncremented": false,
              "generated": false
            },
            {
              "name": "LAST_NAME",
              "jdbcType": 12,
              "nativeType": null,
              "typeName": "VARCHAR2",
              "typeExpression": "VARCHAR2",
              "charsetName": null,
              "length": 255,
              "scale": null,
              "position": 3,
              "optional": false,
              "autoIncremented": false,
              "generated": false
            },
            {
              "name": "EMAIL",
              "jdbcType": 12,
              "nativeType": null,
              "typeName": "VARCHAR2",
              "typeExpression": "VARCHAR2",
              "charsetName": null,
              "length": 255,
              "scale": null,
              "position": 4,
              "optional": false,
              "autoIncremented": false,
              "generated": false
            }
          ],
          "attributes": [ (10)
            {
              "customAttribute": "attributeValue"
            }
          ]
        }
      }
    ]
  }
}
Table 8. Descriptions of fields in messages emitted to the schema change topic
Item Field name Description

1

ts_ms, ts_us, ts_ns

Optional field that displays the time at which the connector processed the event. The time is based on the system clock in the JVM running the Kafka Connect task.

In the source object, ts_ms indicates the time that the change was made in the database. By comparing the value for payload.source.ts_ms with the value for payload.ts_ms, you can determine the lag between the source database update and Debezium.

2

databaseName
schemaName

Identifies the database and the schema that contains the change.

3

ddl

This field contains the DDL that is responsible for the schema change.

4

tableChanges

An array of one or more items that contain the schema changes generated by a DDL command.

5

type

Describes the kind of change. The type can be set to one of the following values:

CREATE

Table created.

ALTER

Table modified.

DROP

Table deleted.

6

id

Full identifier of the table that was created, altered, or dropped. In the case of a table rename, this identifier is a concatenation of <old>,<new> table names.

7

table

Represents table metadata after the applied change.

8

primaryKeyColumnNames

List of columns that compose the table’s primary key.

9

columns

Metadata for each column in the changed table.

10

attributes

Custom attribute metadata for each table change.

In messages that the connector sends to the schema change topic, the message key is the name of the database that contains the schema change. In the following example, the payload field contains the databaseName key:

{
  "schema": {
    "type": "struct",
    "fields": [
      {
        "type": "string",
        "optional": false,
        "field": "databaseName"
      }
    ],
    "optional": false,
    "name": "io.debezium.connector.oracle.SchemaChangeKey",
    "version": 1
  },
  "payload": {
    "databaseName": "ORCLPDB1"
  }
}

Transaction Metadata

Debezium can generate events that represent transaction metadata boundaries and that enrich data change event messages.

Limits on when Debezium receives transaction metadata

Debezium registers and receives metadata only for transactions that occur after you deploy the connector. Metadata for transactions that occur before you deploy the connector is not available.

Database transactions are represented by a statement block that is enclosed between the BEGIN and END keywords. Debezium generates transaction boundary events for the BEGIN and END delimiters in every transaction. Transaction boundary events contain the following fields:

status

BEGIN or END.

id

String representation of the unique transaction identifier.

ts_ms

The time of a transaction boundary event (BEGIN or END event) at the data source. If the data source does not provide Debezium with the event time, then the field instead represents the time at which Debezium processes the event.

event_count (for END events)

Total number of events emmitted by the transaction.

data_collections (for END events)

An array of pairs of data_collection and event_count elements that indicates the number of events that the connector emits for changes that originate from a data collection.

The following example shows a typical transaction boundary message:

Example: Oracle connector transaction boundary event
{
  "status": "BEGIN",
  "id": "5.6.641",
  "ts_ms": 1486500577125,
  "event_count": null,
  "data_collections": null
}

{
  "status": "END",
  "id": "5.6.641",
  "ts_ms": 1486500577691,
  "event_count": 2,
  "data_collections": [
    {
      "data_collection": "ORCLPDB1.DEBEZIUM.CUSTOMER",
      "event_count": 1
    },
    {
      "data_collection": "ORCLPDB1.DEBEZIUM.ORDER",
      "event_count": 1
    }
  ]
}

Unless overridden via the topic.transaction option, the connector emits transaction events to the <topic.prefix>.transaction topic.

Change data event enrichment

When transaction metadata is enabled, the data message Envelope is enriched with a new transaction field. This field provides information about every event in the form of a composite of fields:

id

String representation of unique transaction identifier.

total_order

The absolute position of the event among all events generated by the transaction.

data_collection_order

The per-data collection position of the event among all events that were emitted by the transaction.

The following example shows a typical transaction event message:

{
  "before": null,
  "after": {
    "pk": "2",
    "aa": "1"
  },
  "source": {
...
  },
  "op": "c",
  "ts_ms": "1580390884335",
  "ts_us": "1580390884335741",
  "ts_ns": "1580390884335741963",
  "transaction": {
    "id": "5.6.641",
    "total_order": "1",
    "data_collection_order": "1"
  }
}

LogMiner Mining Strategies

Entries in the Oracle redo logs do not store the original SQL statements that users submit to make DML changes. Instead, a redo entry holds a set of change vectors and a set of object identifiers that represent the tablespace, table, and columns related to these vectors. In other words, redo log entries don’t include the names of the schemas, tables, or columns affected by DML changes.

The Debezium Oracle connector uses the log.mining.strategy configuration property to control how Oracle LogMiner handles the lookup of the object identifiers in the change vectors. In certain situations, one log mining strategy might prove more reliable than another with regard to schema changes. However, before you choose a log mining strategy, it’s important to consider the implications it might have on performance and overhead.

Writing the data dictionary to redo logs

The redo_log_catalog mining strategy instructs the database to flush a copy of the data dictionary to the redo logs immediately after each redo log switch. This is the most reliable strategy for tracking schema changes that are interwoven with data changes, because Oracle LogMiner has a way to interpolate between the starting and ending data dictionary states across a series of change vectors.

However, the redo_log_catalog mode is also the most expensive, because it requires several key steps to function. First, this mode requires the data dictionary to be flushed to the redo logs after every log switch. Flushing the logs after each switch can quickly consume valuable space in the archive log, and the high volume of archive logs might exceed the number that database administrators prepared for. If you intend to use this mode, coordinate with your database administrators to ensure that the database is configured appropriately.

If you configure the connector to use the redo_log_catalog mode, do not use multiple Debezium Oracle connectors to capture changes from the same logical database.

Using the online catalog directly

The default strategy mode, online_catalog, works differently from the redo_log_catalog mode. When the strategy is set to online_catalog, the database never flushes the data dictionary to the redo logs. Instead, Oracle LogMiner always uses the most current data dictionary state to perform comparisons. By always using the current dictionary, and eliminating flushing to the redo logs, this strategy requires less overhead, and operates more efficiently. However, these benefits are offset by the inability to parse interwoven schema changes and data changes. As a result, this strategy can sometimes result in event failures.

If LogMiner was unable to reconstruct the SQL reliability after a schema change, check the redo logs for evidence. Look for references to tables with names like OBJ# 123456 (where the number is the table’s object identifier), or for columns with names like COL1 or COL2. When you configure the connector to use the online_catalog strategy, take steps to ensure that the table schema and its indices remain static and free from change. If the Debezium connector is configured to use the online_catalog mode, and you must apply a schema change, perform the following steps:

  1. Wait for the connector to capture all existing data changes (DML).

  2. Perform the schema (DDL) change, and then wait for the connector to capture the change.

  3. Resume data changes (DML) on the table.

Following this procedure helps to ensure that Oracle LogMiner can safely reconstruct the SQL for all data changes.

Hybrid approach

You can enable this strategy by setting the value of the log.mining.strategy configuration property to hybrid. The goal of this strategy is to provide the reliability of the redo_log_catalog strategy with the performance and low overhead of the online_catalog strategy, without incurring the disadvantages of either strategy.

The hybrid strategy works by primarily operating in the online_catalog mode, meaning that the Debezium Oracle connector first delegates event reconstruction to Oracle LogMiner. If Oracle LogMiner successfully reconstructs the SQL, Debezium processes the event normally, as if it were configured to use the online_catalog strategy. If the connector detects that Oracle LogMiner could not reconstruct the SQL, the connector attempts to reconstruct the SQL directly by using the schema history for that table object. The connector reports a failure only if both Oracle LogMiner and the connector are unable to reconstruct the SQL.

You cannot use the hybrid mining strategy if the lob.enabled property is set to true. If you require streaming CLOB, BLOB, or XML data, only the online_catalog or redo_log_catalog strategies can be used.

Query Modes

The Debezium Oracle connector integrates with Oracle LogMiner by default. This integration requires a specialized set of steps which includes generating a complex JDBC SQL query to ingest the changes recorded in the transaction logs as change events. The V$LOGMNR_CONTENTS view used by the JDBC SQL query does not have any indices to improve the query’s performance, and so there are different query modes that can be used that control how the SQL query is generated as a way to improve the query’s execution.

The log.mining.query.filter.mode connector property can be configured with one of the following to influence how the JDBC SQL query is generated:

none

(Default) This mode creates a JDBC query that only filters based on the different operation types, such as inserts, updates, or deletes, at the database level. When filtering the data based on the schema, table, or username include/exclude lists, this is done during the processing loop within the connector.

This mode is often useful when capturing a small number of tables from a database that is not heavily saturated with changes. The generated query is quite simple, and focuses primarily on reading as quickly as possible with low database overhead.

in

This mode creates a JDBC query that filters not only operation types at the database level, but also schema, table, and username include/exclude lists. The query’s predicates are generated using a SQL in-clause based on the values specified in the include/exclude list configuration properties.

This mode is often useful when capturing a large number of tables from a database that is heavily saturated with changes. The generated query is much more complex than the none mode, and focuses on reducing network overhead and performing as much filtering at the database level as possible.

Finally, do not specify regular expressions as part of schema and table include/exclude configuration properties. Using regular expressions will cause the connector to not match changes based on these configuration properties, causing changes to be missed.

regex

This mode creates a JDBC query that filters not only operation types at the database level, but also schema, table, and username include/exclude lists. However, unlike the in mode, this mode generates a SQL query using the Oracle REGEXP_LIKE operator using a conjunction or disjunction depending on whether include or excluded values are specified.

This mode is often useful when capturing a variable number of tables that can be identified using a small number of regular expressions. The generated query is much more complex than any other mode, and focuses on reducing network overhead and performing as much filtering at the database level as possible.

Event buffering

Oracle writes all changes to the redo logs in the order in which they occur, including changes that are later discarded by a rollback. As a result, concurrent changes from separate transactions are intertwined. When the connector first reads the stream of changes, because it cannot immediately determine which changes are committed or rolled back, it temporarily stores the change events in an internal buffer. After a change is committed, the connector writes the change event from the buffer to Kafka. The connector drops change events that are discarded by a rollback.

You can configure the buffering mechanism that the connector uses by setting the property log.mining.buffer.type.

Heap

The default buffer type is configured using memory. Under the default memory setting, the connector uses the heap memory of the JVM process to allocate and manage buffered event records. If you use the memory buffer setting, be sure that the amount of memory that you allocate to the Java process can accommodate long-running and large transactions in your environment.

Infinispan

The Debezium Oracle connector can also be configured to use Infinispan as its cache provider, supporting cache stores both locally with embedded mode or remotely on a server cluster. In order to use Infinispan, the log.mining.buffer.type must be configured using either infinispan_embedded or infinispan_remote.

In order to allow flexibility with Infinispan cache configurations, the connector expects a series of cache configuration properties to be supplied when using Infinispan to buffer event data. See the configuration properties in the log.mining.buffer.infinispan.cache namespace. The contents of these configuration properties depend on whether the connector is to integrate with a remote Infinispan cluster or to use the embedded engine.

For example, the following illustrates what an embedded configuration would look like for the transaction cache property when using Infinispan in embedded mode:

<local-cache name="transactions">
  <persistence passivation="false">
    <file-store read-only="false" preload="true" shared="false">
      <data path="./data"/>
      <index path="./index"/>
    </file-store>
  </persistence>
</local-cache>

Looking at the configuration in-depth, the cache is configured to be persistent. All caches should be configured this way to avoid loss of transaction events across connector restarts if a transaction is in-progress. Additionally, the location where the cache is kept is defined by the path attribute and this should be a shared location accessible all possible runtime environments.

The Infinispan buffer implementation utilizes multiple cache configurations with different names. There should be a cache defined for transactions, events, processed-transactions, and schema-changes. Each configuration can be tuned to your performance needs or be identical other than the cache name.

When supplying XML configuration as a JSON connector property value, line breaks must be omitted or replaced with a \n character.

Another example, the following illustrates the same cache configured with an Infinispan cluster:

<distributed-cache name="transactions" statistics="true">
  <encoding media-type="application/x-protostream" />
  <persistence passivation="false">
   <file-store read-only="false" preload="true" shared="false">
     <data path="./data"/>
     <index path="./index"/>
   </file-store>
  </persistence>
</distributed-cache>

Just like the embedded local-cache configuration from the previous example, this configuration is also defined to be persistent. All caches should be configured this way to avoid loss of transaction events across connector restarts if a transaction is in-progress.

However, there are a few differences with noting. First, the cache is defined as a distributed cache rather than a local-cache. Secondly, the cache is defined to use the application/x-protostream encoding, which is required for all Debezium caches. And lastly, no path attribute is necessary on the file store definition since the Infinispan cluster will handle this automatically.

The Infinispan buffer type is considered incubating; the cache formats may change between versions and may require a re-snapshot. The migration notes will indicate whether this is needed.

Additionally, when removing a Debezium Oracle connector that uses the Infinispan buffer, the persisted cache files are not removed from disk automatically. If the same buffer location will be used by a new connector deployment, the files should be removed manually before deploying the new connector.

Infinispan Hotrod client integration

The Debezium Oracle connector utilizes the Hotrod client to communicate with the Infinispan cluster. Any connector property that is prefixed with log.mining.buffer.infinispan.client. will be passed directly to the Hotrod client using the infinispan.client. namespace, allowing for complete customization of how the client is to interact with the cluster.

There is at least one required configuration property that must be supplied when using this Infinspan mode:

log.mining.buffer.infinispan.client.hotrod.server_list

Specifies the list of Infinispan server hostname and port combinations, using <hostname>:<port> format.

SCN gap detection

When the Debezium Oracle connector is configured to use LogMiner, it collects change events from Oracle by using a start and end range that is based on system change numbers (SCNs). The connector manages this range automatically, increasing or decreasing the range depending on whether the connector is able to stream changes in near real-time, or must process a backlog of changes due to the volume of large or bulk transactions in the database.

Under certain circumstances, the Oracle database advances the SCN by an unusually high amount, rather than increasing the SCN value at a constant rate. Such a jump in the SCN value can occur because of the way that a particular integration interacts with the database, or as a result of events such as hot backups.

The Debezium Oracle connector relies on the following configuration properties to detect the SCN gap and adjust the mining range.

log.mining.scn.gap.detection.gap.size.min

Specifies the minimum gap size.

log.mining.scn.gap.detection.time.interval.max.ms

Specifies the maximum time interval.

The connector first compares the difference in the number of changes between the current SCN and the highest SCN in the current mining range. If the difference between the current SCN value and the highest SCN value is greater than the minimum gap size, then the connector has potentially detected a SCN gap. To confirm whether a gap exists, the connector next compares the timestamps of the current SCN and the SCN at the end of the previous mining range. If the difference between the timestamps is less than the maximum time interval, then the existence of an SCN gap is confirmed.

When an SCN gap occurs, the Debezium connector automatically uses the current SCN as the end point for the range of the current mining session. This allows the connector to quickly catch up to the real-time events without mining smaller ranges in between that return no changes because the SCN value was increased by an unexpectedly large number. When the connector performs the preceding steps in response to an SCN gap, it ignores the value that is specified by the log.mining.batch.size.max property. After the connector finishes the mining session and catches back up to real-time events, it resumes enforcement of the maximum log mining batch size.

SCN gap detection is available only if the large SCN increment occurs while the connector is running and processing near real-time events.

Low change frequency offset management

The Debezium Oracle connector tracks system change numbers in the connector offsets so that when the connector is restarted, it can begin where it left off. These offsets are part of each emitted change event; however, when the frequency of database changes are low (every few hours or days), the offsets can become stale and prevent the connector from successfully restarting if the system change number is no longer available in the transaction logs.

For connectors that use non-CDB mode to connect to Oracle, you can enable heartbeat.interval.ms to force the connector to emit a heartbeat event at regular intervals so that offsets remain synchronized.

For connectors that use CDB mode to connect to Oracle, maintaining synchronization is more complicated. Not only must you set heartbeat.interval.ms, but it’s also necessary to set heartbeat.action.query. Specifying both properties is required, because in CDB mode, the connector specifically tracks changes inside the PDB only. A supplementary mechanism is needed to trigger change events from within the pluggable database. At regular intervals, the heartbeat action query causes the connector to insert a new table row, or update an existing row in the pluggable database. Debezium detects the table changes and emits change events for them, ensuring that offsets remain synchronized, even in pluggable databases that process changes infrequently.

For the connector to use the heartbeat.action.query with tables that are not owned by the connector user account, you must grant the connector user permission to run the necessary INSERT or UPDATE queries on those tables.

Data change events

Every data change event that the Oracle connector emits has a key and a value. The structures of the key and value depend on the table from which the change events originate. For information about how Debezium constructs topic names, see Topic names.

The Debezium Oracle connector ensures that all Kafka Connect schema names are valid Avro schema names. This means that the logical server name must start with alphabetic characters or an underscore ([a-z,A-Z,_]), and the remaining characters in the logical server name and all characters in the schema and table names must be alphanumeric characters or an underscore ([a-z,A-Z,0-9,\_]). The connector automatically replaces invalid characters with an underscore character.

Unexpected naming conflicts can result when the only distinguishing characters between multiple logical server names, schema names, or table names are not valid characters, and those characters are replaced with underscores.

Debezium and Kafka Connect are designed around continuous streams of event messages. However, the structure of these events might change over time, which can be difficult for topic consumers to handle. To facilitate the processing of mutable event structures, each event in Kafka Connect is self-contained. Every message key and value has two parts: a schema and payload. The schema describes the structure of the payload, while the payload contains the actual data.

Changes that are performed by the SYS or SYSTEM user accounts are not captured by the connector.

Change event keys

For each changed table, the change event key is structured such that a field exists for each column in the primary key (or unique key constraint) of the table at the time when the event is created.

For example, a customers table that is defined in the inventory database schema, might have the following change event key:

CREATE TABLE customers (
  id NUMBER(9) GENERATED BY DEFAULT ON NULL AS IDENTITY (START WITH 1001) NOT NULL PRIMARY KEY,
  first_name VARCHAR2(255) NOT NULL,
  last_name VARCHAR2(255) NOT NULL,
  email VARCHAR2(255) NOT NULL UNIQUE
);

If the value of the <topic.prefix>.transaction configuration property is set to server1, the JSON representation for every change event that occurs in the customers table in the database features the following key structure:

{
    "schema": {
        "type": "struct",
        "fields": [
            {
                "type": "int32",
                "optional": false,
                "field": "ID"
            }
        ],
        "optional": false,
        "name": "server1.INVENTORY.CUSTOMERS.Key"
    },
    "payload": {
        "ID": 1004
    }
}

The schema portion of the key contains a Kafka Connect schema that describes the content of the key portion. In the preceding example, the payload value is not optional, the structure is defined by a schema named server1.DEBEZIUM.CUSTOMERS.Key, and there is one required field named id of type int32. The value of the key’s payload field indicates that it is indeed a structure (which in JSON is just an object) with a single id field, whose value is 1004.

Therefore, you can interpret this key as describing the row in the inventory.customers table (output from the connector named server1) whose id primary key column had a value of 1004.

Change event values

The structure of a value in a change event message mirrors the structure of the message key in the change event in the message, and contains both a schema section and a payload section.

Payload of a change event value

An envelope structure in the payload sections of a change event value contains the following fields:

op

A mandatory field that contains a string value describing the type of operation. The op field in the payload of an Oracle connector change event value contains one of the following values: c (create or insert), u (update), d (delete), or r (read, which indicates a snapshot).

before

An optional field that, if present, describes the state of the row before the event occurred. The structure is described by the server1.INVENTORY.CUSTOMERS.Value Kafka Connect schema, which the server1 connector uses for all rows in the inventory.customers table.

after

An optional field that, if present, contains the state of a row after a change occurs. The structure is described by the same server1.INVENTORY.CUSTOMERS.Value Kafka Connect schema that is used for the before field.

source

A mandatory field that contains a structure that describes the source metadata for the event. In the case of the Oracle connector, the structure includes the following fields:

  • The Debezium version.

  • The connector name.

  • Whether the event is part of an ongoing snapshot or not.

  • The transaction id (not includes for snapshots).

  • The SCN of the change.

  • A timestamp that indicates when the record in the source database changed (for snapshots, the timestamp indicates when the snapshot occurred).

  • Username who made the change

  • The ROWID associated with the row

    The commit_scn field is optional and describes the SCN of the transaction commit that the change event participates within. This field is only present when using the LogMiner connection adapter.

    The user_name field is only populated when using the LogMiner connection adapter.

ts_ms

An optional field that, if present, contains the time (based on the system clock in the JVM that runs the Kafka Connect task) at which the connector processed the event.

Schema of a change event value

The schema portion of the event message’s value contains a schema that describes the envelope structure of the payload and the nested fields within it.

create events

The following example shows the value of a create event value from the customers table that is described in the change event keys example:

{
    "schema": {
        "type": "struct",
        "fields": [
            {
                "type": "struct",
                "fields": [
                    {
                        "type": "int32",
                        "optional": false,
                        "field": "ID"
                    },
                    {
                        "type": "string",
                        "optional": false,
                        "field": "FIRST_NAME"
                    },
                    {
                        "type": "string",
                        "optional": false,
                        "field": "LAST_NAME"
                    },
                    {
                        "type": "string",
                        "optional": false,
                        "field": "EMAIL"
                    }
                ],
                "optional": true,
                "name": "server1.DEBEZIUM.CUSTOMERS.Value",
                "field": "before"
            },
            {
                "type": "struct",
                "fields": [
                    {
                        "type": "int32",
                        "optional": false,
                        "field": "ID"
                    },
                    {
                        "type": "string",
                        "optional": false,
                        "field": "FIRST_NAME"
                    },
                    {
                        "type": "string",
                        "optional": false,
                        "field": "LAST_NAME"
                    },
                    {
                        "type": "string",
                        "optional": false,
                        "field": "EMAIL"
                    }
                ],
                "optional": true,
                "name": "server1.DEBEZIUM.CUSTOMERS.Value",
                "field": "after"
            },
            {
                "type": "struct",
                "fields": [
                    {
                        "type": "string",
                        "optional": true,
                        "field": "version"
                    },
                    {
                        "type": "string",
                        "optional": false,
                        "field": "name"
                    },
                    {
                        "type": "int64",
                        "optional": true,
                        "field": "ts_ms"
                    },
                    {
                        "type": "int64",
                        "optional": true,
                        "field": "ts_us"
                    },
                    {
                        "type": "int64",
                        "optional": true,
                        "field": "ts_ns"
                    },
                    {
                        "type": "string",
                        "optional": true,
                        "field": "txId"
                    },
                    {
                        "type": "string",
                        "optional": true,
                        "field": "scn"
                    },
                    {
                        "type": "string",
                        "optional": true,
                        "field": "commit_scn"
                    },
                    {
                        "type": "string",
                        "optional": true,
                        "field": "rs_id"
                    },
                    {
                        "type": "int64",
                        "optional": true,
                        "field": "ssn"
                    },
                    {
                        "type": "int32",
                        "optional": true,
                        "field": "redo_thread"
                    },
                    {
                        "type": "string",
                        "optional": true,
                        "field": "user_name"
                    },
                    {
                        "type": "boolean",
                        "optional": true,
                        "field": "snapshot"
                    },
                    {
                        "type": "string",
                        "optional": true,
                        "field": "row_id"
                    },
                    {
                        "type": "int64",
                        "optional": true,
                        "field": "commit_ts_ms"
                    },
                    {
                        "type": "string",
                        "optional": true,
                        "field": "start_scn"
                    },
                    {
                        "type": "int64",
                        "optional": true,
                        "field": "start_ts_ms"
                    }
                ],
                "optional": false,
                "name": "io.debezium.connector.oracle.Source",
                "field": "source"
            },
            {
                "type": "string",
                "optional": false,
                "field": "op"
            },
            {
                "type": "int64",
                "optional": true,
                "field": "ts_ms"
            },
            {
                "type": "int64",
                "optional": true,
                "field": "ts_us"
            },
            {
                "type": "int64",
                "optional": true,
                "field": "ts_ns"
            }
        ],
        "optional": false,
        "name": "server1.DEBEZIUM.CUSTOMERS.Envelope"
    },
    "payload": {
        "before": null,
        "after": {
            "ID": 1004,
            "FIRST_NAME": "Anne",
            "LAST_NAME": "Kretchmar",
            "EMAIL": "[email protected]"
        },
        "source": {
            "version": "3.1.3.Final",
            "name": "server1",
            "ts_ms": 1520085154000,
            "ts_us": 1520085154000000,
            "ts_ns": 1520085154000000000,
            "txId": "6.28.807",
            "scn": "2122185",
            "commit_scn": "2122185",
            "rs_id": "001234.00012345.0124",
            "ssn": 1,
            "redo_thread": 1,
            "user_name": "user",
            "snapshot": false,
            "row_id": "AAASgjAAMAAAACnAAA",
            "commit_ts_ms": 1520085154000,
            "start_scn": "2122185",
            "start_ts_ms": 1520085154000
        },
        "op": "c",
        "ts_ms": 1532592105975,
        "ts_us": 1532592105975741,
        "ts_ns": 1532592105975741582
    }
}

In the preceding example, notice how the event defines the following schema:

  • The envelope (server1.DEBEZIUM.CUSTOMERS.Envelope).

  • The source structure (io.debezium.connector.oracle.Source, which is specific to the Oracle connector and reused across all events).

  • The table-specific schemas for the before and after fields.

The names of the schemas for the before and after fields are of the form <logicalName>.<schemaName>.<tableName>.Value, and thus are entirely independent from the schemas for all other tables. As a result, when you use the Avro converter, the Avro schemas for tables in each logical source have their own evolution and history.

The payload portion of this event’s value, provides information about the event. It describes that a row was created (op=c), and shows that the after field value contains the values that were inserted into the ID, FIRST_NAME, LAST_NAME, and EMAIL columns of the row.

By default, the JSON representations of events are much larger than the rows that they describe. The larger size is due to the JSON representation including both the schema and payload portions of a message. You can use the Avro Converter to decrease the size of messages that the connector writes to Kafka topics.

update events

The following example shows an update change event that the connector captures from the same table as the preceding create event.

{
    "schema": { ... },
    "payload": {
        "before": {
            "ID": 1004,
            "FIRST_NAME": "Anne",
            "LAST_NAME": "Kretchmar",
            "EMAIL": "[email protected]"
        },
        "after": {
            "ID": 1004,
            "FIRST_NAME": "Anne",
            "LAST_NAME": "Kretchmar",
            "EMAIL": "[email protected]"
        },
        "source": {
            "version": "3.1.3.Final",
            "name": "server1",
            "ts_ms": 1520085811000,
            "ts_us": 1520085811000000,
            "ts_ns": 1520085811000000000,
            "txId": "6.9.809",
            "scn": "2125544",
            "commit_scn": "2125544",
            "rs_id": "001234.00012345.0124",
            "ssn": 1,
            "redo_thread": 1,
            "user_name": "user",
            "snapshot": false,
            "row_id": "AAASgjAAMAAAACnAAA",
            "commit_ts_ms": 152008581100,
            "start_scn": "2125544",
            "start_ts_ms": 152008581100
        },
        "op": "u",
        "ts_ms": 1532592713485,
        "ts_us": 1532592713485152,
        "ts_ns": 1532592713485152954,
    }
}

The payload has the same structure as the payload of a create (insert) event, but the following values are different:

  • The value of the op field is u, signifying that this row changed because of an update.

  • The before field shows the former state of the row with the values that were present before the update database commit.

  • The after field shows the updated state of the row, with the EMAIL value now set to [email protected].

  • The structure of the source field includes the same fields as before, but the values are different, because the connector captured the event from a different position in the redo log.

  • The ts_ms field shows the timestamp that indicates when Debezium processed the event.

The payload section reveals several other useful pieces of information. For example, by comparing the before and after structures, we can determine how a row changed as the result of a commit. The source structure provides information about Oracle’s record of this change, providing traceability. It also gives us insight into when this event occurred in relation to other events in this topic and in other topics. Did it occur before, after, or as part of the same commit as another event?

When the columns for a row’s primary/unique key are updated, the value of the row’s key changes. As a result, Debezium emits three events after such an update:

  • A DELETE event.

  • A tombstone event with the old key for the row.

  • An INSERT event that provides the new key for the row.

delete events

The following example shows a delete event for the table that is shown in the preceding create and update event examples. The schema portion of the delete event is identical to the schema portion for those events.

{
    "schema": { ... },
    "payload": {
        "before": {
            "ID": 1004,
            "FIRST_NAME": "Anne",
            "LAST_NAME": "Kretchmar",
            "EMAIL": "[email protected]"
        },
        "after": null,
        "source": {
            "version": "3.1.3.Final",
            "name": "server1",
            "ts_ms": 1520085153000,
            "ts_us": 1520085153000000,
            "ts_ns": 1520085153000000000,
            "txId": "6.28.807",
            "scn": "2122184",
            "commit_scn": "2122184",
            "rs_id": "001234.00012345.0124",
            "ssn": 1,
            "redo_thread": 1,
            "user_name": "user",
            "snapshot": false,
            "row_id": "AAASgjAAMAAAACnAAA",
            "commit_ts_ms": 1520085153000,
            "start_scn": "2122184",
            "start_ts_ms": 1520085153000
        },
        "op": "d",
        "ts_ms": 1532592105960,
        "ts_us": 1532592105960854,
        "ts_ns": 1532592105960854693
    }
}

The payload portion of the event reveals several differences when compared to the payload of a create or update event:

  • The value of the op field is d, signifying that the row was deleted.

  • The before field shows the former state of the row that was deleted with the database commit.

  • The value of the after field is null, signifying that the row no longer exists.

  • The structure of the source field includes many of the keys that exist in create or update events, but the values in the ts_ms, scn, and txId fields are different.

  • The ts_ms shows a timestamp that indicates when Debezium processed this event.

The delete event provides consumers with the information that they require to process the removal of this row.

The Oracle connector’s events are designed to work with Kafka log compaction, which allows for the removal of some older messages as long as at least the most recent message for every key is kept. This allows Kafka to reclaim storage space while ensuring the topic contains a complete dataset and can be used for reloading key-based state.

When a row is deleted, the delete event value shown in the preceding example still works with log compaction, because Kafka is able to remove all earlier messages that use the same key. The message value must be set to null to instruct Kafka to remove all messages that share the same key. To make this possible, by default, Debezium’s Oracle connector always follows a delete event with a special tombstone event that has the same key but null value. You can change the default behavior by setting the connector property