Debezium Connector for Oracle

Table of Contents

Overview

Debezium’s Oracle connector captures and records row-level changes that occur in databases on an Oracle server, including tables that are added while the connector is running. You can configure the connector to emit change events for specific subsets of schemas and tables, or to ignore, mask, or truncate values in specific columns.

For information about the Oracle Database versions that are compatible with this connector, see the Debezium release overview.

Debezium can ingest change events from Oracle by using the native LogMiner database package, the XStream API, or OpenLogReplicator.

How the Oracle connector works

To optimally configure and run a Debezium Oracle connector, it is helpful to understand how the connector performs snapshots, streams change events, determines Kafka topic names, uses metadata, and implements event buffering.

Snapshots

Typically, the redo logs on an Oracle server are configured to not retain the complete history of the database. As a result, the Debezium Oracle connector cannot retrieve the entire history of the database from the logs. To enable the connector to establish a baseline for the current state of the database, the first time that the connector starts, it performs an initial consistent snapshot of the database.

If the time needed to complete the initial snapshot exceeds the UNDO_RETENTION time that is set for the database (fifteen minutes, by default), an ORA-01555 exception can occur. For more information about the error, and about the steps that you can take to recover from it, see the Frequently asked questions.

During a table’s snapshot, it’s possible for Oracle to raise an ORA-01466 exception. This happens when a user modifies the schema of the table or adds, changes, or drops an index or related object associated with the table being snapshot. In the event this happens, the connector will stop and the initial snapshot will need to be taken from the beginning.

To remediate the problem, you can configure the snapshot.database.errors.max.retries property with a value greater than 0 so that the specific table’s snapshot will restart. While the entire snapshot will not start from the beginning when retrying, the specific table in question will be re-read from the beginning and the table’s topic will contain duplicate snapshot events.

Default workflow that the Oracle connector uses to perform an initial snapshot

The following workflow lists the steps that Debezium takes to create a snapshot. These steps describe the process for a snapshot when the snapshot.mode configuration property is set to its default value, which is initial. You can customize the way that the connector creates snapshots by changing the value of the snapshot.mode property. If you configure a different snapshot mode, the connector completes the snapshot by using a modified version of this workflow.

When the snapshot mode is set to the default, the connector completes the following tasks to create a snapshot:

Establish a connection to the database.
Determine the tables to be captured. By default, the connector captures all tables except those with schemas that exclude them from capture. After the snapshot completes, the connector continues to stream data for the specified tables. If you want the connector to capture data only from specific tables you can direct the connector to capture the data for only a subset of tables or table elements by setting properties such as table.include.list or table.exclude.list.
Obtain a ROW SHARE MODE lock on each of the captured tables to prevent structural changes from occurring during creation of the snapshot. Debezium holds the locks for only a short time.
Read the current system change number (SCN) position from the server’s redo log.

Capture the structure of all database tables, or all tables that are designated for capture. The connector persists schema information in its internal database schema history topic. The schema history provides information about the structure that is in effect when a change event occurs.

By default, the connector captures the schema of every table in the database that is in capture mode, including tables that are not configured for capture. If tables are not configured for capture, the initial snapshot captures only their structure; it does not capture any table data. For more information about why snapshots persist schema information for tables that you did not include in the initial snapshot, see Understanding why initial snapshots capture the schema for all tables.

Release the locks obtained in Step 3. Other database clients can now write to any previously locked tables.
At the SCN position that was read in Step 4, the connector scans the tables that are designated for capture (SELECT * FROM … AS OF SCN 123). During the scan, the connector completes the following tasks:
1. Confirms that the table was created before the snapshot began. If the table was created after the snapshot began, the connector skips the table. After the snapshot is complete, and the connector transitions to streaming, it emits change events for any tables that were created after the snapshot began.
2. Produces a read event for each row that is captured from a table. All read events contain the same SCN position, which is the SCN position that was obtained in step 4.
3. Emits each read event to the Kafka topic for the source table.
4. Releases data table locks, if applicable.
Record the successful completion of the snapshot in the connector offsets.

The resulting initial snapshot captures the current state of each row in the captured tables. From this baseline state, the connector captures subsequent changes as they occur.

After the snapshot process begins, if the process is interrupted due to connector failure, rebalancing, or other reasons, the process restarts after the connector restarts. After the connector completes the initial snapshot, it continues streaming from the position that it read in Step 3 so that it does not miss any updates. If the connector stops again for any reason, after it restarts, it resumes streaming changes from where it previously left off.

Table 1. Settings for `snapshot.mode` connector configuration property
Setting	Description
`always`	Perform snapshot on each connector start. After the snapshot completes, the connector begins to stream event records for subsequent database changes.
`initial`	The connector performs a database snapshot as described in the default workflow for creating an initial snapshot. After the snapshot completes, the connector begins to stream event records for subsequent database changes.
`initial_only`	The connector performs a database snapshot and stops before streaming any change event records, not allowing any subsequent change events to be captured.
`schema_only`	Deprecated, see `no_data`.
`no_data`	The connector captures the structure of all relevant tables, performing all of the steps described in the default snapshot workflow, except that it does not create `READ` events to represent the data set at the point of the connector’s start-up (Step 6).
`schema_only_recovery`	Deprecated, see `recovery`.
`recovery`	Set this option to restore a database schema history topic that is lost or corrupted. After a restart, the connector runs a snapshot that rebuilds the topic from the source tables. You can also set the property to periodically prune a database schema history topic that experiences unexpected growth. WARNING: Do not use this mode to perform a snapshot if schema changes were committed to the database after the last connector shutdown.
`when_needed`	After the connector starts, it performs a snapshot only if it detects one of the following circumstances: It cannot detect any topic offsets. A previously recorded offset specifies a log position that is not available on the server.
`configuration_based`	Set the snapshot mode to `configuration_based` to control snapshot behavior through the set of connector properties that have the prefix 'snapshot.mode.configuration.based'.
`custom`	The `custom` snapshot mode lets you inject your own implementation of the `io.debezium.spi.snapshot.Snapshotter` interface. Set the `snapshot.mode.custom.name` configuration property to the name provided by the `name()` method of your implementation. The name is specified on the classpath of your Kafka Connect cluster. If you use the Debezium `EmbeddedEngine`, the name is included in the connector JAR file. For more information, see custom snapshotter SPI.

For more information, see snapshot.mode in the table of connector configuration properties.

Understanding why initial snapshots capture the schema history for all tables

The initial snapshot that a connector runs captures two types of information:

Table data: Information about INSERT, UPDATE, and DELETE operations in tables that are named in the connector’s table.include.list property.
Schema data: DDL statements that describe the structural changes that are applied to tables. Schema data is persisted to both the internal schema history topic, and to the connector’s schema change topic, if one is configured.

After you run an initial snapshot, you might notice that the snapshot captures schema information for tables that are not designated for capture. By default, initial snapshots are designed to capture schema information for every table that is present in the database, not only from tables that are designated for capture. Connectors require that the table’s schema is present in the schema history topic before they can capture a table. By enabling the initial snapshot to capture schema data for tables that are not part of the original capture set, Debezium prepares the connector to readily capture event data from these tables should that later become necessary. If the initial snapshot does not capture a table’s schema, you must add the schema to the history topic before the connector can capture data from the table.

In some cases, you might want to limit schema capture in the initial snapshot. This can be useful when you want to reduce the time required to complete a snapshot. Or when Debezium connects to the database instance through a user account that has access to multiple logical databases, but you want the connector to capture changes only from tables in a specific logic database.

Additional information

Capturing data from tables not captured by the initial snapshot (no schema change)
Capturing data from tables not captured by the initial snapshot (schema change)
Setting the schema.history.internal.store.only.captured.tables.ddl property to specify the tables from which to capture schema information.
Setting the schema.history.internal.store.only.captured.databases.ddl property to specify the logical databases from which to capture schema changes.

Capturing data from tables not captured by the initial snapshot (no schema change)

In some cases, you might want the connector to capture data from a table whose schema was not captured by the initial snapshot. Depending on the connector configuration, the initial snapshot might capture the table schema only for specific tables in the database. If the table schema is not present in the history topic, the connector fails to capture the table, and reports a missing schema error.

You might still be able to capture data from the table, but you must perform additional steps to add the table schema.

Prerequisites

You want to capture data from a table with a schema that the connector did not capture during the initial snapshot.
All entries for the table in the transaction log use the same schema. For information about capturing data from a new table that has undergone structural changes, see Capturing data from tables not captured by the initial snapshot (schema change).

Procedure

Stop the connector.
Remove the internal database schema history topic that is specified by the schema.history.internal.kafka.topic property.
In the connector configuration:
1. Set the snapshot.mode to recovery.
2. (Optional) Set the value of schema.history.internal.store.only.captured.tables.ddl to false to ensure that in the future the connector can readily capture data for tables that are not currently designated for capture. Connectors can capture data from a table only if the table’s schema history is present in the history topic.
3. Add the tables that you want the connector to capture to table.include.list.
Restart the connector. The snapshot recovery process rebuilds the schema history based on the current structure of the tables.
(Optional) After the snapshot completes, initiate an incremental snapshot on the newly added tables. The incremental snapshot first streams the historical data of the newly added tables, and then resumes reading changes from the redo and archive logs for previously configured tables, including changes that occur while that connector was off-line.
(Optional) Reset the snapshot.mode back to no_data to prevent the connector from initiating recovery after a future restart.

Capturing data from tables not captured by the initial snapshot (schema change)

If a schema change is applied to a table, records that are committed before the schema change have different structures than those that were committed after the change. When Debezium captures data from a table, it reads the schema history to ensure that it applies the correct schema to each event. If the schema is not present in the schema history topic, the connector is unable to capture the table, and an error results.

If you want to capture data from a table that was not captured by the initial snapshot, and the schema of the table was modified, you must add the schema to the history topic, if it is not already available. You can add the schema by running a new schema snapshot, or by running an initial snapshot for the table.

Prerequisites

You want to capture data from a table with a schema that the connector did not capture during the initial snapshot.
A schema change was applied to the table so that the records to be captured do not have a uniform structure.

Procedure

Initial snapshot captured the schema for all tables (store.only.captured.tables.ddl was set to false)

Edit the table.include.list property to specify the tables that you want to capture.
Restart the connector.
Initiate an incremental snapshot if you want to capture existing data from the newly added tables.

Initial snapshot did not capture the schema for all tables (store.only.captured.tables.ddl was set to true)

If the initial snapshot did not save the schema of the table that you want to capture, complete one of the following procedures:

Procedure 1: Schema snapshot, followed by incremental snapshot

In this procedure, the connector first performs a schema snapshot. You can then initiate an incremental snapshot to enable the connector to synchronize data.

Stop the connector.
Remove the internal database schema history topic that is specified by the schema.history.internal.kafka.topic property.

Clear the offsets in the configured Kafka Connect offset.storage.topic. For more information about how to remove offsets, see the Debezium community FAQ.

Removing offsets should be performed only by advanced users who have experience in manipulating internal Kafka Connect data. This operation is potentially destructive, and should be performed only as a last resort.

Set values for properties in the connector configuration as described in the following steps:
1. Set the value of the snapshot.mode property to no_data.
2. Edit the table.include.list to add the tables that you want to capture.
Restart the connector.
Wait for Debezium to capture the schema of the new and existing tables. Data changes that occurred any tables after the connector stopped are not captured.
To ensure that no data is lost, initiate an incremental snapshot.

Procedure 2: Initial snapshot, followed by optional incremental snapshot

In this procedure the connector performs a full initial snapshot of the database. As with any initial snapshot, in a database with many large tables, running an initial snapshot can be a time-consuming operation. After the snapshot completes, you can optionally trigger an incremental snapshot to capture any changes that occur while the connector is off-line.

Stop the connector.
Remove the internal database schema history topic that is specified by the schema.history.internal.kafka.topic property.

Clear the offsets in the configured Kafka Connect offset.storage.topic. For more information about how to remove offsets, see the Debezium community FAQ.

Edit the table.include.list to add the tables that you want to capture.
Set values for properties in the connector configuration as described in the following steps:
1. Set the value of the snapshot.mode property to initial.
2. (Optional) Set schema.history.internal.store.only.captured.tables.ddl to false.
Restart the connector. The connector takes a full database snapshot. After the snapshot completes, the connector transitions to streaming.
(Optional) To capture any data that changed while the connector was off-line, initiate an incremental snapshot.

Ad hoc snapshots

By default, a connector runs an initial snapshot operation only after it starts for the first time. Following this initial snapshot, under normal circumstances, the connector does not repeat the snapshot process. Any future change event data that the connector captures comes in through the streaming process only.

However, in some situations the data that the connector obtained during the initial snapshot might become stale, lost, or incomplete. To provide a mechanism for recapturing table data, Debezium includes an option to perform ad hoc snapshots. You might want to perform an ad hoc snapshot after any of the following changes occur in your Debezium environment:

The connector configuration is modified to capture a different set of tables.
Kafka topics are deleted and must be rebuilt.
Data corruption occurs due to a configuration error or some other problem.

You can re-run a snapshot for a table for which you previously captured a snapshot by initiating a so-called ad-hoc snapshot. Ad hoc snapshots require the use of signaling tables. You initiate an ad hoc snapshot by sending a signal request to the Debezium signaling table.

When you initiate an ad hoc snapshot of an existing table, the connector appends content to the topic that already exists for the table. If a previously existing topic was removed, Debezium can create a topic automatically if automatic topic creation is enabled.

Ad hoc snapshot signals specify the tables to include in the snapshot. The snapshot can capture the entire contents of the database, or capture only a subset of the tables in the database. Also, the snapshot can capture a subset of the contents of the table(s) in the database.

You specify the tables to capture by sending an execute-snapshot message to the signaling table. Set the type of the execute-snapshot signal to incremental or blocking, and provide the names of the tables to include in the snapshot, as described in the following table:

Table 2. Example of an ad hoc `execute-snapshot` signal record
Field	Default	Value
`type`	`incremental`	Specifies the type of snapshot that you want to run. Currently, you can request `incremental` or `blocking` snapshots.
`data-collections`	N/A	An array that contains regular expressions matching the fully-qualified names of the tables to include in the snapshot. For the Oracle connector, use the following format to specify the fully qualified name of a table: `database.schema.table`.
`additional-conditions`	N/A	An optional array that specifies a set of additional conditions that the connector evaluates to determine the subset of records to include in a snapshot. Each additional condition is an object that specifies the criteria for filtering the data that an ad hoc snapshot captures. You can set the following parameters for each additional condition: `data-collection` The fully-qualified name of the table that the filter applies to. You can apply different filters to each table. `filter` Specifies column values that must be present in a database record for the snapshot to include it, for example, `"color='blue'"`. The values that you assign to the `filter` parameter are the same types of values that you might specify in the `WHERE` clause of `SELECT` statements when you set the `snapshot.select.statement.overrides` property for a blocking snapshot.
`surrogate-key`	N/A	An optional string that specifies the column name that the connector uses as the primary key of a table during the snapshot process.

Triggering an ad hoc incremental snapshot

You initiate an ad hoc incremental snapshot by adding an entry with the execute-snapshot signal type to the signaling table, or by sending a signal message to a Kafka signaling topic. After the connector processes the message, it begins the snapshot operation. The snapshot process reads the first and last primary key values and uses those values as the start and end point for each table. Based on the number of entries in the table, and the configured chunk size, Debezium divides the table into chunks, and proceeds to snapshot each chunk, in succession, one at a time.

For more information, see Incremental snapshots.

Triggering an ad hoc blocking snapshot

You initiate an ad hoc blocking snapshot by adding an entry with the execute-snapshot signal type to the signaling table or signaling topic. After the connector processes the message, it begins the snapshot operation. The connector temporarily stops streaming, and then initiates a snapshot of the specified table, following the same process that it uses during an initial snapshot. After the snapshot completes, the connector resumes streaming.

For more information, see Blocking snapshots.

Incremental snapshots

To provide flexibility in managing snapshots, Debezium includes a supplementary snapshot mechanism, known as incremental snapshotting. Incremental snapshots rely on the Debezium mechanism for sending signals to a Debezium connector. Incremental snapshots are based on the DDD-3 design document.

In an incremental snapshot, instead of capturing the full state of a database all at once, as in an initial snapshot, Debezium captures each table in phases, in a series of configurable chunks. You can specify the tables that you want the snapshot to capture and the size of each chunk. The chunk size determines the number of rows that the snapshot collects during each fetch operation on the database. The default chunk size for incremental snapshots is 1024 rows.

As an incremental snapshot proceeds, Debezium uses watermarks to track its progress, maintaining a record of each table row that it captures. This phased approach to capturing data provides the following advantages over the standard initial snapshot process:

You can run incremental snapshots in parallel with streamed data capture, instead of postponing streaming until the snapshot completes. The connector continues to capture near real-time events from the change log throughout the snapshot process, and neither operation blocks the other.
If the progress of an incremental snapshot is interrupted, you can resume it without losing any data. After the process resumes, the snapshot begins at the point where it stopped, rather than recapturing the table from the beginning.
You can run an incremental snapshot on demand at any time, and repeat the process as needed to adapt to database updates. For example, you might re-run a snapshot after you modify the connector configuration to add a table to its table.include.list property.

Incremental snapshot process

When you run an incremental snapshot, Debezium sorts each table by primary key and then splits the table into chunks based on the configured chunk size. Working chunk by chunk, it then captures each table row in a chunk. For each row that it captures, the snapshot emits a READ event. That event represents the value of the row when the snapshot for the chunk began.

As a snapshot proceeds, it’s likely that other processes continue to access the database, potentially modifying table records. To reflect such changes, INSERT, UPDATE, or DELETE operations are committed to the transaction log as per usual. Similarly, the ongoing Debezium streaming process continues to detect these change events and emits corresponding change event records to Kafka.

How Debezium resolves collisions among records with the same primary key

In some cases, the UPDATE or DELETE events that the streaming process emits are received out of sequence. That is, the streaming process might emit an event that modifies a table row before the snapshot captures the chunk that contains the READ event for that row. When the snapshot eventually emits the corresponding READ event for the row, its value is already superseded. To ensure that incremental snapshot events that arrive out of sequence are processed in the correct logical order, Debezium employs a buffering scheme for resolving collisions. Only after collisions between the snapshot events and the streamed events are resolved does Debezium emit an event record to Kafka.

Snapshot window

To assist in resolving collisions between late-arriving READ events and streamed events that modify the same table row, Debezium employs a so-called snapshot window. The snapshot window demarcates the interval during which an incremental snapshot captures data for a specified table chunk. Before the snapshot window for a chunk opens, Debezium follows its usual behavior and emits events from the transaction log directly downstream to the target Kafka topic. But from the moment that the snapshot for a particular chunk opens, until it closes, Debezium performs a de-duplication step to resolve collisions between events that have the same primary key..

For each data collection, the Debezium emits two types of events, and stores the records for them both in a single destination Kafka topic. The snapshot records that it captures directly from a table are emitted as READ operations. Meanwhile, as users continue to update records in the data collection, and the transaction log is updated to reflect each commit, Debezium emits UPDATE or DELETE operations for each change.

As the snapshot window opens, and Debezium begins processing a snapshot chunk, it delivers snapshot records to a memory buffer. During the snapshot windows, the primary keys of the READ events in the buffer are compared to the primary keys of the incoming streamed events. If no match is found, the streamed event record is sent directly to Kafka. If Debezium detects a match, it discards the buffered READ event, and writes the streamed record to the destination topic, because the streamed event logically supersede the static snapshot event. After the snapshot window for the chunk closes, the buffer contains only READ events for which no related transaction log events exist. Debezium emits these remaining READ events to the table’s Kafka topic.

The connector repeats the process for each snapshot chunk.

Currently, you can use either of the following methods to initiate an incremental snapshot:

The Debezium connector for Oracle does not support schema changes while an incremental snapshot is running.

Triggering an incremental snapshot

To initiate an incremental snapshot, you can send an ad hoc snapshot signal to the signaling table on the source database. You submit snapshot signals as SQL INSERT queries.

After Debezium detects the change in the signaling table, it reads the signal, and runs the requested snapshot operation.

The query that you submit specifies the tables to include in the snapshot, and, optionally, specifies the type of snapshot operation. Debezium currently supports the incremental and blocking snapshot types.

To specify the tables to include in the snapshot, provide a data-collections array that lists the tables, or an array of regular expressions used to match tables, for example,

{"data-collections": ["public.MyFirstTable", "public.MySecondTable"]}

The data-collections array for an incremental snapshot signal has no default value. If the data-collections array is empty, Debezium interprets the empty array to mean that no action is required, and it does not perform a snapshot.

If the name of a table that you want to include in a snapshot contains a dot (.), a space, or some other non-alphanumeric character, you must escape the table name in double quotes.
For example, to include a table that exists in the public schema in the db1 database, and that has the name My.Table, use the following format: "db1.public.\"My.Table\"".

Prerequisites

Signaling is enabled.
- A signaling data collection exists on the source database.
- The signaling data collection is specified in the signal.data.collection property.

Using a source signaling channel to trigger an incremental snapshot

Send a SQL query to add the ad hoc incremental snapshot request to the signaling table:

INSERT INTO <signalTable> (id, type, data) VALUES ('<id>', '<snapshotType>', '{"data-collections": ["<fullyQualfiedTableName>","<fullyQualfiedTableName>"],"type":"<snapshotType>","additional-conditions":[{"data-collection": "<fullyQualfiedTableName>", "filter": "<additional-condition>"}]}');

For example,

INSERT INTO db1.myschema.debezium_signal (id, type, data) (1)
values ('ad-hoc-1',   (2)
    'execute-snapshot',  (3)
    '{"data-collections": ["db1.schema1.table1", "db1.schema1.table2"], (4)
    "type":"incremental", (5)
    "additional-conditions":[{"data-collection": "db1.schema1.table1" ,"filter":"color=\'blue\'"}]}'); (6)

The values of the id,type, and data parameters in the command correspond to the fields of the signaling table.
The following table describes the parameters in the example:

Table 3. Descriptions of fields in a SQL command for sending an incremental snapshot signal to the signaling table
Item	Value	Description
1	`database.schema.debezium_signal`	Specifies the fully-qualified name of the signaling table on the source database.
2	`ad-hoc-1`	The `id` parameter specifies an arbitrary string that is assigned as the `id` identifier for the signal request. Use this string to identify logging messages to entries in the signaling table. Debezium does not use this string. Rather, during the snapshot, Debezium generates its own `id` string as a watermarking signal.
3	`execute-snapshot`	The `type` parameter specifies the operation that the signal is intended to trigger.
4	`data-collections`	A required component of the `data` field of a signal that specifies an array of table names or regular expressions to match table names to include in the snapshot. The array lists regular expressions that use the format `database.schema.table` to match the fully-qualified names of the tables. This format is the same as the one that you use to specify the name of the connector’s signaling table.
5	`incremental`	An optional `type` component of the `data` field of a signal that specifies the type of snapshot operation to run. Valid values are `incremental` and `blocking`. If you do not specify a value, the connector defaults to performing an incremental snapshot.
6	`additional-conditions`	An optional array that specifies a set of additional conditions that the connector evaluates to determine the subset of records to include in a snapshot. Each additional condition is an object with `data-collection` and `filter` properties. You can specify different filters for each data collection. * The `data-collection` property is the fully-qualified name of the data collection that the filter applies to. For more information about the `additional-conditions` parameter, see Running an ad hoc incremental snapshots with `additional-conditions`.

Running an ad hoc incremental snapshots with `additional-conditions`

If you want a snapshot to include only a subset of the content in a table, you can modify the signal request by appending an additional-conditions parameter to the snapshot signal.

The SQL query for a typical snapshot takes the following form:

SELECT * FROM <tableName> ....

By adding an additional-conditions parameter, you append a WHERE condition to the SQL query, as in the following example:

SELECT * FROM <data-collection> WHERE <filter> ....

The following example shows a SQL query to send an ad hoc incremental snapshot request with an additional condition to the signaling table:

INSERT INTO <signalTable> (id, type, data) VALUES ('<id>', '<snapshotType>', '{"data-collections": ["<fullyQualfiedTableName>","<fullyQualfiedTableName>"],"type":"<snapshotType>","additional-conditions":[{"data-collection": "<fullyQualfiedTableName>", "filter": "<additional-condition>"}]}');

For example, suppose you have a products table that contains the following columns:

id (primary key)
color
quantity

If you want an incremental snapshot of the products table to include only the data items where color=blue, you can use the following SQL statement to trigger the snapshot:

INSERT INTO db1.myschema.debezium_signal (id, type, data) VALUES('ad-hoc-1', 'execute-snapshot', '{"data-collections": ["db1.schema1.products"],"type":"incremental", "additional-conditions":[{"data-collection": "db1.schema1.products", "filter": "color=blue"}]}');

The additional-conditions parameter also enables you to pass conditions that are based on more than one column. For example, using the products table from the previous example, you can submit a query that triggers an incremental snapshot that includes the data of only those items for which color=blue and quantity>10:

INSERT INTO db1.myschema.debezium_signal (id, type, data) VALUES('ad-hoc-1', 'execute-snapshot', '{"data-collections": ["db1.schema1.products"],"type":"incremental", "additional-conditions":[{"data-collection": "db1.schema1.products", "filter": "color=blue AND quantity>10"}]}');

The following example, shows the JSON for an incremental snapshot event that is captured by a connector.

Example 1. Incremental snapshot event message

{
    "before":null,
    "after": {
        "pk":"1",
        "value":"New data"
    },
    "source": {
        ...
        "snapshot":"incremental" (1)
    },
    "op":"r", (2)
    "ts_ms":"1620393591654",
    "ts_us":"1620393591654547",
    "ts_ns":"1620393591654547920",
    "transaction":null
}

Table 4. Description of fields in an incremental snapshot event message
Item	Field name	Description
1	`snapshot`	Specifies the type of snapshot operation to run. Currently, the only valid options are `blocking` and `incremental`. Specifying a `type` value in the SQL query that you submit to the signaling table is optional. If you do not specify a value, the connector runs an incremental snapshot.
2	`op`	Specifies the event type. The value for snapshot events is `r`, signifying a `READ` operation.

Using the Kafka signaling channel to trigger an incremental snapshot

You can send a message to the configured Kafka topic to request the connector to run an ad hoc incremental snapshot.

The key of the Kafka message must match the value of the topic.prefix connector configuration option.

The value of the message is a JSON object with type and data fields.

The signal type is execute-snapshot, and the data field must have the following fields:

Table 5. Execute snapshot data fields
Field	Default	Value
`type`	`incremental`	The type of the snapshot to be executed. Currently Debezium supports the `incremental` and `blocking` types. See the next section for more details.
`data-collections`	N/A	An array of comma-separated regular expressions that match the fully-qualified names of tables to include in the snapshot. Specify the names by using the same format as is required for the signal.data.collection configuration option.
`additional-conditions`	N/A	An optional array of additional conditions that specifies criteria that the connector evaluates to designate a subset of records to include in a snapshot. Each additional condition is an object that specifies the criteria for filtering the data that an ad hoc snapshot captures. You can set the following parameters for each additional condition: `data-collection`:: The fully-qualified name of the table that the filter applies to. You can apply different filters to each table. `filter`:: Specifies column values that must be present in a database record for the snapshot to include it, for example, `"color='blue'"`. The values that you assign to the `filter` parameter are the same types of values that you might specify in the `WHERE` clause of `SELECT` statements when you set the `snapshot.select.statement.overrides` property for a blocking snapshot.

Example 2. An execute-snapshot Kafka message

Key = `test_connector`

Value = `{"type":"execute-snapshot","data": {"data-collections": ["{collection-container}.table1", "{collection-container}.table2"], "type": "INCREMENTAL"}}`

Ad hoc incremental snapshots with additional-conditions

Debezium uses the additional-conditions field to select a subset of a table’s content.

Typically, when Debezium runs a snapshot, it runs a SQL query such as:

SELECT * FROM <tableName> ….

When the snapshot request includes an additional-conditions property, the data-collection and filter parameters of the property are appended to the SQL query, for example:

SELECT * FROM <data-collection> WHERE <filter> ….

For example, given a products table with the columns id (primary key), color, and brand, if you want a snapshot to include only content for which color='blue', when you request the snapshot, you could add the additional-conditions property to filter the content: :leveloffset: +1

Key = `test_connector`

Value = `{"type":"execute-snapshot","data": {"data-collections": ["db1.schema1.products"], "type": "INCREMENTAL", "additional-conditions": [{"data-collection": "db1.schema1.products" ,"filter":"color='blue'"}]}}`

You can also use the additional-conditions property to pass conditions based on multiple columns. For example, using the same products table as in the previous example, if you want a snapshot to include only the content from the products table for which color='blue', and brand='MyBrand', you could send the following request: :leveloffset: +1

Key = `test_connector`

Value = `{"type":"execute-snapshot","data": {"data-collections": ["db1.schema1.products"], "type": "INCREMENTAL", "additional-conditions": [{"data-collection": "db1.schema1.products" ,"filter":"color='blue' AND brand='MyBrand'"}]}}`

Stopping an incremental snapshot

In some situations, it might be necessary to stop an incremental snapshot. For example, you might realize that snapshot was not configured correctly, or maybe you want to ensure that resources are available for other database operations. You can stop a snapshot that is already running by sending a signal to the signaling table on the source database.

You submit a stop snapshot signal to the signaling table by sending it in a SQL INSERT query. The stop-snapshot signal specifies the type of the snapshot operation as incremental, and optionally specifies the tables that you want to omit from the currently running snapshot. After Debezium detects the change in the signaling table, it reads the signal, and stops the incremental snapshot operation if it’s in progress.

Additional resources

You can also stop an incremental snapshot by sending a JSON message to the Kafka signaling topic.

Prerequisites

Signaling is enabled.
- A signaling data collection exists on the source database.
- The signaling data collection is specified in the signal.data.collection property.

Using a source signaling channel to stop an incremental snapshot

Send a SQL query to stop the ad hoc incremental snapshot to the signaling table:

INSERT INTO <signalTable> (id, type, data) values ('<id>', 'stop-snapshot', '{"data-collections": ["<fullyQualfiedTableName>","<fullyQualfiedTableName>"],"type":"incremental"}');

For example,

INSERT INTO db1.myschema.debezium_signal (id, type, data) (1)
values ('ad-hoc-1',   (2)
    'stop-snapshot',  (3)
    '{"data-collections": ["db1.schema1.table1", "db1.schema1.table2"], (4)
    "type":"incremental"}'); (5)

The values of the id, type, and data parameters in the signal command correspond to the fields of the signaling table.
The following table describes the parameters in the example:

Table 6. Descriptions of fields in a SQL command for sending a stop incremental snapshot signal to the signaling table
Item	Value	Description
1	`database.schema.debezium_signal`	Specifies the fully-qualified name of the signaling table on the source database.
2	`ad-hoc-1`	The `id` parameter specifies an arbitrary string that is assigned as the `id` identifier for the signal request. Use this string to identify logging messages to entries in the signaling table. Debezium does not use this string.
3	`stop-snapshot`	Specifies `type` parameter specifies the operation that the signal is intended to trigger.
4	`data-collections`	An optional component of the `data` field of a signal that specifies an array of table names or regular expressions to match table names to remove from the snapshot. The array lists regular expressions which match tables by their fully-qualified names in the format `database.schema.table` If you omit this component from the `data` field, the signal stops the entire incremental snapshot that is in progress.
5	`incremental`	A required component of the `data` field of a signal that specifies the type of snapshot operation that is to be stopped. Currently, the only valid option is `incremental`. If you do not specify a `type` value, the signal fails to stop the incremental snapshot.

Using the Kafka signaling channel to stop an incremental snapshot

You can send a signal message to the configured Kafka signaling topic to stop an ad hoc incremental snapshot.

The key of the Kafka message must match the value of the topic.prefix connector configuration option.

The value of the message is a JSON object with type and data fields.

The signal type is stop-snapshot, and the data field must have the following fields:

Table 7. Execute snapshot data fields
Field	Default	Value
`type`	`incremental`	The type of the snapshot to be executed. Currently Debezium supports only the `incremental` type. See the next section for more details.
`data-collections`	N/A	An optional array of comma-separated regular expressions that match the fully-qualified names of the tables an array of table names or regular expressions to match table names to remove from the snapshot. Specify table names by using the format `database.schema.table`.

The following example shows a typical stop-snapshot Kafka message: :leveloffset: +1

Key = `test_connector`

Value = `{"type":"stop-snapshot","data": {"data-collections": ["db1.schema1.table1", "db1.schema1.table2"], "type": "INCREMENTAL"}}`

Custom snapshotter SPI

For more advanced uses, you can fine-tune control of the snapshot by implementing one of the following interfaces:

io.debezium.snapshot.spi.Snapshotter: Controls whether the connector takes a snapshot.
io.debezium.snapshot.spi.SnapshotQuery: Controls how data is queried during a snapshot.
io.debezium.snapshot.spi.SnapshotLock: Controls whether the connector locks tables when taking a snapshot.

io.debezium.snapshot.spi.Snapshotter interface. All built-in snapshot modes implement this interface.

/**
 * {@link Snapshotter} is used to determine the following details about the snapshot process:
 * <p>
 * - Whether a snapshot occurs. <br>
 * - Whether streaming continues during the snapshot. <br>
 * - Whether the snapshot includes schema (if supported). <br>
 * - Whether to snapshot data or schema following an error.
 * <p>
 * Although Debezium provides many default snapshot modes,
 * to provide more advanced functionality, such as partial snapshots,
 * you can customize implementation of the interface.
 * For more information, see the documentation.
 *
 *
 *
 */
@Incubating
public interface Snapshotter extends Configurable {

    /**
     * @return the name of the snapshotter.
     *
     *
     */
    String name();

    /**
     * @param offsetExists is {@code true} when the connector has an offset context (i.e. restarted)
     * @param snapshotInProgress is {@code true} when the connector is started, but a snapshot is already in progress
     *
     * @return {@code true} if the snapshotter should take a data snapshot
     */
    boolean shouldSnapshotData(boolean offsetExists, boolean snapshotInProgress);

    /**
     * @param offsetExists is {@code true} when the connector has an offset context (i.e. restarted)
     * @param snapshotInProgress is {@code true} when the connector is started, but a snapshot is already in progress
     *
     * @return {@code true} if the snapshotter should take a schema snapshot
     */
    boolean shouldSnapshotSchema(boolean offsetExists, boolean snapshotInProgress);

    /**
     * @return {@code true} if the snapshotter should stream after taking a snapshot
     */
    boolean shouldStream();

    /**
     * @return {@code true} whether the schema can be recovered if database schema history is corrupted.
     */
    boolean shouldSnapshotOnSchemaError();

    /**
     * @return {@code true} whether the snapshot should be re-executed when there is a gap in data stream.
     */
    boolean shouldSnapshotOnDataError();

    /**
     *
     * @return {@code true} if streaming should resume from the start of the snapshot
     * transaction, or {@code false} for when a connector resumes and takes a snapshot,
     * streaming should resume from where streaming previously left off.
     */
    default boolean shouldStreamEventsStartingFromSnapshot() {
        return true;
    }

    /**
     * Lifecycle hook called after the snapshot phase is successful.
     */
    default void snapshotCompleted() {
        // no operation
    }

    /**
     * Lifecycle hook called after the snapshot phase is aborted.
     */
    default void snapshotAborted() {
        // no operation
    }
}

io.debezium.snapshot.spi.SnapshotQuery interface. All built-in snapshot query modes implement this interface.

/**
 * {@link SnapshotQuery} is used to determine the query used during a data snapshot
 *
 *
 */
public interface SnapshotQuery extends Configurable, Service {

    /**
     * @return the name of the snapshot lock.
     *
     *
     */
    String name();

    /**
     * Generate a valid query string for the specified table, or an empty {@link Optional}
     * to skip snapshotting this table (but that table will still be streamed from)
     *
     * @param tableId the table to generate a query for
     * @param snapshotSelectColumns the columns to be used in the snapshot select based on the column
     *                              include/exclude filters
     * @return a valid query string, or none to skip snapshotting this table
     */
    Optional<String> snapshotQuery(String tableId, List<String> snapshotSelectColumns);

}

io.debezium.snapshot.spi.SnapshotLock interface. All built-in snapshot lock modes implement this interface.

/**
 * {@link SnapshotLock} is used to determine the table lock mode used during schema snapshot
 *
 *
 */
public interface SnapshotLock extends Configurable, Service {

    /**
     * @return the name of the snapshot lock.
     *
     *
     */
    String name();

    /**
     * Returns a SQL statement for locking the given table during snapshotting, if required by the specific snapshotter
     * implementation.
     */
    Optional<String> tableLockingStatement(Duration lockTimeout, String tableId);

}

Blocking snapshots

To provide more flexibility in managing snapshots, Debezium includes a supplementary ad hoc snapshot mechanism, known as a blocking snapshot. Blocking snapshots rely on the Debezium mechanism for sending signals to a Debezium connector.

A blocking snapshot behaves just like an initial snapshot, except that you can trigger it at run time.

You might want to run a blocking snapshot rather than use the standard initial snapshot process in the following situations:

You add a new table and you want to complete the snapshot while the connector is running.
You add a large table, and you want the snapshot to complete in less time than is possible with an incremental snapshot.

Blocking snapshot process

When you run a blocking snapshot, Debezium stops streaming, and then initiates a snapshot of the specified table, following the same process that it uses during an initial snapshot. After the snapshot completes, the streaming is resumed.

Configure snapshot

You can set the following properties in the data component of a signal:

data-collections: to specify which tables must be snapshot.
data-collections: Specifies the tables that you want the snapshot to include.
This property accepts a comma-separated list of regular expressions that match fully-qualified table names. The behavior of the property is similar to the behavior of the table.include.list property, which specifies the tables to capture in a blocking snapshot.
additional-conditions: You can specify different filters for different table.
- The data-collection property is the fully-qualified name of the table for which the filter will be applied, and can be case-sensitive or case-insensitive depending on the database.
- The filter property will have the same value used in the snapshot.select.statement.overrides, the fully-qualified name of the table that should match by case.

For example:

  {"type": "blocking", "data-collections": ["schema1.table1", "schema1.table2"], "additional-conditions": [{"data-collection": "schema1.table1", "filter": "SELECT * FROM [schema1].[table1] WHERE column1 = 0 ORDER BY column2 DESC"}, {"data-collection": "schema1.table2", "filter": "SELECT * FROM [schema1].[table2] WHERE column2 > 0"}]}

Possible duplicates

A delay might exist between the time that you send the signal to trigger the snapshot, and the time when streaming stops and the snapshot starts. As a result of this delay, after the snapshot completes, the connector might emit some event records that duplicate records captured by the snapshot.

Topic names

By default, the Oracle connector writes change events for all INSERT, UPDATE, and DELETE operations that occur in a table to a single Apache Kafka topic that is specific to that table. The connector uses the following convention to name change event topics:

topicPrefix.schemaName.tableName

The following list provides definitions for the components of the default name:

topicPrefix: The topic prefix as specified by the topic.prefix connector configuration property.
schemaName: The name of the schema in which the operation occurred.
tableName: The name of the table in which the operation occurred.

For example, if fulfillment is the server name, inventory is the schema name, and the database contains tables with the names orders, customers, and products, the Debezium Oracle connector emits events to the following Kafka topics, one for each table in the database:

fulfillment.inventory.orders
fulfillment.inventory.customers
fulfillment.inventory.products

The connector applies similar naming conventions to label its internal database schema history topics, schema change topics, and transaction metadata topics.

If the default topic name do not meet your requirements, you can configure custom topic names. To configure custom topic names, you specify regular expressions in the logical topic routing SMT. For more information about using the logical topic routing SMT to customize topic naming, see Topic routing.

Schema history topic

When a database client queries a database, the client uses the database’s current schema. However, the database schema can be changed at any time, which means that the connector must be able to identify what the schema was at the time each insert, update, or delete operation was recorded. Also, a connector cannot necessarily apply the current schema to every event. If an event is relatively old, it’s possible that it was recorded before the current schema was applied.

To ensure correct processing of events that occur after a schema change, Oracle includes in the redo log not only the row-level changes that affect the data, but also the DDL statements that are applied to the database. As the connector encounters these DDL statements in the redo log, it parses them and updates an in-memory representation of each table’s schema. The connector uses this schema representation to identify the structure of the tables at the time of each insert, update, or delete operation and to produce the appropriate change event. In a separate database schema history Kafka topic, the connector records all DDL statements along with the position in the redo log where each DDL statement appeared.

When the connector restarts after either a crash or a graceful stop, it starts reading the redo log from a specific position, that is, from a specific point in time. The connector rebuilds the table structures that existed at this point in time by reading the database schema history Kafka topic and parsing all DDL statements up to the point in the redo log where the connector is starting.

This database schema history topic is internal for internal connector use only. Optionally, the connector can also emit schema change events to a different topic that is intended for consumer applications.

Additional resources

Default names for topics that receive Debezium event records.

Schema change topic

You can configure a Debezium Oracle connector to produce schema change events that describe structural changes that are applied to tables in the database. The connector writes schema change events to a Kafka topic named <serverName>, where serverName is the namespace that is specified in the topic.prefix configuration property.

Debezium emits a new message to the schema change topic whenever it streams data from a new table, or when the structure of the table is altered.

Following a change in table structure, you must follow (the schema evolution procedure).

Messages that the connector sends to the schema change topic contain a payload, and, optionally, also contain the schema of the change event message.

The schema for the schema change event has the following elements:

name: The name of the schema change event message.
type: The type of the change event message.
version: The version of the schema. The version is an integer that is incremented each time the schema is changed.
fields: The fields that are included in the change event message.

Example: Schema of the Oracle connector schema change topic

The following example shows a typical schema in JSON format.

{
  "schema": {
    "type": "struct",
    "fields": [
      {
        "type": "string",
        "optional": false,
        "field": "databaseName"
      }
    ],
    "optional": false,
    "name": "io.debezium.connector.oracle.SchemaChangeKey",
    "version": 1
  },
  "payload": {
    "databaseName": "inventory"
  }
}

The payload of a schema change event message includes the following elements:

ddl: Provides the SQL CREATE, ALTER, or DROP statement that results in the schema change.
databaseName: The name of the database to which the statements are applied. The value of databaseName serves as the message key.
tableChanges: A structured representation of the entire table schema after the schema change. The tableChanges field contains an array that includes entries for each column of the table. Because the structured representation presents data in JSON or Avro format, consumers can easily read messages without first processing them through a DDL parser.

By default, the connector uses the ALL_TABLES database view to identify the table names to store in the schema history topic. Within that view, the connector can access data only from tables that are available to the user account through which it connects to the database.

You can modify settings so that the schema history topic stores a different subset of tables. Use one of the following methods to alter the set of tables that the topic stores:

Change the permissions of the account that Debezium uses to access the database so that a different set of tables are visible in the ALL_TABLES view.
Set the connector property schema.history.internal.store.only.captured.tables.ddl to true.

When the connector is configured to capture a table, it stores the history of the table’s schema changes not only in the schema change topic, but also in an internal database schema history topic. The internal database schema history topic is for connector use only and it is not intended for direct use by consuming applications. Ensure that applications that require notifications about schema changes consume that information only from the schema change topic.

Never partition the database schema history topic. For the database schema history topic to function correctly, it must maintain a consistent, global order of the event records that the connector emits to it.

To ensure that the topic is not split among partitions, set the partition count for the topic by using one of the following methods:

If you create the database schema history topic manually, specify a partition count of 1.
If you use the Apache Kafka broker to create the database schema history topic automatically, the topic is created, set the value of the Kafka num.partitions configuration option to 1.

The schema change topic message format is in an incubating state and might change without notice.

Example: Message emitted to the Oracle connector schema change topic

The following example shows a typical schema change message in JSON format. The message contains a logical representation of the table schema.

{
  "schema": {
  ...
  },
  "payload": {
    "source": {
      "version": "3.0.8.Final",
      "connector": "oracle",
      "name": "server1",
      "ts_ms": 1588252618953,
      "ts_us": 1588252618953000,
      "ts_ns": 1588252618953000000,
      "snapshot": "true",
      "db": "ORCLPDB1",
      "schema": "DEBEZIUM",
      "table": "CUSTOMERS",
      "txId" : null,
      "scn" : "1513734",
      "commit_scn": "1513754",
      "lcr_position" : null,
      "rs_id": "001234.00012345.0124",
      "ssn": 1,
      "redo_thread": 1,
      "user_name": "user",
      "row_id": "AAASgjAAMAAAACnAAA"
    },
    "ts_ms": 1588252618953, (1)
    "ts_us": 1588252618953987, (1)
    "ts_ns": 1588252618953987512, (1)
    "databaseName": "ORCLPDB1", (2)
    "schemaName": "DEBEZIUM", //
    "ddl": "CREATE TABLE \"DEBEZIUM\".\"CUSTOMERS\" \n   (    \"ID\" NUMBER(9,0) NOT NULL ENABLE, \n    \"FIRST_NAME\" VARCHAR2(255), \n    \"LAST_NAME" VARCHAR2(255), \n    \"EMAIL\" VARCHAR2(255), \n     PRIMARY KEY (\"ID\") ENABLE, \n     SUPPLEMENTAL LOG DATA (ALL) COLUMNS\n   ) SEGMENT CREATION IMMEDIATE \n  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 \n NOCOMPRESS LOGGING\n  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645\n  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1\n  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)\n  TABLESPACE \"USERS\" ", (3)
    "tableChanges": [ (4)
      {
        "type": "CREATE", (5)
        "id": "\"ORCLPDB1\".\"DEBEZIUM\".\"CUSTOMERS\"", (6)
        "table": { (7)
          "defaultCharsetName": null,
          "primaryKeyColumnNames": [ (8)
            "ID"
          ],
          "columns": [ (9)
            {
              "name": "ID",
              "jdbcType": 2,
              "nativeType": null,
              "typeName": "NUMBER",
              "typeExpression": "NUMBER",
              "charsetName": null,
              "length": 9,
              "scale": 0,
              "position": 1,
              "optional": false,
              "autoIncremented": false,
              "generated": false
            },
            {
              "name": "FIRST_NAME",
              "jdbcType": 12,
              "nativeType": null,
              "typeName": "VARCHAR2",
              "typeExpression": "VARCHAR2",
              "charsetName": null,
              "length": 255,
              "scale": null,
              "position": 2,
              "optional": false,
              "autoIncremented": false,
              "generated": false
            },
            {
              "name": "LAST_NAME",
              "jdbcType": 12,
              "nativeType": null,
              "typeName": "VARCHAR2",
              "typeExpression": "VARCHAR2",
              "charsetName": null,
              "length": 255,
              "scale": null,
              "position": 3,
              "optional": false,
              "autoIncremented": false,
              "generated": false
            },
            {
              "name": "EMAIL",
              "jdbcType": 12,
              "nativeType": null,
              "typeName": "VARCHAR2",
              "typeExpression": "VARCHAR2",
              "charsetName": null,
              "length": 255,
              "scale": null,
              "position": 4,
              "optional": false,
              "autoIncremented": false,
              "generated": false
            }
          ],
          "attributes": [ (10)
            {
              "customAttribute": "attributeValue"
            }
          ]
        }
      }
    ]
  }
}

Table 8. Descriptions of fields in messages emitted to the schema change topic
Item	Field name	Description
1	`ts_ms`, `ts_us`, `ts_ns`	Optional field that displays the time at which the connector processed the event. The time is based on the system clock in the JVM running the Kafka Connect task. In the source object, ts_ms indicates the time that the change was made in the database. By comparing the value for payload.source.ts_ms with the value for payload.ts_ms, you can determine the lag between the source database update and Debezium.
2	`databaseName` `schemaName`	Identifies the database and the schema that contains the change.
3	`ddl`	This field contains the DDL that is responsible for the schema change.
4	`tableChanges`	An array of one or more items that contain the schema changes generated by a DDL command.
5	`type`	Describes the kind of change. The `type` can be set to one of the following values: `CREATE` Table created. `ALTER` Table modified. `DROP` Table deleted.
6	`id`	Full identifier of the table that was created, altered, or dropped. In the case of a table rename, this identifier is a concatenation of `<old>,<new>` table names.
7	`table`	Represents table metadata after the applied change.
8	`primaryKeyColumnNames`	List of columns that compose the table’s primary key.
9	`columns`	Metadata for each column in the changed table.
10	`attributes`	Custom attribute metadata for each table change.

In messages that the connector sends to the schema change topic, the message key is the name of the database that contains the schema change. In the following example, the payload field contains the databaseName key:

{
  "schema": {
    "type": "struct",
    "fields": [
      {
        "type": "string",
        "optional": false,
        "field": "databaseName"
      }
    ],
    "optional": false,
    "name": "io.debezium.connector.oracle.SchemaChangeKey",
    "version": 1
  },
  "payload": {
    "databaseName": "ORCLPDB1"
  }
}

Transaction Metadata

Debezium can generate events that represent transaction metadata boundaries and that enrich data change event messages.

Limits on when Debezium receives transaction metadata

Debezium registers and receives metadata only for transactions that occur after you deploy the connector. Metadata for transactions that occur before you deploy the connector is not available.

Database transactions are represented by a statement block that is enclosed between the BEGIN and END keywords. Debezium generates transaction boundary events for the BEGIN and END delimiters in every transaction. Transaction boundary events contain the following fields:

status: BEGIN or END.
id: String representation of the unique transaction identifier.
ts_ms: The time of a transaction boundary event (BEGIN or END event) at the data source. If the data source does not provide Debezium with the event time, then the field instead represents the time at which Debezium processes the event.
event_count (for END events): Total number of events emmitted by the transaction.
data_collections (for END events): An array of pairs of data_collection and event_count elements that indicates the number of events that the connector emits for changes that originate from a data collection.

The following example shows a typical transaction boundary message:

Example: Oracle connector transaction boundary event

{
  "status": "BEGIN",
  "id": "5.6.641",
  "ts_ms": 1486500577125,
  "event_count": null,
  "data_collections": null
}

{
  "status": "END",
  "id": "5.6.641",
  "ts_ms": 1486500577691,
  "event_count": 2,
  "data_collections": [
    {
      "data_collection": "ORCLPDB1.DEBEZIUM.CUSTOMER",
      "event_count": 1
    },
    {
      "data_collection": "ORCLPDB1.DEBEZIUM.ORDER",
      "event_count": 1
    }
  ]
}

Unless overridden via the topic.transaction option, the connector emits transaction events to the <topic.prefix>.transaction topic.

Change data event enrichment

When transaction metadata is enabled, the data message Envelope is enriched with a new transaction field. This field provides information about every event in the form of a composite of fields:

id: String representation of unique transaction identifier.
total_order: The absolute position of the event among all events generated by the transaction.
data_collection_order: The per-data collection position of the event among all events that were emitted by the transaction.

The following example shows a typical transaction event message:

{
  "before": null,
  "after": {
    "pk": "2",
    "aa": "1"
  },
  "source": {
...
  },
  "op": "c",
  "ts_ms": "1580390884335",
  "ts_us": "1580390884335741",
  "ts_ns": "1580390884335741963",
  "transaction": {
    "id": "5.6.641",
    "total_order": "1",
    "data_collection_order": "1"
  }
}

LogMiner Mining Strategies

Entries in the Oracle redo logs do not store the original SQL statements that users submit to make DML changes. Instead, a redo entry holds a set of change vectors and a set of object identifiers that represent the tablespace, table, and columns related to these vectors. In other words, redo log entries don’t include the names of the schemas, tables, or columns affected by DML changes.

The Debezium Oracle connector uses the log.mining.strategy configuration property to control how Oracle LogMiner handles the lookup of the object identifiers in the change vectors. In certain situations, one log mining strategy might prove more reliable than another with regard to schema changes. However, before you choose a log mining strategy, it’s important to consider the implications it might have on performance and overhead.

Writing the data dictionary to redo logs

The redo_log_catalog mining strategy instructs the database to flush a copy of the data dictionary to the redo logs immediately after each redo log switch. This is the most reliable strategy for tracking schema changes that are interwoven with data changes, because Oracle LogMiner has a way to interpolate between the starting and ending data dictionary states across a series of change vectors.

However, the redo_log_catalog mode is also the most expensive, because it requires several key steps to function. First, this mode requires the data dictionary to be flushed to the redo logs after every log switch. Flushing the logs after each switch can quickly consume valuable space in the archive log, and the high volume of archive logs might exceed the number that database administrators prepared for. If you intend to use this mode, coordinate with your database administrators to ensure that the database is configured appropriately.

If you configure the connector to use the redo_log_catalog mode, do not use multiple Debezium Oracle connectors to capture changes from the same logical database.

Using the online catalog directly

The default strategy mode, online_catalog, works differently from the redo_log_catalog mode. When the strategy is set to online_catalog, the database never flushes the data dictionary to the redo logs. Instead, Oracle LogMiner always uses the most current data dictionary state to perform comparisons. By always using the current dictionary, and eliminating flushing to the redo logs, this strategy requires less overhead, and operates more efficiently. However, these benefits are offset by the inability to parse interwoven schema changes and data changes. As a result, this strategy can sometimes result in event failures.

If LogMiner was unable to reconstruct the SQL reliability after a schema change, check the redo logs for evidence. Look for references to tables with names like OBJ# 123456 (where the number is the table’s object identifier), or for columns with names like COL1 or COL2. When you configure the connector to use the online_catalog strategy, take steps to ensure that the table schema and its indices remain static and free from change. If the Debezium connector is configured to use the online_catalog mode, and you must apply a schema change, perform the following steps:

Wait for the connector to capture all existing data changes (DML).
Perform the schema (DDL) change, and then wait for the connector to capture the change.
Resume data changes (DML) on the table.

Following this procedure helps to ensure that Oracle LogMiner can safely reconstruct the SQL for all data changes.

Hybrid approach

You can enable this strategy by setting the value of the log.mining.strategy configuration property to hybrid. The goal of this strategy is to provide the reliability of the redo_log_catalog strategy with the performance and low overhead of the online_catalog strategy, without incurring the disadvantages of either strategy.

The hybrid strategy works by primarily operating in the online_catalog mode, meaning that the Debezium Oracle connector first delegates event reconstruction to Oracle LogMiner. If Oracle LogMiner successfully reconstructs the SQL, Debezium processes the event normally, as if it were configured to use the online_catalog strategy. If the connector detects that Oracle LogMiner could not reconstruct the SQL, the connector attempts to reconstruct the SQL directly by using the schema history for that table object. The connector reports a failure only if both Oracle LogMiner and the connector are unable to reconstruct the SQL.

You cannot use the hybrid mining strategy if the lob.enabled property is set to true. If you require streaming CLOB, BLOB, or XML data, only the online_catalog or redo_log_catalog strategies can be used.

Query Modes

The Debezium Oracle connector integrates with Oracle LogMiner by default. This integration requires a specialized set of steps which includes generating a complex JDBC SQL query to ingest the changes recorded in the transaction logs as change events. The V$LOGMNR_CONTENTS view used by the JDBC SQL query does not have any indices to improve the query’s performance, and so there are different query modes that can be used that control how the SQL query is generated as a way to improve the query’s execution.

The log.mining.query.filter.mode connector property can be configured with one of the following to influence how the JDBC SQL query is generated:

none: (Default) This mode creates a JDBC query that only filters based on the different operation types, such as inserts, updates, or deletes, at the database level. When filtering the data based on the schema, table, or username include/exclude lists, this is done during the processing loop within the connector.

This mode is often useful when capturing a small number of tables from a database that is not heavily saturated with changes. The generated query is quite simple, and focuses primarily on reading as quickly as possible with low database overhead.
in: This mode creates a JDBC query that filters not only operation types at the database level, but also schema, table, and username include/exclude lists. The query’s predicates are generated using a SQL in-clause based on the values specified in the include/exclude list configuration properties.

This mode is often useful when capturing a large number of tables from a database that is heavily saturated with changes. The generated query is much more complex than the none mode, and focuses on reducing network overhead and performing as much filtering at the database level as possible.

Finally, do not specify regular expressions as part of schema and table include/exclude configuration properties. Using regular expressions will cause the connector to not match changes based on these configuration properties, causing changes to be missed.
regex: This mode creates a JDBC query that filters not only operation types at the database level, but also schema, table, and username include/exclude lists. However, unlike the in mode, this mode generates a SQL query using the Oracle REGEXP_LIKE operator using a conjunction or disjunction depending on whether include or excluded values are specified.

This mode is often useful when capturing a variable number of tables that can be identified using a small number of regular expressions. The generated query is much more complex than any other mode, and focuses on reducing network overhead and performing as much filtering at the database level as possible.

Event buffering

Oracle writes all changes to the redo logs in the order in which they occur, including changes that are later discarded by a rollback. As a result, concurrent changes from separate transactions are intertwined. When the connector first reads the stream of changes, because it cannot immediately determine which changes are committed or rolled back, it temporarily stores the change events in an internal buffer. After a change is committed, the connector writes the change event from the buffer to Kafka. The connector drops change events that are discarded by a rollback.

You can configure the buffering mechanism that the connector uses by setting the property log.mining.buffer.type.

Heap

The default buffer type is configured using memory. Under the default memory setting, the connector uses the heap memory of the JVM process to allocate and manage buffered event records. If you use the memory buffer setting, be sure that the amount of memory that you allocate to the Java process can accommodate long-running and large transactions in your environment.

Infinispan

The Debezium Oracle connector can also be configured to use Infinispan as its cache provider, supporting cache stores both locally with embedded mode or remotely on a server cluster. In order to use Infinispan, the log.mining.buffer.type must be configured using either infinispan_embedded or infinispan_remote.

In order to allow flexibility with Infinispan cache configurations, the connector expects a series of cache configuration properties to be supplied when using Infinispan to buffer event data. See the configuration properties in the log.mining.buffer.infinispan.cache namespace. The contents of these configuration properties depend on whether the connector is to integrate with a remote Infinispan cluster or to use the embedded engine.

For example, the following illustrates what an embedded configuration would look like for the transaction cache property when using Infinispan in embedded mode:

<local-cache name="transactions">
  <persistence passivation="false">
    <file-store read-only="false" preload="true" shared="false">
      <data path="./data"/>
      <index path="./index"/>
    </file-store>
  </persistence>
</local-cache>

Looking at the configuration in-depth, the cache is configured to be persistent. All caches should be configured this way to avoid loss of transaction events across connector restarts if a transaction is in-progress. Additionally, the location where the cache is kept is defined by the path attribute and this should be a shared location accessible all possible runtime environments.

The Infinispan buffer implementation utilizes multiple cache configurations with different names. There should be a cache defined for transactions, events, processed-transactions, and schema-changes. Each configuration can be tuned to your performance needs or be identical other than the cache name.

When supplying XML configuration as a JSON connector property value, line breaks must be omitted or replaced with a \n character.

Another example, the following illustrates the same cache configured with an Infinispan cluster:

<distributed-cache name="transactions" statistics="true">
  <encoding media-type="application/x-protostream" />
  <persistence passivation="false">
   <file-store read-only="false" preload="true" shared="false">
     <data path="./data"/>
     <index path="./index"/>
   </file-store>
  </persistence>
</distributed-cache>

Just like the embedded local-cache configuration from the previous example, this configuration is also defined to be persistent. All caches should be configured this way to avoid loss of transaction events across connector restarts if a transaction is in-progress.

However, there are a few differences with noting. First, the cache is defined as a distributed cache rather than a local-cache. Secondly, the cache is defined to use the application/x-protostream encoding, which is required for all Debezium caches. And lastly, no path attribute is necessary on the file store definition since the Infinispan cluster will handle this automatically.

The Infinispan buffer type is considered incubating; the cache formats may change between versions and may require a re-snapshot. The migration notes will indicate whether this is needed.

Additionally, when removing a Debezium Oracle connector that uses the Infinispan buffer, the persisted cache files are not removed from disk automatically. If the same buffer location will be used by a new connector deployment, the files should be removed manually before deploying the new connector.

Infinispan Hotrod client integration

The Debezium Oracle connector utilizes the Hotrod client to communicate with the Infinispan cluster. Any connector property that is prefixed with log.mining.buffer.infinispan.client. will be passed directly to the Hotrod client using the infinispan.client. namespace, allowing for complete customization of how the client is to interact with the cluster.

There is at least one required configuration property that must be supplied when using this Infinspan mode:

log.mining.buffer.infinispan.client.hotrod.server_list: Specifies the list of Infinispan server hostname and port combinations, using <hostname>:<port> format.

SCN gap detection

When the Debezium Oracle connector is configured to use LogMiner, it collects change events from Oracle by using a start and end range that is based on system change numbers (SCNs). The connector manages this range automatically, increasing or decreasing the range depending on whether the connector is able to stream changes in near real-time, or must process a backlog of changes due to the volume of large or bulk transactions in the database.

Under certain circumstances, the Oracle database advances the SCN by an unusually high amount, rather than increasing the SCN value at a constant rate. Such a jump in the SCN value can occur because of the way that a particular integration interacts with the database, or as a result of events such as hot backups.

The Debezium Oracle connector relies on the following configuration properties to detect the SCN gap and adjust the mining range.

log.mining.scn.gap.detection.gap.size.min: Specifies the minimum gap size.
log.mining.scn.gap.detection.time.interval.max.ms: Specifies the maximum time interval.

The connector first compares the difference in the number of changes between the current SCN and the highest SCN in the current mining range. If the difference between the current SCN value and the highest SCN value is greater than the minimum gap size, then the connector has potentially detected a SCN gap. To confirm whether a gap exists, the connector next compares the timestamps of the current SCN and the SCN at the end of the previous mining range. If the difference between the timestamps is less than the maximum time interval, then the existence of an SCN gap is confirmed.

When an SCN gap occurs, the Debezium connector automatically uses the current SCN as the end point for the range of the current mining session. This allows the connector to quickly catch up to the real-time events without mining smaller ranges in between that return no changes because the SCN value was increased by an unexpectedly large number. When the connector performs the preceding steps in response to an SCN gap, it ignores the value that is specified by the log.mining.batch.size.max property. After the connector finishes the mining session and catches back up to real-time events, it resumes enforcement of the maximum log mining batch size.

SCN gap detection is available only if the large SCN increment occurs while the connector is running and processing near real-time events.

Low change frequency offset management

The Debezium Oracle connector tracks system change numbers in the connector offsets so that when the connector is restarted, it can begin where it left off. These offsets are part of each emitted change event; however, when the frequency of database changes are low (every few hours or days), the offsets can become stale and prevent the connector from successfully restarting if the system change number is no longer available in the transaction logs.

For connectors that use non-CDB mode to connect to Oracle, you can enable heartbeat.interval.ms to force the connector to emit a heartbeat event at regular intervals so that offsets remain synchronized.

For connectors that use CDB mode to connect to Oracle, maintaining synchronization is more complicated. Not only must you set heartbeat.interval.ms, but it’s also necessary to set heartbeat.action.query. Specifying both properties is required, because in CDB mode, the connector specifically tracks changes inside the PDB only. A supplementary mechanism is needed to trigger change events from within the pluggable database. At regular intervals, the heartbeat action query causes the connector to insert a new table row, or update an existing row in the pluggable database. Debezium detects the table changes and emits change events for them, ensuring that offsets remain synchronized, even in pluggable databases that process changes infrequently.

For the connector to use the heartbeat.action.query with tables that are not owned by the connector user account, you must grant the connector user permission to run the necessary INSERT or UPDATE queries on those tables.

Data change events

Every data change event that the Oracle connector emits has a key and a value. The structures of the key and value depend on the table from which the change events originate. For information about how Debezium constructs topic names, see Topic names.

The Debezium Oracle connector ensures that all Kafka Connect schema names are valid Avro schema names. This means that the logical server name must start with alphabetic characters or an underscore ([a-z,A-Z,_]), and the remaining characters in the logical server name and all characters in the schema and table names must be alphanumeric characters or an underscore ([a-z,A-Z,0-9,\_]). The connector automatically replaces invalid characters with an underscore character.

Unexpected naming conflicts can result when the only distinguishing characters between multiple logical server names, schema names, or table names are not valid characters, and those characters are replaced with underscores.

Debezium and Kafka Connect are designed around continuous streams of event messages. However, the structure of these events might change over time, which can be difficult for topic consumers to handle. To facilitate the processing of mutable event structures, each event in Kafka Connect is self-contained. Every message key and value has two parts: a schema and payload. The schema describes the structure of the payload, while the payload contains the actual data.

Changes that are performed by the SYS or SYSTEM user accounts are not captured by the connector.

Change event keys

For each changed table, the change event key is structured such that a field exists for each column in the primary key (or unique key constraint) of the table at the time when the event is created.

For example, a customers table that is defined in the inventory database schema, might have the following change event key:

CREATE TABLE customers (
  id NUMBER(9) GENERATED BY DEFAULT ON NULL AS IDENTITY (START WITH 1001) NOT NULL PRIMARY KEY,
  first_name VARCHAR2(255) NOT NULL,
  last_name VARCHAR2(255) NOT NULL,
  email VARCHAR2(255) NOT NULL UNIQUE
);

If the value of the <topic.prefix>.transaction configuration property is set to server1, the JSON representation for every change event that occurs in the customers table in the database features the following key structure:

{
    "schema": {
        "type": "struct",
        "fields": [
            {
                "type": "int32",
                "optional": false,
                "field": "ID"
            }
        ],
        "optional": false,
        "name": "server1.INVENTORY.CUSTOMERS.Key"
    },
    "payload": {
        "ID": 1004
    }
}

The schema portion of the key contains a Kafka Connect schema that describes the content of the key portion. In the preceding example, the payload value is not optional, the structure is defined by a schema named server1.DEBEZIUM.CUSTOMERS.Key, and there is one required field named id of type int32. The value of the key’s payload field indicates that it is indeed a structure (which in JSON is just an object) with a single id field, whose value is 1004.

Therefore, you can interpret this key as describing the row in the inventory.customers table (output from the connector named server1) whose id primary key column had a value of 1004.

Change event values

The structure of a value in a change event message mirrors the structure of the message key in the change event in the message, and contains both a schema section and a payload section.

Payload of a change event value

An envelope structure in the payload sections of a change event value contains the following fields:

op: A mandatory field that contains a string value describing the type of operation. The op field in the payload of an Oracle connector change event value contains one of the following values: c (create or insert), u (update), d (delete), or r (read, which indicates a snapshot).
before: An optional field that, if present, describes the state of the row before the event occurred. The structure is described by the server1.INVENTORY.CUSTOMERS.Value Kafka Connect schema, which the server1 connector uses for all rows in the inventory.customers table.

after

An optional field that, if present, contains the state of a row after a change occurs. The structure is described by the same server1.INVENTORY.CUSTOMERS.Value Kafka Connect schema that is used for the before field.

source

A mandatory field that contains a structure that describes the source metadata for the event. In the case of the Oracle connector, the structure includes the following fields:

The Debezium version.
The connector name.
Whether the event is part of an ongoing snapshot or not.
The transaction id (not included for snapshots).
The following values related to the SCN (system change number) that the database assigns when a change is committed:

`scn`	The unique identifier that the database uses to track the transaction.
`start_scn`	The SCN when the transaction started.
`start_ts_ms`	The time when the transaction started.
`commit_ts_ms`	The time when the transaction was committed. Timestamps that represent when the connector processed the event, based on the system clock of the JVM running the Kafka Connect task. For snapshots, the timestamp indicates when the snapshot occurred. The connector reports the timestamp value with differing precision in the following fields:

ts_ms

Provides the timestamp in milliseconds.

ts_us

Provides the timestamp in microseconds.

ts_ns

Provides the timestamp in nanoseconds.

Username who made the change

The ROWID associated with the row

The commit_scn field is optional and describes the SCN of the transaction commit that the change event participates within. This field is only present when using the LogMiner connection adapter.

The user_name field is only populated when using the LogMiner connection adapter.

ts_ms

An optional field that, if present, contains the time (based on the system clock in the JVM that runs the Kafka Connect task) at which the connector processed the event.

Schema of a change event value

The schema portion of the event message’s value contains a schema that describes the envelope structure of the payload and the nested fields within it.

create events

The following example shows the value of a create event value from the customers table that is described in the change event keys example:

{
    "schema": {
        "type": "struct",
        "fields": [
            {
                "type": "struct",
                "fields": [
                    {
                        "type": "int32",
                        "optional": false,
                        "field": "ID"
                    },
                    {
                        "type": "string",
                        "optional": false,
                        "field": "FIRST_NAME"
                    },
                    {
                        "type": "string",
                        "optional": false,
                        "field": "LAST_NAME"
                    },
                    {
                        "type": "string",
                        "optional": false,
                        "field": "EMAIL"
                    }
                ],
                "optional": true,
                "name": "server1.DEBEZIUM.CUSTOMERS.Value",
                "field": "before"
            },
            {
                "type": "struct",
                "fields": [
                    {
                        "type": "int32",
                        "optional": false,
                        "field": "ID"
                    },
                    {
                        "type": "string",
                        "optional": false,
                        "field": "FIRST_NAME"
                    },
                    {
                        "type": "string",
                        "optional": false,
                        "field": "LAST_NAME"
                    },
                    {
                        "type": "string",
                        "optional": false,
                        "field": "EMAIL"
                    }
                ],
                "optional": true,
                "name": "server1.DEBEZIUM.CUSTOMERS.Value",
                "field": "after"
            },
            {
                "type": "struct",
                "fields": [
                    {
                        "type": "string",
                        "optional": true,
                        "field": "version"
                    },
                    {
                        "type": "string",
                        "optional": false,
                        "field": "name"
                    },
                    {
                        "type": "int64",
                        "optional": true,
                        "field": "ts_ms"
                    },
                    {
                        "type": "int64",
                        "optional": true,
                        "field": "ts_us"
                    },
                    {
                        "type": "int64",
                        "optional": true,
                        "field": "ts_ns"
                    },
                    {
                        "type": "string",
                        "optional": true,
                        "field": "txId"
                    },
                    {
                        "type": "string",
                        "optional": true,
                        "field": "scn"
                    },
                    {
                        "type": "string",
                        "optional": true,
                        "field": "commit_scn"
                    },
                    {
                        "type": "string",
                        "optional": true,
                        "field": "rs_id"
                    },
                    {
                        "type": "int64",
                        "optional": true,
                        "field": "ssn"
                    },
                    {
                        "type": "int32",
                        "optional": true,
                        "field": "redo_thread"
                    },
                    {
                        "type": "string",
                        "optional": true,
                        "field": "user_name"
                    },
                    {
                        "type": "boolean",
                        "optional": true,
                        "field": "snapshot"
                    },
                    {
                        "type": "string",
                        "optional": true,
                        "field": "row_id"
                    }
                ],
                "optional": false,
                "name": "io.debezium.connector.oracle.Source",
                "field": "source"
            },
            {
                "type": "string",
                "optional": false,
                "field": "op"
            },
            {
                "type": "int64",
                "optional": true,
                "field": "ts_ms"
            },
            {
                "type": "int64",
                "optional": true,
                "field": "ts_us"
            },
            {
                "type": "int64",
                "optional": true,
                "field": "ts_ns"
            }
        ],
        "optional": false,
        "name": "server1.DEBEZIUM.CUSTOMERS.Envelope"
    },
    "payload": {
        "before": null,
        "after": {
            "ID": 1004,
            "FIRST_NAME": "Anne",
            "LAST_NAME": "Kretchmar",
            "EMAIL": "[email protected]"
        },
        "source": {
            "version": "3.0.8.Final",
            "name": "server1",
            "ts_ms": 1520085154000,
            "ts_us": 1520085154000000,
            "ts_ns": 1520085154000000000,
            "txId": "6.28.807",
            "scn": "2122185",
            "commit_scn": "2122185",
            "rs_id": "001234.00012345.0124",
            "ssn": 1,
            "redo_thread": 1,
            "user_name": "user",
            "snapshot": false,
            "row_id": "AAASgjAAMAAAACnAAA"
        },
        "op": "c",
        "ts_ms": 1532592105975,
        "ts_us": 1532592105975741,
        "ts_ns": 1532592105975741582
    }
}

In the preceding example, notice how the event defines the following schema:

The envelope (server1.DEBEZIUM.CUSTOMERS.Envelope).
The source structure (io.debezium.connector.oracle.Source, which is specific to the Oracle connector and reused across all events).
The table-specific schemas for the before and after fields.

The names of the schemas for the before and after fields are of the form <logicalName>.<schemaName>.<tableName>.Value, and thus are entirely independent from the schemas for all other tables. As a result, when you use the Avro converter, the Avro schemas for tables in each logical source have their own evolution and history.

The payload portion of this event’s value, provides information about the event. It describes that a row was created (op=c), and shows that the after field value contains the values that were inserted into the ID, FIRST_NAME, LAST_NAME, and EMAIL columns of the row.

By default, the JSON representations of events are much larger than the rows that they describe. The larger size is due to the JSON representation including both the schema and payload portions of a message. You can use the Avro Converter to decrease the size of messages that the connector writes to Kafka topics.

update events

The following example shows an update change event that the connector captures from the same table as the preceding create event.

{
    "schema": { ... },
    "payload": {
        "before": {
            "ID": 1004,
            "FIRST_NAME": "Anne",
            "LAST_NAME": "Kretchmar",
            "EMAIL": "[email protected]"
        },
        "after": {
            "ID": 1004,
            "FIRST_NAME": "Anne",
            "LAST_NAME": "Kretchmar",
            "EMAIL": "[email protected]"
        },
        "source": {
            "version": "3.0.8.Final",
            "name": "server1",
            "ts_ms": 1520085811000,
            "ts_us": 1520085811000000,
            "ts_ns": 1520085811000000000,
            "txId": "6.9.809",
            "scn": "2125544",
            "commit_scn": "2125544",
            "rs_id": "001234.00012345.0124",
            "ssn": 1,
            "redo_thread": 1,
            "user_name": "user",
            "snapshot": false,
            "row_id": "AAASgjAAMAAAACnAAA"
        },
        "op": "u",
        "ts_ms": 1532592713485,
        "ts_us": 1532592713485152,
        "ts_ns": 1532592713485152954,
    }
}

The payload has the same structure as the payload of a create (insert) event, but the following values are different:

The value of the op field is u, signifying that this row changed because of an update.
The before field shows the former state of the row with the values that were present before the update database commit.
The after field shows the updated state of the row, with the EMAIL value now set to [email protected].
The structure of the source field includes the same fields as before, but the values are different, because the connector captured the event from a different position in the redo log.
The ts_ms field shows the timestamp that indicates when Debezium processed the event.

The payload section reveals several other useful pieces of information. For example, by comparing the before and after structures, we can determine how a row changed as the result of a commit. The source structure provides information about Oracle’s record of this change, providing traceability. It also gives us insight into when this event occurred in relation to other events in this topic and in other topics. Did it occur before, after, or as part of the same commit as another event?

When the columns for a row’s primary/unique key are updated, the value of the row’s key changes. As a result, Debezium emits three events after such an update:

A DELETE event.
A tombstone event with the old key for the row.
An INSERT event that provides the new key for the row.

delete events

The following example shows a delete event for the table that is shown in the preceding create and update event examples. The schema portion of the delete event is identical to the schema portion for those events.

{
    "schema": { ... },
    "payload": {
        "before": {
            "ID": 1004,
            "FIRST_NAME": "Anne",
            "LAST_NAME": "Kretchmar",
            "EMAIL": "[email protected]"
        },
        "after": null,
        "source": {
            "version": "3.0.8.Final",
            "name": "server1",
            "ts_ms": 1520085153000,
            "ts_us": 1520085153000000,
            "ts_ns": 1520085153000000000,
            "txId": "6.28.807",
            "scn": "2122184",
            "commit_scn": "2122184",
            "rs_id": "001234.00012345.0124",
            "ssn": 1,
            "redo_thread": 1,
            "user_name": "user",
            "snapshot": false,
            "row_id": "AAASgjAAMAAAACnAAA"
        },
        "op": "d",
        "ts_ms": 1532592105960,
        "ts_us": 1532592105960854,
        "ts_ns": 1532592105960854693
    }
}

The payload portion of the event reveals several differences when compared to the payload of a create or update event:

The value of the op field is d, signifying that the row was deleted.
The before field shows the former state of the row that was deleted with the database commit.
The value of the after field is null, signifying that the row no longer exists.
The structure of the source field includes many of the keys that exist in create or update events, but the values in the ts_ms, scn, and txId fields are different.
The ts_ms shows a timestamp that indicates when Debezium processed this event.

The delete event provides consumers with the information that they require to process the removal of this row.

The Oracle connector’s events are designed to work with Kafka log compaction, which allows for the removal of some older messages as long as at least the most recent message for every key is kept. This allows Kafka to reclaim storage space while ensuring the topic contains a complete dataset and can be used for reloading key-based state.

When a row is deleted, the delete event value shown in the preceding example still works with log compaction, because Kafka is able to remove all earlier messages that use the same key. The message value must be set to null to instruct Kafka to remove all messages that share the same key. To make this possible, by default, Debezium’s Oracle connector always follows a delete event with a special tombstone event that has the same key but null value. You can change the default behavior by setting the connector property tombstones.on.delete.

truncate events

A truncate change event signals that a table has been truncated. The message key is null in this case, the message value looks like this:

{
    "schema": { ... },
    "payload": {
        "before": null,
        "after": null,
        "source": { (1)
            "version": "3.0.8.Final",
            "connector": "oracle",
            "name": "oracle_server",
            "ts_ms": 1638974535000,
            "ts_us": 1638974535000000,
            "ts_ns": 1638974535000000000,
            "snapshot": "false",
            "db": "ORCLPDB1",
            "sequence": null,
            "schema": "DEBEZIUM",
            "table": "TEST_TABLE",
            "txId": "02000a0037030000",
            "scn": "13234397",
            "commit_scn": "13271102",
            "lcr_position": null,
            "rs_id": "001234.00012345.0124",
            "ssn": 1,
            "redo_thread": 1,
            "user_name": "user"
        },
        "op": "t", (2)
        "ts_ms": 1638974558961, (3)
        "ts_us": 1638974558961987, (3)
        "ts_ns": 1638974558961987251, (3)
        "transaction": null
    }
}

Table 9. Descriptions of *truncate* event value fields
Item	Field name	Description
1	`source`	Mandatory field that describes the source metadata for the event. In a truncate event value, the `source` field structure is the same as for create, update, and delete events for the same table, provides this metadata: Debezium version Connector type and name Database and table that contains the new row Schema name If the event was part of a snapshot (always `false` for truncate events) ID of the transaction in which the operation was performed SCN of the operation Timestamp for when the change was made in the database Username who performed the change
2	`op`	Mandatory string that describes the type of operation. The `op` field value is `t`, signifying that this table was truncated.
3	`ts_ms`, `ts_us`, `ts_ns`	Optional field that displays the time at which the connector processed the event. The time is based on the system clock in the JVM running the Kafka Connect task. In the `source` object, `ts_ms` indicates the time that the change was made in the database. By comparing the value for `payload.source.ts_ms` with the value for `payload.ts_ms`, you can determine the lag between the source database update and Debezium.

If a single TRUNCATE operation affects multiple tables, the connector emits one truncate change event record for each truncated table.

A truncate event represents a change that is made to an entire table and it has no message key. As a result, for topics with multiple partitions, there is no ordering guarantee for the change events (create, update, and so forth), or truncate events that pertain to a table. For example, if a consumer reads events for a table from multiple partitions, it might receive an update event for a table from one partition after it receives a truncate event that deletes all of the data in the table from another partition. Ordering is guaranteed only for topics that use a single partition.

If you do not want the connector to capture truncate events, use the skipped.operations option to filter them out.

Data type mappings

When the Debezium Oracle connector detects a change in the value of a table row, it emits a change event that represents the change. Each change event record is structured in the same way as the original table, with the event record containing a field for each column value. The data type of a table column determines how the connector represents the column’s values in change event fields, as shown in the tables in the following sections.

For each column in a table, Debezium maps the source data type to a literal type and, and in some cases, a semantic type, in the corresponding event field.

Literal types: Describe how the value is literally represented, using one of the following Kafka Connect schema types: INT8, INT16, INT32, INT64, FLOAT32, FLOAT64, BOOLEAN, STRING, BYTES, ARRAY, MAP, and STRUCT.
Semantic types: Describe how the Kafka Connect schema captures the meaning of the field, by using the name of the Kafka Connect schema for the field.

If the default data type conversions do not meet your needs, you can create a custom converter for the connector.

For some Oracle large object (CLOB, NCLOB, and BLOB) and numeric data types, you can manipulate the way that the connector performs the type mapping by changing default configuration property settings. For more information about how Debezium properties control mappings for these data types, see Binary and Character LOB types and Numeric types.

Support for further data types is planned for subsequent releases. Please file a JIRA issue for any specific types that might be missing.

Character types

The following table describes how the connector maps basic character types.

Table 10. Mappings for Oracle basic character types
Oracle data type	Literal type (schema type)	Semantic type (schema name) and Notes
`CHAR[(M)]`	`STRING`	n/a
`NCHAR[(M)]`	`STRING`	n/a
`NVARCHAR2[(M)]`	`STRING`	n/a
`VARCHAR[(M)]`	`STRING`	n/a
`VARCHAR2[(M)]`	`STRING`	n/a

Binary and Character LOB types

The following table describes how the connector maps binary and character large object (LOB) data types.

Table 11. Mappings for Oracle binary and character LOB types
Oracle data type	Literal type (schema type)	Semantic type (schema name) and Notes
`BFILE`	n/a	This data type is not supported
`BLOB`	`BYTES`	Depending on the setting of the `binary.handling.mode` property in the connector configuration, the connector maps LOB values of this type to one of the following semantic types: Raw bytes (the default) A base64-encoded string A base64-url-safe-encoded string A hex-encoded string
`CLOB`	`STRING`	n/a
`LONG`	n/a	This data type is not supported.
`LONG RAW`	n/a	This data type is not supported.
`NCLOB`	`STRING`	n/a
`RAW`	n/a	Depending on the setting of the `binary.handling.mode` property in the connector configuration, the connector maps LOB values of this type to one of the following semantic types: Raw bytes (the default) A base64-encoded string A base64-url-safe-encoded string A hex-encoded string

Oracle only supplies column values for CLOB, NCLOB, and BLOB data types if they’re explicitly set or changed in a SQL statement. As a result, change events never contain the value of an unchanged CLOB, NCLOB, or BLOB column. Instead, they contain placeholders as defined by the connector property, unavailable.value.placeholder.

If the value of a CLOB, NCLOB, or BLOB column is updated, the new value is placed in the after element of the corresponding update change event. The before element contains the unavailable value placeholder.

Numeric types

The following table describes how the Debezium Oracle connector maps numeric types.

You can modify the way that the connector maps the Oracle DECIMAL, NUMBER, NUMERIC, and REAL data types by changing the value of the connector’s decimal.handling.mode configuration property. When the property is set to its default value of precise, the connector maps these Oracle data types to the Kafka Connect org.apache.kafka.connect.data.Decimal logical type, as indicated in the table. When the value of the property is set to double or string, the connector uses alternate mappings for some Oracle data types. For more information, see the Semantic type and Notes column in the following table.

Table 12. Mappings for Oracle numeric data types
Oracle data type	Literal type (schema type)	Semantic type (schema name) and Notes
`BINARY_FLOAT`	`FLOAT32`	n/a
`BINARY_DOUBLE`	`FLOAT64`	n/a
`DECIMAL[(P, S)]`	`BYTES` / `INT8` / `INT16` / `INT32` / `INT64`	`org.apache.kafka.connect.data.Decimal` if using `BYTES` Handled equivalently to `NUMBER` (note that S defaults to 0 for `DECIMAL`). When the `decimal.handling.mode` property is set to `double`, the connector represents `DECIMAL` values as Java `double` values with schema type `FLOAT64`. When the `decimal.handling.mode` property is set to `string`, the connector represents DECIMAL values as their formatted string representation with schema type `STRING`.
`DOUBLE PRECISION`	`STRUCT`	`io.debezium.data.VariableScaleDecimal` Contains a structure with two fields: `scale` of type `INT32` that contains the scale of the transferred value and `value` of type `BYTES` containing the original value in an unscaled form.
`FLOAT[(P)]`	`STRUCT`	`io.debezium.data.VariableScaleDecimal` Contains a structure with two fields: `scale` of type `INT32` that contains the scale of the transferred value and `value` of type `BYTES` containing the original value in an unscaled form.
`INTEGER`, `INT`	`BYTES`	`org.apache.kafka.connect.data.Decimal` `INTEGER` is mapped in Oracle to NUMBER(38,0) and hence can hold values larger than any of the `INT` types could store
`NUMBER[(P[, *])]`	`STRUCT`	`io.debezium.data.VariableScaleDecimal` Contains a structure with two fields: `scale` of type `INT32` that contains the scale of the transferred value and `value` of type `BYTES` containing the original value in an unscaled form. When the `decimal.handling.mode` property is set to `double`, the connector represents `NUMBER` values as Java `double` values with schema type `FLOAT64`. When the `decimal.handling.mode` property is set to `string`, the connector represents `NUMBER` values as their formatted string representation with schema type `STRING`.
`NUMBER(P, S <= 0)`	`INT8` / `INT16` / `INT32` / `INT64`	`NUMBER` columns with a scale of 0 represent integer numbers. A negative scale indicates rounding in Oracle, for example, a scale of -2 causes rounding to hundreds. Depending on the precision and scale, one of the following matching Kafka Connect integer type is chosen: P - S < 3, `INT8` P - S < 5, `INT16` P - S < 10, `INT32` P - S < 19, `INT64` P - S >= 19, `BYTES` (`org.apache.kafka.connect.data.Decimal`) When the `decimal.handling.mode` property is set to `double`, the connector represents `NUMBER` values as Java `double` values with schema type `FLOAT64`. When the `decimal.handling.mode` property is set to `string`, the connector represents `NUMBER` values as their formatted string representation with schema type `STRING`.
`NUMBER(P, S > 0)`	`BYTES`	`org.apache.kafka.connect.data.Decimal`
`NUMERIC[(P, S)]`	`BYTES` / `INT8` / `INT16` / `INT32` / `INT64`	`org.apache.kafka.connect.data.Decimal` if using `BYTES` Handled equivalently to `NUMBER` (note that S defaults to 0 for `NUMERIC`). When the `decimal.handling.mode` property is set to `double`, the connector represents `NUMERIC` values as Java `double` values with schema type `FLOAT64`. When the `decimal.handling.mode` property is set to `string`, the connector represents `NUMERIC` values as their formatted string representation with schema type `STRING`.
`SMALLINT`	`BYTES`	`org.apache.kafka.connect.data.Decimal` `SMALLINT` is mapped in Oracle to NUMBER(38,0) and hence can hold values larger than any of the `INT` types could store
`REAL`	`STRUCT`	`io.debezium.data.VariableScaleDecimal` Contains a structure with two fields: `scale` of type `INT32` that contains the scale of the transferred value and `value` of type `BYTES` containing the original value in an unscaled form. When the `decimal.handling.mode` property is set to `double`, the connector represents `REAL` values as Java `double` values with schema type `FLOAT64`. When the `decimal.handling.mode` property is set to `string`, the connector represents `REAL` values as their formatted string representation with schema type `STRING`.

As mention above, Oracle allows negative scales in NUMBER type. This can cause an issue during conversion to the Avro format when the number is represented as the Decimal. Decimal type includes scale information, but Avro specification allows only positive values for the scale. Depending on the schema registry used, it may result into Avro serialization failure. To avoid this issue, you can use NumberToZeroScaleConverter, which converts sufficiently high numbers (P - S >= 19) with negative scale into Decimal type with zero scale. It can be configured as follows:

converters=zero_scale
zero_scale.type=io.debezium.connector.oracle.converters.NumberToZeroScaleConverter
zero_scale.decimal.mode=precise

By default, the number is converted to Decimal type (zero_scale.decimal.mode=precise), but for completeness remaining two supported types (double and string) are supported as well.

Boolean types

Oracle does not provide native support for a BOOLEAN data type. However, it is common practice to use other data types with certain semantics to simulate the concept of a logical BOOLEAN data type.

To enable you to convert source columns to Boolean data types, Debezium provides a NumberOneToBooleanConverter custom converter that you can use in one of the following ways:

Map all NUMBER(1) columns to a BOOLEAN type.
Enumerate a subset of columns by using a comma-separated list of regular expressions.
To use this type of conversion, you must set the converters configuration property with the selector parameter, as shown in the following example:
```
converters=boolean
boolean.type=io.debezium.connector.oracle.converters.NumberOneToBooleanConverter
boolean.selector=.*MYTABLE.FLAG,.*.IS_ARCHIVED
```

Temporal types

Other than the Oracle INTERVAL, TIMESTAMP WITH TIME ZONE, and TIMESTAMP WITH LOCAL TIME ZONE data types, the way that the connector converts temporal types depends on the value of the time.precision.mode configuration property.

When the time.precision.mode configuration property is set to adaptive (the default), then the connector determines the literal and semantic type for the temporal types based on the column’s data type definition so that events exactly represent the values in the database:

Oracle data type Literal type (schema type) Semantic type (schema name) and Notes

Oracle data type	Literal type (schema type)	Semantic type (schema name) and Notes
`DATE`	`INT64`	`io.debezium.time.Timestamp` Represents the number of milliseconds since the UNIX epoch, and does not include timezone information.
`INTERVAL DAY[(M)] TO SECOND`	`FLOAT64`	`io.debezium.time.MicroDuration` The number of micro seconds for a time interval using the `365.25 / 12.0` formula for days per month average. `io.debezium.time.Interval` (when `interval.handling.mode` is set to `string`) The string representation of the interval value that follows the pattern `P<years>Y<months>M<days>DT<hours>H<minutes>M<seconds>S`, for example, `P1Y2M3DT4H5M6.78S`.
`INTERVAL YEAR[(M)] TO MONTH`	`FLOAT64`	`io.debezium.time.MicroDuration` The number of micro seconds for a time interval using the `365.25 / 12.0` formula for days per month average. `io.debezium.time.Interval` (when `interval.handling.mode` is set to `string`) The string representation of the interval value that follows the pattern `P<years>Y<months>M<days>DT<hours>H<minutes>M<seconds>S`, for example, `P1Y2M3DT4H5M6.78S`.
`TIMESTAMP(0 - 3)`	`INT64`	`io.debezium.time.Timestamp` Represents the number of milliseconds since the UNIX epoch, and does not include timezone information.
`TIMESTAMP, TIMESTAMP(4 - 6)`	`INT64`	`io.debezium.time.MicroTimestamp` Represents the number of microseconds since the UNIX epoch, and does not include timezone information.
`TIMESTAMP(7 - 9)`	`INT64`	`io.debezium.time.NanoTimestamp` Represents the number of nanoseconds since the UNIX epoch, and does not include timezone information.
`TIMESTAMP WITH TIME ZONE`	`STRING`	`io.debezium.time.ZonedTimestamp` A string representation of a timestamp with timezone information.
`TIMESTAMP WITH LOCAL TIME ZONE`	`STRING`	`io.debezium.time.ZonedTimestamp` A string representation of a timestamp in UTC.

DATE

INT64

io.debezium.time.Timestamp

Represents the number of milliseconds since the UNIX epoch, and does not include timezone information.

INTERVAL DAY[(M)] TO SECOND

FLOAT64

io.debezium.time.MicroDuration

The number of micro seconds for a time interval using the 365.25 / 12.0 formula for days per month average.

io.debezium.time.Interval (when interval.handling.mode is set to string)

The string representation of the interval value that follows the pattern P<years>Y<months>M<days>DT<hours>H<minutes>M<seconds>S, for example, P1Y2M3DT4H5M6.78S.

INTERVAL YEAR[(M)] TO MONTH

FLOAT64

TIMESTAMP(0 - 3)

INT64

io.debezium.time.Timestamp

Represents the number of milliseconds since the UNIX epoch, and does not include timezone information.

TIMESTAMP, TIMESTAMP(4 - 6)

INT64

io.debezium.time.MicroTimestamp

Represents the number of microseconds since the UNIX epoch, and does not include timezone information.

TIMESTAMP(7 - 9)

INT64

io.debezium.time.NanoTimestamp

Represents the number of nanoseconds since the UNIX epoch, and does not include timezone information.

TIMESTAMP WITH TIME ZONE

STRING

io.debezium.time.ZonedTimestamp

A string representation of a timestamp with timezone information.

TIMESTAMP WITH LOCAL TIME ZONE

STRING

io.debezium.time.ZonedTimestamp

A string representation of a timestamp in UTC.

When the time.precision.mode configuration property is set to connect, then the connector uses the predefined Kafka Connect logical types. This can be useful when consumers only know about the built-in Kafka Connect logical types and are unable to handle variable-precision time values. Because the level of precision that Oracle supports exceeds the level that the logical types in Kafka Connect support, if you set time.precision.mode to connect, a loss of precision results when the fractional second precision value of a database column is greater than 3:

Oracle data type Literal type (schema type) Semantic type (schema name) and Notes

Oracle data type	Literal type (schema type)	Semantic type (schema name) and Notes
`DATE`	`INT32`	`org.apache.kafka.connect.data.Date` Represents the number of days since the UNIX epoch.
`INTERVAL DAY[(M)] TO SECOND`	`FLOAT64`	`io.debezium.time.MicroDuration` The number of micro seconds for a time interval using the `365.25 / 12.0` formula for days per month average. `io.debezium.time.Interval` (when `interval.handling.mode` is set to `string`) The string representation of the interval value that follows the pattern `P<years>Y<months>M<days>DT<hours>H<minutes>M<seconds>S`, for example, `P1Y2M3DT4H5M6.78S`.
`INTERVAL YEAR[(M)] TO MONTH`	`FLOAT64`	`io.debezium.time.MicroDuration` The number of micro seconds for a time interval using the `365.25 / 12.0` formula for days per month average. `io.debezium.time.Interval` (when `interval.handling.mode` is set to `string`) The string representation of the interval value that follows the pattern `P<years>Y<months>M<days>DT<hours>H<minutes>M<seconds>S`, for example, `P1Y2M3DT4H5M6.78S`.
`TIMESTAMP(0 - 3)`	`INT64`	`org.apache.kafka.connect.data.Timestamp` Represents the number of milliseconds since the UNIX epoch, and does not include timezone information.
`TIMESTAMP(4 - 6)`	`INT64`	`org.apache.kafka.connect.data.Timestamp` Represents the number of milliseconds since the UNIX epoch, and does not include timezone information.
`TIMESTAMP(7 - 9)`	`INT64`	`org.apache.kafka.connect.data.Timestamp` Represents the number of milliseconds since the UNIX epoch, and does not include timezone information.
`TIMESTAMP WITH TIME ZONE`	`STRING`	`io.debezium.time.ZonedTimestamp` A string representation of a timestamp with timezone information.
`TIMESTAMP WITH LOCAL TIME ZONE`	`STRING`	`io.debezium.time.ZonedTimestamp` A string representation of a timestamp in UTC.

DATE

INT32

org.apache.kafka.connect.data.Date

Represents the number of days since the UNIX epoch.

INTERVAL DAY[(M)] TO SECOND

FLOAT64

INTERVAL YEAR[(M)] TO MONTH

FLOAT64

TIMESTAMP(0 - 3)

INT64

org.apache.kafka.connect.data.Timestamp

Represents the number of milliseconds since the UNIX epoch, and does not include timezone information.

TIMESTAMP(4 - 6)

INT64

org.apache.kafka.connect.data.Timestamp

Represents the number of milliseconds since the UNIX epoch, and does not include timezone information.

TIMESTAMP(7 - 9)

INT64

org.apache.kafka.connect.data.Timestamp

Represents the number of milliseconds since the UNIX epoch, and does not include timezone information.

TIMESTAMP WITH TIME ZONE

STRING

io.debezium.time.ZonedTimestamp

A string representation of a timestamp with timezone information.

TIMESTAMP WITH LOCAL TIME ZONE

STRING

io.debezium.time.ZonedTimestamp

A string representation of a timestamp in UTC.

ROWID types

The following table describes how the connector maps ROWID (row address) data types.

Table 13. Mappings for Oracle ROWID data types
Oracle data type	Literal type (schema type)	Semantic type (schema name) and Notes
`ROWID`	`STRING`	This data type is not supported when using Oracle XStream.
`UROWID`	n/a	This data type is not supported.

XML types

Support for XMLTYPE is currently in an incubating state. Depending on the feedback that we receive, the exact semantics, configuration options, and so forth might change in future revisions. Please let us know if you encounter any problems while using these data types.

The following table describes how the connector maps XMLTYPE data types.

Table 14. Mappings for Oracle XMLTYPE data types
Oracle data type	Literal type (schema type)	Semantic type (schema name) and Notes
`XMLTYPE`	`STRING`	`io.debezium.data.Xml`

User-defined types

Oracle enables you to define custom data types to provide flexibility when the built-in data types do not satisfy your requirements. There are a several user-defined types such as Object types, REF data types, Varrays, and Nested Tables. At this time, you cannot use the Debezium Oracle connector with any of these user-defined types.

Oracle-supplied types

Oracle provides SQL-based interfaces that you can use to define new types when the built-in or ANSI-supported types are insufficient. Oracle offers several commonly used data types to serve a broad array of purposes such as Any or Spatial types. At this time, you cannot use the Debezium Oracle connector with any of these data types.

Default Values

If a default value is specified for a column in the database schema, the Oracle connector will attempt to propagate this value to the schema of the corresponding Kafka record field. Most common data types are supported, including:

Character types (CHAR, NCHAR, VARCHAR, VARCHAR2, NVARCHAR, NVARCHAR2)
Numeric types (INTEGER, NUMERIC, etc.)
Temporal types (DATE, TIMESTAMP, INTERVAL, etc.)

If a temporal type uses a function call such as TO_TIMESTAMP or TO_DATE to represent the default value, the connector will resolve the default value by making an additional database call to evaluate the function. For example, if a DATE column is defined with the default value of TO_DATE('2021-01-02', 'YYYY-MM-DD'), the column’s default value will be the number of days since the UNIX epoch for that date or 18629 in this case.

If a temporal type uses the SYSDATE constant to represent the default value, the connector will resolve this based on whether the column is defined as NOT NULL or NULL. If the column is nullable, no default value will be set; however, if the column isn’t nullable then the default value will be resolved as either 0 (for DATE or TIMESTAMP(n) data types) or 1970-01-01T00:00:00Z (for TIMESTAMP WITH TIME ZONE or TIMESTAMP WITH LOCAL TIME ZONE data types). The default value type will be numeric except if the column is a TIMESTAMP WITH TIME ZONE or TIMESTAMP WITH LOCAL TIME ZONE in which case its emitted as a string.

Custom converters

By default, the Debezium Oracle connector provides several CustomConverter implementations specific to Oracle data types. These custom converters provide alternative mappings for specific data types based on the connector configuration. To add a CustomConverter to the connector, follow the instructions in the Custom Converters documentation.

The Debezium Oracle connector provides the following custom converters:

`NUMBER(1)` to Boolean

Beginning with version 23, Oracle database provides a BOOLEAN logical data type. In earlier versions, the database simulates a BOOLEAN type by using a NUMBER(1) data type, constrained with a value of 0 for false, or a value of 1 for true.

By default, when Debezium emits change events for source columns that use the NUMBER(1) data type, it converts the data to the INT8 literal type. If the default mapping for NUMBER(1) data types does not meet your needs, you can configure the connector to use the logical BOOL type when it emits these columns by configuring the NumberOneToBooleanConverter, as shown in the following example:

Example: NumberOneToBooleanConverter configuration

converters=number-to-boolean
number-to-boolean.type=io.debezium.connector.oracle.converters.NumberOneToBooleanConverter
number-to-boolean.selector=.*.MY_TABLE.DATA

In the preceding example, the selector property is optional. The selector property specifies a regular expression that designates which tables or columns the converter applies to. If you omit the selector property, when Debezium emits an event, every column with the NUMBER(1) data type is converted to a field that uses the logical BOOL type.

`NUMBER` To Zero Scale

Oracle supports creating NUMBER based columns with negative scale, that is, NUMBER(-2). Not all systems can process negative scale values, so these values can result in processing problems in your pipeline. For example, because Apache Avro does not support these values, problems can occur if Debezium converts events to Avro format. Similarly, downstream consumers that do not support these values can also encounter errors.

Example configuration

converters=number-zero-scale
number-zero-scale.type=io.debezium.connector.oracle.converters.NumberToZeroScaleConverter
number-zero-scale.decimal.mode=precise

In the preceding example, the decimal.mode property specifies how the connector emits decimal values. This property is optional. If you omit the decimal.mode property, the converter defaults to using the PRECISE decimal handling mode.

`RAW` to String

Although Oracle recommends against the use of certain data types, such as RAW, legacy systems might continue to use such types. By default, Debezium emits RAW column types as logical BYTES, a type that enables the storage of binary or text-based data.

In some cases, RAW columns might store character data as a series of bytes. To facilitate consumption by consumers, you can configure Debezium to use the RawToStringConverter. The RawToStringConverter provides a way to easily target such RAW columns and emit values as strings, rather than bytes. The following example shows how to add the RawToStringConverter to the connector configuration:

Example: RawToStringConverter configuration

converters=raw-to-string
raw-to-string.type=io.debezium.connector.oracle.converters.RawToStringConverter
raw-to-string.selector=.*.MY_TABLE.DATA

In the preceding example, the selector property enables you to define a regular expression that specifies the tables or columns that the converter processes. If you omit the selector property, the converter maps all RAW column types to logical STRING field types.

Setting up Oracle

The following steps are necessary to set up Oracle for use with the Debezium Oracle connector. These steps assume the use of the multi-tenancy configuration with a container database and at least one pluggable database. If you do not intend to use a multi-tenant configuration, it might be necessary to adjust the following steps.

For information about using Vagrant to set up Oracle in a virtual machine, see the Debezium Vagrant Box for Oracle database GitHub repository.

Compatibility with Oracle installation types

An Oracle database can be installed either as a standalone instance or using Oracle Real Application Cluster (RAC). The Debezium Oracle connector is compatible with both types of installation.

Schemas excluded from capture

When the Debezium Oracle connector captures tables, it automatically excludes tables from the following schemas:

appqossys
audsys
ctxsys
dvsys
dbsfwuser
dbsnmp
qsmadmin_internal
lbacsys
mdsys
ojvmsys
olapsys
orddata
ordsys
outln
sys
system
vecsys (Oracle 23+)
wmsys
xdb

To enable the connector to capture changes from a table, the table must use a schema that is not named in the preceding list.

Tables excluded from capture

When the Debezium Oracle connector captures tables, it automatically excludes tables that match the following rules:

Compression Advisor tables matching the pattern CMP[3|4]$[0-9]+.
Index-organized tables matching the pattern SYS_IOT_OVER_%.
Spatial tables matching the patterns MDRT_%, MDRS_%, or MDXT_%.
Nested tables

To enable the connector to capture a table with a name that matches any of the preceding rules, you must rename the table.

Preparing the database

Configuration needed for Oracle LogMiner

ORACLE_SID=ORACLCDB dbz_oracle sqlplus /nolog

CONNECT sys/top_secret AS SYSDBA
alter system set db_recovery_file_dest_size = 10G;
alter system set db_recovery_file_dest = '/opt/oracle/oradata/recovery_area' scope=spfile;
shutdown immediate
startup mount
alter database archivelog;
alter database open;
-- Should now "Database log mode: Archive Mode"
archive log list

exit;

Oracle AWS RDS does not allow you to execute the commands above nor does it allow you to log in as sysdba. AWS provides these alternative commands to configure LogMiner. Before executing these commands, ensure that your Oracle AWS RDS instance is enabled for backups.

To confirm that Oracle has backups enabled, execute the command below first. The LOG_MODE should say ARCHIVELOG. If it does not, you may need to reboot your Oracle AWS RDS instance.

Configuration needed for Oracle AWS RDS LogMiner

SQL> SELECT LOG_MODE FROM V$DATABASE;

LOG_MODE
------------
ARCHIVELOG

Once LOG_MODE is set to ARCHIVELOG, execute the commands to complete LogMiner configuration. The first command set the database to archivelogs and the second adds supplemental logging.

Configuration needed for Oracle AWS RDS LogMiner

exec rdsadmin.rdsadmin_util.set_configuration('archivelog retention hours',24);

exec rdsadmin.rdsadmin_util.alter_supplemental_logging('ADD');

To enable Debezium to capture the before state of changed database rows, you must also enable supplemental logging for captured tables or for the entire database. The following example illustrates how to configure supplemental logging for all columns in a single inventory.customers table.

ALTER TABLE inventory.customers ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

Enabling supplemental logging for all table columns increases the volume of the Oracle redo logs. To prevent excessive growth in the size of the logs, apply the preceding configuration selectively.

Minimal supplemental logging must be enabled at the database level and can be configured as follows.

ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;

Redo log sizing

Depending on the database configuration, the size and number of redo logs might not be sufficient to achieve acceptable performance. Before you set up the Debezium Oracle connector, ensure that the capacity of the redo logs is sufficient to support the database.

The capacity of the redo logs for a database must be sufficient to store its data dictionary. In general, the size of the data dictionary increases with the number of tables and columns in the database. If the redo log lacks sufficient capacity, both the database and the Debezium connector might experience performance problems.

Consult with your database administrator to evaluate whether the database might require increased log capacity.

Archive log destinations

Oracle database administrators can configure up to 31 different destinations for archive logs. Administrators can set parameters for each destination to designate it for a specific use, for example, log shipping for physical standbys, or external storage to allow for extended log retention. Oracle reports details about archive log destinations in the V$ARCHIVE_DEST_STATUS view.

The Debezium Oracle connector only uses destinations that have a status of VALID and a type of LOCAL. If your Oracle environment includes multiple destinations that satisfy that criteria, consult with your Oracle administrator to determine which archive log destination Debezium should use.

Procedure

To specify the archive log destination that you want Debezium to use, set the archive.destination.name property in the connector configuration.

For example, suppose that a database is configured with two archive destination paths, /path/one and /path/two, and that the V$ARCHIVE_DEST_STATUS table associates these paths with destination names that are specified in the column DEST_NAME. If both destinations satisfy the criteria for Debezium — that is, their status is VALID and their type is LOCAL — to configure the connector to use the archive logs that the database writes to /path/two, set the value of archive.destination.name to the value in the DEST_NAME column that is associated with /path/two in the V$ARCHIVE_DEST_STATUS table. For example, if the DEST_NAME is LOG_ARCHIVE_DEST_3 for /path/two, you would configure Debezium as follows:

{
  "archive.destination.name": "LOG_ARCHIVE_DEST_3"
}

Do not set the value of archive.destination.name to the path that the database uses for the archive logs. Set the property to the name of an archive log destination in the DEST_NAME column for a row in the V$ARCHIVE_DEST_STATUS table that satisfies your archive log retention policy.

If your Oracle environment includes multiple destinations that satisfy that criteria, and you fail to specify the preferred destination, the Debezium Oracle connector selects the destination path at random. Because the retention policy that is configured for each destination might differ, this can lead to errors if the connector selects a path from which the requested log data was deleted.

Creating users for the connector

For the Debezium Oracle connector to capture change events, it must run as an Oracle LogMiner user that has specific permissions. The following example shows the SQL for creating an Oracle user account for the connector in a multi-tenant database model.

The connector captures database changes that are made by its own Oracle user account. However, it does not capture changes that are made by the SYS or SYSTEM user accounts.

Creating the connector’s LogMiner user

sqlplus sys/top_secret@//localhost:1521/ORCLCDB as sysdba
  CREATE TABLESPACE logminer_tbs DATAFILE '/opt/oracle/oradata/ORCLCDB/logminer_tbs.dbf'
    SIZE 25M REUSE AUTOEXTEND ON MAXSIZE UNLIMITED;
  exit;

sqlplus sys/top_secret@//localhost:1521/ORCLPDB1 as sysdba
  CREATE TABLESPACE logminer_tbs DATAFILE '/opt/oracle/oradata/ORCLCDB/ORCLPDB1/logminer_tbs.dbf'
    SIZE 25M REUSE AUTOEXTEND ON MAXSIZE UNLIMITED;
  exit;

sqlplus sys/top_secret@//localhost:1521/ORCLCDB as sysdba

  CREATE USER c##dbzuser IDENTIFIED BY dbz
    DEFAULT TABLESPACE logminer_tbs
    QUOTA UNLIMITED ON logminer_tbs
    CONTAINER=ALL;

  GRANT CREATE SESSION TO c##dbzuser CONTAINER=ALL; (1)
  GRANT SET CONTAINER TO c##dbzuser CONTAINER=ALL; (2)
  GRANT SELECT ON V_$DATABASE to c##dbzuser CONTAINER=ALL; (3)
  GRANT FLASHBACK ANY TABLE TO c##dbzuser CONTAINER=ALL; (4)
  GRANT SELECT ANY TABLE TO c##dbzuser CONTAINER=ALL; (5)
  GRANT SELECT_CATALOG_ROLE TO c##dbzuser CONTAINER=ALL; (6)
  GRANT EXECUTE_CATALOG_ROLE TO c##dbzuser CONTAINER=ALL; (7)
  GRANT SELECT ANY TRANSACTION TO c##dbzuser CONTAINER=ALL; (8)
  GRANT LOGMINING TO c##dbzuser CONTAINER=ALL; (9)

  GRANT CREATE TABLE TO c##dbzuser CONTAINER=ALL; (10)
  GRANT LOCK ANY TABLE TO c##dbzuser CONTAINER=ALL; (11)
  GRANT CREATE SEQUENCE TO c##dbzuser CONTAINER=ALL; (12)

  GRANT EXECUTE ON DBMS_LOGMNR TO c##dbzuser CONTAINER=ALL; (13)
  GRANT EXECUTE ON DBMS_LOGMNR_D TO c##dbzuser CONTAINER=ALL; (14)

  GRANT SELECT ON V_$LOG TO c##dbzuser CONTAINER=ALL; (15)
  GRANT SELECT ON V_$LOG_HISTORY TO c##dbzuser CONTAINER=ALL; (16)
  GRANT SELECT ON V_$LOGMNR_LOGS TO c##dbzuser CONTAINER=ALL; (17)
  GRANT SELECT ON V_$LOGMNR_CONTENTS TO c##dbzuser CONTAINER=ALL; (18)
  GRANT SELECT ON V_$LOGMNR_PARAMETERS TO c##dbzuser CONTAINER=ALL; (19)
  GRANT SELECT ON V_$LOGFILE TO c##dbzuser CONTAINER=ALL; (20)
  GRANT SELECT ON V_$ARCHIVED_LOG TO c##dbzuser CONTAINER=ALL; (21)
  GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO c##dbzuser CONTAINER=ALL; (22)
  GRANT SELECT ON V_$TRANSACTION TO c##dbzuser CONTAINER=ALL; (23)

  GRANT SELECT ON V_$MYSTAT TO c##dbzuser CONTAINER=ALL; (24)
  GRANT SELECT ON V_$STATNAME TO c##dbzuser CONTAINER=ALL; (25)

  exit;

Table 15. Descriptions of permissions / grants
Item	Role name	Description
1	CREATE SESSION	Enables the connector to connect to Oracle.
2	SET CONTAINER	Enables the connector to switch between pluggable databases. This is only required when the Oracle installation has container database support (CDB) enabled.
3	SELECT ON V_$DATABASE	Enables the connector to read the `V$DATABASE` table.
4	FLASHBACK ANY TABLE	Enables the connector to perform Flashback queries, which is how the connector performs the initial snapshot of data. Optionally, rather than granting `FLASHBACK` permission on all tables, you can grant the FLASHBACK privilege for specific tables only.
5	SELECT ANY TABLE	Enables the connector to read any table. Optionally, rather than granting `SELECT` permission on all tables, you can grant the `SELECT` privilege for specific tables only.
6	SELECT_CATALOG_ROLE	Enables the connector to read the data dictionary, which is needed by Oracle LogMiner sessions.
7	EXECUTE_CATALOG_ROLE	Enables the connector to write the data dictionary into the Oracle redo logs, which is needed to track schema changes.
8	SELECT ANY TRANSACTION	Enables the snapshot process to perform a Flashback snapshot query against any transaction. When `FLASHBACK ANY TABLE` is granted, this should also be granted.
9	LOGMINING	This role was added in newer versions of Oracle as a way to grant full access to Oracle LogMiner and its packages. On older versions of Oracle that don’t have this role, you can ignore this grant.
10	CREATE TABLE	Enables the connector to create its flush table in its default tablespace. The flush table allows the connector to explicitly control flushing of the LGWR internal buffers to disk.
11	LOCK ANY TABLE	Enables the connector to lock tables during schema snapshot. If snapshot locks are explicitly disabled via configuration, this grant can be safely ignored.
12	CREATE SEQUENCE	Enables the connector to create a sequence in its default tablespace.
13	EXECUTE ON DBMS_LOGMNR	Enables the connector to run methods in the `DBMS_LOGMNR` package. This is required to interact with Oracle LogMiner. On newer versions of Oracle this is granted via the `LOGMINING` role but on older versions, this must be explicitly granted.
14	EXECUTE ON DBMS_LOGMNR_D	Enables the connector to run methods in the `DBMS_LOGMNR_D` package. This is required to interact with Oracle LogMiner. On newer versions of Oracle this is granted via the `LOGMINING` role but on older versions, this must be explicitly granted.
15 to 25	SELECT ON V_$….	Enables the connector to read these tables. The connector must be able to read information about the Oracle redo and archive logs, and the current transaction state, to prepare the Oracle LogMiner session. Without these grants, the connector cannot operate.

Standby databases

A standby database provides a synchronized copy of the primary instance. In the event of a primary database failure, standby databases provide for continuous availability, and disaster recovery. Oracle makes use of both physical and logical standby databases.

Physical standbys

A physical standby is an exact, block-for-block copy of the primary production database, and its system change number (SCN) values are identical to those of the primary. The Debezium Oracle connector cannot capture change events directly from a physical standby database, because physical standbys do not accept external connections. The connector can capture events from a physical standby only after the standby is converted to the primary database. The connector then connects to the former standby as if it were any primary database.

Logical standbys

A logical standby contains the same logical data as the primary, but data might be stored in a different physical manner. SCN offsets in a logical standby differ from the offsets in the primary database. You can configure the Debezium Oracle connector to capture changes from a logical standby database.

Failover databases

When you set up a failover database, it is generally best practice to use a physical standby database rather than a logical standby database. A physical standby maintains a more consistent state with the primary database than does a logical standby. Physical standbys contain an exact replica of the primary data, and the system change number (SCN) values of the standby are identical to those of the primary. In a Debezium environment, after the database fails over to physical standby, the presence of consistent SCN values ensure that the connector can find the last processed SCN value.

A physical standby is locked in a read-only mode, with managed recovery running to maintain synchronization. When a database is in standby mode, it does not accept external JDBC connections from clients, and it cannot be accessed by external application.

After a failure event, to permit Debezium to connect to the former physical standby,a DBA must perform several actions to enable failover to the standby, and promote it the the primary database. The following list identifies some of the key actions:

Cancel managed recovery on the standby.
Complete the active recovery process.
Convert the standby to the primary role.
Open the new primary to client read and write operations.

After the former physical standby is available for normal use, you can configure the Debezium Oracle connector to connect to it. To enable the connector to capture from the new primary, edit the database hostname in the connector configuration, replacing the hostname of the original primary with the hostname of the new primary.

Configuring the Debezium Oracle connector to capture events from a logical standby

When the Debezium connector for Oracle connects to a primary database, it uses an internal flush table to manage the flush cycles of the Oracle Log Writer Buffer (LGWR) process. The flush process requires that the user account through which the connector accesses the database has permission to create and write to this flush table. However, a logical stand-by database typically permits read-only access, preventing the connector from writing to the database. You can modify the connector configuration to enable the connector to capture events from a logical standby, or the DBA can create a new writable tablespace in which the connector can store the flush table.

The ability to use the connector with a logical standby in an incubating state and can change without notice. There is an open Jira issue to investigate support for capturing changes from a physical standby.

Procedure

To enable Debezium to capture events from an Oracle read-only logical standby database, add the following property to the connector configuration, to disable creation and management of the flush table:
```
internal.log.mining.read.only=true
```
The preceding setting prevents the database from creating and updating the LOG_MINING_FLUSH table. You can use the internal.log.mining.read.only property with an Oracle Standalone database, or with an Oracle RAC installation.

Extended max string size

The database parameter max_string_size controls how the Oracle database, and by extension, Debezium, interprets values for VARCHAR2, NVARCHAR2, and RAW fields. The default, STANDARD, means the lengths for these data types align with the same limits with releases prior to Oracle 12c (4000 bytes for VARCHAR2 and NVARCHAR2 and 2000 bytes for RAW). When configured as EXTENDED, these columns now allow up to 32767 bytes of data to be stored.

While a database administrator can change the max_string_size from STANDARD to EXTENDED, the opposite isn’t allowed. Once the database is updated to EXTENDED string support, that cannot be undone.

For the Debezium Oracle connector, when the database parameter max_string_size is EXTENDED, the lob.enabled connector configuration option should be set to true to capture changes made to VARCHAR2 and NVARCHAR2 fields that have string lengths that exceed 4000 bytes or RAW fields with more than 2000 bytes.

When set to EXTENDED, Oracle performs an implicit conversion of the character data in-flight when the byte length of the string data exceeds the legacy maximums. This implicit conversion means that Oracle internally treats the string data as though it is a CLOB, so you get the benefit of treating the field as a regular string to the outside world, but with all the pitfalls and concerns about storage at the database tier level.

Since Oracle treats these strings as CLOB internally, the redo logs also reflect several unique operation types that the Debezium Oracle connector needs to be aware that it should mine. And because these operation types very closely resemble CLOB operations, the redo log entries must be mined in the same way as any other LOB type, which is why lob.enabled must be set to true to capture changes from the redo logs regardless of the string data’s byte length.

When Oracle is configured to use EXTENDED string sizes, LogMiner sometimes fails to escape single quote characters (') when it reconstructs the SQL for an extended string field. This problem can occur if the byte length of the extended string field does not exceed the legacy maximum length. As a result, values in these fields can be truncated, resulting in invalid SQL statements, which the Oracle connector is unable to parse.

To help resolve some instances of the problem, you can configure the connector to relax single-quote detection by setting the following property to true:

internal.log.mining.sql.relaxed.quote.detection

Note that the use of this internal setting is not currently a supported feature.
For more information, see DBZ-8034.

Deployment

To deploy a Debezium Oracle connector, you install the Debezium Oracle connector archive, configure the connector, and start the connector by adding its configuration to Kafka Connect.

Prerequisites

Apache ZooKeeper, Apache Kafka, and Kafka Connect are installed.
Oracle Database is installed, and is configured to work with the Debezium connector.

Procedure

Download the Debezium Oracle connector plug-in archive.
Extract the files into your Kafka Connect environment.

Download the JDBC driver for Oracle from Maven Central and extract the downloaded driver file to the directory that contains the Debezium Oracle connector JAR file.

If you use the Debezium Oracle connector with Oracle XStream, obtain the JDBC driver as part of the Oracle Instant Client package. For more information, see Obtaining the Oracle JDBC driver and XStream API files.

Download the XDB library for Oracle from Maven Central and extract the downloaded file to the directory that contains the Debezium Oracle connector JAR file.
Add the directory with the JAR files to Kafka Connect’s plugin.path.
Restart your Kafka Connect process to pick up the new JAR files.

Next steps

Configure the connector and add the configuration to your Kafka Connect cluster.

Debezium Oracle connector configuration

Typically, you register a Debezium Oracle connector by submitting a JSON request that specifies the configuration properties for the connector. The following example shows a JSON request for registering an instance of the Debezium Oracle connector with logical name server1 at port 1521:

You can choose to produce events for a subset of the schemas and tables in a database. Optionally, you can ignore, mask, or truncate columns that contain sensitive data, that are larger than a specified size, or that you do not need.

Example: Debezium Oracle connector configuration

{
    "name": "inventory-connector",  (1)
    "config": {
        "connector.class" : "io.debezium.connector.oracle.OracleConnector",  (2)
        "database.hostname" : "<ORACLE_IP_ADDRESS>",  (3)
        "database.port" : "1521",  (4)
        "database.user" : "c##dbzuser",  (5)
        "database.password" : "dbz",   (6)
        "database.dbname" : "ORCLCDB",  (7)
        "topic.prefix" : "server1",  (8)
        "tasks.max" : "1",  (9)
        "database.pdb.name" : "ORCLPDB1",  (10)
        "schema.history.internal.kafka.bootstrap.servers" : "kafka:9092", (11)
        "schema.history.internal.kafka.topic": "schema-changes.inventory"  (12)
    }
}

1	The name that is assigned to the connector when you register it with a Kafka Connect service.
2	The name of this Oracle connector class.
3	The address of the Oracle instance.
4	The port number of the Oracle instance.
5	The name of the Oracle user, as specified in Creating users for the connector.
6	The password for the Oracle user, as specified in Creating users for the connector.
7	The name of the database to capture changes from.
8	Topic prefix that identifies and provides a namespace for the Oracle database server from which the connector captures changes.
9	The maximum number of tasks to create for this connector.
10	The name of the Oracle pluggable database that the connector captures changes from. Used in container database (CDB) installations only.
11	The list of Kafka brokers that this connector uses to write and recover DDL statements to the database schema history topic.
12	The name of the database schema history topic where the connector writes and recovers DDL statements. This topic is for internal use only and should not be used by consumers.

In the previous example, the database.hostname and database.port properties are used to define the connection to the database host. However, in more complex Oracle deployments, or in deployments that use Transparent Network Substrate (TNS) names, you can use an alternative method in which you specify a JDBC URL.

The following JSON example shows the same configuration as in the preceding example, except that it uses a JDBC URL to connect to the database.

Example: Debezium Oracle connector configuration that uses a JDBC URL to connect to the database

{
    "name": "inventory-connector",
    "config": {
        "connector.class" : "io.debezium.connector.oracle.OracleConnector",
        "tasks.max" : "1",
        "topic.prefix" : "server1",
        "database.user" : "c##dbzuser",
        "database.password" : "dbz",
        "database.url": "jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(LOAD_BALANCE=OFF)(FAILOVER=ON)(ADDRESS=(PROTOCOL=TCP)(HOST=<oracle ip 1>)(PORT=1521))(ADDRESS=(PROTOCOL=TCP)(HOST=<oracle ip 2>)(PORT=1521)))(CONNECT_DATA=SERVICE_NAME=)(SERVER=DEDICATED)))",
        "database.dbname" : "ORCLCDB",
        "database.pdb.name" : "ORCLPDB1",
        "schema.history.internal.kafka.bootstrap.servers" : "kafka:9092",
        "schema.history.internal.kafka.topic": "schema-changes.inventory"
    }
}

For the complete list of the configuration properties that you can set for the Debezium Oracle connector, see Oracle connector properties.

You can send this configuration with a POST command to a running Kafka Connect service. The service records the configuration and starts a connector task that performs the following operations:

Connects to the Oracle database.
Reads the redo log.
Records change events to Kafka topics.

Adding connector configuration

To start running a Debezium Oracle connector, create a connector configuration, and add the configuration to your Kafka Connect cluster.

Prerequisites

Oracle is configured for use with Debezium.
The Debezium Oracle connector is installed.

Procedure

Create a configuration for the Oracle connector.
Use the Kafka Connect REST API to add that connector configuration to your Kafka Connect cluster.

Results

After the connector starts, it performs a consistent snapshot of the Oracle databases that the connector is configured for. The connector then starts generating data change events for row-level operations and streaming the change event records to Kafka topics.

Pluggable vs Non-Pluggable databases

Oracle Database supports the following deployment types:

Container database (CDB): A database that can contain multiple pluggable databases (PDBs). Database clients connect to each PDB as if it were a standard, non-CDB database.
Non-container database (non-CDB): A standard Oracle database, which does not support the creation of pluggable databases.

Example: Debezium connector configuration for CDB deployments

{
    "config": {
        "connector.class" : "io.debezium.connector.oracle.OracleConnector",
        "tasks.max" : "1",
        "topic.prefix" : "server1",
        "database.hostname" : "<oracle ip>",
        "database.port" : "1521",
        "database.user" : "c##dbzuser",
        "database.password" : "dbz",
        "database.dbname" : "ORCLCDB",
        "database.pdb.name" : "ORCLPDB1",
        "schema.history.internal.kafka.bootstrap.servers" : "kafka:9092",
        "schema.history.internal.kafka.topic": "schema-changes.inventory"
    }
}

When you configure a Debezium Oracle connector for use with an Oracle CDB, you must specify a value for the property database.pdb.name, which names the PDB that you want the connector to capture changes from. For non-CDB installation, do not specify the database.pdb.name property.

Example: Debezium Oracle connector configuration for non-CDB deployments

{
    "config": {
        "connector.class" : "io.debezium.connector.oracle.OracleConnector",
        "tasks.max" : "1",
        "topic.prefix" : "server1",
        "database.hostname" : "<oracle ip>",
        "database.port" : "1521",
        "database.user" : "c##dbzuser",
        "database.password" : "dbz",
        "database.dbname" : "ORCLCDB",
        "schema.history.internal.kafka.bootstrap.servers" : "kafka:9092",
        "schema.history.internal.kafka.topic": "schema-changes.inventory"
    }
}

Using mTLS secure connections

When connecting to Oracle using mutual TLS authentication (mTLS), this involves both the connector and the database service proving their identities. The Debezium for Oracle connector relies on the Oracle JDBC driver’s built-in functionality to support mTLS authentication.

You can use either of the following methods to establish mTLS connections between Debezium and Oracle:

Java key/trust stores

Prerequisites

Verify connector can access the configured key and trust store files.
Verify that the database TNS (Transparent Network Substrate) listener supports TCPS secure connections.

Procedure

Configure the Debezium Oracle connector, as illustrated in the following example.

Example: Debezium Oracle connector TLS configuration

{
  "database.url": "jdbc:oracle:thin@(DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=<host>)(PORT=<port>))(CONNECT_DATA(SERVICE_NAME=<service>)))",
  "driver.javax.net.ssl.keyStore": "<path-to-jks-keystore>",
  "driver.javax.net.ssl.keyStorePassword": "<keystore-password>",
  "driver.javax.net.ssl.keyStoreType": "JKS",
  "driver.javax.net.ssl.trustStore": "<path-to-jks-truststore>",
  "driver.javax.net.ssl.trustStorePassword": "<truststore-password>",
  "driver.javax.net.ssl.trustStoreType": "JKS"
}

For Debezium to use TLS encryption to communicate with Oracle, requires a TCPS connection with the server. To establish a TCPS connection, you must configure the connector to use the database.url property, and not the database.host property. Unlike the database.host property, the database.url property permits you to define an Oracle TNS (Transparent Network Substrate) connection string that explicitly requires use of the TCPS protocol.

Oracle Wallet

Prerequisites

Verify Oracle Wallet is configured on the Oracle database server with your database administrators
Locate the oraclepki.jar inside the Oracle JDBC driver archive

Procedure

Install oraclepki.jar in the same location where $Debezium Debezium Oracle connector jars exist. If you have downloaded and installed the Oracle JDBC driver, the same location is where you would also place oraclepki.jar.
Use the database.url configuration property rather than database.hostname to configure the connector. By using database.url, a TNS-based configuration is provided to interact with Oracle Wallet. A sample configuration is shown below.
Set the Oracle JDBC driver property oracle.net.wallet_location to explicitly set the Oracle Wallet configuration to be used by the Oracle JDBC driver.

Example mTLS configuration

{
  "database.url": "jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCPS)(PORT=xxxx)(HOST=xx.xx.xx.xx))(CONNECT_DATA=(SID=xxxx)))",
  "driver.oracle.net.wallet_location": "(SOURCE=(METHOD=file)(METHOD_DATA=(DIRECTORY=/opt/kafka/external-configuration/oracle_wallet/)))"
}

Be sure that you set the correct host, port, and service identifier (sid) in the database.url. Additionally, be sure that the directory in the driver.oracle.net.wallet_location is readable.

Connector properties

The Debezium Oracle connector has numerous configuration properties that you can use to achieve the right connector behavior for your application. Many properties have default values. Information about the properties is organized as follows:

Required Debezium Oracle connector configuration properties
Database schema history connector configuration properties that control how Debezium processes events that it reads from the database schema history topic.
Pass-through Db2 connector configuration properties

Required Debezium Oracle connector configuration properties

The following configuration properties are required unless a default value is available.

Property

Default

Description

name

No default

Unique name for the connector. Attempting to register again with the same name will fail. (This property is required by all Kafka Connect connectors.)

connector.class

No default

The name of the Java class for the connector. Always use a value of io.debezium.connector.oracle.OracleConnector for the Oracle connector.

converters

No default

Enumerates a comma-separated list of the symbolic names of the custom converter instances that the connector can use.
For example, boolean.
This property is required to enable the connector to use a custom converter.

For each converter that you configure for a connector, you must also add a .type property, which specifies the fully-qualified name of the class that implements the converter interface. The .type property uses the following format:

<converterSymbolicName>.type

For example,

boolean.type: io.debezium.connector.oracle.converters.NumberOneToBooleanConverter

If you want to further control the behavior of a configured converter, you can add one or more configuration parameters to pass values to the converter. To associate any additional configuration parameters with a converter, prefix the parameter names with the symbolic name of the converter.

For example, to define a selector parameter that specifies the subset of columns that the boolean converter processes, add the following property:

boolean.selector: .*MYTABLE.FLAG,.*.IS_ARCHIVED

tasks.max

1

The maximum number of tasks to create for this connector. The Oracle connector always uses a single task and therefore does not use this value, so the default is always acceptable.

database.hostname

No default

IP address or hostname of the Oracle database server.

database.port

No default

Integer port number of the Oracle database server.

database.user

No default

Name of the Oracle user account that the connector uses to connect to the Oracle database server.

database.password

No default

Password to use when connecting to the Oracle database server.

database.dbname

No default

Name of the database to connect to. In a container database environment, specify the name of the root container database (CDB), not the name of an included pluggable database (PDB).

database.url

No default

Specifies the raw database JDBC URL. Use this property to provide flexibility in defining that database connection. Valid values include raw TNS names and RAC connection strings.

database.pdb.name

No default

Name of the Oracle pluggable database to connect to. Use this property with container database (CDB) installations only.

topic.prefix

No default

Topic prefix that provides a namespace for the Oracle database server from which the connector captures changes. The value that you set is used as a prefix for all Kafka topic names that the connector emits. Specify a topic prefix that is unique among all connectors in your Debezium environment. The following characters are valid: alphanumeric characters, hyphens, dots, and underscores.

Do not change the value of this property. If you change the name value, after a restart, instead of continuing to emit events to the original topics, the connector emits subsequent events to topics whose names are based on the new value. The connector is also unable to recover its database schema history topic.

database.connection.adapter

logminer

The adapter implementation that the connector uses when it streams database changes. You can set the following values:

logminer (default): The connector uses the native Oracle LogMiner API.
olr: The connector uses OpenLogReplicator.
xstream: The connector uses the Oracle XStream API.

snapshot.mode

initial

Specifies the mode that the connector uses to take snapshots of a captured table. You can set the following values:

always: The snapshot includes the structure and data of the captured tables. Specify this value to populate topics with a complete representation of the data from the captured tables on each connector start.
initial: The snapshot includes the structure and data of the captured tables. Specify this value to populate topics with a complete representation of the data from the captured tables. If the snapshot completes successfully, upon next connector start snapshot is not executed again.
initial_only: The snapshot includes the structure and data of the captured tables. The connector performs an initial snapshot and then stops, without processing any subsequent changes.
schema_only: Deprecated, see no_data
no_data: The snapshot includes only the structure of captured tables. Specify this value if you want the connector to capture data only for changes that occur after the snapshot.
schema_only_recovery: Deprecated, see recovery.
recovery: This is a recovery setting for a connector that has already been capturing changes. When you restart the connector, this setting enables recovery of a corrupted or lost database schema history topic. You might set it periodically to "clean up" a database schema history topic that has been growing unexpectedly. Database schema history topics require infinite retention. Note this mode is only safe to be used when it is guaranteed that no schema changes happened since the point in time the connector was shut down before and the point in time the snapshot is taken.

After the snapshot is complete, the connector continues to read change events from the database’s redo logs except when snapshot.mode is configured as initial_only.

when_needed

After the connector starts, it performs a snapshot only if it detects one of the following circumstances:

It cannot detect any topic offsets.
A previously recorded offset specifies a log position that is not available on the server.

configuration_based

With this option, you control snapshot behavior through a set of connector properties that have the prefix 'snapshot.mode.configuration.based'.

custom

The connector performs a snapshot according to the implementation specified by the snapshot.mode.custom.name property, which defines a custom implementation of the io.debezium.spi.snapshot.Snapshotter interface.

For more information, see the table of snapshot.mode options.

snapshot.mode.configuration.based.snapshot.data

false

If the snapshot.mode is set to configuration_based, set this property to specify whether the connector includes table data when it performs a snapshot.

snapshot.mode.configuration.based.snapshot.schema

false

If the snapshot.mode is set to configuration_based, set this property to specify whether the connector includes the table schema when it performs a snapshot.

snapshot.mode.configuration.based.start.stream

false

If the snapshot.mode is set to configuration_based, set this property to specify whether the connector begins to stream change events after a snapshot completes.

snapshot.mode.configuration.based.snapshot.on.schema.error

false

If the snapshot.mode is set to configuration_based, set this property to specify whether the connector includes table schema in a snapshot if the schema history topic is not available.

snapshot.mode.configuration.based.snapshot.on.data.error

false

If the snapshot.mode is set to configuration_based, this property specifies whether the connector attempts to snapshot table data if it does not find the last committed offset in the transaction log.
Set the value to true to instruct the connector to perform a new snapshot.

snapshot.mode.custom.name

No default

If snapshot.mode is set to custom, use this setting to specify the name of the custom implementation that is provided in the name() method that is defined in the 'io.debezium.spi.snapshot.Snapshotter' interface. After a connector restart, Debezium calls the specified custom implementation to determine whether to perform a snapshot. For more information, see custom snapshotter SPI.

snapshot.locking.mode

shared

Controls whether and for how long the connector holds a table lock. Table locks prevent certain types of changes table operations from occurring while the connector performs a snapshot. You can set the following values:

shared: Enables concurrent access to the table, but prevents any session from acquiring an exclusive table lock. The connector acquires a ROW SHARE level lock while it captures table schema.
none: Prevents the connector from acquiring any table locks during the snapshot. Use this setting only if no schema changes might occur during the creation of the snapshot.
custom: The connector performs a snapshot according to the implementation specified by the snapshot.locking.mode.custom.name property, which is a custom implementation of the io.debezium.spi.snapshot.SnapshotLock interface.

snapshot.locking.mode.custom.name

No default

When snapshot.locking.mode is set as custom, use this setting to specify the name of the custom implementation provided in the name() method that is defined by the 'io.debezium.spi.snapshot.SnapshotLock' interface. For more information, see custom snapshotter SPI.

snapshot.query.mode

select_all

Specifies how the connector queries data while performing a snapshot.
Set one of the following options:

select_all: The connector performs a select all query by default, optionally adjusting the columns selected based on the column include and exclude list configurations.
custom: The connector performs a snapshot query according to the implementation specified by the snapshot.query.mode.custom.name property, which defines a custom implementation of the io.debezium.spi.snapshot.SnapshotQuery interface.

This setting enables you to manage snapshot content in a more flexible manner compared to using the snapshot.select.statement.overrides property.

snapshot.query.mode.custom.name

No default

When snapshot.query.mode is set to custom, use this setting to specify the name of the custom implementation provided in the name() method that is defined by the 'io.debezium.spi.snapshot.SnapshotQuery' interface. For more information, see custom snapshotter SPI.

snapshot.include.collection.list

All tables specified in the connector’s table.include.list property.

An optional, comma-separated list of regular expressions that match the fully-qualified names (<databaseName>.<schemaName>.<tableName>) of the tables to include in a snapshot.

In a multitenant container database (CDB) environment, the regular expression must include the pluggable database (PDB) name, using the format <pdbName>.<schemaName>.<tableName>.

To match the name of a table, Debezium applies the regular expression that you specify as an anchored regular expression. That is, the specified expression is matched against the entire name string of the table; it does not match substrings that might be present in a table name.
In environments that use the LogMiner implementation, you must use POSIX regular expressions only.

A snapshot can only include tables that are named in the connector’s table.include.list property.

This property takes effect only if the connector’s snapshot.mode property is set to a value other than never.
This property does not affect the behavior of incremental snapshots.

snapshot.select.statement.overrides

No default

Specifies the table rows to include in a snapshot. Use the property if you want a snapshot to include only a subset of the rows in a table. This property affects snapshots only. It does not apply to events that the connector reads from the log.

The property contains a comma-separated list of fully-qualified table names in the form <schemaName>.<tableName>. For example,

"snapshot.select.statement.overrides": "inventory.products,customers.orders"

For each table in the list, add a further configuration property that specifies the SELECT statement for the connector to run on the table when it takes a snapshot. The specified SELECT statement determines the subset of table rows to include in the snapshot. Use the following format to specify the name of this SELECT statement property:

snapshot.select.statement.overrides.<schemaName>.<tableName>

For example, snapshot.select.statement.overrides.customers.orders

Example:

From a customers.orders table that includes the soft-delete column, delete_flag, add the following properties if you want a snapshot to include only those records that are not soft-deleted:

"snapshot.select.statement.overrides": "customer.orders",
"snapshot.select.statement.overrides.customer.orders": "SELECT * FROM customers.orders WHERE delete_flag = 0 ORDER BY id DESC"

In the resulting snapshot, the connector includes only the records for which delete_flag = 0.

schema.include.list

No default

An optional, comma-separated list of regular expressions that match names of schemas for which you want to capture changes. In environments that use the LogMiner implementation, you must use POSIX regular expressions only. Any schema name not included in schema.include.list is excluded from having its changes captured. By default, all non-system schemas have their changes captured.

To match the name of a schema, Debezium applies the regular expression that you specify as an anchored regular expression. That is, the specified expression is matched against the entire name string of the schema; it does not match substrings that might be present in a schema name.
If you include this property in the configuration, do not also set the schema.exclude.list property.

include.schema.comments

false

Boolean value that specifies whether the connector should parse and publish table and column comments on metadata objects. Enabling this option will bring the implications on memory usage. The number and size of logical schema objects is what largely impacts how much memory is consumed by the Debezium connectors, and adding potentially large string data to each of them can potentially be quite expensive.

schema.exclude.list

No default

An optional, comma-separated list of regular expressions that match names of schemas for which you do not want to capture changes. In environments that use the LogMiner implementation, you must use POSIX regular expressions only.
Any schema whose name is not included in schema.exclude.list has its changes captured, with the exception of system schemas.

table.include.list

No default

An optional comma-separated list of regular expressions that match fully-qualified table identifiers for tables to be captured. If you use the LogMiner implementation, use only POSIX regular expressions with this property. When this property is set, the connector captures changes only from the specified tables. Each table identifier uses the following format:

<schema_name>.<table_name>

By default, the connector monitors every non-system table in each captured database.

To match the name of a table, Debezium applies the regular expression that you specify as an anchored regular expression. That is, the specified expression is matched against the entire name string of the table; it does not match substrings that might be present in a table name.
If you include this property in the configuration, do not also set the table.exclude.list property.

table.exclude.list

No default

An optional comma-separated list of regular expressions that match fully-qualified table identifiers for tables to be excluded from monitoring. If you use the LogMiner implementation, use only POSIX regular expressions with this property. The connector captures change events from any table that is not specified in the exclude list. Specify the identifier for each table using the following format:

<schemaName>.<tableName>.

To match the name of a table, Debezium applies the regular expression that you specify as an anchored regular expression. That is, the specified expression is matched against the entire name string of the table; it does not match substrings that might be present in a table name.
If you include this property in the configuration, do not also set the table.include.list property.

column.include.list

No default

An optional, comma-separated list of regular expressions that match the fully-qualified names of columns that want to include in the change event message values. In environments that use the LogMiner implementation, you must use POSIX regular expressions only. Fully-qualified names for columns use the following format:

<Schema_name>.<table_name>.<column_name>

The primary key column is always included in an event’s key, even if you do not use this property to explicitly include its value.

To match the name of a column, Debezium applies the regular expression that you specify as an anchored regular expression. That is, the specified expression is matched against the entire name string of the column it does not match substrings that might be present in a column name.
If you include this property in the configuration, do not also set the column.exclude.list property.

column.exclude.list

No default

An optional, comma-separated list of regular expressions that match the fully-qualified names of columns that you want to exclude from change event message values. In environments that use the LogMiner implementation, you must use POSIX regular expressions only. Fully-qualified column names use the following format:

<schema_name>.<table_name>.<column_name>

The primary key column is always included in an event’s key, even if you use this property to explicitly exclude its value.

skip.messages.without.change

false

Specifies whether to skip publishing messages when there is no change in included columns. This would essentially filter messages if there is no change in columns included as per column.include.list or column.exclude.list properties.

column.mask.hash.hashAlgorithm.with.salt.salt; column.mask.hash.v2.hashAlgorithm.with.salt.salt

n/a

An optional, comma-separated list of regular expressions that match the fully-qualified names of character-based columns. Fully-qualified names for columns are of the form <schemaName>.<tableName>.<columnName>.
To match the name of a column Debezium applies the regular expression that you specify as an anchored regular expression. That is, the specified expression is matched against the entire name string of the column; the expression does not match substrings that might be present in a column name.
In the resulting change event record, the values for the specified columns are replaced with pseudonyms.

A pseudonym consists of the hashed value that results from applying the specified hashAlgorithm and salt. Based on the hash function that is used, referential integrity is maintained, while column values are replaced with pseudonyms. Supported hash functions are described in the MessageDigest section of the Java Cryptography Architecture Standard Algorithm Name Documentation.

In the following example, CzQMA0cB5K is a randomly selected salt.

column.mask.hash.SHA-256.with.salt.CzQMA0cB5K = inventory.orders.customerName, inventory.shipment.customerName

If necessary, the pseudonym is automatically shortened to the length of the column. The connector configuration can include multiple properties that specify different hash algorithms and salts.

Depending on the hashAlgorithm used, the salt selected, and the actual data set, the resulting data set might not be completely masked.

Hashing strategy version 2 should be used to ensure fidelity if the value is being hashed in different places or systems.

binary.handling.mode

bytes

Specifies how binary (blob) columns should be represented in change events, including: bytes represents binary data as byte array (default), base64 represents binary data as base64-encoded String, base64-url-safe represents binary data as base64-url-safe-encoded String, hex represents binary data as hex-encoded (base16) String

schema.name.adjustment.mode

none

Specifies how schema names should be adjusted for compatibility with the message converter used by the connector. Possible settings:

none does not apply any adjustment.
avro replaces the characters that cannot be used in the Avro type name with underscore.
avro_unicode replaces the underscore or characters that cannot be used in the Avro type name with corresponding unicode like _uxxxx. Note: _ is an escape sequence like backslash in Java

field.name.adjustment.mode

none

Specifies how field names should be adjusted for compatibility with the message converter used by the connector. Possible settings:

none does not apply any adjustment.
avro replaces the characters that cannot be used in the Avro type name with underscore.
avro_unicode replaces the underscore or characters that cannot be used in the Avro type name with corresponding unicode like _uxxxx. Note: _ is an escape sequence like backslash in Java

See Avro naming for more details.

decimal.handling.mode

precise

Specifies how the connector should handle floating point values for NUMBER, DECIMAL and NUMERIC columns. You can set one of the following options:

precise (default): Represents values precisely by using java.math.BigDecimal values represented in change events in a binary form.
double: Represents values by using double values. Using double values is easier, but can result in a loss of precision.
string: Encodes values as formatted strings. Using the string option is easier to consume, but results in a loss of semantic information about the real type. For more information, see Numeric types.

interval.handling.mode

numeric

Specifies how the connector should handle values for interval columns:

numeric represents intervals using approximate number of microseconds.

string represents intervals exactly by using the string pattern representation P<years>Y<months>M<days>DT<hours>H<minutes>M<seconds>S. For example: P1Y2M3DT4H5M6.78S.

event.processing.failure.handling.mode

fail

Specifies how the connector should react to exceptions during processing of events. You can set one of the following options:

fail: Propagates the exception (indicating the offset of the problematic event), causing the connector to stop.
warn: Causes the problematic event to be skipped. The offset of the problematic event is then logged.
skip: Causes the problematic event to be skipped.

max.batch.size

2048

A positive integer value that specifies the maximum size of each batch of events to process during each iteration of this connector.

max.queue.size

8192

Positive integer value that specifies the maximum number of records that the blocking queue can hold. When Debezium reads events streamed from the database, it places the events in the blocking queue before it writes them to Kafka. The blocking queue can provide backpressure for reading change events from the database in cases where the connector ingests messages faster than it can write them to Kafka, or when Kafka becomes unavailable. Events that are held in the queue are disregarded when the connector periodically records offsets. Always set the value of max.queue.size to be larger than the value of max.batch.size.

max.queue.size.in.bytes

0 (disabled)

A long integer value that specifies the maximum volume of the blocking queue in bytes. By default, volume limits are not specified for the blocking queue. To specify the number of bytes that the queue can consume, set this property to a positive long value.
If max.queue.size is also set, writing to the queue is blocked when the size of the queue reaches the limit specified by either property. For example, if you set max.queue.size=1000, and max.queue.size.in.bytes=5000, writing to the queue is blocked after the queue contains 1000 records, or after the volume of the records in the queue reaches 5000 bytes.

poll.interval.ms

500 (0.5 second)

Positive integer value that specifies the number of milliseconds the connector should wait during each iteration for new change events to appear.

include.schema.changes

true

Boolean value that specifies whether the connector publishes changes in the database schema to a Kafka topic with the same name as the topic prefix. The connector records each schema change with a key that contains the database name, and a value that is a JSON structure that describes the schema update. This mechanism for recording schema changes is independent of the connector’s internal recording of changes to the database schema history.

tombstones.on.delete

true

Controls whether a delete event is followed by a tombstone event. The following values are possible:

true: For each delete operation, the connector emits a delete event and a subsequent tombstone event.
false: For each delete operation, the connector emits only a delete event.

After a source record is deleted, a tombstone event (the default behavior) enables Kafka to completely delete all events that share the key of the deleted row in topics that have log compaction enabled.

message.key.columns

No default

A list of expressions that specify the columns that the connector uses to form custom message keys for change event records that it publishes to the Kafka topics for specified tables.

By default, Debezium uses the primary key column of a table as the message key for records that it emits. In place of the default, or to specify a key for tables that lack a primary key, you can configure custom message keys based on one or more columns.
To establish a custom message key for a table, list the table, followed by the columns to use as the message key. Each list entry takes the following format:

<fullyQualifiedTableName>:<keyColumn>,<keyColumn>

To base a table key on multiple column names, insert commas between the column names.
Each fully-qualified table name is a regular expression in the following format:

<schemaName>.<tableName>

The property can include entries for multiple tables. Use a semicolon to separate table entries in the list.
The following example sets the message key for the tables inventory.customers and purchase.orders:

inventory.customers:pk1,pk2;(.*).purchaseorders:pk3,pk4

For the table inventory.customer, the columns pk1 and pk2 are specified as the message key. For the purchaseorders tables in any schema, the columns pk3 and pk4 server as the message key.
There is no limit to the number of columns that you use to create custom message keys. However, it’s best to use the minimum number that are required to specify a unique key.

column.truncate.to.length.chars

No default

An optional, comma-separated list of regular expressions that match the fully-qualified names of character-based columns. Set this property if you want the connector to mask the values for a set of columns, for example, if they contain sensitive data. Set length to a positive integer to replace data in the specified columns with the number of asterisk (*) characters specified by the length in the property name. Set length to 0 (zero) to replace data in the specified columns with an empty string.

The fully-qualified name of a column observes the following format: <schemaName>.<tableName>.<columnName>. To match the name of a column, Debezium applies the regular expression that you specify as an anchored regular expression. That is, the specified expression is matched against the entire name string of the column; the expression does not match substrings that might be present in a column name.

You can specify multiple properties with different lengths in a single configuration.

column.mask.with.length.chars

No default

An optional comma-separated list of regular expressions for masking column names in change event messages by replacing characters with asterisks (*).
Specify the number of characters to replace in the name of the property, for example, column.mask.with.8.chars.
Specify length as a positive integer or zero. Then add regular expressions to the list for each character-based column name where you want to apply a mask.
Use the following format to specify fully-qualified column names: <schemaName>.<tableName>.<columnName>.

The connector configuration can include multiple properties that specify different lengths.

column.propagate.source.type

No default

An optional, comma-separated list of regular expressions that match the fully-qualified names of columns for which you want the connector to emit extra parameters that represent column metadata. When this property is set, the connector adds the following fields to the schema of event records:

__debezium.source.column.type
__debezium.source.column.length
__debezium.source.column.scale

These parameters propagate a column’s original type name and length (for variable-width types), respectively.
Enabling the connector to emit this extra data can assist in properly sizing specific numeric or character-based columns in sink databases.

The fully-qualified name of a column observes one of the following formats: <tableName>.<columnName>, or <schemaName>.<tableName>.<columnName>.
To match the name of a column, Debezium applies the regular expression that you specify as an anchored regular expression. That is, the specified expression is matched against the entire name string of the column; the expression does not match substrings that might be present in a column name.

datatype.propagate.source.type

No default

An optional, comma-separated list of regular expressions that specify the fully-qualified names of data types that are defined for columns in a database. When this property is set, for columns with matching data types, the connector emits event records that include the following extra fields in their schema:

__debezium.source.column.type
__debezium.source.column.length
__debezium.source.column.scale

The fully-qualified name of a column observes one of the following formats: <tableName>.<typeName>, or <schemaName>.<tableName>.<typeName>.
To match the name of a data type, Debezium applies the regular expression that you specify as an anchored regular expression. That is, the specified expression is matched against the entire name string of the data type; the expression does not match substrings that might be present in a type name.

For the list of Oracle-specific data type names, see the Oracle data type mappings.

heartbeat.interval.ms

0

Specifies, in milliseconds, how frequently the connector sends messages to a heartbeat topic.
Use this property to determine whether the connector continues to receive change events from the source database.
It can also be useful to set the property in situations where no change events occur in captured tables for an extended period.
In such a case, although the connector continues to read the redo log, it emits no change event messages, so that the offset in the Kafka topic remains unchanged. Because the connector does not flush the latest system change number (SCN) that it read from the database, the database might retain the redo log files for longer than necessary. If the connector restarts, the extended retention period could result in the connector redundantly sending some change events.
The default value of 0 prevents the connector from sending any heartbeat messages.

heartbeat.action.query

No default

Specifies a query that the connector executes on the source database when the connector sends a heartbeat message.

For example:

INSERT INTO test_heartbeat_table (text) VALUES ('test_heartbeat')

The connector runs the query after it emits a heartbeat message.

Set this property and create a heartbeat table to receive the heartbeat messages to resolve situations in which Debezium fails to synchronize offsets on low-traffic databases that are on the same host as a high-traffic database. After the connector inserts records into the configured table, it is able to receive changes from the low-traffic database and acknowledge SCN changes in the database, so that offsets can be synchronized with the broker.

snapshot.delay.ms

No default

Specifies an interval in milliseconds that the connector waits after it starts before it takes a snapshot.
Use this property to prevent snapshot interruptions when you start multiple connectors in a cluster, which might cause re-balancing of connectors.

streaming.delay.ms

Specifies the time, in milliseconds, that the connector delays the start of the streaming process after it completes a snapshot. Setting a delay interval helps to prevent the connector from restarting snapshots in the event that a failure occurs immediately after the snapshot completes, but before the streaming process begins. Set a delay value that is higher than the value of the offset.flush.interval.ms property that is set for the Kafka Connect worker.

snapshot.fetch.size

10000

Specifies the maximum number of rows that should be read in one go from each table while taking a snapshot. The connector reads table contents in multiple batches of the specified size.

query.fetch.size

10000

Specifies the number of rows that will be fetched for each database round-trip of a given query. Using a value of 0 will use the JDBC driver’s default fetch size.

provide.transaction.metadata

false

Set the property to true if you want Debezium to generate events with transaction boundaries and enriches data events envelope with transaction metadata.

See Transaction Metadata for additional details.

log.mining.strategy

online_catalog

Specifies the mining strategy that controls how Oracle LogMiner builds and uses a given data dictionary for resolving table and column ids to names.
Set one of the following options:

redo_log_catalog: Writes the data dictionary to the online redo logs causing more archive logs to be generated over time. This also enables tracking DDL changes against captured tables, so if the schema changes frequently this is the ideal choice.
online_catalog: Uses the database’s current data dictionary to resolve object ids and does not write any extra information to the online redo logs. This allows LogMiner to mine substantially faster but at the expense that DDL changes cannot be tracked. If the captured table(s) schema changes infrequently or never, this is the ideal choice.
hybrid: Uses a combination of the database’s current data dictionary and the Debezium in-memory schema model to resolve table and column names seamlessly. This mode performs at the level of the online_catalog LogMiner strategy with the schema tracking resilience of the redo_log_catalog strategy while not incurring the overhead of archive log generation and performance costs of the redo_log_catalog strategy.

Debezium Connector for Oracle

Overview

How the Oracle connector works

Snapshots

Default workflow that the Oracle connector uses to perform an initial snapshot

Understanding why initial snapshots capture the schema history for all tables

Capturing data from tables not captured by the initial snapshot (no schema change)

Capturing data from tables not captured by the initial snapshot (schema change)

Ad hoc snapshots

Incremental snapshots

Triggering an incremental snapshot

Running an ad hoc incremental snapshots with additional-conditions

Using the Kafka signaling channel to trigger an incremental snapshot

Stopping an incremental snapshot

Using the Kafka signaling channel to stop an incremental snapshot

Custom snapshotter SPI

Blocking snapshots

Topic names

Schema history topic

Schema change topic

Transaction Metadata

Change data event enrichment

LogMiner Mining Strategies

Writing the data dictionary to redo logs

Using the online catalog directly

Hybrid approach

Query Modes

Event buffering

Heap

Infinispan

Infinispan Hotrod client integration

SCN gap detection

Low change frequency offset management

Data change events

Change event keys

Change event values

create events

update events

delete events

truncate events

Data type mappings

Character types

Binary and Character LOB types

Numeric types

Boolean types

Temporal types

ROWID types

XML types

User-defined types

Oracle-supplied types

Default Values

Custom converters

NUMBER(1) to Boolean

NUMBER To Zero Scale

RAW to String

Setting up Oracle

Compatibility with Oracle installation types

Schemas excluded from capture

Tables excluded from capture

Preparing the database

Redo log sizing

Archive log destinations

Creating users for the connector

Standby databases

Failover databases

Configuring the Debezium Oracle connector to capture events from a logical standby

Extended max string size

Deployment

Debezium Oracle connector configuration

Adding connector configuration

Pluggable vs Non-Pluggable databases

Using mTLS secure connections

Java key/trust stores

Oracle Wallet

Connector properties

Required Debezium Oracle connector configuration properties

Running an ad hoc incremental snapshots with `additional-conditions`

`NUMBER(1)` to Boolean

`NUMBER` To Zero Scale

`RAW` to String