Debezium Connector for Oracle
- Overview
- How the Oracle connector works
- Adapter Modes
- Snapshots
- Ad hoc snapshots
- Incremental snapshots
- Custom snapshotter SPI
- Blocking snapshots
- Topic names
- Schema history topic
- Schema change topic
- Transaction Metadata
- LogMiner Mining Strategies
- Using a downstream mining database
- Query Modes
- Event buffering
- SCN gap detection
- Low change frequency offset management
- Data change events
- Data type mappings
- Custom converters
- Setting up Oracle
- Deployment
- Connector properties
- Monitoring
- Surrogate schema evolution
- OpenLogReplicator support
- How OpenLogReplicator works
- Preparing the database
- Creating connector users
- Configuring the OpenLogReplicator adapter
- Building OpenLogReplicator
- Obtaining the Oracle JDBC driver for OpenLogReplicator
- OpenLogReplicator configuration
- OpenLogReplicator connector properties
- OpenLogReplicator ROWID support
- OpenLogReplicator XML support
- XStream support
- Frequently Asked Questions
Overview
Debezium’s Oracle connector captures and records row-level changes that occur in databases on an Oracle server, including tables that are added while the connector is running. You can configure the connector to emit change events for specific subsets of schemas and tables, or to ignore, mask, or truncate values in specific columns.
For information about the Oracle Database versions that are compatible with this connector, see the Debezium release overview.
Debezium can ingest change events from Oracle by using the native LogMiner database package, the XStream API, or OpenLogReplicator.
How the Oracle connector works
To optimally configure and run a Debezium Oracle connector, it is helpful to understand how the connector works.
Adapter Modes
The Debezium Oracle connector supports multiple adapters for capturing changes from the Oracle database transaction logs.
You can configure the connector to use a specific adapter by setting the database.connection.adapter
configuration property.
The connector supports the following adapters:
- LogMiner
-
By default, the Debezium Oracle connector uses the native Oracle LogMiner API to read and stream changes from database transaction logs. You can configure Debezium to use LogMiner in either of the following distinct modes:
- Uncommitted changes mode
-
In uncommitted changes mode, the Debezium Oracle connector receives a continuous stream of change events for both committed and in-flight uncommitted transactions. Because Oracle interleaves operations from multiple transactions, the connector must buffer events until a corresponding commit or rollback is detected.
Because this the default mode for the connector, it’s important to allocate enough memory to the connector to handle the largest transactions and highest concurrency levels across all captured tables.
Uncommitted changes mode minimizes the load on the source database by delegating most processing to the connector. Oracle LogMiner is responsible only for reading redo and archive logs from disk and streaming the raw changes.
Enable this mode by setting the
database.connection.adapter
property tologminer
(default). - Committed changes mode
-
In committed changes mode, the Debezium Oracle connector instructs Oracle LogMiner to stream only committed changes. Because the connector receives only committed events, it can immediately forward them to destination topics without maintaining an internal transaction buffer.
This mode requires significantly less connector memory, but shifts a much greater processing and memory burden to the source database. For large transactions, this can easily exceed the database’s available PGA memory, potentially leading to
PGA_AGGREGATE_LIMIT
errors.This mode is recommended only if you can guarantee that all transactions fit within the database’s PGA memory space. Otherwise, use the uncommitted changes mode.
Enable this mode by setting the
database.connection.adapter
property tologminer_unbuffered
.This feature is currently incubating and might change in future releases.
- OpenLogReplicator
-
You can configure the Debezium Oracle connector to use OpenLogReplicator, an open-source application that reads Oracle changes directly from the redo and archive logs with minimal impact on the database.
Enable OpenLogReplicator by setting the
database.connection.adapter
property toolr
. For more information, see the OpenLogReplicator support section. - Oracle XStream
-
The Debezium Oracle connector can also use Oracle XStream, a commercial component of Oracle GoldenGate.
Enable Oracle XStream by setting the
database.connection.adapter
property toxstream
. For more details, see the XStream section.
Snapshots
Typically, the redo logs on an Oracle server are configured to not retain the complete history of the database. As a result, the Debezium Oracle connector cannot retrieve the entire history of the database from the logs. To enable the connector to establish a baseline for the current state of the database, the first time that the connector starts, it performs an initial consistent snapshot of the database.
If the time needed to complete the initial snapshot exceeds the |
During a table’s snapshot, it’s possible for Oracle to raise an ORA-01466 exception. This happens when a user modifies the schema of the table or adds, changes, or drops an index or related object associated with the table being snapshot. In the event this happens, the connector will stop and the initial snapshot will need to be taken from the beginning. To remediate the problem, you can configure the |
Default workflow that the Oracle connector uses to perform an initial snapshot
The following workflow lists the steps that Debezium takes to create a snapshot.
These steps describe the process for a snapshot when the snapshot.mode
configuration property is set to its default value, which is initial
.
You can customize the way that the connector creates snapshots by changing the value of the snapshot.mode
property.
If you configure a different snapshot mode, the connector completes the snapshot by using a modified version of this workflow.
When the snapshot mode is set to the default, the connector completes the following tasks to create a snapshot:
-
Establish a connection to the database.
-
Determine the tables to be captured. By default, the connector captures all tables except those with schemas that exclude them from capture. After the snapshot completes, the connector continues to stream data for the specified tables. If you want the connector to capture data only from specific tables you can direct the connector to capture the data for only a subset of tables or table elements by setting properties such as
table.include.list
ortable.exclude.list
. -
Obtain a
ROW SHARE MODE
lock on each of the captured tables to prevent structural changes from occurring during creation of the snapshot. Debezium holds the locks for only a short time. -
Read the current system change number (SCN) position from the server’s redo log.
-
Capture the structure of all database tables, or all tables that are designated for capture. The connector persists schema information in its internal database schema history topic. The schema history provides information about the structure that is in effect when a change event occurs.
By default, the connector captures the schema of every table in the database that is in capture mode, including tables that are not configured for capture. If tables are not configured for capture, the initial snapshot captures only their structure; it does not capture any table data. For more information about why snapshots persist schema information for tables that you did not include in the initial snapshot, see Understanding why initial snapshots capture the schema for all tables.
-
Release the locks obtained in Step 3. Other database clients can now write to any previously locked tables.
-
At the SCN position that was read in Step 4, the connector scans the tables that are designated for capture (
SELECT * FROM … AS OF SCN 123
). During the scan, the connector completes the following tasks:-
Confirms that the table was created before the snapshot began. If the table was created after the snapshot began, the connector skips the table. After the snapshot is complete, and the connector transitions to streaming, it emits change events for any tables that were created after the snapshot began.
-
Produces a
read
event for each row that is captured from a table. Allread
events contain the same SCN position, which is the SCN position that was obtained in step 4. -
Emits each
read
event to the Kafka topic for the source table. -
Releases data table locks, if applicable.
-
-
Record the successful completion of the snapshot in the connector offsets.
The resulting initial snapshot captures the current state of each row in the captured tables. From this baseline state, the connector captures subsequent changes as they occur.
After the snapshot process begins, if the process is interrupted due to connector failure, rebalancing, or other reasons, the process restarts after the connector restarts. After the connector completes the initial snapshot, it continues streaming from the position that it read in Step 3 so that it does not miss any updates. If the connector stops again for any reason, after it restarts, it resumes streaming changes from where it previously left off.
Setting | Description |
---|---|
|
Perform snapshot on each connector start. After the snapshot completes, the connector begins to stream event records for subsequent database changes. |
|
The connector performs a database snapshot as described in the default workflow for creating an initial snapshot. After the snapshot completes, the connector begins to stream event records for subsequent database changes. |
|
The connector performs a database snapshot and stops before streaming any change event records, not allowing any subsequent change events to be captured. |
|
Deprecated, see |
|
The connector captures the structure of all relevant tables, performing all of the steps described in the default snapshot workflow, except that it does not create |
|
Deprecated, see |
|
Set this option to restore a database schema history topic that is lost or corrupted.
After a restart, the connector runs a snapshot that rebuilds the topic from the source tables.
You can also set the property to periodically prune a database schema history topic that experiences unexpected growth. |
|
After the connector starts, it performs a snapshot only if it detects one of the following circumstances:
|
|
Set the snapshot mode to |
|
The |
For more information, see snapshot.mode
in the table of connector configuration properties.
Understanding why initial snapshots capture the schema history for all tables
The initial snapshot that a connector runs captures two types of information:
- Table data
-
Information about
INSERT
,UPDATE
, andDELETE
operations in tables that are named in the connector’stable.include.list
property. - Schema data
-
DDL statements that describe the structural changes that are applied to tables. Schema data is persisted to both the internal schema history topic, and to the connector’s schema change topic, if one is configured.
After you run an initial snapshot, you might notice that the snapshot captures schema information for tables that are not designated for capture. By default, initial snapshots are designed to capture schema information for every table that is present in the database, not only from tables that are designated for capture. Connectors require that the table’s schema is present in the schema history topic before they can capture a table. By enabling the initial snapshot to capture schema data for tables that are not part of the original capture set, Debezium prepares the connector to readily capture event data from these tables should that later become necessary. If the initial snapshot does not capture a table’s schema, you must add the schema to the history topic before the connector can capture data from the table.
In some cases, you might want to limit schema capture in the initial snapshot. This can be useful when you want to reduce the time required to complete a snapshot. Or when Debezium connects to the database instance through a user account that has access to multiple logical databases, but you want the connector to capture changes only from tables in a specific logic database.
-
Capturing data from tables not captured by the initial snapshot (no schema change)
-
Capturing data from tables not captured by the initial snapshot (schema change)
-
Setting the
schema.history.internal.store.only.captured.tables.ddl
property to specify the tables from which to capture schema information. -
Setting the
schema.history.internal.store.only.captured.databases.ddl
property to specify the logical databases from which to capture schema changes.
Capturing data from tables not captured by the initial snapshot (no schema change)
In some cases, you might want the connector to capture data from a table whose schema was not captured by the initial snapshot. Depending on the connector configuration, the initial snapshot might capture the table schema only for specific tables in the database. If the table schema is not present in the history topic, the connector fails to capture the table, and reports a missing schema error.
You might still be able to capture data from the table, but you must perform additional steps to add the table schema.
-
You want to capture data from a table with a schema that the connector did not capture during the initial snapshot.
-
All entries for the table in the transaction log use the same schema. For information about capturing data from a new table that has undergone structural changes, see Capturing data from tables not captured by the initial snapshot (schema change).
-
Stop the connector.
-
Remove the internal database schema history topic that is specified by the
schema.history.internal.kafka.topic property
. -
In the connector configuration:
-
Set the
snapshot.mode
torecovery
. -
(Optional) Set the value of
schema.history.internal.store.only.captured.tables.ddl
tofalse
to ensure that in the future the connector can readily capture data for tables that are not currently designated for capture. Connectors can capture data from a table only if the table’s schema history is present in the history topic. -
Add the tables that you want the connector to capture to
table.include.list
.
-
-
Restart the connector. The snapshot recovery process rebuilds the schema history based on the current structure of the tables.
-
(Optional) After the snapshot completes, initiate an incremental snapshot on the newly added tables. The incremental snapshot first streams the historical data of the newly added tables, and then resumes reading changes from the redo and archive logs for previously configured tables, including changes that occur while that connector was off-line.
-
(Optional) Reset the
snapshot.mode
back tono_data
to prevent the connector from initiating recovery after a future restart.
Capturing data from tables not captured by the initial snapshot (schema change)
If a schema change is applied to a table, records that are committed before the schema change have different structures than those that were committed after the change. When Debezium captures data from a table, it reads the schema history to ensure that it applies the correct schema to each event. If the schema is not present in the schema history topic, the connector is unable to capture the table, and an error results.
If you want to capture data from a table that was not captured by the initial snapshot, and the schema of the table was modified, you must add the schema to the history topic, if it is not already available. You can add the schema by running a new schema snapshot, or by running an initial snapshot for the table.
-
You want to capture data from a table with a schema that the connector did not capture during the initial snapshot.
-
A schema change was applied to the table so that the records to be captured do not have a uniform structure.
- Initial snapshot captured the schema for all tables (
store.only.captured.tables.ddl
was set tofalse
) -
-
Edit the
table.include.list
property to specify the tables that you want to capture. -
Restart the connector.
-
Initiate an incremental snapshot if you want to capture existing data from the newly added tables.
-
- Initial snapshot did not capture the schema for all tables (
store.only.captured.tables.ddl
was set totrue
) -
If the initial snapshot did not save the schema of the table that you want to capture, complete one of the following procedures:
- Procedure 1: Schema snapshot, followed by incremental snapshot
-
In this procedure, the connector first performs a schema snapshot. You can then initiate an incremental snapshot to enable the connector to synchronize data.
-
Stop the connector.
-
Remove the internal database schema history topic that is specified by the
schema.history.internal.kafka.topic property
. -
Clear the offsets in the configured Kafka Connect
offset.storage.topic
. For more information about how to remove offsets, see the Debezium community FAQ.Removing offsets should be performed only by advanced users who have experience in manipulating internal Kafka Connect data. This operation is potentially destructive, and should be performed only as a last resort.
-
Set values for properties in the connector configuration as described in the following steps:
-
Set the value of the
snapshot.mode
property tono_data
. -
Edit the
table.include.list
to add the tables that you want to capture.
-
-
Restart the connector.
-
Wait for Debezium to capture the schema of the new and existing tables. Data changes that occurred any tables after the connector stopped are not captured.
-
To ensure that no data is lost, initiate an incremental snapshot.
-
- Procedure 2: Initial snapshot, followed by optional incremental snapshot
-
In this procedure the connector performs a full initial snapshot of the database. As with any initial snapshot, in a database with many large tables, running an initial snapshot can be a time-consuming operation. After the snapshot completes, you can optionally trigger an incremental snapshot to capture any changes that occur while the connector is off-line.
-
Stop the connector.
-
Remove the internal database schema history topic that is specified by the
schema.history.internal.kafka.topic property
. -
Clear the offsets in the configured Kafka Connect
offset.storage.topic
. For more information about how to remove offsets, see the Debezium community FAQ.Removing offsets should be performed only by advanced users who have experience in manipulating internal Kafka Connect data. This operation is potentially destructive, and should be performed only as a last resort.
-
Edit the
table.include.list
to add the tables that you want to capture. -
Set values for properties in the connector configuration as described in the following steps:
-
Set the value of the
snapshot.mode
property toinitial
. -
(Optional) Set
schema.history.internal.store.only.captured.tables.ddl
tofalse
.
-
-
Restart the connector. The connector takes a full database snapshot. After the snapshot completes, the connector transitions to streaming.
-
(Optional) To capture any data that changed while the connector was off-line, initiate an incremental snapshot.
-
Ad hoc snapshots
By default, a connector runs an initial snapshot operation only after it starts for the first time. Following this initial snapshot, under normal circumstances, the connector does not repeat the snapshot process. Any future change event data that the connector captures comes in through the streaming process only.
However, in some situations the data that the connector obtained during the initial snapshot might become stale, lost, or incomplete. To provide a mechanism for recapturing table data, Debezium includes an option to perform ad hoc snapshots. You might want to perform an ad hoc snapshot after any of the following changes occur in your Debezium environment:
-
The connector configuration is modified to capture a different set of tables.
-
Kafka topics are deleted and must be rebuilt.
-
Data corruption occurs due to a configuration error or some other problem.
You can re-run a snapshot for a table for which you previously captured a snapshot by initiating a so-called ad-hoc snapshot. Ad hoc snapshots require the use of signaling tables. You initiate an ad hoc snapshot by sending a signal request to the Debezium signaling table.
When you initiate an ad hoc snapshot of an existing table, the connector appends content to the topic that already exists for the table. If a previously existing topic was removed, Debezium can create a topic automatically if automatic topic creation is enabled.
Ad hoc snapshot signals specify the tables to include in the snapshot. The snapshot can capture the entire contents of the database, or capture only a subset of the tables in the database. Also, the snapshot can capture a subset of the contents of the table(s) in the database.
You specify the tables to capture by sending an execute-snapshot
message to the signaling table.
Set the type of the execute-snapshot
signal to incremental
or blocking
, and provide the names of the tables to include in the snapshot, as described in the following table:
Field | Default | Value |
---|---|---|
|
|
Specifies the type of snapshot that you want to run. |
|
N/A |
An array that contains regular expressions matching the fully-qualified names of the tables to include in the snapshot. |
|
N/A |
An optional array that specifies a set of additional conditions that the connector evaluates to determine the subset of records to include in a snapshot.
|
|
N/A |
An optional string that specifies the column name that the connector uses as the primary key of a table during the snapshot process. |
You initiate an ad hoc incremental snapshot by adding an entry with the execute-snapshot
signal type to the signaling table, or by sending a signal message to a Kafka signaling topic.
After the connector processes the message, it begins the snapshot operation.
The snapshot process reads the first and last primary key values and uses those values as the start and end point for each table.
Based on the number of entries in the table, and the configured chunk size, Debezium divides the table into chunks, and proceeds to snapshot each chunk, in succession, one at a time.
For more information, see Incremental snapshots.
You initiate an ad hoc blocking snapshot by adding an entry with the execute-snapshot
signal type to the signaling table or signaling topic.
After the connector processes the message, it begins the snapshot operation.
The connector temporarily stops streaming, and then initiates a snapshot of the specified table, following the same process that it uses during an initial snapshot.
After the snapshot completes, the connector resumes streaming.
For more information, see Blocking snapshots.
Incremental snapshots
To provide flexibility in managing snapshots, Debezium includes a supplementary snapshot mechanism, known as incremental snapshotting. Incremental snapshots rely on the Debezium mechanism for sending signals to a Debezium connector. Incremental snapshots are based on the DDD-3 design document.
In an incremental snapshot, instead of capturing the full state of a database all at once, as in an initial snapshot, Debezium captures each table in phases, in a series of configurable chunks. You can specify the tables that you want the snapshot to capture and the size of each chunk. The chunk size determines the number of rows that the snapshot collects during each fetch operation on the database. The default chunk size for incremental snapshots is 1024 rows.
As an incremental snapshot proceeds, Debezium uses watermarks to track its progress, maintaining a record of each table row that it captures. This phased approach to capturing data provides the following advantages over the standard initial snapshot process:
-
You can run incremental snapshots in parallel with streamed data capture, instead of postponing streaming until the snapshot completes. The connector continues to capture near real-time events from the change log throughout the snapshot process, and neither operation blocks the other.
-
If the progress of an incremental snapshot is interrupted, you can resume it without losing any data. After the process resumes, the snapshot begins at the point where it stopped, rather than recapturing the table from the beginning.
-
You can run an incremental snapshot on demand at any time, and repeat the process as needed to adapt to database updates. For example, you might re-run a snapshot after you modify the connector configuration to add a table to its
table.include.list
property.
When you run an incremental snapshot, Debezium sorts each table by primary key and then splits the table into chunks based on the configured chunk size.
Working chunk by chunk, it then captures each table row in a chunk.
For each row that it captures, the snapshot emits a READ
event.
That event represents the value of the row when the snapshot for the chunk began.
As a snapshot proceeds, it’s likely that other processes continue to access the database, potentially modifying table records.
To reflect such changes, INSERT
, UPDATE
, or DELETE
operations are committed to the transaction log as per usual.
Similarly, the ongoing Debezium streaming process continues to detect these change events and emits corresponding change event records to Kafka.
In some cases, the UPDATE
or DELETE
events that the streaming process emits are received out of sequence.
That is, the streaming process might emit an event that modifies a table row before the snapshot captures the chunk that contains the READ
event for that row.
When the snapshot eventually emits the corresponding READ
event for the row, its value is already superseded.
To ensure that incremental snapshot events that arrive out of sequence are processed in the correct logical order, Debezium employs a buffering scheme for resolving collisions.
Only after collisions between the snapshot events and the streamed events are resolved does Debezium emit an event record to Kafka.
To assist in resolving collisions between late-arriving READ
events and streamed events that modify the same table row, Debezium employs a so-called snapshot window.
The snapshot window demarcates the interval during which an incremental snapshot captures data for a specified table chunk.
Before the snapshot window for a chunk opens, Debezium follows its usual behavior and emits events from the transaction log directly downstream to the target Kafka topic.
But from the moment that the snapshot for a particular chunk opens, until it closes, Debezium performs a de-duplication step to resolve collisions between events that have the same primary key..
For each data collection, the Debezium emits two types of events, and stores the records for them both in a single destination Kafka topic.
The snapshot records that it captures directly from a table are emitted as READ
operations.
Meanwhile, as users continue to update records in the data collection, and the transaction log is updated to reflect each commit, Debezium emits UPDATE
or DELETE
operations for each change.
As the snapshot window opens, and Debezium begins processing a snapshot chunk, it delivers snapshot records to a memory buffer.
During the snapshot windows, the primary keys of the READ
events in the buffer are compared to the primary keys of the incoming streamed events.
If no match is found, the streamed event record is sent directly to Kafka.
If Debezium detects a match, it discards the buffered READ
event, and writes the streamed record to the destination topic, because the streamed event logically supersede the static snapshot event.
After the snapshot window for the chunk closes, the buffer contains only READ
events for which no related transaction log events exist.
Debezium emits these remaining READ
events to the table’s Kafka topic.
The connector repeats the process for each snapshot chunk.
To enable Debezium to perform incremental snapshots, you must grant the connector permission to write to the signaling table. Write permission is unnecessary only for connectors that can be configured to perform read-only incrementals snapshots (MariaDB, MySQL, or PostgreSQL). |
Currently, you can use either of the following methods to initiate an incremental snapshot:
The Debezium connector for Oracle does not support schema changes while an incremental snapshot is running. |
Triggering an incremental snapshot
To initiate an incremental snapshot, you can send an ad hoc snapshot signal to the signaling table on the source database.
You submit snapshot signals as SQL INSERT
queries.
After Debezium detects the change in the signaling table, it reads the signal, and runs the requested snapshot operation.
The query that you submit specifies the tables to include in the snapshot, and, optionally, specifies the type of snapshot operation.
Debezium currently supports the incremental
and blocking
snapshot types.
To specify the tables to include in the snapshot, provide a data-collections
array that lists the tables, or an array of regular expressions used to match tables, for example,
{"data-collections": ["public.MyFirstTable", "public.MySecondTable"]}
The data-collections
array for an incremental snapshot signal has no default value.
If the data-collections
array is empty, Debezium interprets the empty array to mean that no action is required, and it does not perform a snapshot.
If the name of a table that you want to include in a snapshot contains a dot ( |
-
-
A signaling data collection exists on the source database.
-
The signaling data collection is specified in the
signal.data.collection
property.
-
-
Send a SQL query to add the ad hoc incremental snapshot request to the signaling table:
INSERT INTO <signalTable> (id, type, data) VALUES ('<id>', '<snapshotType>', '{"data-collections": ["<fullyQualfiedTableName>","<fullyQualfiedTableName>"],"type":"<snapshotType>","additional-conditions":[{"data-collection": "<fullyQualfiedTableName>", "filter": "<additional-condition>"}]}');
For example,
INSERT INTO db1.myschema.debezium_signal (id, type, data) (1) values ('ad-hoc-1', (2) 'execute-snapshot', (3) '{"data-collections": ["db1.schema1.table1", "db1.schema1.table2"], (4) "type":"incremental", (5) "additional-conditions":[{"data-collection": "db1.schema1.table1" ,"filter":"color=\'blue\'"}]}'); (6)
The values of the
id
,type
, anddata
parameters in the command correspond to the fields of the signaling table.
The following table describes the parameters in the example:Table 3. Descriptions of fields in a SQL command for sending an incremental snapshot signal to the signaling table Item Value Description 1
database.schema.debezium_signal
Specifies the fully-qualified name of the signaling table on the source database.
2
ad-hoc-1
The
id
parameter specifies an arbitrary string that is assigned as theid
identifier for the signal request.
Use this string to identify logging messages to entries in the signaling table. Debezium does not use this string. Rather, during the snapshot, Debezium generates its ownid
string as a watermarking signal.3
execute-snapshot
The
type
parameter specifies the operation that the signal is intended to trigger.4
data-collections
A required component of the
data
field of a signal that specifies an array of table names or regular expressions to match table names to include in the snapshot.
The array lists regular expressions that use the formatdatabase.schema.table
to match the fully-qualified names of the tables. This format is the same as the one that you use to specify the name of the connector’s signaling table.5
incremental
An optional
type
component of thedata
field of a signal that specifies the type of snapshot operation to run.
Valid values areincremental
andblocking
.
If you do not specify a value, the connector defaults to performing an incremental snapshot.6
additional-conditions
An optional array that specifies a set of additional conditions that the connector evaluates to determine the subset of records to include in a snapshot.
Each additional condition is an object withdata-collection
andfilter
properties. You can specify different filters for each data collection.
* Thedata-collection
property is the fully-qualified name of the data collection that the filter applies to. For more information about theadditional-conditions
parameter, see Running an ad hoc incremental snapshots withadditional-conditions
.
Running an ad hoc incremental snapshots with additional-conditions
If you want a snapshot to include only a subset of the content in a table, you can modify the signal request by appending an additional-conditions
parameter to the snapshot signal.
The SQL query for a typical snapshot takes the following form:
SELECT * FROM <tableName> ....
By adding an additional-conditions
parameter, you append a WHERE
condition to the SQL query, as in the following example:
SELECT * FROM <data-collection> WHERE <filter> ....
The following example shows a SQL query to send an ad hoc incremental snapshot request with an additional condition to the signaling table:
INSERT INTO <signalTable> (id, type, data) VALUES ('<id>', '<snapshotType>', '{"data-collections": ["<fullyQualfiedTableName>","<fullyQualfiedTableName>"],"type":"<snapshotType>","additional-conditions":[{"data-collection": "<fullyQualfiedTableName>", "filter": "<additional-condition>"}]}');
For example, suppose you have a products
table that contains the following columns:
-
id
(primary key) -
color
-
quantity
If you want an incremental snapshot of the products
table to include only the data items where color=blue
, you can use the following SQL statement to trigger the snapshot:
INSERT INTO db1.myschema.debezium_signal (id, type, data) VALUES('ad-hoc-1', 'execute-snapshot', '{"data-collections": ["db1.schema1.products"],"type":"incremental", "additional-conditions":[{"data-collection": "db1.schema1.products", "filter": "color=blue"}]}');
The additional-conditions
parameter also enables you to pass conditions that are based on more than one column.
For example, using the products
table from the previous example, you can submit a query that triggers an incremental snapshot that includes the data of only those items for which color=blue
and quantity>10
:
INSERT INTO db1.myschema.debezium_signal (id, type, data) VALUES('ad-hoc-1', 'execute-snapshot', '{"data-collections": ["db1.schema1.products"],"type":"incremental", "additional-conditions":[{"data-collection": "db1.schema1.products", "filter": "color=blue AND quantity>10"}]}');
The following example, shows the JSON for an incremental snapshot event that is captured by a connector.
{
"before":null,
"after": {
"pk":"1",
"value":"New data"
},
"source": {
...
"snapshot":"incremental" (1)
},
"op":"r", (2)
"ts_ms":"1620393591654",
"ts_us":"1620393591654547",
"ts_ns":"1620393591654547920",
"transaction":null
}
Item | Field name | Description |
---|---|---|
1 |
|
Specifies the type of snapshot operation to run. |
2 |
|
Specifies the event type. |
Using the Kafka signaling channel to trigger an incremental snapshot
You can send a message to the configured Kafka topic to request the connector to run an ad hoc incremental snapshot.
The key of the Kafka message must match the value of the topic.prefix
connector configuration option.
The value of the message is a JSON object with type
and data
fields.
The signal type is execute-snapshot
, and the data
field must have the following fields:
Field | Default | Value |
---|---|---|
|
|
The type of the snapshot to be executed.
Currently Debezium supports the |
|
N/A |
An array of comma-separated regular expressions that match the fully-qualified names of tables to include in the snapshot. |
|
N/A |
An optional array of additional conditions that specifies criteria that the connector evaluates to designate a subset of records to include in a snapshot. |
execute-snapshot
Kafka messageKey = `test_connector` Value = `{"type":"execute-snapshot","data": {"data-collections": ["{collection-container}.table1", "{collection-container}.table2"], "type": "INCREMENTAL"}}`
Debezium uses the additional-conditions
field to select a subset of a table’s content.
Typically, when Debezium runs a snapshot, it runs a SQL query such as:
SELECT * FROM <tableName> ….
When the snapshot request includes an additional-conditions
property, the data-collection
and filter
parameters of the property are appended to the SQL query, for example:
SELECT * FROM <data-collection> WHERE <filter> ….
For example, given a products
table with the columns id
(primary key), color
, and brand
, if you want a snapshot to include only content for which color='blue'
, when you request the snapshot, you could add the additional-conditions
property to filter the content:
:leveloffset: +1
Key = `test_connector`
Value = `{"type":"execute-snapshot","data": {"data-collections": ["db1.schema1.products"], "type": "INCREMENTAL", "additional-conditions": [{"data-collection": "db1.schema1.products" ,"filter":"color='blue'"}]}}`
You can also use the additional-conditions
property to pass conditions based on multiple columns.
For example, using the same products
table as in the previous example, if you want a snapshot to include only the content from the products
table for which color='blue'
, and brand='MyBrand'
, you could send the following request:
:leveloffset: +1
Key = `test_connector`
Value = `{"type":"execute-snapshot","data": {"data-collections": ["db1.schema1.products"], "type": "INCREMENTAL", "additional-conditions": [{"data-collection": "db1.schema1.products" ,"filter":"color='blue' AND brand='MyBrand'"}]}}`
Stopping an incremental snapshot
In some situations, it might be necessary to stop an incremental snapshot. For example, you might realize that snapshot was not configured correctly, or maybe you want to ensure that resources are available for other database operations. You can stop a snapshot that is already running by sending a signal to the signaling table on the source database.
You submit a stop snapshot signal to the signaling table by sending it in a SQL INSERT
query.
The stop-snapshot signal specifies the type
of the snapshot operation as incremental
, and optionally specifies the tables that you want to omit from the currently running snapshot.
After Debezium detects the change in the signaling table, it reads the signal, and stops the incremental snapshot operation if it’s in progress.
You can also stop an incremental snapshot by sending a JSON message to the Kafka signaling topic.
-
-
A signaling data collection exists on the source database.
-
The signaling data collection is specified in the
signal.data.collection
property.
-
-
Send a SQL query to stop the ad hoc incremental snapshot to the signaling table:
INSERT INTO <signalTable> (id, type, data) values ('<id>', 'stop-snapshot', '{"data-collections": ["<fullyQualfiedTableName>","<fullyQualfiedTableName>"],"type":"incremental"}');
For example,
INSERT INTO db1.myschema.debezium_signal (id, type, data) (1) values ('ad-hoc-1', (2) 'stop-snapshot', (3) '{"data-collections": ["db1.schema1.table1", "db1.schema1.table2"], (4) "type":"incremental"}'); (5)
The values of the
id
,type
, anddata
parameters in the signal command correspond to the fields of the signaling table.
The following table describes the parameters in the example:Table 6. Descriptions of fields in a SQL command for sending a stop incremental snapshot signal to the signaling table Item Value Description 1
database.schema.debezium_signal
Specifies the fully-qualified name of the signaling table on the source database.
2
ad-hoc-1
The
id
parameter specifies an arbitrary string that is assigned as theid
identifier for the signal request.
Use this string to identify logging messages to entries in the signaling table. Debezium does not use this string.3
stop-snapshot
Specifies
type
parameter specifies the operation that the signal is intended to trigger.4
data-collections
An optional component of the
data
field of a signal that specifies an array of table names or regular expressions to match table names to remove from the snapshot.
The array lists regular expressions which match tables by their fully-qualified names in the formatdatabase.schema.table
If you omit this component from the
data
field, the signal stops the entire incremental snapshot that is in progress.5
incremental
A required component of the
data
field of a signal that specifies the type of snapshot operation that is to be stopped.
Currently, the only valid option isincremental
.
If you do not specify atype
value, the signal fails to stop the incremental snapshot.
Using the Kafka signaling channel to stop an incremental snapshot
You can send a signal message to the configured Kafka signaling topic to stop an ad hoc incremental snapshot.
The key of the Kafka message must match the value of the topic.prefix
connector configuration option.
The value of the message is a JSON object with type
and data
fields.
The signal type is stop-snapshot
, and the data
field must have the following fields:
Field | Default | Value |
---|---|---|
|
|
The type of the snapshot to be executed.
Currently Debezium supports only the |
|
N/A |
An optional array of comma-separated regular expressions that match the fully-qualified names of the tables an array of table names or regular expressions to match table names to remove from the snapshot. |
The following example shows a typical stop-snapshot
Kafka message:
Key = `test_connector`
Value = `{"type":"stop-snapshot","data": {"data-collections": ["db1.schema1.table1", "db1.schema1.table2"], "type": "INCREMENTAL"}}`
Custom snapshotter SPI
For more advanced uses, you can fine-tune control of the snapshot by implementing one of the following interfaces:
io.debezium.snapshot.spi.Snapshotter
-
Controls whether the connector takes a snapshot.
io.debezium.snapshot.spi.SnapshotQuery
-
Controls how data is queried during a snapshot.
io.debezium.snapshot.spi.SnapshotLock
-
Controls whether the connector locks tables when taking a snapshot.
/**
* {@link Snapshotter} is used to determine the following details about the snapshot process:
* <p>
* - Whether a snapshot occurs. <br>
* - Whether streaming continues during the snapshot. <br>
* - Whether the snapshot includes schema (if supported). <br>
* - Whether to snapshot data or schema following an error.
* <p>
* Although Debezium provides many default snapshot modes,
* to provide more advanced functionality, such as partial snapshots,
* you can customize implementation of the interface.
* For more information, see the documentation.
*
*
*
*/
@Incubating
public interface Snapshotter extends Configurable {
/**
* @return the name of the snapshotter.
*
*
*/
String name();
/**
* @param offsetExists is {@code true} when the connector has an offset context (i.e. restarted)
* @param snapshotInProgress is {@code true} when the connector is started, but a snapshot is already in progress
*
* @return {@code true} if the snapshotter should take a data snapshot
*/
boolean shouldSnapshotData(boolean offsetExists, boolean snapshotInProgress);
/**
* @param offsetExists is {@code true} when the connector has an offset context (i.e. restarted)
* @param snapshotInProgress is {@code true} when the connector is started, but a snapshot is already in progress
*
* @return {@code true} if the snapshotter should take a schema snapshot
*/
boolean shouldSnapshotSchema(boolean offsetExists, boolean snapshotInProgress);
/**
* @return {@code true} if the snapshotter should stream after taking a snapshot
*/
boolean shouldStream();
/**
* @return {@code true} whether the schema can be recovered if database schema history is corrupted.
*/
boolean shouldSnapshotOnSchemaError();
/**
* @return {@code true} whether the snapshot should be re-executed when there is a gap in data stream.
*/
boolean shouldSnapshotOnDataError();
/**
*
* @return {@code true} if streaming should resume from the start of the snapshot
* transaction, or {@code false} for when a connector resumes and takes a snapshot,
* streaming should resume from where streaming previously left off.
*/
default boolean shouldStreamEventsStartingFromSnapshot() {
return true;
}
/**
* Lifecycle hook called after the snapshot phase is successful.
*/
default void snapshotCompleted() {
// no operation
}
/**
* Lifecycle hook called after the snapshot phase is aborted.
*/
default void snapshotAborted() {
// no operation
}
}
/**
* {@link SnapshotQuery} is used to determine the query used during a data snapshot
*
*
*/
public interface SnapshotQuery extends Configurable, Service {
/**
* @return the name of the snapshot lock.
*
*
*/
String name();
/**
* Generate a valid query string for the specified table, or an empty {@link Optional}
* to skip snapshotting this table (but that table will still be streamed from)
*
* @param tableId the table to generate a query for
* @param snapshotSelectColumns the columns to be used in the snapshot select based on the column
* include/exclude filters
* @return a valid query string, or none to skip snapshotting this table
*/
Optional<String> snapshotQuery(String tableId, List<String> snapshotSelectColumns);
}
/**
* {@link SnapshotLock} is used to determine the table lock mode used during schema snapshot
*
*
*/
public interface SnapshotLock extends Configurable, Service {
/**
* @return the name of the snapshot lock.
*
*
*/
String name();
/**
* Returns a SQL statement for locking the given table during snapshotting, if required by the specific snapshotter
* implementation.
*/
Optional<String> tableLockingStatement(Duration lockTimeout, String tableId);
}
Blocking snapshots
To provide more flexibility in managing snapshots, Debezium includes a supplementary ad hoc snapshot mechanism, known as a blocking snapshot. Blocking snapshots rely on the Debezium mechanism for sending signals to a Debezium connector.
A blocking snapshot behaves just like an initial snapshot, except that you can trigger it at run time.
You might want to run a blocking snapshot rather than use the standard initial snapshot process in the following situations:
-
You add a new table and you want to complete the snapshot while the connector is running.
-
You add a large table, and you want the snapshot to complete in less time than is possible with an incremental snapshot.
When you run a blocking snapshot, Debezium stops streaming, and then initiates a snapshot of the specified table, following the same process that it uses during an initial snapshot. After the snapshot completes, the streaming is resumed.
You can set the following properties in the data
component of a signal:
-
data-collections: to specify which tables must be snapshot.
-
data-collections: Specifies the tables that you want the snapshot to include.
This property accepts a comma-separated list of regular expressions that match fully-qualified table names. The behavior of the property is similar to the behavior of thetable.include.list
property, which specifies the tables to capture in a blocking snapshot. -
additional-conditions: You can specify different filters for different table.
-
The
data-collection
property is the fully-qualified name of the table for which the filter will be applied, and can be case-sensitive or case-insensitive depending on the database. -
The
filter
property will have the same value used in thesnapshot.select.statement.overrides
, the fully-qualified name of the table that should match by case.
-
For example:
{"type": "blocking", "data-collections": ["schema1.table1", "schema1.table2"], "additional-conditions": [{"data-collection": "schema1.table1", "filter": "SELECT * FROM [schema1].[table1] WHERE column1 = 0 ORDER BY column2 DESC"}, {"data-collection": "schema1.table2", "filter": "SELECT * FROM [schema1].[table2] WHERE column2 > 0"}]}
A delay might exist between the time that you send the signal to trigger the snapshot, and the time when streaming stops and the snapshot starts. As a result of this delay, after the snapshot completes, the connector might emit some event records that duplicate records captured by the snapshot.
Topic names
By default, the Oracle connector writes change events for all INSERT
, UPDATE
, and DELETE
operations that occur in a table to a single Apache Kafka topic that is specific to that table.
The connector uses the following convention to name change event topics:
topicPrefix.schemaName.tableName
The following list provides definitions for the components of the default name:
- topicPrefix
-
The topic prefix as specified by the
topic.prefix
connector configuration property. - schemaName
-
The name of the schema in which the operation occurred.
- tableName
-
The name of the table in which the operation occurred.
For example, if fulfillment
is the server name, inventory
is the schema name, and the database contains tables with the names orders
, customers
, and products
,
the Debezium Oracle connector emits events to the following Kafka topics, one for each table in the database:
fulfillment.inventory.orders fulfillment.inventory.customers fulfillment.inventory.products
The connector applies similar naming conventions to label its internal database schema history topics, schema change topics, and transaction metadata topics.
If the default topic name do not meet your requirements, you can configure custom topic names. To configure custom topic names, you specify regular expressions in the logical topic routing SMT. For more information about using the logical topic routing SMT to customize topic naming, see Topic routing.
Schema history topic
When a database client queries a database, the client uses the database’s current schema. However, the database schema can be changed at any time, which means that the connector must be able to identify what the schema was at the time each insert, update, or delete operation was recorded. Also, a connector cannot necessarily apply the current schema to every event. If an event is relatively old, it’s possible that it was recorded before the current schema was applied.
To ensure correct processing of events that occur after a schema change, Oracle includes in the redo log not only the row-level changes that affect the data, but also the DDL statements that are applied to the database. As the connector encounters these DDL statements in the redo log, it parses them and updates an in-memory representation of each table’s schema. The connector uses this schema representation to identify the structure of the tables at the time of each insert, update, or delete operation and to produce the appropriate change event. In a separate database schema history Kafka topic, the connector records all DDL statements along with the position in the redo log where each DDL statement appeared.
When the connector restarts after either a crash or a graceful stop, it starts reading the redo log from a specific position, that is, from a specific point in time. The connector rebuilds the table structures that existed at this point in time by reading the database schema history Kafka topic and parsing all DDL statements up to the point in the redo log where the connector is starting.
This database schema history topic is internal for internal connector use only. Optionally, the connector can also emit schema change events to a different topic that is intended for consumer applications.
-
Default names for topics that receive Debezium event records.
Schema change topic
You can configure a Debezium Oracle connector to produce schema change events that describe structural changes that are applied to tables in the database.
The connector writes schema change events to a Kafka topic named <serverName>
, where serverName
is the namespace that is specified in the topic.prefix
configuration property.
Debezium emits a new message to the schema change topic whenever it streams data from a new table, or when the structure of the table is altered.
Following a change in table structure, you must follow (the schema evolution procedure). |
Messages that the connector sends to the schema change topic contain a payload, and, optionally, also contain the schema of the change event message.
The schema for the schema change event has the following elements:
name
-
The name of the schema change event message.
type
-
The type of the change event message.
version
-
The version of the schema. The version is an integer that is incremented each time the schema is changed.
fields
-
The fields that are included in the change event message.
The following example shows a typical schema in JSON format.
{
"schema": {
"type": "struct",
"fields": [
{
"type": "string",
"optional": false,
"field": "databaseName"
}
],
"optional": false,
"name": "io.debezium.connector.oracle.SchemaChangeKey",
"version": 1
},
"payload": {
"databaseName": "inventory"
}
}
The payload of a schema change event message includes the following elements:
ddl
-
Provides the SQL
CREATE
,ALTER
, orDROP
statement that results in the schema change. databaseName
-
The name of the database to which the statements are applied. The value of
databaseName
serves as the message key. tableChanges
-
A structured representation of the entire table schema after the schema change. The
tableChanges
field contains an array that includes entries for each column of the table. Because the structured representation presents data in JSON or Avro format, consumers can easily read messages without first processing them through a DDL parser.
By default, the connector uses the You can modify settings so that the schema history topic stores a different subset of tables. Use one of the following methods to alter the set of tables that the topic stores:
|
When the connector is configured to capture a table, it stores the history of the table’s schema changes not only in the schema change topic, but also in an internal database schema history topic. The internal database schema history topic is for connector use only and it is not intended for direct use by consuming applications. Ensure that applications that require notifications about schema changes consume that information only from the schema change topic. |
Never partition the database schema history topic. For the database schema history topic to function correctly, it must maintain a consistent, global order of the event records that the connector emits to it. To ensure that the topic is not split among partitions, set the partition count for the topic by using one of the following methods:
|
The schema change topic message format is in an incubating state and might change without notice. |
The following example shows a typical schema change message in JSON format. The message contains a logical representation of the table schema.
{
"schema": {
...
},
"payload": {
"source": {
"version": "3.3.0.Final",
"connector": "oracle",
"name": "server1",
"ts_ms": 1588252618953,
"ts_us": 1588252618953000,
"ts_ns": 1588252618953000000,
"snapshot": "true",
"db": "ORCLPDB1",
"schema": "DEBEZIUM",
"table": "CUSTOMERS",
"txId" : null,
"scn" : "1513734",
"commit_scn": "1513754",
"lcr_position" : null,
"rs_id": "001234.00012345.0124",
"ssn": 1,
"redo_thread": 1,
"user_name": "user",
"row_id": "AAASgjAAMAAAACnAAA",
"commit_ts_ms": 1588252619012,
"start_scn": "1513734",
"start_ts_ms": 1588252618953
},
"ts_ms": 1588252618953, (1)
"ts_us": 1588252618953987, (1)
"ts_ns": 1588252618953987512, (1)
"databaseName": "ORCLPDB1", (2)
"schemaName": "DEBEZIUM", //
"ddl": "CREATE TABLE \"DEBEZIUM\".\"CUSTOMERS\" \n ( \"ID\" NUMBER(9,0) NOT NULL ENABLE, \n \"FIRST_NAME\" VARCHAR2(255), \n \"LAST_NAME" VARCHAR2(255), \n \"EMAIL\" VARCHAR2(255), \n PRIMARY KEY (\"ID\") ENABLE, \n SUPPLEMENTAL LOG DATA (ALL) COLUMNS\n ) SEGMENT CREATION IMMEDIATE \n PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 \n NOCOMPRESS LOGGING\n STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645\n PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1\n BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)\n TABLESPACE \"USERS\" ", (3)
"tableChanges": [ (4)
{
"type": "CREATE", (5)
"id": "\"ORCLPDB1\".\"DEBEZIUM\".\"CUSTOMERS\"", (6)
"table": { (7)
"defaultCharsetName": null,
"primaryKeyColumnNames": [ (8)
"ID"
],
"columns": [ (9)
{
"name": "ID",
"jdbcType": 2,
"nativeType": null,
"typeName": "NUMBER",
"typeExpression": "NUMBER",
"charsetName": null,
"length": 9,
"scale": 0,
"position": 1,
"optional": false,
"autoIncremented": false,
"generated": false
},
{
"name": "FIRST_NAME",
"jdbcType": 12,
"nativeType": null,
"typeName": "VARCHAR2",
"typeExpression": "VARCHAR2",
"charsetName": null,
"length": 255,
"scale": null,
"position": 2,
"optional": false,
"autoIncremented": false,
"generated": false
},
{
"name": "LAST_NAME",
"jdbcType": 12,
"nativeType": null,
"typeName": "VARCHAR2",
"typeExpression": "VARCHAR2",
"charsetName": null,
"length": 255,
"scale": null,
"position": 3,
"optional": false,
"autoIncremented": false,
"generated": false
},
{
"name": "EMAIL",
"jdbcType": 12,
"nativeType": null,
"typeName": "VARCHAR2",
"typeExpression": "VARCHAR2",
"charsetName": null,
"length": 255,
"scale": null,
"position": 4,
"optional": false,
"autoIncremented": false,
"generated": false
}
],
"attributes": [ (10)
{
"customAttribute": "attributeValue"
}
]
}
}
]
}
}
Item | Field name | Description |
---|---|---|
1 |
|
Optional field that displays the time at which the connector processed the event. The time is based on the system clock in the JVM running the Kafka Connect task. In the source object, ts_ms indicates the time that the change was made in the database. By comparing the value for payload.source.ts_ms with the value for payload.ts_ms, you can determine the lag between the source database update and Debezium. |
2 |
|
Identifies the database and the schema that contains the change. |
3 |
|
This field contains the DDL that is responsible for the schema change. |
4 |
|
An array of one or more items that contain the schema changes generated by a DDL command. |
5 |
|
Describes the kind of change. The
|
6 |
|
Full identifier of the table that was created, altered, or dropped.
In the case of a table rename, this identifier is a concatenation of |
7 |
|
Represents table metadata after the applied change. |
8 |
|
List of columns that compose the table’s primary key. |
9 |
|
Metadata for each column in the changed table. |
10 |
|
Custom attribute metadata for each table change. |
In messages that the connector sends to the schema change topic, the message key is the name of the database that contains the schema change.
In the following example, the payload
field contains the databaseName
key:
{
"schema": {
"type": "struct",
"fields": [
{
"type": "string",
"optional": false,
"field": "databaseName"
}
],
"optional": false,
"name": "io.debezium.connector.oracle.SchemaChangeKey",
"version": 1
},
"payload": {
"databaseName": "ORCLPDB1"
}
}
Transaction Metadata
Debezium can generate events that represent transaction metadata boundaries and that enrich data change event messages.
Limits on when Debezium receives transaction metadata
Debezium registers and receives metadata only for transactions that occur after you deploy the connector. Metadata for transactions that occur before you deploy the connector is not available. |
Database transactions are represented by a statement block that is enclosed between the BEGIN
and END
keywords.
Debezium generates transaction boundary events for the BEGIN
and END
delimiters in every transaction.
Transaction boundary events contain the following fields:
status
-
BEGIN
orEND
. id
-
String representation of the unique transaction identifier.
ts_ms
-
The time of a transaction boundary event (
BEGIN
orEND
event) at the data source. If the data source does not provide Debezium with the event time, then the field instead represents the time at which Debezium processes the event. event_count
(forEND
events)-
Total number of events emmitted by the transaction.
data_collections
(forEND
events)-
An array of pairs of
data_collection
andevent_count
elements that indicates the number of events that the connector emits for changes that originate from a data collection.
The following example shows a typical transaction boundary message:
{
"status": "BEGIN",
"id": "5.6.641",
"ts_ms": 1486500577125,
"event_count": null,
"data_collections": null
}
{
"status": "END",
"id": "5.6.641",
"ts_ms": 1486500577691,
"event_count": 2,
"data_collections": [
{
"data_collection": "ORCLPDB1.DEBEZIUM.CUSTOMER",
"event_count": 1
},
{
"data_collection": "ORCLPDB1.DEBEZIUM.ORDER",
"event_count": 1
}
]
}
Unless overridden via the