Debezium Connector for Oracle
- Overview
- How the Oracle connector works
- Data change events
- Data type mappings
- Custom converters
- Setting up Oracle
- Deployment
- Connector properties
- Monitoring
- Surrogate schema evolution
- OpenLogReplicator support
- How OpenLogReplicator works
- Preparing the database
- Creating connector users
- Configuring the OpenLogReplicator adapter
- Building OpenLogReplicator
- Obtaining the Oracle JDBC driver for OpenLogReplicator
- OpenLogReplicator configuration
- OpenLogReplicator connector properties
- OpenLogReplicator ROWID support
- OpenLogReplicator XML support
- XStream support
- Frequently Asked Questions
Overview
Debezium’s Oracle connector captures and records row-level changes that occur in databases on an Oracle server, including tables that are added while the connector is running. You can configure the connector to emit change events for specific subsets of schemas and tables, or to ignore, mask, or truncate values in specific columns.
For information about the Oracle Database versions that are compatible with this connector, see the Debezium release overview.
Debezium can ingest change events from Oracle by using the native LogMiner database package, the XStream API, or OpenLogReplicator.
How the Oracle connector works
To optimally configure and run a Debezium Oracle connector, it is helpful to understand how the connector performs snapshots, streams change events, determines Kafka topic names, uses metadata, and implements event buffering.
Snapshots
Typically, the redo logs on an Oracle server are configured to not retain the complete history of the database. As a result, the Debezium Oracle connector cannot retrieve the entire history of the database from the logs. To enable the connector to establish a baseline for the current state of the database, the first time that the connector starts, it performs an initial consistent snapshot of the database.
If the time needed to complete the initial snapshot exceeds the |
During a table’s snapshot, it’s possible for Oracle to raise an ORA-01466 exception. This happens when a user modifies the schema of the table or adds, changes, or drops an index or related object associated with the table being snapshot. In the event this happens, the connector will stop and the initial snapshot will need to be taken from the beginning. To remediate the problem, you can configure the |
Default workflow that the Oracle connector uses to perform an initial snapshot
The following workflow lists the steps that Debezium takes to create a snapshot.
These steps describe the process for a snapshot when the snapshot.mode
configuration property is set to its default value, which is initial
.
You can customize the way that the connector creates snapshots by changing the value of the snapshot.mode
property.
If you configure a different snapshot mode, the connector completes the snapshot by using a modified version of this workflow.
When the snapshot mode is set to the default, the connector completes the following tasks to create a snapshot:
-
Establish a connection to the database.
-
Determine the tables to be captured. By default, the connector captures all tables except those with schemas that exclude them from capture. After the snapshot completes, the connector continues to stream data for the specified tables. If you want the connector to capture data only from specific tables you can direct the connector to capture the data for only a subset of tables or table elements by setting properties such as
table.include.list
ortable.exclude.list
. -
Obtain a
ROW SHARE MODE
lock on each of the captured tables to prevent structural changes from occurring during creation of the snapshot. Debezium holds the locks for only a short time. -
Read the current system change number (SCN) position from the server’s redo log.
-
Capture the structure of all database tables, or all tables that are designated for capture. The connector persists schema information in its internal database schema history topic. The schema history provides information about the structure that is in effect when a change event occurs.
By default, the connector captures the schema of every table in the database that is in capture mode, including tables that are not configured for capture. If tables are not configured for capture, the initial snapshot captures only their structure; it does not capture any table data. For more information about why snapshots persist schema information for tables that you did not include in the initial snapshot, see Understanding why initial snapshots capture the schema for all tables.
-
Release the locks obtained in Step 3. Other database clients can now write to any previously locked tables.
-
At the SCN position that was read in Step 4, the connector scans the tables that are designated for capture (
SELECT * FROM … AS OF SCN 123
). During the scan, the connector completes the following tasks:-
Confirms that the table was created before the snapshot began. If the table was created after the snapshot began, the connector skips the table. After the snapshot is complete, and the connector transitions to streaming, it emits change events for any tables that were created after the snapshot began.
-
Produces a
read
event for each row that is captured from a table. Allread
events contain the same SCN position, which is the SCN position that was obtained in step 4. -
Emits each
read
event to the Kafka topic for the source table. -
Releases data table locks, if applicable.
-
-
Record the successful completion of the snapshot in the connector offsets.
The resulting initial snapshot captures the current state of each row in the captured tables. From this baseline state, the connector captures subsequent changes as they occur.
After the snapshot process begins, if the process is interrupted due to connector failure, rebalancing, or other reasons, the process restarts after the connector restarts. After the connector completes the initial snapshot, it continues streaming from the position that it read in Step 3 so that it does not miss any updates. If the connector stops again for any reason, after it restarts, it resumes streaming changes from where it previously left off.
Setting | Description |
---|---|
|
Perform snapshot on each connector start. After the snapshot completes, the connector begins to stream event records for subsequent database changes. |
|
The connector performs a database snapshot as described in the default workflow for creating an initial snapshot. After the snapshot completes, the connector begins to stream event records for subsequent database changes. |
|
The connector performs a database snapshot and stops before streaming any change event records, not allowing any subsequent change events to be captured. |
|
Deprecated, see |
|
The connector captures the structure of all relevant tables, performing all of the steps described in the default snapshot workflow, except that it does not create |
|
Deprecated, see |
|
Set this option to restore a database schema history topic that is lost or corrupted.
After a restart, the connector runs a snapshot that rebuilds the topic from the source tables.
You can also set the property to periodically prune a database schema history topic that experiences unexpected growth. |
|
After the connector starts, it performs a snapshot only if it detects one of the following circumstances:
|
|
Set the snapshot mode to |
|
The |
For more information, see snapshot.mode
in the table of connector configuration properties.
Understanding why initial snapshots capture the schema history for all tables
The initial snapshot that a connector runs captures two types of information:
- Table data
-
Information about
INSERT
,UPDATE
, andDELETE
operations in tables that are named in the connector’stable.include.list
property. - Schema data
-
DDL statements that describe the structural changes that are applied to tables. Schema data is persisted to both the internal schema history topic, and to the connector’s schema change topic, if one is configured.
After you run an initial snapshot, you might notice that the snapshot captures schema information for tables that are not designated for capture. By default, initial snapshots are designed to capture schema information for every table that is present in the database, not only from tables that are designated for capture. Connectors require that the table’s schema is present in the schema history topic before they can capture a table. By enabling the initial snapshot to capture schema data for tables that are not part of the original capture set, Debezium prepares the connector to readily capture event data from these tables should that later become necessary. If the initial snapshot does not capture a table’s schema, you must add the schema to the history topic before the connector can capture data from the table.
In some cases, you might want to limit schema capture in the initial snapshot. This can be useful when you want to reduce the time required to complete a snapshot. Or when Debezium connects to the database instance through a user account that has access to multiple logical databases, but you want the connector to capture changes only from tables in a specific logic database.
-
Capturing data from tables not captured by the initial snapshot (no schema change)
-
Capturing data from tables not captured by the initial snapshot (schema change)
-
Setting the
schema.history.internal.store.only.captured.tables.ddl
property to specify the tables from which to capture schema information. -
Setting the
schema.history.internal.store.only.captured.databases.ddl
property to specify the logical databases from which to capture schema changes.
Capturing data from tables not captured by the initial snapshot (no schema change)
In some cases, you might want the connector to capture data from a table whose schema was not captured by the initial snapshot. Depending on the connector configuration, the initial snapshot might capture the table schema only for specific tables in the database. If the table schema is not present in the history topic, the connector fails to capture the table, and reports a missing schema error.
You might still be able to capture data from the table, but you must perform additional steps to add the table schema.
-
You want to capture data from a table with a schema that the connector did not capture during the initial snapshot.
-
All entries for the table in the transaction log use the same schema. For information about capturing data from a new table that has undergone structural changes, see Capturing data from tables not captured by the initial snapshot (schema change).
-
Stop the connector.
-
Remove the internal database schema history topic that is specified by the
schema.history.internal.kafka.topic property
. -
In the connector configuration:
-
Set the
snapshot.mode
torecovery
. -
(Optional) Set the value of
schema.history.internal.store.only.captured.tables.ddl
tofalse
to ensure that in the future the connector can readily capture data for tables that are not currently designated for capture. Connectors can capture data from a table only if the table’s schema history is present in the history topic. -
Add the tables that you want the connector to capture to
table.include.list
.
-
-
Restart the connector. The snapshot recovery process rebuilds the schema history based on the current structure of the tables.
-
(Optional) After the snapshot completes, initiate an incremental snapshot on the newly added tables. The incremental snapshot first streams the historical data of the newly added tables, and then resumes reading changes from the redo and archive logs for previously configured tables, including changes that occur while that connector was off-line.
-
(Optional) Reset the
snapshot.mode
back tono_data
to prevent the connector from initiating recovery after a future restart.
Capturing data from tables not captured by the initial snapshot (schema change)
If a schema change is applied to a table, records that are committed before the schema change have different structures than those that were committed after the change. When Debezium captures data from a table, it reads the schema history to ensure that it applies the correct schema to each event. If the schema is not present in the schema history topic, the connector is unable to capture the table, and an error results.
If you want to capture data from a table that was not captured by the initial snapshot, and the schema of the table was modified, you must add the schema to the history topic, if it is not already available. You can add the schema by running a new schema snapshot, or by running an initial snapshot for the table.
-
You want to capture data from a table with a schema that the connector did not capture during the initial snapshot.
-
A schema change was applied to the table so that the records to be captured do not have a uniform structure.
- Initial snapshot captured the schema for all tables (
store.only.captured.tables.ddl
was set tofalse
) -
-
Edit the
table.include.list
property to specify the tables that you want to capture. -
Restart the connector.
-
Initiate an incremental snapshot if you want to capture existing data from the newly added tables.
-
- Initial snapshot did not capture the schema for all tables (
store.only.captured.tables.ddl
was set totrue
) -
If the initial snapshot did not save the schema of the table that you want to capture, complete one of the following procedures:
- Procedure 1: Schema snapshot, followed by incremental snapshot
-
In this procedure, the connector first performs a schema snapshot. You can then initiate an incremental snapshot to enable the connector to synchronize data.
-
Stop the connector.
-
Remove the internal database schema history topic that is specified by the
schema.history.internal.kafka.topic property
. -
Clear the offsets in the configured Kafka Connect
offset.storage.topic
. For more information about how to remove offsets, see the Debezium community FAQ.Removing offsets should be performed only by advanced users who have experience in manipulating internal Kafka Connect data. This operation is potentially destructive, and should be performed only as a last resort.
-
Set values for properties in the connector configuration as described in the following steps:
-
Set the value of the
snapshot.mode
property tono_data
. -
Edit the
table.include.list
to add the tables that you want to capture.
-
-
Restart the connector.
-
Wait for Debezium to capture the schema of the new and existing tables. Data changes that occurred any tables after the connector stopped are not captured.
-
To ensure that no data is lost, initiate an incremental snapshot.
-
- Procedure 2: Initial snapshot, followed by optional incremental snapshot
-
In this procedure the connector performs a full initial snapshot of the database. As with any initial snapshot, in a database with many large tables, running an initial snapshot can be a time-consuming operation. After the snapshot completes, you can optionally trigger an incremental snapshot to capture any changes that occur while the connector is off-line.
-
Stop the connector.
-
Remove the internal database schema history topic that is specified by the
schema.history.internal.kafka.topic property
. -
Clear the offsets in the configured Kafka Connect
offset.storage.topic
. For more information about how to remove offsets, see the Debezium community FAQ.Removing offsets should be performed only by advanced users who have experience in manipulating internal Kafka Connect data. This operation is potentially destructive, and should be performed only as a last resort.
-
Edit the
table.include.list
to add the tables that you want to capture. -
Set values for properties in the connector configuration as described in the following steps:
-
Set the value of the
snapshot.mode
property toinitial
. -
(Optional) Set
schema.history.internal.store.only.captured.tables.ddl
tofalse
.
-
-
Restart the connector. The connector takes a full database snapshot. After the snapshot completes, the connector transitions to streaming.
-
(Optional) To capture any data that changed while the connector was off-line, initiate an incremental snapshot.
-
Ad hoc snapshots
By default, a connector runs an initial snapshot operation only after it starts for the first time. Following this initial snapshot, under normal circumstances, the connector does not repeat the snapshot process. Any future change event data that the connector captures comes in through the streaming process only.
However, in some situations the data that the connector obtained during the initial snapshot might become stale, lost, or incomplete. To provide a mechanism for recapturing table data, Debezium includes an option to perform ad hoc snapshots. You might want to perform an ad hoc snapshot after any of the following changes occur in your Debezium environment:
-
The connector configuration is modified to capture a different set of tables.
-
Kafka topics are deleted and must be rebuilt.
-
Data corruption occurs due to a configuration error or some other problem.
You can re-run a snapshot for a table for which you previously captured a snapshot by initiating a so-called ad-hoc snapshot. Ad hoc snapshots require the use of signaling tables. You initiate an ad hoc snapshot by sending a signal request to the Debezium signaling table.
When you initiate an ad hoc snapshot of an existing table, the connector appends content to the topic that already exists for the table. If a previously existing topic was removed, Debezium can create a topic automatically if automatic topic creation is enabled.
Ad hoc snapshot signals specify the tables to include in the snapshot. The snapshot can capture the entire contents of the database, or capture only a subset of the tables in the database. Also, the snapshot can capture a subset of the contents of the table(s) in the database.
You specify the tables to capture by sending an execute-snapshot
message to the signaling table.
Set the type of the execute-snapshot
signal to incremental
or blocking
, and provide the names of the tables to include in the snapshot, as described in the following table:
Field | Default | Value |
---|---|---|
|
|
Specifies the type of snapshot that you want to run. |
|
N/A |
An array that contains regular expressions matching the fully-qualified names of the tables to include in the snapshot. |
|
N/A |
An optional array that specifies a set of additional conditions that the connector evaluates to determine the subset of records to include in a snapshot.
|
|
N/A |
An optional string that specifies the column name that the connector uses as the primary key of a table during the snapshot process. |
You initiate an ad hoc incremental snapshot by adding an entry with the execute-snapshot
signal type to the signaling table, or by sending a signal message to a Kafka signaling topic.
After the connector processes the message, it begins the snapshot operation.
The snapshot process reads the first and last primary key values and uses those values as the start and end point for each table.
Based on the number of entries in the table, and the configured chunk size, Debezium divides the table into chunks, and proceeds to snapshot each chunk, in succession, one at a time.
For more information, see Incremental snapshots.
You initiate an ad hoc blocking snapshot by adding an entry with the execute-snapshot
signal type to the signaling table or signaling topic.
After the connector processes the message, it begins the snapshot operation.
The connector temporarily stops streaming, and then initiates a snapshot of the specified table, following the same process that it uses during an initial snapshot.
After the snapshot completes, the connector resumes streaming.
For more information, see Blocking snapshots.