Add idempotency support to XADD via IDMPAUTO and IDMP parameters #14615

sggeorgiev · 2025-12-09T09:05:56Z

Overview

This PR introduces idempotency support to Redis Streams' XADD command, enabling automatic deduplication of duplicate message submissions through optional IDMPAUTO and IDMP parameters with producer identification. This eliminates the need for external deduplication logic and prevents duplicate entries in streams.

Problem Statement

Current Redis Streams implementations lack built-in idempotency mechanisms, forcing developers to implement external deduplication:

Application-level tracking: Developers must maintain separate data structures to track submitted messages
Race conditions: Network failures and retries can result in duplicate stream entries
Complexity overhead: Each producer must implement custom deduplication logic
Memory inefficiency: External deduplication systems duplicate Redis's storage capabilities

This lack of native idempotency support creates reliability challenges in distributed systems where at-least-once delivery semantics are required but exactly-once processing is desired.

Solution

Extends XADD with optional idempotency parameters that include producer identification:

XADD key [NOMKSTREAM] [KEEPREF | DELREF | ACKED] [IDMPAUTO pid | IDMP pid iid] [MAXLEN | MINID [= | ~] threshold [LIMIT count]] <* | id> field value [field value ...]

Producer ID (pid)

pid (producer id): A unique identifier for each producer
Must be unique per producer instance
Producers must use the same pid after restart to maintain idempotency tracking
Enables per-producer idempotency tracking, isolating duplicate detection between different producers

Idempotency Modes

IDMPAUTO pid (Automatic Idempotency):

Producer specifies its pid, Redis automatically calculates a unique idempotent ID (iid) based on entry content
Hash calculation: iid = XOR(XXH128(fieldName || fieldValue)) for all fields
16-byte binary iid ensures collision probability of ~1.47×10⁻³¹ for 10K tracked entries
Order-independent: field ordering does not affect the calculated iid
If (pid, iid) pair exists in producer's IDMP map: returns existing entry ID without creating duplicate entry
Generally slower than manual mode due to hash calculation overhead

IDMP pid iid (Manual Idempotency):

Caller provides explicit producer id (pid) and idempotent ID (iid) for deduplication
iid must be unique per message (either globally or per pid)
Faster processing than IDMPAUTO (no hash calculation overhead)
Enables shorter iids for reduced memory footprint
If (pid, iid) pair exists in producer's IDMP map: returns existing entry ID without comparing field contents
Caller responsible for iid uniqueness and consistency across retries

Both modes can only be specified when entry ID is * (auto-generated).

Deduplication Logic

When XADD is called with idempotency parameters:

Redis checks if the message was recently added to the stream based on the (pid, iid) pair
If the (pid, iid) pair matches a recently-seen pair for that producer, the message is assumed to be identical
No duplicate message is added to the stream; the existing entry ID is returned
With IDMP pid iid: Redis does not compare the specified fields and their values—two messages with the same (pid, iid) are assumed identical
With IDMPAUTO pid: Redis calculates the iid from message content and checks for duplicates

IDMP Map: Per-Producer Time and Capacity-Based Expiration

Each producer with idempotency enabled maintains its own isolated IDMP map (iid → entry_id) with dual expiration criteria:

Time-based expiration (duration):

Each iid expires automatically after duration seconds from insertion
Provides operational guarantee: Redis will not forget an iid before duration elapses (unless capacity reached)
Configurable per-stream via XIDMP CFGSET

Capacity-based expiration (maxsize):

Each producer's map enforces maximum capacity of maxsize entries
When capacity reached, oldest iids for that producer are evicted regardless of remaining duration
Prevents unbounded memory growth during extended usage

Configuration Commands

XIDMP CFGGET: Retrieve current configuration

XIDMP CFGGET key [DURATION] [MAXSIZE]

XIDMP CFGSET: Configure expiration parameters

XIDMP CFGSET key [DURATION duration] [MAXSIZE maxsize]

duration: Seconds to retain each iid (range: 1- 86400 seconds)
maxsize: Maximum iids to track per producer (range: 1-10,000 entries)
Calling CFGSET clears all existing producer IDMP maps for the stream

Default Configuration (when XIDMP CFGSET not called):

Duration: 100 seconds
Maxsize: 100 iids per producer
Runtime configurable via: stream-idmp-duration and stream-idmp-maxsize

Response Behavior

On first submission (pid, iid) pair not in producer's map:

Entry added to stream with generated entry ID
(pid, iid) pair stored in producer's IDMP map with current timestamp
Returns new entry ID

On duplicate submission (pid, iid) pair exists in producer's map:

No entry added to stream
Returns existing entry ID from producer's IDMP map
Identical response to original submission (client cannot distinguish)

Stream Metadata

XINFO STREAM extended with idempotency metrics:

iids-tracked: Current total number of iids across all producers' IDMP maps
iids-added: Lifetime count of entries added with idempotency parameters
producers-tracked: Current number of producers with active IDMP maps

Key Benefits

Simplified Architecture: Eliminates need for external deduplication systems
Automatic Retry Safety: Network failures and retries cannot create duplicate entries
Producer Isolation: Each producer maintains independent idempotency tracking
Memory Efficient: Time and capacity-based expiration per producer prevents unbounded growth
Flexible Implementation: Choose automatic (IDMPAUTO) or manual (IDMP) based on performance needs
Backward Compatible: Fully optional parameters with zero impact on existing XADD behavior
Collision Resistant: XXH128 provides cryptographically strong uniqueness for IDMPAUTO

Performance

Comprehensive benchmarks demonstrate that idempotency features introduce minimal overhead to XADD operations while maintaining predictable memory usage across various message sizes.

Benchmark Methodology

Test Configuration:

Tool: memtier_benchmark for Redis performance testing
Test Parameters:
- Requests: 2,000,000 operations per test
- Clients: 1 (single-threaded to isolate performance characteristics)
- Threads: 1
- Pipeline: 1 (no pipelining to measure per-operation overhead)
- Data: Random data of varying sizes (8-512 bytes)
Redis Configuration:
- Persistence disabled (RDB and AOF off) for pure in-memory performance
- IDMP Configuration: MAXSIZE 10000 (tracking up to 10K idempotent IDs per producer)
Iterations: 10 runs per configuration, averaging middle 8 results (excluding highest and lowest to reduce variance)
Memory Measurement: RSS (Resident Set Size) delta between pre-benchmark and post-benchmark states

Test Commands:

XADD: XADD test * f __data__ (baseline, no idempotency)
XIDMP: XADD test IDMP <pid> <iid> * f __data__ (manual idempotency)
XIDMPAUTO: XADD test IDMPAUTO <pid> * f __data__ (automatic content-based idempotency)

Throughput Performance

Data Size (bytes)	XADD (ops/sec)	XIDMP (ops/sec)	XIDMPAUTO (ops/sec)	XIDMP Overhead	XIDMPAUTO Overhead
8	69,066	67,242	67,033	-2.6%	-2.9%
21	68,915	67,331	67,044	-2.3%	-2.7%
24	68,928	67,245	66,826	-2.4%	-3.0%
26	69,029	67,186	66,708	-2.7%	-3.4%
36	68,833	66,942	66,679	-2.7%	-3.1%
64	68,675	66,727	66,301	-2.8%	-3.5%
128	68,367	66,460	65,738	-2.8%	-3.8%
256	67,867	65,664	65,169	-3.2%	-4.0%
512	67,188	63,841	63,676	-5.0%	-5.2%

Memory Overhead

Data Size (bytes)	XADD (MB)	XIDMP (MB)	XIDMPAUTO (MB)	XIDMP Overhead	XIDMPAUTO Overhead
8	40.73	41.13	41.18	+1.0%	+1.1%
21	69.07	69.84	69.73	+1.1%	+1.0%
24	69.19	69.84	69.87	+0.9%	+1.0%
26	81.02	81.89	81.63	+1.1%	+0.8%
36	96.26	97.64	97.01	+1.4%	+0.8%
64	155.55	156.80	156.54	+0.8%	+0.6%
128	289.43	291.42	290.34	+0.7%	+0.3%
256	541.78	545.16	542.51	+0.6%	+0.1%
512	1160.58	1167.16	1161.19	+0.6%	+0.1%

Key Performance Insights

Throughput Impact:

XIDMP (manual mode): 2.3-5.0% throughput reduction compared to baseline XADD
XIDMPAUTO (automatic mode): 2.7-5.2% throughput reduction compared to baseline XADD
Performance overhead increases slightly with message size due to larger content hashing in XIDMPAUTO
Both modes maintain >63K ops/sec even with 512-byte messages, demonstrating production-ready performance
Single-threaded performance ensures no contention effects mask true per-operation cost

Memory Efficiency:

Minimal overhead: 0.1-1.4% additional memory consumption across all message sizes
Memory overhead decreases proportionally as message size increases (stream entry storage dominates)
At 512 bytes per message: only 0.6% additional memory for XIDMP, 0.1% for XIDMPAUTO
Per-producer IDMP map memory is negligible compared to stream entry storage
With 10K tracked IIDs: IDMP overhead remains under 1MB even for small messages

Performance Characteristics by Data Size:

Small messages (8-26 bytes): Idempotency overhead most visible (2.3-3.4%) as fixed costs dominate
Medium messages (36-128 bytes): Balanced overhead (2.7-3.8%) with good throughput (>65K ops/sec)
Large messages (256-512 bytes): Higher overhead (4.0-5.2%) but memory efficiency improves significantly

Technical Implementation

Per-Producer IDMP Architecture

Each stream maintains a radix tree (rax) mapping producer IDs (pid) to independent idmpProducer structures:

typedef struct idmpProducer {
    dict *idmp_dict;       /* IDMP IID tracking hash table */
    idmpEntry *idmp_head;  /* Head of the IDMP entries linked list */
    idmpEntry *idmp_tail;  /* Tail of the IDMP entries linked list */
} idmpProducer;

Design Rationale:

Producer Isolation: Each producer's idempotency tracking is completely independent
Radix Tree Mapping: Stream-level rax tree maps pid → idmpProducer* for O(log n) producer lookup
Per-Producer Limits: Each producer respects the configured maxsize independently
Scalable: Supports arbitrary number of producers without interference

IDMP Map Data Structure

Each producer's IDMP map uses a hybrid data structure combining a hash table with an ordered linked list.

Core Data Structure: idmpEntry

typedef struct idmpEntry {
    struct idmpEntry *next;  /* Pointer to next entry in insertion order (linked list) */
    streamID id;             /* Associated stream ID */
    size_t iid_len;          /* Length of the IID */
    char iid[];              /* Flexible array member for inline IID storage */
} idmpEntry;

Design Rationale:

Flexible Array Member (FAM): The iid[] flexible array member enables inline storage of variable-length idempotent IDs directly within the structure, eliminating separate heap allocations and reducing memory fragmentation. This design is particularly beneficial since:

IDMPAUTO generates fixed 16-byte iids
Manual IDMP allows shorter iids for memory optimization
Per-entry memory overhead adapts to actual iid length

List Pointer: The next pointer maintains the insertion-ordered linked list structure, enabling efficient traversal for time-based expiration and capacity-based eviction operations.

Embedded Stream ID: The streamID id field serves dual purposes:

Maps the iid to its corresponding stream entry ID (required for deduplication responses)
Provides timestamp information via the ms (milliseconds) field for time-based expiration

Dual-Index Architecture (Per Producer)

Each producer's IDMP map maintains two concurrent access patterns using a single set of idmpEntry structures:

Hash Table (dict): iid → idmpEntry*
- Purpose: Fast lookup and duplicate detection
- Implementation: Redis server internal dict structure
- Performance: O(1) insertion and O(1) lookup
- Key: The idempotent ID (iid)
- Value: Pointer to the idmpEntry structure
Insertion-Ordered Linked List:
- Purpose: Time-based expiration and capacity management
- Implementation: Custom linked list using the next field in idmpEntry
- Performance: O(1) removal of oldest entry, O(1) append of new entry
- Ordering: Maintains entries in insertion order (oldest to newest)
- Head pointer: Points to oldest entry (first inserted)
- Tail pointer: Points to newest entry (most recently inserted)
- Links: The next field in each idmpEntry connects entries in insertion order

Relationship Between Structures:

The dict stores pointers to idmpEntry structures for fast iid-based lookups
The same idmpEntry structures are simultaneously linked via their next pointers to form the ordered list
This avoids duplicate allocations—each idmpEntry serves both indexing purposes

Natural Time Ordering

New messages are naturally ordered by insertion time due to Redis Streams' ID structure:

Stream entry IDs are composed of timestamp-sequence (stored in streamID.ms and streamID.seq)
IDMPAUTO and IDMP parameters are only valid with * (auto-generated IDs)
Auto-generated IDs guarantee monotonically increasing timestamps
Therefore, insertion order directly corresponds to chronological order

This natural ordering eliminates the need for explicit timestamp storage—the streamID.ms field already contains the entry creation timestamp, and the linked list's insertion order inherently represents time-based ordering.

Expiration Operations

Time-Based Expiration (duration):

Executed periodically via Redis cron job (every second):

Iterate through all streams with active IDMP maps
For each stream, iterate through all producers in the rax tree
For each producer, traverse linked list from head (oldest entries)
Extract timestamp from streamID.ms field of each entry
Check if current_time_ms - streamID.ms > duration_ms
Remove expired entries from both dict and list until reaching unexpired entry
Stop processing producer when first unexpired entry is found (leverages time-ordered list)
Remove empty idmpProducer structures from rax tree when all entries expire
Complexity: O(p × k) per stream, where p is the number of producers and k is the average number of expired entries per producer

Capacity-Based Eviction (maxsize):

Executed inline during XADD operations (per producer):

Check if producer's dict size exceeds maxsize after insertion
Remove head entry (oldest) from producer's linked list
Remove corresponding entry from producer's dict using iid as key
Update head pointer to next entry
Complexity: O(1) per eviction

Combined Expiration Strategy:

Background expiration: Cron job handles time-based expiration across all streams and producers every second
Inline eviction: XADD operations enforce maxsize capacity limits per producer immediately

This dual approach ensures:

Expired entries are removed promptly without waiting for new writes
Capacity limits are enforced strictly at write time per producer
Memory usage remains bounded even during idle periods
Producers cannot interfere with each other's idempotency tracking

Memory Efficiency

Per-Producer Overhead:

Each producer adds:

Radix tree node: ~40 bytes for pid storage and pointers
idmpProducer structure: 24 bytes (3 pointers)
Total: ~64 bytes per active producer

Per-Entry Memory Breakdown:

Each idempotency entry requires storage in both the Redis dict (hash table) and the idmpEntry structure:

Redis dictEntry overhead: 24 bytes
- Hash table pointers (key, value, and next)
idmpEntry fixed overhead: 32 bytes
- next pointer for linked list: 8 bytes
- streamID (millisecond timestamp and sequence number): 16 bytes
- iid_len field: 8 bytes
Variable-length iid storage: Inline within idmpEntry (size depends on mode)

Total Per-Entry Overhead:

IDMPAUTO pid mode (16-byte idempotent IDs):

dictEntry: 24 bytes
idmpEntry fixed fields: 32 bytes
iid data: 16 bytes
Total: 72 bytes per entry

IDMP pid iid mode (variable-length idempotent IDs):

dictEntry: 24 bytes
idmpEntry fixed fields: 32 bytes
iid data: iid_len bytes
Total: 56 + iid_len bytes per entry
Examples:
- 4-byte iid: 60 bytes total
- 8-byte iid: 64 bytes total
- 16-byte iid: 72 bytes total (equivalent to IDMPAUTO)

Total Memory Example:

For a stream with 5 producers, each tracking 1,000 entries with IDMPAUTO:

Producer overhead: 5 × 64 bytes = 320 bytes
Entry overhead: 5 × 1,000 × 72 bytes = 360,000 bytes
Total: ~360 KB

This hybrid architecture provides optimal performance for both lookup operations (O(1) via dict per producer) and expiration operations (O(1) for capacity-based, O(k) for time-based per producer), while maintaining predictable memory overhead based on iid length choice and number of active producers. The reuse of streamID.ms for time-based expiration eliminates the need for separate timestamp storage, and the cron-based background expiration ensures timely cleanup without impacting write path performance.

src/commands/xidmp-cfgget.json

src/defrag.c

src/rdb.h

src/defrag.c

sundb · 2025-12-12T08:41:06Z

src/server.c

        server.db[j].blocking_keys = dictCreate(&keylistDictType);
        server.db[j].blocking_keys_unblock_on_nokey = dictCreate(&objectKeyPointerValueDictType);
        server.db[j].stream_claim_pending_keys = dictCreate(&objectKeyPointerValueDictType);
+        server.db[j].stream_idmp_keys = dictCreate(&objectKeyPointerValueDictType);


i don't see that stream_idmp_keys was updated when the key was deleted, defragged, am i missing something?

The key will be deleted in handleExpiredIdmpEntries t_stream.c:5661.

what if the key was defragged? That is to say, when defrag the keys of db->keys, it will not be updated simultaneously here.

The key in stream_idmp_keys comes from the command's argv argument. I don't think we're defragging it—note the call to trackStreamIdmpEntries(c, c->argv[1]);

i wonder why not store the key as sds? This way can save some memory.

src/t_stream.c

src/defrag.c

src/stream.h

src/defrag.c

sggeorgiev added 30 commits November 19, 2025 09:11

added: ring buffer

5654946

added: avl tree

c850356

chnaged: improve avl tree

2bdd282

added: IDMP option to XADD

fd81f8b

changed: use robj instead to copy the uid

1c552bd

added: tring

06759fc

changed: XADD to use tring with IDMP option

22b0071

changed: improve tringPop

aeec5ed

changed: improve tring

e938305

changed: improve tring

f811890

changed: improve tring

b3eb04d

changed: improve tring

c4c200f

changed: improve tring

00dfdca

changed: improve tring

692b039

changed: improve tring

bc8d8a1

changed: improve trimng

d11980e

changed: improve tring

647372b

changed: improve IDMP

cc2674f

added: config command

17c9c28

added: tests

02ecffa

added: tests

afce4b3

added: tring clear

5095171

changed: string compare

04ada99

fixed: issue

eb6fcdd

fixed: issue

3c32f64

fixed: issue

fdd7f99

Merge branch 'unstable' into xadd-idmp-tring

cba1036

fixed: issue

79d2c75

changed: organization

aa6cc60

changed: maxsize logic

b7d9a19

sggeorgiev added 2 commits December 9, 2025 09:25

changed: comments

a3d35b7

changed: style

0346dff

sundb added the release-notes indication that this issue needs to be mentioned in the release notes label Dec 10, 2025

sundb added this to Redis 8.6 Dec 10, 2025

github-project-automation bot moved this to Todo in Redis 8.6 Dec 10, 2025

sggeorgiev added 6 commits December 10, 2025 12:02

added: idmp producer

86d1157

fixed: issue

8c09f65

fixed: issue

2f33110

changed: default values and limits

129cf3c

fixed: issue

8d947bb

fixed: issue

d3e8f57

minchopaskal requested review from oranagra, skaslev and sundb and removed request for oranagra and skaslev December 11, 2025 13:45

sundb reviewed Dec 11, 2025

View reviewed changes

src/commands/xidmp-cfgget.json Outdated Show resolved Hide resolved

src/defrag.c Outdated Show resolved Hide resolved

src/rdb.h Show resolved Hide resolved

justramesh2000 reviewed Dec 12, 2025

View reviewed changes

src/defrag.c Outdated Show resolved Hide resolved

fixed: issues from review

4f041ff

sundb reviewed Dec 12, 2025

View reviewed changes

src/defrag.c Outdated Show resolved Hide resolved

sggeorgiev added 4 commits December 12, 2025 16:38

fixed: issues from review

2c11492

added: the count of duplicate iids in xinfo stream

ac94bb0

fixed: poblem from daily run

16f3dc1

Merge branch 'unstable' into xadd-idmp-dict

2cdf6f8

sundb reviewed Dec 16, 2025

View reviewed changes

src/stream.h Show resolved Hide resolved

src/defrag.c Outdated Show resolved Hide resolved

sggeorgiev added 2 commits December 16, 2025 11:00

fixed: issue from review

2fbeedd

fixed: issue from review

fb1f435

Add idempotency support to XADD via IDMPAUTO and IDMP parameters #14615

Are you sure you want to change the base?

Add idempotency support to XADD via IDMPAUTO and IDMP parameters #14615

Conversation

sggeorgiev commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Problem Statement

Solution

Producer ID (pid)

Idempotency Modes

Deduplication Logic

IDMP Map: Per-Producer Time and Capacity-Based Expiration

Configuration Commands

Response Behavior

Stream Metadata

Key Benefits

Performance

Benchmark Methodology

Throughput Performance

Memory Overhead

Key Performance Insights

Technical Implementation

Per-Producer IDMP Architecture

IDMP Map Data Structure

Core Data Structure: idmpEntry

Dual-Index Architecture (Per Producer)

Natural Time Ordering

Expiration Operations

Memory Efficiency

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sundb Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

sggeorgiev Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

sundb Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

sggeorgiev Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

sundb Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

sundb Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sggeorgiev commented Dec 9, 2025 •

edited

Loading