Universal Attribute Deduplication

### Components

batch processor and exporter

### Summary.

Currently, many spans, metrics, logs, and profiles carry redundant attributes (common tags, labels, or metadata) across multiple items. This leads to increased memory usage, larger payloads, and slower batch processing.

This proposal introduces a universal deduplication layer to reduce redundancy before encoding/export, improving memory and network efficiency.

### Describe the solution you'd like

### Motivation

- Memory optimization: Large traces with repetitive attributes can spike memory usage.
- Network efficiency: Avoid sending duplicate data across exporters.
- Cross-signal consistency: Metrics, logs, profiles, and spans often share common attributes; deduplication can be applied uniformly.

### Goals

- Identify common attributes across spans, metrics, logs, profiles within a batch.
- Store common attributes once as metadata.
- Adjust exporters to reference shared metadata instead of duplicating attributes.
- Ensure backward compatibility with existing exporters (optional flag to enable/disable).

### Non-Goals

- Dictionary or delta encoding compression (can be added later).
- Altering the semantics of existing attributes.

### Design Overview

1. **Batch-level metadata registry**

    - Each batch maintains a registry of unique attribute sets.
    - Each item references its attribute set by ID.

2. **Item processing**

    - During batch construction, the processor extracts common attributes.
    - If the same set exists, reference it by ID; else, register a new set.

3. **Export adjustments**

    - Exporters serialize items with reference IDs instead of repeating the attributes.
    - Optionally, deduplication can be disabled per exporter.

4. **Scope**

    - Initial focus: spans and logs.
    - Extendable to metrics and profiles.

### Example

**Before:**

```
[
  {"name": "span1", "service": "auth", "env": "prod"},
  {"name": "span2", "service": "auth", "env": "prod"}
]

```

**After deduplication:**

```
"commonAttributes": {"service": "auth", "env": "prod"},
"spans": [
  {"name": "span1", "attrRef": 1},
  {"name": "span2", "attrRef": 1}
]
```

**Benefits**

- Reduces memory footprint and payload size for large traces.
- Enables a consistent deduplication strategy across all signal types.
- Can be combined with compression later for even greater efficiency.

### Risks / Considerations

- Slight CPU overhead for computing and maintaining the metadata registry.
- Backward compatibility with exporters must be verified.
- Testing required for batching edge cases (e.g., partial deduplication across batches).

### Next Steps

- Review proposal with maintainers.
- Prototype deduplication for spans only.
- Extend to logs, metrics, and profiles once validated.
- Add optional dictionary encoding on top of deduplication if needed.

### Describe alternatives you've considered

_No response_

### Additional context

High usefulness for large-scale telemetry pipelines.

### Tip

<sub>[React](https://github.blog/news-insights/product-news/add-reactions-to-pull-requests-issues-and-comments/) with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding `+1` or `me too`, to help us triage it. Learn more [here](https://opentelemetry.io/community/end-user/issue-participation/).</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Universal Attribute Deduplication #13785

Components

Summary.

Describe the solution you'd like

Motivation

Goals

Non-Goals

Design Overview

Example

Risks / Considerations

Next Steps

Describe alternatives you've considered

Additional context

Tip

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Universal Attribute Deduplication #13785

Description

Components

Summary.

Describe the solution you'd like

Motivation

Goals

Non-Goals

Design Overview

Example

Risks / Considerations

Next Steps

Describe alternatives you've considered

Additional context

Tip

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions