-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Description
Components
batch processor and exporter
Summary.
Currently, many spans, metrics, logs, and profiles carry redundant attributes (common tags, labels, or metadata) across multiple items. This leads to increased memory usage, larger payloads, and slower batch processing.
This proposal introduces a universal deduplication layer to reduce redundancy before encoding/export, improving memory and network efficiency.
Describe the solution you'd like
Motivation
- Memory optimization: Large traces with repetitive attributes can spike memory usage.
- Network efficiency: Avoid sending duplicate data across exporters.
- Cross-signal consistency: Metrics, logs, profiles, and spans often share common attributes; deduplication can be applied uniformly.
Goals
- Identify common attributes across spans, metrics, logs, profiles within a batch.
- Store common attributes once as metadata.
- Adjust exporters to reference shared metadata instead of duplicating attributes.
- Ensure backward compatibility with existing exporters (optional flag to enable/disable).
Non-Goals
- Dictionary or delta encoding compression (can be added later).
- Altering the semantics of existing attributes.
Design Overview
-
Batch-level metadata registry
- Each batch maintains a registry of unique attribute sets.
- Each item references its attribute set by ID.
-
Item processing
- During batch construction, the processor extracts common attributes.
- If the same set exists, reference it by ID; else, register a new set.
-
Export adjustments
- Exporters serialize items with reference IDs instead of repeating the attributes.
- Optionally, deduplication can be disabled per exporter.
-
Scope
- Initial focus: spans and logs.
- Extendable to metrics and profiles.
Example
Before:
[
{"name": "span1", "service": "auth", "env": "prod"},
{"name": "span2", "service": "auth", "env": "prod"}
]
After deduplication:
"commonAttributes": {"service": "auth", "env": "prod"},
"spans": [
{"name": "span1", "attrRef": 1},
{"name": "span2", "attrRef": 1}
]
Benefits
- Reduces memory footprint and payload size for large traces.
- Enables a consistent deduplication strategy across all signal types.
- Can be combined with compression later for even greater efficiency.
Risks / Considerations
- Slight CPU overhead for computing and maintaining the metadata registry.
- Backward compatibility with exporters must be verified.
- Testing required for batching edge cases (e.g., partial deduplication across batches).
Next Steps
- Review proposal with maintainers.
- Prototype deduplication for spans only.
- Extend to logs, metrics, and profiles once validated.
- Add optional dictionary encoding on top of deduplication if needed.
Describe alternatives you've considered
No response
Additional context
High usefulness for large-scale telemetry pipelines.
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1
or me too
, to help us triage it. Learn more here.
Metadata
Metadata
Assignees
Labels
No labels