Skip to content

Conversation

Logiraptor
Copy link
Contributor

@Logiraptor Logiraptor commented Oct 7, 2025

While working on a fix for #41656, I've noticed many tests are strongly coupled to the tsp internals. This PR refactors all such coupling to exist in a small number of test helpers, instead of being duplicated across all tests. Those helpers are:

  • withPolicies - allows directly setting the policy evaluators which is useful for fine grained decision logic tests
  • withTickerLatency - allows speeding up the decision ticker which is useful for speeding up tests
  • withTestController - makes the decision tick blocking until triggered. This allows deterministic tests of otherwise non-deterministic behavior.

A few tests were mostly rewritten to make them test externally observable behavior. For example:

  • TestLateArrivingSpanUsesDecisionCache - used to inspect idToTrace state to ensure traces are cleared from memory. Now the tsp is destroyed and recreated with the same cache to ensure the cache is the only state available.
  • TestSequentialTraceMapSize et al - these used to inspect idToTrace state to ensure trace data is properly tracked. Now they trigger sampling decisions and assert that the output is complete.
  • TestSetSamplingPolicy et al - these used to inspect policies state to ensure policies are reloaded. Now they observe sampled traces only.

Instead of invoking the TSP tick loop directly

Instead of directly calling the tsp tick loop, I've added code to wrap the tick loop and make it blocking until called. This still involves some coupling, but since it's encapsulated in one place, it will be easier to change when refactoring the tsp in #41656. I took the opportunity to consolidate the sync batcher into this controller as well. In all cases with a manual tick loop, we also want a synchronous batcher for the same reason. Again this makes it simpler to update later without modifying every test at once.

Instead of accessing the idToTrace map

In all cases where we were directly inspecting the idToTrace map, we could instead just let the tsp sample those traces and inspect the output.

Instead of setting tickerFrequency directly

I've moved these tests to use withTickerFrequency, which is still a coupling, but since it's delegated to the option func, there's a central place to change how it works.

other

I included a final commit: 7bfab91. This just removes an unused field.

I've decided to remove two tests:

  • TestConcurrentTraceMapSize - this one writes traces concurrently then asserts on the size of the idToTrace map. We have overlapping tests which write concurrently then assert complete sampled output, which IMO is a better test.
  • TestDecisionPolicyMetrics - this one calls the internal makeDecision method and then asserts on the returned metrics, which are only used in debug logging. This feels like over testing to me and makes the tests overly coupled.

@Logiraptor Logiraptor requested a review from a team as a code owner October 7, 2025 18:51
@Logiraptor Logiraptor requested a review from axw October 7, 2025 18:51
@github-actions github-actions bot added the processor/tailsampling Tail sampling processor label Oct 7, 2025
@github-actions github-actions bot requested a review from portertech October 7, 2025 18:51
@axw
Copy link
Contributor

axw commented Oct 8, 2025

@Logiraptor I like the sound of this, but it needs to wait until the minimum Go version is bumped to 1.25. That will happen when Go 1.26.0 comes out in February 2026.

@Logiraptor
Copy link
Contributor Author

@axw Yeah, I just realized that 😓 I will mark this as a draft while I figure out if there's an easy way around it

@Logiraptor Logiraptor marked this pull request as draft October 8, 2025 01:40
@Logiraptor Logiraptor marked this pull request as ready for review October 8, 2025 17:26
@Logiraptor
Copy link
Contributor Author

@axw Figured out an alternative approach. It's not quite as magical, but it accomplishes the goal of reducing how much needs to change when the tsp internals are refactored.

@Logiraptor Logiraptor changed the title [tailsamplingprocessor] [chore] Use synctest instead of tsp internals [tailsamplingprocessor] [chore] Remove test coupling on component internal fields Oct 8, 2025
Logiraptor added a commit to Logiraptor/opentelemetry-collector-contrib that referenced this pull request Oct 8, 2025
This PR builds on open-telemetry#43201 (pending) to parallelize the decision ticker.
It also fixes the open issue in open-telemetry#41656 by moving all state management to
a single goroutine (per shard).

In practice at Grafana Labs, we've seen that the single threaded nature
of the decision ticker is the primary bottleneck when running the tail
sampling processor at high scales. For example, we have one large
cluster performing around 12M policy evaluations per second while
maintaining p99 tick latency of around 500ms. In order to reach this
scale, we need to run ~200 replicas just to get enough parallel threads
of execution. By adding support for parallel decision ticks, we hope to
run fewer, larger replicas which should reduce the fragmentation
caused by trace id load balancing.

Another benefit here is to reduce contention on the idToTrace map and generally
simplify reasoning about concurrency by giving each trace a clear
"owner" goroutine. Previously we'd try to do some work inside the calling
goroutine of ConsumeTraces. Instead, new work is passed via channels to
its dedicated shard. By default we run one shard per available core.
Shared state is kept to the minimum required to implement TSP semantics:

1. The trace limiter is shared, which allows the overall memory
   consumption to be reliably capped even in case of uneven shard
   utilization.
2. Both decision caches are shared, which again allows overall memory
   consumption to be capped even with uneven shard utilization.

In practice, shard utilization should be very even, but it's still nice
to have hard guarantees on the total traces in memory, and this allows
us to avoid changing any tests in this PR, which greatly increases my
confidence in the code despite the large change. Ideally the change to
parallel execution is invisible to end users (aside from the much higher throughput).

Sharding makes use of the maphash package with a unique `Seed`, which
should provide even shard distribution even if traceIDs have previously
been sampled downstream. This is better than e.g. a simple modulo on
traceID, since traceIDs are not necessarily uniformly distribute after
head sampling.
Logiraptor added a commit to Logiraptor/opentelemetry-collector-contrib that referenced this pull request Oct 9, 2025
This PR builds on open-telemetry#43201 (pending) to parallelize the decision ticker.
It also fixes the open issue in open-telemetry#41656 by moving all state management to
a single goroutine (per shard).

In practice at Grafana Labs, we've seen that the single threaded nature
of the decision ticker is the primary bottleneck when running the tail
sampling processor at high scales. For example, we have one large
cluster performing around 12M policy evaluations per second while
maintaining p99 tick latency of around 500ms. In order to reach this
scale, we need to run ~200 replicas just to get enough parallel threads
of execution. By adding support for parallel decision ticks, we hope to
run fewer, larger replicas which should reduce the fragmentation
caused by trace id load balancing.

Another benefit here is to reduce contention on the idToTrace map and generally
simplify reasoning about concurrency by giving each trace a clear
"owner" goroutine. Previously we'd try to do some work inside the calling
goroutine of ConsumeTraces. Instead, new work is passed via channels to
its dedicated shard. By default we run one shard per available core.
Shared state is kept to the minimum required to implement TSP semantics:

1. The trace limiter is shared, which allows the overall memory
   consumption to be reliably capped even in case of uneven shard
   utilization.
2. Both decision caches are shared, which again allows overall memory
   consumption to be capped even with uneven shard utilization.

In practice, shard utilization should be very even, but it's still nice
to have hard guarantees on the total traces in memory, and this allows
us to avoid changing any tests in this PR, which greatly increases my
confidence in the code despite the large change. Ideally the change to
parallel execution is invisible to end users (aside from the much higher throughput).

Sharding makes use of the maphash package with a unique `Seed`, which
should provide even shard distribution even if traceIDs have previously
been sampled downstream. This is better than e.g. a simple modulo on
traceID, since traceIDs are not necessarily uniformly distribute after
head sampling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
processor/tailsampling Tail sampling processor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants