Skip to content

Conversation

jpkrohling
Copy link
Member

Description: Added a Trace ID aware load-balancing exporter.

Link to tracking Issue: open-telemetry/opentelemetry-collector#1724

Testing: some manual tests (missing Kubernetes e2e tests for DNS resolver) + unit tests

Documentation: readme file

@jpkrohling
Copy link
Member Author

Each commit in this PR is a step towards the final solution and can be reviewed individually.

@jpkrohling jpkrohling force-pushed the jpkrohling/1724-loadbalancing-processor branch 2 times, most recently from a319e1e to 156554f Compare October 9, 2020 14:05
Copy link
Contributor

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few comments. Thank you for this work @jpkrohling. Very nice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that either splits the incoming batches into multiple batches, one per trace

Does this exist? might be better just to mention groupbytrace if it's the only existing option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will, once #1235 is addressed :-) I think groupbytrace isn't suitable for all cases, although I would certainly recommend it for most cases that I can think of.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the example show groupbytrace since it's basically required?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, but I think I would prefer to have the future "splitbatch" there instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this example is cracking me up. I love how flexible the collector is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps use a jumphash? I'm pretty sure it will work in this use case and it's significantly less code.

https://github.com/dgryski/go-jump

We present jump consistent hash, a fast, minimal memory, consistent hash algorithm that can be expressed in about 5 lines of code. In comparison to the algorithm of Karger et al., jump consistent hash requires no storage, is faster, and does a better job of evenly dividing the key space among the buckets and of evenly dividing the workload when the number of buckets changes. Its main limitation is that the buckets must be numbered sequentially, which makes it more suitable for data storage applications than for distributed web caching.

I'm no expert, but I think this will work if you just order the IPs you're load balancing over in increasing order.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw it the first time in @annanay25's branch and found it highly interesting, especially because it's implemented 5 lines written by a PerlMonks. I decided to read the paper, and this statement in the abstract intrigued me:

Its main limitation is that the buckets must be numbered sequentially, which makes it more suitable for data storage applications than for distributed web caching

Reading further, I realized that jumphash might not be a good choice for our use case at all:

This ability to add or remove buckets in any order can be valuable for cache servers where the servers are purely a performance improvement. But for data storage applications, where each bucket represents a different shard of the data, it is not acceptable for shards to simply disappear, because that shard is only place where the corresponding data is stored. Typically this is handled by either making the shards redundant (with several replicas), or being able to quickly recover a replacement, or accepting lower availability for some data. Server death thus does not cause reallocation of data; only capacity changes do. This means that shards can be assigned numerical ids in increasing order as they are added, so that the active bucket ids always fill the range [0, num_buckets)

Basically, we can scale up, but can't scale down.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thank you. I knew we used jumphash with memcached in Cortex so I thought it might be useful here, but you are correct. We use a statefulset to force identity/order so that new shards are always appended to the end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace double negative (never false) to positive (always true)

jpkrohling
jpkrohling commented