-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Improve speed of BytesQueueBuffer.get() by using memoryview
#3711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Does it make more sense to just always store the byte chunks as memoryviews? My instinct was that would solve this as well. We'll need to update the type hints as well as now the deque can have either bytes or memoryview probably |
|
@sigmavirus24 Thank you for your suggestion! I updated the code to store the data as memoryview. |
|
Is that as performant at least? |
|
Yes, at least on my machine. Test code: from src.urllib3.response import BytesQueueBuffer
import time
buffer = BytesQueueBuffer()
buffer.put(b"x" * 1024 * 1024 * 10) # 10 MiB
now = time.perf_counter()
while len(buffer) > 5000:
buffer.get(1024) # 1 KiB
delta = time.perf_counter() - now
print(delta)The result: The new one looks faster, but it is because the |
|
Worth checking for a performance hit on |
|
I tested in the original repro, and it looked fine. However, the from src.urllib3.response import BytesQueueBuffer
import time
buffer = BytesQueueBuffer()
now = time.perf_counter()
for _ in range(10 * 1024): # 10 MiB total
buffer.put(b"x" * 1024) # 1 KiB
delta = time.perf_counter() - now
print(f"Time to put data: {delta}s")
while len(buffer) > 0:
buffer.get(1024) # 1 KiB
delta = time.perf_counter() - now
print(f"Time to get data: {delta}s")Maybe because this code is not reveraging anything from bytes->memoryview conversion, as it is just reading and writing the same size? It might be better to use 495aab1. |
|
Now it's like this: |
illia-v
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phenylshima thanks for uncovering the issue!
as it does not seem to benefit something
illia-v
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I extended the tests a bit while testing the change locally and fixed a minor issue in the changelog entry
|
|
||
| assert len(get_func(buffer)) == 10 * 2**20 | ||
| result = get_func(buffer) | ||
| assert type(result) is bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not isinstance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wanted to ensure that it returns bytes and not something else including a subclass instance which isinstance wouln't differenciate
|
Am I misreading the benchmarks? I'm not seeing a speed difference between |
|
Thank you for looking at this PR. The current test does not test the execution time, rather the non-existance of extra memory allocation (that led to the overhead in the current main branch). |
|
I believe Quentin asked about the benchmarks from #3711 (comment) and #3711 (comment) which did not measure the effect of using The benchmarked case was related to a regression that was introduced by this PR initially and has been resolved since then in e898682: memoryview was used even when a chunk was not split. |
|
Ah I see, thank you for the followup! |
pquentin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM. Illia showed me a (private for now) benchmark where this PR makes a drastic difference.
… BytesQueueBuffer class taken from upstream urllib3#3711
2.15.900 (2025-12-16) ===================== - Improved pre-check for socket liveness probe before connection reuse from pool. - Backported "HTTPHeaderDict bytes key handling" from upstream urllib3#3653 - Backported "Expand environment variable of SSLKEYLOGFILE" from upstream urllib3#3705 - Backported "Fix redirect handling when an integer is passed to a pool manager" from upstream urllib3#3655 - Backported "Improved the performance of content decoding by optimizing ``BytesQueueBuffer`` class." from upstream urllib3#3711 - Backported "GHSA-gm62-xv2j-4w53" security patch for "attacker could compose an HTTP response with virtually unlimited links in the ``Content-Encoding`` header" from upstream urllib3@24d7b67
Fixes #3710
This pull request improves the speed of
BytesQueueBuffer.get()by using memoryview, solving the problem described in #3710 .