Improve speed of `BytesQueueBuffer.get()` by using memoryview #3711

phenylshima · 2025-11-13T03:01:38Z

This pull request improves the speed of BytesQueueBuffer.get() by using memoryview, solving the problem described in #3710 .

sigmavirus24 · 2025-11-13T15:13:16Z

Does it make more sense to just always store the byte chunks as memoryviews? My instinct was that would solve this as well.

We'll need to update the type hints as well as now the deque can have either bytes or memoryview probably

phenylshima · 2025-11-14T00:25:14Z

@sigmavirus24 Thank you for your suggestion! I updated the code to store the data as memoryview.

sigmavirus24 · 2025-11-14T00:31:43Z

Is that as performant at least?

phenylshima · 2025-11-14T00:42:11Z

Yes, at least on my machine.

Test code:

from src.urllib3.response import BytesQueueBuffer
import time

buffer = BytesQueueBuffer()
buffer.put(b"x" * 1024 * 1024 * 10)  # 10 MiB
now = time.perf_counter()
while len(buffer) > 5000:
    buffer.get(1024)  # 1 KiB
delta = time.perf_counter() - now

print(delta)

The result:
495aab1: 0.0052835239985142834
19820a4: 0.004722315999970306

The new one looks faster, but it is because the memoryview(chunk) is moved to put method. I think the overall performance is the same. One thing to note is when we write a big chunk once, and read it only once (either by get or get_all), I expect it to be slightly slower compared to the current main branch (due to running memoryview(chunk) and converting it back to bytes. If get_all is used, the conversion back to bytes will not happen though).

sigmavirus24 · 2025-11-14T00:56:39Z

Worth checking for a performance hit on .put then from before your changes to now then, although I don't expect much of one. May as well also do a full end-to-end performance test. Put a bunch of chunks, do a get_all, etc.

phenylshima · 2025-11-14T01:55:51Z

I tested in the original repro, and it looked fine.

However, the put method got slower (surprising to me):

from src.urllib3.response import BytesQueueBuffer
import time

buffer = BytesQueueBuffer()
now = time.perf_counter()

for _ in range(10 * 1024):  # 10 MiB total
    buffer.put(b"x" * 1024)  # 1 KiB

delta = time.perf_counter() - now
print(f"Time to put data: {delta}s")

while len(buffer) > 0:
    buffer.get(1024)  # 1 KiB

delta = time.perf_counter() - now

print(f"Time to get data: {delta}s")

> ~/dev/urllib3> main !1 ?1> python t.py
Time to put data: 0.0007599639998261409s
Time to get data: 0.004324924999764335s
> ~/dev/urllib3> main !1 ?1> python t.py
Time to put data: 0.0008291090002785495s
Time to get data: 0.004370479000044725s
> ~/dev/urllib3> main !1 ?1> python t.py
Time to put data: 0.0007629809997524717s
Time to get data: 0.004293366999718273s

> ~/dev/urllib3> @495aab1b ?1> python t.py
Time to put data: 0.0007698970002820715s
Time to get data: 0.004365679000329692s
> ~/dev/urllib3> @495aab1b ?1> python t.py
Time to put data: 0.000990569999885338s
Time to get data: 0.004518313000062335s
> ~/dev/urllib3> @495aab1b ?1> python t.py
Time to put data: 0.0009582410002622055s
Time to get data: 0.004823216000204411s

> ~/dev/urllib3> byte-queue-perf ?1> python t.py
Time to put data: 0.003418835000047693s
Time to get data: 0.007445904999713093s
> ~/dev/urllib3> byte-queue-perf ?1> python t.py
Time to put data: 0.0033251579998250236s
Time to get data: 0.007321049999973184s
> ~/dev/urllib3> byte-queue-perf ?1> python t.py
Time to put data: 0.003460155999619019s
Time to get data: 0.00736400899995715s

Maybe because this code is not reveraging anything from bytes->memoryview conversion, as it is just reading and writing the same size?

It might be better to use 495aab1.

phenylshima · 2025-11-14T14:24:14Z

Now it's like this:

>~/dev/urllib3 > byte-queue-perf ?1 > python t.py
Time to put data: 0.0007707550003033248s
Time to get data: 0.004282103000150528s
>~/dev/urllib3 > byte-queue-perf ?1 > python t.py
Time to put data: 0.0007912390001365566s
Time to get data: 0.004329170999881171s
>~/dev/urllib3 > byte-queue-perf ?1 > python t.py
Time to put data: 0.0007524289994762512s
Time to get data: 0.004279935999875306s

src/urllib3/response.py

illia-v

@phenylshima thanks for uncovering the issue!

test/test_response.py

src/urllib3/response.py

as it does not seem to benefit something

…emory allocation

illia-v

I extended the tests a bit while testing the change locally and fixed a minor issue in the changelog entry

src/urllib3/response.py

sigmavirus24 · 2025-11-22T21:22:33Z

test/test_response.py


-        assert len(get_func(buffer)) == 10 * 2**20
+        result = get_func(buffer)
+        assert type(result) is bytes


Why not isinstance?

I just wanted to ensure that it returns bytes and not something else including a subclass instance which isinstance wouln't differenciate

pquentin · 2025-11-28T06:59:27Z

Am I misreading the benchmarks? I'm not seeing a speed difference between main and the last byte-queue-perf benchmark: 0.7ms for put and 4ms for get.

phenylshima · 2025-11-28T09:27:43Z

Thank you for looking at this PR. The current test does not test the execution time, rather the non-existance of extra memory allocation (that led to the overhead in the current main branch).
Changing buffer.get(2**20) to buffer.get(2**10) or something like that should make the performance difference more visible (I'm not sure if I should commit it, though).

illia-v · 2025-11-28T11:50:25Z

I believe Quentin asked about the benchmarks from #3711 (comment) and #3711 (comment) which did not measure the effect of using memoryview for chunks that are split.

The benchmarked case was related to a regression that was introduced by this PR initially and has been resolved since then in e898682: memoryview was used even when a chunk was not split.

phenylshima · 2025-11-28T11:51:40Z

Ah I see, thank you for the followup!

pquentin

Thanks! LGTM. Illia showed me a (private for now) benchmark where this PR makes a drastic difference.

… BytesQueueBuffer class taken from upstream urllib3#3711

2.15.900 (2025-12-16) ===================== - Improved pre-check for socket liveness probe before connection reuse from pool. - Backported "HTTPHeaderDict bytes key handling" from upstream urllib3#3653 - Backported "Expand environment variable of SSLKEYLOGFILE" from upstream urllib3#3705 - Backported "Fix redirect handling when an integer is passed to a pool manager" from upstream urllib3#3655 - Backported "Improved the performance of content decoding by optimizing ``BytesQueueBuffer`` class." from upstream urllib3#3711 - Backported "GHSA-gm62-xv2j-4w53" security patch for "attacker could compose an HTTP response with virtually unlimited links in the ``Content-Encoding`` header" from upstream urllib3@24d7b67

phenylshima added 5 commits November 13, 2025 11:52

add test_data_retrieval_time

84f1e24

use memoryview to split data

f1c0631

lint and format

629a725

add empty line for linter

576d49f

add changelog

495aab1

phenylshima marked this pull request as ready for review November 13, 2025 11:55

phenylshima added 4 commits November 14, 2025 08:11

type hints for memoryview and convert to bytes in get_all

fc1327c

store buffer as memoryview

5d3e592

fix mypy type error

737f1cc

update changelog

19820a4

phenylshima added 3 commits November 14, 2025 10:34

execution time test for multiple puts and retrieval

e0c1f89

fix function name

0f02647

fix failure message

d0ab29b

do not convert to memoryview until it is necessary

e898682

sigmavirus24 reviewed Nov 14, 2025

View reviewed changes

src/urllib3/response.py Outdated Show resolved Hide resolved

illia-v reviewed Nov 18, 2025

View reviewed changes

test/test_response.py Outdated Show resolved Hide resolved

src/urllib3/response.py Outdated Show resolved Hide resolved

phenylshima and others added 7 commits November 19, 2025 22:55

remove memoryview from put method's signature

77b7e76

as it does not seem to benefit something

add a test that verifies splitting a chunk doesn't cause additional m…

803b532

…emory allocation

remove tests depending on time delta

2869b83

remove unnecessary 1 MB allocation in test

09a45d4

Extend tests

ae2c479

Fix .rst syntax and use the issue number

a994ff0

Join two new tests

393d320

illia-v previously approved these changes Nov 19, 2025

View reviewed changes

illia-v requested a review from sigmavirus24 November 19, 2025 22:05

illia-v added 2 commits November 20, 2025 12:47

Merge branch 'main' into byte-queue-perf

6192209

Fix a type hint

396d35a

illia-v dismissed their stale review via 396d35a November 20, 2025 11:10

illia-v approved these changes Nov 20, 2025

View reviewed changes

illia-v reviewed Nov 22, 2025

View reviewed changes

src/urllib3/response.py Show resolved Hide resolved

illia-v requested a review from pquentin November 22, 2025 20:39

Merge branch 'main' into byte-queue-perf

7130da4

sigmavirus24 reviewed Nov 22, 2025

View reviewed changes

Merge branch 'main' into byte-queue-perf

9c90ed4

pquentin approved these changes Nov 30, 2025

View reviewed changes

illia-v merged commit 18af0a1 into urllib3:main Dec 1, 2025
42 checks passed

phenylshima deleted the byte-queue-perf branch December 2, 2025 22:43

Ousret added a commit to jawah/urllib3.future that referenced this pull request Dec 16, 2025

⚡ backport improved the performance of content decoding by optimizing…

9df4486

… BytesQueueBuffer class taken from upstream urllib3#3711

Ousret mentioned this pull request Dec 16, 2025

Release 2.15.900 jawah/urllib3.future#300

Merged

Uh oh!

Improve speed of BytesQueueBuffer.get() by using memoryview #3711

Improve speed of BytesQueueBuffer.get() by using memoryview #3711

Conversation

phenylshima commented Nov 13, 2025

Uh oh!

sigmavirus24 commented Nov 13, 2025

Uh oh!

phenylshima commented Nov 14, 2025

Uh oh!

sigmavirus24 commented Nov 14, 2025

Uh oh!

phenylshima commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sigmavirus24 commented Nov 14, 2025

Uh oh!

phenylshima commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phenylshima commented Nov 14, 2025

Uh oh!

Uh oh!

illia-v left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

illia-v left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sigmavirus24 Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

illia-v Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

pquentin commented Nov 28, 2025

Uh oh!

phenylshima commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

illia-v commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phenylshima commented Nov 28, 2025

Uh oh!

pquentin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Improve speed of `BytesQueueBuffer.get()` by using memoryview #3711

Improve speed of `BytesQueueBuffer.get()` by using memoryview #3711

phenylshima commented Nov 14, 2025 •

edited

Loading

phenylshima commented Nov 14, 2025 •

edited

Loading

phenylshima commented Nov 28, 2025 •

edited

Loading

illia-v commented Nov 28, 2025 •

edited

Loading