Skip to content

Conversation

danlaine
Copy link
Collaborator

@danlaine danlaine commented May 27, 2025

(Hopefully) fixes write-after-free panics we've seen in some CI tests. This PR passes the buffer being acted on by the io_uring to the iouring event loop to ensure the buffer lives past the operation's completion.

Resolves #954

@danlaine danlaine self-assigned this May 27, 2025
@danlaine danlaine requested a review from Copilot May 27, 2025 19:29
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a write‐after‐free bug by updating the StableBuf trait and propagating the correct buffer type through the io_uring storage and network layers. The key changes include:

  • Adding Sync to the StableBuf trait to ensure thread safety.
  • Updating mpsc channel payloads to pass an Arc‑wrapped StableBuf.
  • Revising buffer handling in read, write, send, and recv operations to avoid use-after-free issues.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
utils/src/stable_buf.rs Added Sync to the StableBuf trait for thread safety.
runtime/src/storage/iouring.rs Updated mpsc channel payloads and buffer handling.
runtime/src/network/iouring.rs Adjusted channel signatures and buffer conversions.
runtime/src/iouring/mod.rs Revised channel handling and updated tests accordingly.

@danlaine danlaine requested a review from Copilot May 27, 2025 20:39
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR extends the io_uring integration to carry buffer references across async operations and ensures they are not freed prematurely, fixing a write-after-free bug.

  • Added Sync bound to StableBuf so buffers can safely cross thread boundaries
  • Updated all io_sender/submitter channels to carry an Option<Arc<dyn StableBuf>>
  • Wrapped buffers in Arc in read/write/send/recv implementations and stored them in-flight until completion

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
utils/src/stable_buf.rs Added Sync bound to StableBuf trait
runtime/src/storage/iouring.rs Extended channel tuples with buffer Arc and updated Blob read/write
runtime/src/network/iouring.rs Extended network/send/recv channels with buffer Arc and updated Sink/Stream
runtime/src/iouring/mod.rs Updated run loop to accept and drop buffer Arc from waiters
Comments suppressed due to low confidence (1)

utils/src/stable_buf.rs:10

  • Adding a Sync bound to StableBuf is a breaking change for existing implementors. Verify that all current StableBuf implementations are updated to satisfy Sync.
pub unsafe trait StableBuf: Unpin + Send + Sync + 'static {

Comment on lines 253 to 254
let buf_ref = unsafe { std::slice::from_raw_parts_mut(buf.stable_mut_ptr(), buf_len) };
let buf_arc = Arc::new(buf);
Copy link

Copilot AI May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pointer is taken from buf before it's moved into the Arc, which can break for inline-buffer types (e.g., arrays). You should wrap buf in an Arc first and then call stable_mut_ptr on the Arc to get a stable pointer.

Suggested change
let buf_ref = unsafe { std::slice::from_raw_parts_mut(buf.stable_mut_ptr(), buf_len) };
let buf_arc = Arc::new(buf);
let buf_arc = Arc::new(buf);
let buf_ref = unsafe { std::slice::from_raw_parts_mut(buf_arc.stable_mut_ptr(), buf_len) };

Copilot uses AI. Check for mistakes.

Copy link
Collaborator Author

@danlaine danlaine May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is safe for the only types that actually implement StableBufMut -- namely Vec<u8> and BytesMut. When we move buf into buf_arc we are moving ownership, but the underlying bytes remain in the same place.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is arguably kind of brittle. Open to ideas for how to improve.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed so that we take the reference to the StableBuf / StableBufMut after we put it into an Arc

@danlaine danlaine requested a review from Copilot May 27, 2025 21:18
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request fixes a write‐after‐free bug by improving buffer lifetime management and ensuring thread safety. Key changes include updating the StableBuf trait to include Sync, wrapping buffers with Arc in io_uring operations for both storage and network, and replacing tuple-based channel messages with a unified Op struct.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
utils/src/stable_buf.rs Added Sync to the StableBuf trait to enforce thread safety.
runtime/src/storage/iouring.rs Updated read_at and write_at methods to wrap buffers in an Arc, ensuring they remain valid.
runtime/src/network/iouring.rs Replaced tuple channels with iouring::Op and wrapped buffers in an Arc for both sending and receiving.
runtime/src/iouring/mod.rs Introduced the Op struct and updated the run function to manage buffer lifetimes via Arc.
Comments suppressed due to low confidence (1)

utils/src/stable_buf.rs:10

  • The addition of Sync to StableBuf improves thread safety; please ensure that all current StableBuf implementors are compatible with this change.
pub unsafe trait StableBuf: Unpin + Send + Sync + 'static {

@danlaine danlaine marked this pull request as ready for review May 27, 2025 21:45
@danlaine danlaine requested a review from patrick-ogrady May 27, 2025 21:45
@danlaine danlaine marked this pull request as draft May 27, 2025 21:47
@patrick-ogrady
Copy link
Contributor

As part of merging this, I think we should re-enable the network tests?

patrick-ogrady
patrick-ogrady previously approved these changes May 30, 2025
@patrick-ogrady patrick-ogrady requested a review from Copilot May 31, 2025 00:13
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses a write‐after‐free bug by ensuring that buffers passed to io_uring operations live past request submission, and it updates the buffer API across several modules to use the new StableBuf abstraction.

  • Replace Vec with StableBuf to manage buffer lifetimes correctly.
  • Update API signatures to use Into for improved flexibility and consistency.
  • Refactor io_uring operation submission to include buffer ownership, avoiding premature deallocation.

Reviewed Changes

Copilot reviewed 28 out of 28 changed files in this pull request and generated no comments.

Show a summary per file
File Description
runtime/src/utils/buffer/read.rs Modified to use StableBuf instead of Vec in buffer management.
runtime/src/storage/* Updated buffer parameters and conversion API for storage operations.
runtime/src/network/* Refactored network buffer handling in send/recv and test assertions.
runtime/src/iouring/mod.rs Adjusted io_uring operation submission to include buffer lifetime management.
.github/workflows/*.yml Adjusted timeouts and re-enabled previously commented-out features.
runtime/src/deterministic.rs Updated buffer assertions to align with the new StableBuf API.
Comments suppressed due to low confidence (1)

runtime/src/iouring/mod.rs:237

  • The variable 'next_work_id' is not incremented after assignment, which may lead to work ID collisions when multiple operations are submitted. Consider adding an increment (e.g. next_work_id += 1) after assigning work_id.
let work_id = next_work_id;

@patrick-ogrady patrick-ogrady changed the title [runtime] fix write-after-free bug [runtime/iouring-*] fix write-after-free bug May 31, 2025
@patrick-ogrady patrick-ogrady merged commit 6452e32 into main May 31, 2025
17 checks passed
@patrick-ogrady patrick-ogrady deleted the danlaine/iouring-fix branch May 31, 2025 15:21
Copy link

codecov bot commented May 31, 2025

Codecov Report

Attention: Patch coverage is 91.58879% with 27 lines in your changes missing coverage. Please review.

Project coverage is 90.99%. Comparing base (259675f) to head (76dbe38).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
storage/src/journal/variable.rs 18.18% 18 Missing ⚠️
utils/src/stable_buf.rs 86.36% 6 Missing ⚠️
runtime/src/storage/mod.rs 83.33% 3 Missing ⚠️
@@            Coverage Diff             @@
##             main    #1003      +/-   ##
==========================================
+ Coverage   90.97%   90.99%   +0.01%     
==========================================
  Files         192      191       -1     
  Lines       54094    54232     +138     
==========================================
+ Hits        49212    49347     +135     
- Misses       4882     4885       +3     
Files with missing lines Coverage Δ
runtime/src/deterministic.rs 97.33% <100.00%> (ø)
runtime/src/lib.rs 96.13% <100.00%> (ø)
runtime/src/mocks.rs 98.63% <100.00%> (+0.01%) ⬆️
runtime/src/network/audited.rs 94.54% <100.00%> (+0.25%) ⬆️
runtime/src/network/deterministic.rs 98.71% <100.00%> (+0.05%) ⬆️
runtime/src/network/metered.rs 100.00% <100.00%> (ø)
runtime/src/network/mod.rs 100.00% <100.00%> (ø)
runtime/src/network/tokio.rs 82.44% <100.00%> (+1.60%) ⬆️
runtime/src/storage/audited.rs 100.00% <100.00%> (ø)
runtime/src/storage/memory.rs 100.00% <100.00%> (ø)
... and 12 more

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 259675f...76dbe38. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[runtime/iouring] Fix unaligned tcache chunk detected + Stall

2 participants