Skip to content

Conversation

@joamag
Copy link
Owner

@joamag joamag commented Aug 18, 2025

This pull request introduces several performance optimizations and refactorings across the audio, CPU, and memory subsystems of the emulator. The most significant changes include replacing the audio buffer implementation with an optimized circular buffer, streamlining interrupt handling and CPU clock execution, and improving memory access patterns in the MMU. These changes aim to reduce memory allocations, improve cache locality, and minimize function call overhead for better overall emulation speed.

Audio Buffer Optimization

  • Replaced the VecDeque<i16> audio buffer in Apu with a pre-allocated circular buffer (Vec<i16>) and added head/tail/size tracking for efficient sample management. Introduced methods for buffer iteration and optimized sample pushing. (src/apu.rs) [1] [2] [3] [4]
  • Updated all usages and interfaces to reflect the new buffer type, including compatibility methods for APIs expecting VecDeque<i16>. (src/gb.rs, src/apu.rs) [1] [2] [3]

CPU Performance Improvements

  • Refactored interrupt flag collection and dispatch in Cpu to use bit manipulation and priority-based handling for faster execution. (src/cpu.rs) [1] [2]
  • Added a fast-path batch clock execution method (clock_batch) and a simplified clock_fast function to minimize overhead during non-interrupt cycles. (src/cpu.rs)

Memory Access Optimizations

  • Rewrote Mmu read and write methods to use branch prediction hints and handle the most common RAM and ROM access patterns first, improving cache locality and speed. (src/mmu.rs) [1] [2] [3]

GameBoy Struct and Buffer Handling

  • Added an audio_buffer_cache field to GameBoy for temporary storage of audio samples, reducing allocations when fetching audio data. Updated related methods for eager buffer access. (src/gb.rs) [1] [2] [3]

General Refactoring and Minor Improvements

  • Improved frame buffer handling to avoid borrowing issues and clarified buffer access patterns. (src/gb.rs) [1] [2]

Let me know if you want a deeper explanation of any of these optimizations or how they affect emulator performance!

Summary by CodeRabbit

  • Refactor

    • Switched to a fixed-size circular audio buffer for smoother, more consistent playback and lower latency.
    • Added batched CPU stepping and a fast instruction path for faster emulation when interrupts aren’t active.
    • Optimized memory read/write fast-paths for common RAM/ROM access to improve responsiveness.
  • Bug Fixes

    • Unified interrupt handling and halt behavior for more predictable execution.
    • Audio buffer reset now reliably clears playback state and avoids borrow conflicts.
  • Documentation

    • Updated public audio access patterns and related API notes.

joamag added 4 commits August 18, 2025 18:02
- Replaced VecDeque with a circular buffer implementation in APU for improved performance and reduced allocations.
- Enhanced CPU interrupt handling by optimizing flag collection and using a priority-based dispatch mechanism.
- Introduced batch execution for CPU cycles to minimize function call overhead.
- Updated audio buffer API for better compatibility and performance.
@joamag joamag requested a review from Copilot August 18, 2025 17:43
@joamag joamag self-assigned this Aug 18, 2025
@joamag joamag added the enhancement New feature or request label Aug 18, 2025
@coderabbitai
Copy link

coderabbitai bot commented Aug 18, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/claude-performance

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

This comment was marked as outdated.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
src/apu.rs (1)

950-960: Compatibility audio_buffer() copies — acceptable, but note the cost.

Returning an owned VecDeque provides a safe API but incurs O(n) copy. Callers on hot paths should prefer audio_buffer_iter() (or the eager path in GameBoy) to avoid allocations.

src/gb.rs (2)

425-429: Owned audio buffer in trait simplifies lifetimes but increases copies.

The API change to return VecDeque<i16> avoids borrow issues but forces allocations. If you want the best of both worlds, consider adding a secondary method (non-breaking) to the trait that exposes an iterator or slice pair for zero-copy consumers while retaining this owned fallback.


824-833: Avoid the intermediate Vec in audio_buffer_eager to cut one copy.

You can extend the cache directly from the iterator, eliminating the temporary samples Vec.

Apply this diff:

-    // collect samples first to avoid borrowing issues
-    let samples: Vec<i16> = self.apu().audio_buffer_iter().collect();
-    self.audio_buffer_cache.clear();
-    self.audio_buffer_cache.extend(samples);
+    // extend cache directly from iterator to avoid an extra allocation/copy
+    self.audio_buffer_cache.clear();
+    self.audio_buffer_cache.extend(self.apu().audio_buffer_iter());
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6970c35 and 4d03d93.

📒 Files selected for processing (4)
  • src/apu.rs (4 hunks)
  • src/cpu.rs (3 hunks)
  • src/gb.rs (7 hunks)
  • src/mmu.rs (3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
src/apu.rs (1)
src/gb.rs (5)
  • audio_buffer (427-427)
  • audio_buffer (1256-1258)
  • audio_buffer (1648-1650)
  • clear_audio_buffer (428-428)
  • clear_audio_buffer (1652-1654)
src/mmu.rs (2)
src/ppu.rs (4)
  • read (938-1005)
  • read (2229-2231)
  • write (1007-1152)
  • write (2233-2235)
src/rom.rs (4)
  • read (429-440)
  • read (936-938)
  • write (442-450)
  • write (940-942)
src/gb.rs (1)
src/apu.rs (1)
  • audio_buffer (952-960)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (22)
  • GitHub Check: Build SDL (latest)
  • GitHub Check: Build Libretro Android (latest)
  • GitHub Check: Build Libretro Android (1.85.0)
  • GitHub Check: Build Libretro Android (1.82.0)
  • GitHub Check: Build SDL (1.85.0)
  • GitHub Check: Build Libretro Android (1.83.0)
  • GitHub Check: Build WASM (1.85.0, 18)
  • GitHub Check: Build Libretro Android (1.84.0)
  • GitHub Check: Build WASM (latest, 18)
  • GitHub Check: Build SDL (1.84.0)
  • GitHub Check: Build WASM (1.82.0, 18)
  • GitHub Check: Build WASM (1.84.0, 18)
  • GitHub Check: Build WASM (1.83.0, 18)
  • GitHub Check: Build SDL (1.83.0)
  • GitHub Check: Build SDL (1.82.0)
  • GitHub Check: Build (1.82.0)
  • GitHub Check: Build (latest)
  • GitHub Check: Build (1.85.0)
  • GitHub Check: Build (1.83.0)
  • GitHub Check: Build (1.84.0)
  • GitHub Check: Build Windows
  • GitHub Check: Build Mac
🔇 Additional comments (15)
src/mmu.rs (2)

387-403: Boot ROM mapping and fast ROM path look correct.

  • DMG boot: 0x0000–0x00FF gated by BOOT.
  • CGB boot: 0x0000–0x00FF and 0x0200–0x08FF when boot_active; 0x0100–0x01FF goes to cartridge (header overlay).
  • Early return on addr <= 0x7fff preserves cartridge/MBC behavior.

404-486: I/O decode maintained and consistent.

VRAM/OAM/HRAM and device-mapped registers are preserved post fast-paths. KEY1 (0xFF4D) and SVBK (0xFF70) semantics match expected behavior.

src/cpu.rs (2)

171-179: Flags aggregation via bit ops is clear and efficient.

Compact and branchless interrupt flag collection; semantics preserved.


189-231: Priority-based interrupt dispatch is correct and simpler.

  • trailing_zeros() selects the lowest set bit (VBlank highest priority).
  • Common prologue (DI, push PC, clear HALT) avoids duplication.
  • Acks routed to respective components.
src/apu.rs (6)

197-204: Circular buffer design is a good fit for audio; zero-alloc steady-state.

Replacing VecDeque with a ring buffer plus head/tail/size avoids reallocation and memmoves under sustained throughput.


293-304: Initialization pre-allocates to max capacity.

Buffer sized to sampling_rate * buffer_size * channels; indices initialized correctly.


433-441: Correctly pushes interleaved channel samples via ring buffer.

Left and right samples are filtered and enqueued with overwrite-on-full semantics.


962-967: Iterator API is lean and correct.

Produces samples in chronological order starting from tail for size elements.


969-973: Buffer clear resets indices only.

O(1) reset with stable capacity — ideal for real-time scenarios.


975-987: Ring push logic handles full/empty correctly.

  • On full: advance tail, keep size at max.
  • On not full: increment size.
  • Then write at head and advance.

No off-by-one; modulo arithmetic correct.

src/gb.rs (5)

525-527: Pre-allocated audio_buffer_cache is a sensible optimization.

Reduces transient allocations for eager audio extraction.


688-692: Copying frame buffer on VBlank avoids borrow conflicts.

This resolves aliasing during the clock loop; minimal overhead, only on frame change.


814-822: Direct vector return for frame buffers is clean.

No intermediate caching; matches the new borrowing strategy.


1256-1258: Owned buffer passthrough aligns GameBoy with new APU API.

Method now returns a VecDeque<i16> as per the updated trait.


1648-1650: AudioProvider impl delegates correctly to APU.

Consistent with the trait change and APU’s new audio_buffer().

Comment on lines +285 to +306
/// Optimized batch execution of multiple CPU cycles
/// Reduces function call overhead for better performance
#[inline]
pub fn clock_batch(&mut self, target_cycles: u16) -> u16 {
let mut total_cycles = 0u16;

while total_cycles < target_cycles {
// fast path for non-interrupt cases
if !self.ime || self.mmu.ie == 0 {
total_cycles += self.clock_fast() as u16;
} else {
total_cycles += self.clock() as u16;
}

if total_cycles.saturating_add(20) > target_cycles {
break;
}
}

total_cycles
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Batch clock fast-path can starve HALT release when IME is disabled.

clock_batch calls clock_fast whenever !self.ime || self.mmu.ie == 0. However, clock_fast does not implement the GB quirk “release HALT when an interrupt is pending with IME=0”. This causes the CPU to remain halted forever in that scenario.

Fix by adding the HALT release check to clock_fast before the HALT early return. See next comment for patch.

Additionally, the “overshoot guard” using a hardcoded 20-cycle margin is conservative. Optional: factor a constant or compute a tighter bound (e.g., 16 for most opcodes, 20 only if IME and pending) to improve batching efficiency.

🤖 Prompt for AI Agents
In src/cpu.rs around lines 285 to 306, clock_batch fast-path calls clock_fast
when !self.ime || self.mmu.ie == 0 but clock_fast lacks the GameBoy quirk that
releases HALT when an interrupt is pending while IME == 0, which can leave the
CPU halted forever; modify clock_fast to perform the HALT-release check (detect
a pending interrupt via mmu.ie & mmu.if and IME==0 and HALT state) before
returning early from HALT so the CPU exits HALT correctly, and keep clock_batch
unchanged except optionally replace the hardcoded 20-cycle overshoot guard with
a named constant or tighten it (e.g., 16 normally, 20 only when IME/pending) to
improve batching efficiency.

Comment on lines +307 to +328
/// Fast path clock without interrupt checking for better performance
#[inline(always)]
fn clock_fast(&mut self) -> u8 {
if self.halted {
return 4;
}

let pc = self.pc;

// simplified fetch for common cases
let (inst, new_pc, _, _) = self.fetch(pc);
self.ppc = self.pc;
self.pc = new_pc;

let (inst_fn, inst_time, _) = inst;

self.cycles = 0;
inst_fn(self);
self.cycles = self.cycles.wrapping_add(*inst_time);

self.cycles
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add HALT release on pending interrupts to clock_fast.

Implement the IME=0 pending-interrupt check so HALT is released like in clock().

Apply this diff:

     #[inline(always)]
     fn clock_fast(&mut self) -> u8 {
-        if self.halted {
+        // Release HALT if an interrupt is pending while IME is disabled (GB quirk)
+        if self.halted && !self.ime {
+            let mut flags = 0u8;
+            flags |= self.mmu.ppu().int_vblank() as u8;
+            flags |= (self.mmu.ppu().int_stat() as u8) << 1;
+            flags |= (self.mmu.timer().int_tima() as u8) << 2;
+            flags |= (self.mmu.serial().int_serial() as u8) << 3;
+            flags |= (self.mmu.pad().int_pad() as u8) << 4;
+            let pending = flags & self.mmu.ie;
+            if pending != 0 {
+                self.halted = false;
+            }
+        }
+        if self.halted {
             return 4;
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Fast path clock without interrupt checking for better performance
#[inline(always)]
fn clock_fast(&mut self) -> u8 {
if self.halted {
return 4;
}
let pc = self.pc;
// simplified fetch for common cases
let (inst, new_pc, _, _) = self.fetch(pc);
self.ppc = self.pc;
self.pc = new_pc;
let (inst_fn, inst_time, _) = inst;
self.cycles = 0;
inst_fn(self);
self.cycles = self.cycles.wrapping_add(*inst_time);
self.cycles
}
/// Fast path clock without interrupt checking for better performance
#[inline(always)]
fn clock_fast(&mut self) -> u8 {
// Release HALT if an interrupt is pending while IME is disabled (GB quirk)
if self.halted && !self.ime {
let mut flags = 0u8;
flags |= self.mmu.ppu().int_vblank() as u8;
flags |= (self.mmu.ppu().int_stat() as u8) << 1;
flags |= (self.mmu.timer().int_tima() as u8) << 2;
flags |= (self.mmu.serial().int_serial() as u8) << 3;
flags |= (self.mmu.pad().int_pad() as u8) << 4;
let pending = flags & self.mmu.ie;
if pending != 0 {
self.halted = false;
}
}
if self.halted {
return 4;
}
let pc = self.pc;
// simplified fetch for common cases
let (inst, new_pc, _, _) = self.fetch(pc);
self.ppc = self.pc;
self.pc = new_pc;
let (inst_fn, inst_time, _) = inst;
self.cycles = 0;
inst_fn(self);
self.cycles = self.cycles.wrapping_add(*inst_time);
self.cycles
}

Comment on lines 378 to 385
if (0xc000..=0xfdff).contains(&addr) {
return match addr {
0xc000..=0xcfff => self.ram[(addr & 0x0fff) as usize],
0xd000..=0xdfff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize],
0xe000..=0xfdff => self.ram[(addr & 0x1fff) as usize],
_ => unreachable!(),
};
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Echo RAM (0xE000–0xFDFF) ignores SVBK bank — fix mapping for CGB correctness.

The fast-path maps Echo RAM with (addr & 0x1fff) which always targets WRAM bank 0 for the upper 4 KiB. On CGB, 0xF000–0xFDFF must echo the currently selected WRAM bank (SVBK). This breaks games relying on banked WRAM through the Echo region.

Refactor to mirror 0xC000–0xCFFF and 0xD000–0xDFFF behavior:

  • 0xE000–0xEFFF → bank 0 (like 0xC000–0xCFFF)
  • 0xF000–0xFDFF → selected bank (like 0xD000–0xDFFF), honoring ram_offset

Apply this diff:

         if (0xc000..=0xfdff).contains(&addr) {
             return match addr {
                 0xc000..=0xcfff => self.ram[(addr & 0x0fff) as usize],
                 0xd000..=0xdfff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize],
-                0xe000..=0xfdff => self.ram[(addr & 0x1fff) as usize],
+                0xe000..=0xefff => self.ram[(addr & 0x0fff) as usize],
+                0xf000..=0xfdff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize],
                 _ => unreachable!(),
             };
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (0xc000..=0xfdff).contains(&addr) {
return match addr {
0xc000..=0xcfff => self.ram[(addr & 0x0fff) as usize],
0xd000..=0xdfff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize],
0xe000..=0xfdff => self.ram[(addr & 0x1fff) as usize],
_ => unreachable!(),
};
}
if (0xc000..=0xfdff).contains(&addr) {
return match addr {
0xc000..=0xcfff => self.ram[(addr & 0x0fff) as usize],
0xd000..=0xdfff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize],
0xe000..=0xefff => self.ram[(addr & 0x0fff) as usize],
0xf000..=0xfdff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize],
_ => unreachable!(),
};
}
🤖 Prompt for AI Agents
In src/mmu.rs around lines 378 to 385, the Echo RAM fast-path incorrectly maps
0xE000–0xFDFF using (addr & 0x1fff) which always targets bank 0; change the
match to mirror 0xC000/0xD000 behavior: 0xC000..=0xCFFF -> ram[(addr & 0x0fff)
as usize], 0xD000..=0xDFFF -> ram[(self.ram_offset + (addr & 0x0fff)) as usize],
0xE000..=0xEFFF -> ram[(addr & 0x0fff) as usize], 0xF000..=0xFDFF ->
ram[(self.ram_offset + (addr & 0x0fff)) as usize], keeping the unreachable!()
default; this ensures 0xF000–0xFDFF honors the selected WRAM bank (SVBK) on CGB.

Comment on lines 493 to 501
// fast path for RAM writes (most common)
if (0xc000..=0xfdff).contains(&addr) {
match addr {
0xc000..=0xcfff => self.ram[(addr & 0x0fff) as usize] = value,
0xd000..=0xdfff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize] = value,
0xe000..=0xfdff => self.ram[(addr & 0x1fff) as usize] = value,
_ => unreachable!(),
}
return;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Write path has the same Echo RAM bank mapping bug.

Writes to 0xF000–0xFDFF should go to the currently selected WRAM bank, not bank 0. Mirror the read fix in the write fast-path.

Apply this diff:

         if (0xc000..=0xfdff).contains(&addr) {
             match addr {
-                0xc000..=0xcfff => self.ram[(addr & 0x0fff) as usize] = value,
-                0xd000..=0xdfff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize] = value,
-                0xe000..=0xfdff => self.ram[(addr & 0x1fff) as usize] = value,
+                0xc000..=0xcfff => self.ram[(addr & 0x0fff) as usize] = value,
+                0xd000..=0xdfff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize] = value,
+                0xe000..=0xefff => self.ram[(addr & 0x0fff) as usize] = value,
+                0xf000..=0xfdff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize] = value,
                 _ => unreachable!(),
             }
             return;
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// fast path for RAM writes (most common)
if (0xc000..=0xfdff).contains(&addr) {
match addr {
0xc000..=0xcfff => self.ram[(addr & 0x0fff) as usize] = value,
0xd000..=0xdfff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize] = value,
0xe000..=0xfdff => self.ram[(addr & 0x1fff) as usize] = value,
_ => unreachable!(),
}
return;
// fast path for RAM writes (most common)
if (0xc000..=0xfdff).contains(&addr) {
match addr {
0xc000..=0xcfff => self.ram[(addr & 0x0fff) as usize] = value,
0xd000..=0xdfff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize] = value,
0xe000..=0xefff => self.ram[(addr & 0x0fff) as usize] = value,
0xf000..=0xfdff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize] = value,
_ => unreachable!(),
}
return;
}
🤖 Prompt for AI Agents
In src/mmu.rs around lines 493 to 501, the write fast-path incorrectly maps
0xE000..=0xFDFF to bank 0 the same way for the whole 0xE000-0xFDFF range; split
the 0xE000..=0xFDFF case into 0xE000..=0xEFFF (mirror of C000, use (addr &
0x0fff) -> bank 0) and 0xF000..=0xFDFF (mirror of D000, use (self.ram_offset +
(addr & 0x0fff)) -> selected WRAM bank) so writes to 0xF000..0xFDFF go to the
currently selected WRAM bank, mirroring the read-path fix.

… and optimized memory access

- Added comprehensive documentation for the `read` and `write` methods, explaining their functionality and arguments.
- Improved memory access optimization by implementing branch prediction hints and switch statements for handling different memory regions.
- Clarified memory region mappings in comments for better understanding of the address ranges.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
src/mmu.rs (2)

394-396: Echo RAM read path ignores SVBK; 0xF000–0xFDFF must mirror selected WRAM bank (CGB).

Currently, the entire 0xE000–0xFDFF range is indexed with (addr & 0x1fff), which always targets bank 0. On CGB, 0xF000–0xFDFF must mirror the currently selected WRAM bank (SVBK), like 0xD000–0xDFFF. This breaks titles that use banked WRAM through the echo window.

Fix by splitting the range to mirror 0xC000/0xD000 behavior:

  • 0xE000–0xEFFF → bank 0 (addr & 0x0fff)
  • 0xF000–0xFDFF → selected bank (self.ram_offset + (addr & 0x0fff))

Also correct the comment which states “(4 KB)”; the echo region is 0x1E00 bytes mirrored.

Apply this diff:

-                // 0xE000-0xFDFF - Echo RAM (4 KB)
-                0xe000..=0xfdff => self.ram[(addr & 0x1fff) as usize],
+                // 0xE000-0xEFFF - Echo RAM (bank 0 mirror)
+                0xe000..=0xefff => self.ram[(addr & 0x0fff) as usize],
+                // 0xF000-0xFDFF - Echo RAM (selected bank mirror)
+                0xf000..=0xfdff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize],

530-532: Echo RAM write path has the same bank mapping bug; mirror selected bank for 0xF000–0xFDFF.

Writes to 0xF000–0xFDFF should target the currently selected WRAM bank (SVBK), not bank 0. Mirror the read-path fix for correctness.

Apply this diff:

-                // 0xE000-0xFDFF - Echo RAM (4 KB)
-                0xe000..=0xfdff => self.ram[(addr & 0x1fff) as usize] = value,
+                // 0xE000-0xEFFF - Echo RAM (bank 0 mirror)
+                0xe000..=0xefff => self.ram[(addr & 0x0fff) as usize] = value,
+                // 0xF000-0xFDFF - Echo RAM (selected bank mirror)
+                0xf000..=0xfdff => {
+                    self.ram[(self.ram_offset + (addr & 0x0fff)) as usize] = value
+                },
🧹 Nitpick comments (5)
src/mmu.rs (5)

372-377: Doc wording nit: “switch statement” → “match expression.”

Rust uses match expressions; consider updating the comment to avoid confusion.

-    /// prediction hints and a switch statement to handle
+    /// prediction hints and a match expression to handle
-    /// the different memory regions.
+    /// the different memory regions.

475-477: KEY1 (0xFF4D) read: consider returning 0xFF on DMG for accuracy.

On DMG, KEY1 is unmapped/unused; many implementations return 0xFF. Optional, but improves compatibility with DMG-only titles/tests.

-            0xff4d => (if self.switching { 0x01 } else { 0x00 }) | ((self.speed as u8) << 7) | 0x7e,
+            0xff4d => {
+                if self.mode == GameBoyMode::Cgb {
+                    (if self.switching { 0x01 } else { 0x00 }) | ((self.speed as u8) << 7) | 0x7e
+                } else {
+                    0xff
+                }
+            }

600-602: KEY1 (0xFF4D) write: optionally ignore writes on DMG.

Writing KEY1 is CGB-only; on DMG it should be ignored. Not critical, but improves behavioral fidelity.

-            0xff4d => self.switching = value & 0x01 == 0x01,
+            0xff4d => {
+                if self.mode == GameBoyMode::Cgb {
+                    self.switching = value & 0x01 == 0x01;
+                }
+            }

658-667: Minor: read_many could take &self (not &mut self).

read_many only calls self.read(&self), so it doesn’t need &mut. Keeping it &self can reduce borrow restrictions in callers. Purely optional.

-    pub fn read_many(&mut self, addr: u16, count: u16) -> Vec<u8> {
+    pub fn read_many(&self, addr: u16, count: u16) -> Vec<u8> {

372-399: Add tests to lock the Echo RAM mirroring semantics.

Once the Echo fixes are applied, consider adding regression tests that assert:

  • 0xE000–0xEFFF mirrors bank 0 (0xC000–0xCFFF)
  • 0xF000–0xFDFF mirrors selected bank (0xD000–0xDFFF), honoring 0xFF70 (SVBK)

Example to add under the existing #[cfg(test)] mod tests:

#[test]
fn test_echo_ram_mirrors_bank0_and_selected_bank() {
    let mut mmu = Mmu::default();
    mmu.allocate_cgb();
    mmu.set_mode(GameBoyMode::Cgb);
    mmu.ppu.set_gb_mode(GameBoyMode::Cgb);

    // Bank 0 mirror: C000 <-> E000
    mmu.write(0xc456, 0x12);
    assert_eq!(mmu.read(0xe456), 0x12);
    mmu.write(0xe789, 0x34);
    assert_eq!(mmu.read(0xc789), 0x34);

    // Select bank 3 and test D000 <-> F000 mirror
    mmu.write(0xff70, 0x03); // SVBK = 3
    mmu.write(0xd123, 0xab);
    assert_eq!(mmu.read(0xf123), 0xab);
    mmu.write(0xf123, 0xcd);
    assert_eq!(mmu.read(0xd123), 0xcd);

    // Switch bank and ensure the echo mirrors the new bank
    mmu.write(0xff70, 0x02); // SVBK = 2
    mmu.write(0xd123, 0x55);
    assert_eq!(mmu.read(0xf123), 0x55);
}

Also applies to: 510-536

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4d03d93 and fa98c77.

📒 Files selected for processing (1)
  • src/mmu.rs (3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/mmu.rs (2)
src/ppu.rs (4)
  • read (938-1005)
  • read (2229-2231)
  • write (1007-1152)
  • write (2233-2235)
src/rom.rs (4)
  • read (429-440)
  • read (936-938)
  • write (442-450)
  • write (940-942)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: Build SDL (1.83.0)
  • GitHub Check: Build Documentation (rust:latest)
  • GitHub Check: Build Documentation (rust:1.84.0)
  • GitHub Check: Build Documentation (rust:1.82.0)
  • GitHub Check: Build Documentation (rustlang/rust:nightly)
  • GitHub Check: Build Libretro Android (1.85.0)
  • GitHub Check: Build WASM (1.84.0, 18)
  • GitHub Check: Build Documentation (rust:1.85.0)
  • GitHub Check: Build SDL (1.85.0)
  • GitHub Check: Build WASM (latest, 18)
  • GitHub Check: Build Documentation (rust:1.83.0)
  • GitHub Check: Build WASM (1.83.0, 18)
  • GitHub Check: Build WASM (1.85.0, 18)
  • GitHub Check: Build Libretro Android (1.83.0)
  • GitHub Check: Build Libretro Android (1.84.0)
  • GitHub Check: Build WASM (1.82.0, 18)
  • GitHub Check: Build SDL (latest)
  • GitHub Check: Build SDL (1.84.0)
  • GitHub Check: Build SDL (1.82.0)
  • GitHub Check: Build Libretro Android (latest)
  • GitHub Check: Build Libretro Android (1.82.0)
  • GitHub Check: Build (1.83.0)
  • GitHub Check: Build (1.82.0)
  • GitHub Check: Build (1.84.0)
  • GitHub Check: Build (1.85.0)
  • GitHub Check: Build (latest)
  • GitHub Check: Build Mac
  • GitHub Check: Build Windows
🔇 Additional comments (2)
src/mmu.rs (2)

401-421: ROM fast-path and boot ROM window handling look good.

The split boot ROM mapping for DMG (0x0000–0x00FF) and CGB (additional 0x0200–0x08FF while excluding 0x0100–0x01FF) matches hardware behavior and keeps the hot path tight.


383-385: Inlining hot paths is appropriate here.

#[inline(always)] on read/write is reasonable given these functions are tiny dispatchers and are called in the hot loop.

Also applies to: 519-521

@joamag joamag requested a review from Copilot August 19, 2025 02:45
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements comprehensive performance optimizations across the audio, CPU, and memory subsystems of the Game Boy emulator to improve emulation speed through reduced memory allocations, better cache locality, and minimized function call overhead.

  • Replaced VecDeque-based audio buffer with optimized circular buffer implementation
  • Added fast-path CPU execution methods and streamlined interrupt handling
  • Optimized MMU memory access patterns with branch prediction hints

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
src/apu.rs Replaces VecDeque audio buffer with circular buffer using Vec and head/tail tracking
src/cpu.rs Adds optimized interrupt handling and fast-path CPU execution methods
src/mmu.rs Implements fast-path memory access with branch prediction hints for common operations
src/gb.rs Updates audio buffer interfaces and adds pre-allocated cache for audio operations

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

// Buffer is full, advance tail (overwrite oldest)
self.audio_buffer_tail = (self.audio_buffer_tail + 1) % self.audio_buffer_max;
} else {
self.audio_buffer_size += 1;
Copy link

Copilot AI Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The audio_buffer_size increment happens even when the buffer is full and we're overwriting data. This creates incorrect size tracking. The size should only increase when actually adding new samples, not when overwriting.

Copilot uses AI. Check for mistakes.
0xe000..=0xfdff => self.ram[(addr & 0x1fff) as usize],

_ => unreachable!(),
};
Copy link

Copilot AI Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unreachable!() macro will panic in release builds. Since the condition check (0xc000..=0xfdff).contains(&addr) doesn't guarantee all values will match the subsequent patterns due to potential range gaps, this could cause runtime panics.

Suggested change
};
match addr {
// 0xC000-0xCFFF - Work RAM (4 KB)
0xc000..=0xcfff => return self.ram[(addr & 0x0fff) as usize],
// 0xD000-0xDFFF - Work RAM (4 KB)
0xd000..=0xdfff => return self.ram[(self.ram_offset + (addr & 0x0fff)) as usize],
// 0xE000-0xFDFF - Echo RAM (4 KB)
0xe000..=0xfdff => return self.ram[(addr & 0x1fff) as usize],
_ => {}

Copilot uses AI. Check for mistakes.

// 0xC000-0xCFFF - Working RAM 0 (4 KB)
0xc000..=0xcfff => self.ram[(addr & 0x0fff) as usize] = value,
_ => unreachable!(),
Copy link

Copilot AI Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the read function, this unreachable!() could panic if the range check doesn't perfectly align with the match patterns. This creates a potential runtime failure point.

Suggested change
_ => unreachable!(),
_ => {
warnln!("MMU::write: Unexpected RAM address: {:#06x}", addr);
// Optionally, do nothing or handle as needed
}

Copilot uses AI. Check for mistakes.
}

return 20;
_ => unreachable!("Invalid interrupt bit"),
Copy link

Copilot AI Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trailing_zeros() function can return values greater than 4 if no bits are set or if higher bits are set. Since pending is masked to only include valid interrupt bits (0-4), this should be safe, but the unreachable could still trigger if the bitmask logic has bugs.

Suggested change
_ => unreachable!("Invalid interrupt bit"),
_ => {
debugln!("Warning: Invalid interrupt bit {} encountered in CPU interrupt handler", interrupt_bit);
return 0;
}

Copilot uses AI. Check for mistakes.
let mut i = self.audio_buffer_tail;
for _ in 0..self.audio_buffer_size {
deque.push_back(self.audio_buffer[i]);
i = (i + 1) % self.audio_buffer_max;
Copy link

Copilot AI Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modulo operation in a tight loop can be expensive. Consider using bitwise AND with (buffer_size - 1) if the buffer size is guaranteed to be a power of 2, or use conditional logic to avoid modulo.

Suggested change
i = (i + 1) % self.audio_buffer_max;
// Use bitwise AND for power-of-two buffer size for performance
i = (i + 1) & (self.audio_buffer_max - 1);

Copilot uses AI. Check for mistakes.
self.audio_buffer[(self.audio_buffer_tail + offset) % self.audio_buffer_max]
})
}

Copy link

Copilot AI Aug 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar modulo operation in iterator that could be optimized. This will be called for every sample access and could impact performance.

Suggested change
let is_pow2 = Self::is_power_of_two(self.audio_buffer_max);
let mask = self.audio_buffer_max - 1;
(0..self.audio_buffer_size).map(move |offset| {
let idx = self.audio_buffer_tail + offset;
let wrapped_idx = if is_pow2 {
idx & mask
} else {
idx % self.audio_buffer_max
};
self.audio_buffer[wrapped_idx]
})
}
/// Returns true if n is a power of two
fn is_power_of_two(n: usize) -> bool {
n != 0 && (n & (n - 1)) == 0
}

Copilot uses AI. Check for mistakes.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

self.clear_audio_buffer();
}
buffer
self.audio_buffer_cache.clone()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Audio Buffer Method Causes Unnecessary Memory Usage

The audio_buffer_eager method introduces multiple unnecessary allocations and copies. It collects samples into a temporary vector, copies them to audio_buffer_cache, and then clones the cache for the return value, which negates the performance benefits of the cache.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants