-
Notifications
You must be signed in to change notification settings - Fork 36
Claude performance improvements #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Replaced VecDeque with a circular buffer implementation in APU for improved performance and reduced allocations. - Enhanced CPU interrupt handling by optimizing flag collection and using a priority-based dispatch mechanism. - Introduced batch execution for CPU cycles to minimize function call overhead. - Updated audio buffer API for better compatibility and performance.
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (3)
src/apu.rs (1)
950-960: Compatibilityaudio_buffer()copies — acceptable, but note the cost.Returning an owned VecDeque provides a safe API but incurs O(n) copy. Callers on hot paths should prefer
audio_buffer_iter()(or the eager path in GameBoy) to avoid allocations.src/gb.rs (2)
425-429: Owned audio buffer in trait simplifies lifetimes but increases copies.The API change to return
VecDeque<i16>avoids borrow issues but forces allocations. If you want the best of both worlds, consider adding a secondary method (non-breaking) to the trait that exposes an iterator or slice pair for zero-copy consumers while retaining this owned fallback.
824-833: Avoid the intermediate Vec inaudio_buffer_eagerto cut one copy.You can extend the cache directly from the iterator, eliminating the temporary
samplesVec.Apply this diff:
- // collect samples first to avoid borrowing issues - let samples: Vec<i16> = self.apu().audio_buffer_iter().collect(); - self.audio_buffer_cache.clear(); - self.audio_buffer_cache.extend(samples); + // extend cache directly from iterator to avoid an extra allocation/copy + self.audio_buffer_cache.clear(); + self.audio_buffer_cache.extend(self.apu().audio_buffer_iter());
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
src/apu.rs(4 hunks)src/cpu.rs(3 hunks)src/gb.rs(7 hunks)src/mmu.rs(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
src/apu.rs (1)
src/gb.rs (5)
audio_buffer(427-427)audio_buffer(1256-1258)audio_buffer(1648-1650)clear_audio_buffer(428-428)clear_audio_buffer(1652-1654)
src/mmu.rs (2)
src/ppu.rs (4)
read(938-1005)read(2229-2231)write(1007-1152)write(2233-2235)src/rom.rs (4)
read(429-440)read(936-938)write(442-450)write(940-942)
src/gb.rs (1)
src/apu.rs (1)
audio_buffer(952-960)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (22)
- GitHub Check: Build SDL (latest)
- GitHub Check: Build Libretro Android (latest)
- GitHub Check: Build Libretro Android (1.85.0)
- GitHub Check: Build Libretro Android (1.82.0)
- GitHub Check: Build SDL (1.85.0)
- GitHub Check: Build Libretro Android (1.83.0)
- GitHub Check: Build WASM (1.85.0, 18)
- GitHub Check: Build Libretro Android (1.84.0)
- GitHub Check: Build WASM (latest, 18)
- GitHub Check: Build SDL (1.84.0)
- GitHub Check: Build WASM (1.82.0, 18)
- GitHub Check: Build WASM (1.84.0, 18)
- GitHub Check: Build WASM (1.83.0, 18)
- GitHub Check: Build SDL (1.83.0)
- GitHub Check: Build SDL (1.82.0)
- GitHub Check: Build (1.82.0)
- GitHub Check: Build (latest)
- GitHub Check: Build (1.85.0)
- GitHub Check: Build (1.83.0)
- GitHub Check: Build (1.84.0)
- GitHub Check: Build Windows
- GitHub Check: Build Mac
🔇 Additional comments (15)
src/mmu.rs (2)
387-403: Boot ROM mapping and fast ROM path look correct.
- DMG boot: 0x0000–0x00FF gated by BOOT.
- CGB boot: 0x0000–0x00FF and 0x0200–0x08FF when
boot_active; 0x0100–0x01FF goes to cartridge (header overlay).- Early return on
addr <= 0x7fffpreserves cartridge/MBC behavior.
404-486: I/O decode maintained and consistent.VRAM/OAM/HRAM and device-mapped registers are preserved post fast-paths.
KEY1(0xFF4D) andSVBK(0xFF70) semantics match expected behavior.src/cpu.rs (2)
171-179: Flags aggregation via bit ops is clear and efficient.Compact and branchless interrupt flag collection; semantics preserved.
189-231: Priority-based interrupt dispatch is correct and simpler.
trailing_zeros()selects the lowest set bit (VBlank highest priority).- Common prologue (DI, push PC, clear HALT) avoids duplication.
- Acks routed to respective components.
src/apu.rs (6)
197-204: Circular buffer design is a good fit for audio; zero-alloc steady-state.Replacing VecDeque with a ring buffer plus head/tail/size avoids reallocation and memmoves under sustained throughput.
293-304: Initialization pre-allocates to max capacity.Buffer sized to sampling_rate * buffer_size * channels; indices initialized correctly.
433-441: Correctly pushes interleaved channel samples via ring buffer.Left and right samples are filtered and enqueued with overwrite-on-full semantics.
962-967: Iterator API is lean and correct.Produces samples in chronological order starting from tail for
sizeelements.
969-973: Buffer clear resets indices only.O(1) reset with stable capacity — ideal for real-time scenarios.
975-987: Ring push logic handles full/empty correctly.
- On full: advance tail, keep size at max.
- On not full: increment size.
- Then write at head and advance.
No off-by-one; modulo arithmetic correct.
src/gb.rs (5)
525-527: Pre-allocatedaudio_buffer_cacheis a sensible optimization.Reduces transient allocations for eager audio extraction.
688-692: Copying frame buffer on VBlank avoids borrow conflicts.This resolves aliasing during the clock loop; minimal overhead, only on frame change.
814-822: Direct vector return for frame buffers is clean.No intermediate caching; matches the new borrowing strategy.
1256-1258: Owned buffer passthrough aligns GameBoy with new APU API.Method now returns a
VecDeque<i16>as per the updated trait.
1648-1650: AudioProvider impl delegates correctly to APU.Consistent with the trait change and APU’s new
audio_buffer().
| /// Optimized batch execution of multiple CPU cycles | ||
| /// Reduces function call overhead for better performance | ||
| #[inline] | ||
| pub fn clock_batch(&mut self, target_cycles: u16) -> u16 { | ||
| let mut total_cycles = 0u16; | ||
|
|
||
| while total_cycles < target_cycles { | ||
| // fast path for non-interrupt cases | ||
| if !self.ime || self.mmu.ie == 0 { | ||
| total_cycles += self.clock_fast() as u16; | ||
| } else { | ||
| total_cycles += self.clock() as u16; | ||
| } | ||
|
|
||
| if total_cycles.saturating_add(20) > target_cycles { | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| total_cycles | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Batch clock fast-path can starve HALT release when IME is disabled.
clock_batch calls clock_fast whenever !self.ime || self.mmu.ie == 0. However, clock_fast does not implement the GB quirk “release HALT when an interrupt is pending with IME=0”. This causes the CPU to remain halted forever in that scenario.
Fix by adding the HALT release check to clock_fast before the HALT early return. See next comment for patch.
Additionally, the “overshoot guard” using a hardcoded 20-cycle margin is conservative. Optional: factor a constant or compute a tighter bound (e.g., 16 for most opcodes, 20 only if IME and pending) to improve batching efficiency.
🤖 Prompt for AI Agents
In src/cpu.rs around lines 285 to 306, clock_batch fast-path calls clock_fast
when !self.ime || self.mmu.ie == 0 but clock_fast lacks the GameBoy quirk that
releases HALT when an interrupt is pending while IME == 0, which can leave the
CPU halted forever; modify clock_fast to perform the HALT-release check (detect
a pending interrupt via mmu.ie & mmu.if and IME==0 and HALT state) before
returning early from HALT so the CPU exits HALT correctly, and keep clock_batch
unchanged except optionally replace the hardcoded 20-cycle overshoot guard with
a named constant or tighten it (e.g., 16 normally, 20 only when IME/pending) to
improve batching efficiency.
| /// Fast path clock without interrupt checking for better performance | ||
| #[inline(always)] | ||
| fn clock_fast(&mut self) -> u8 { | ||
| if self.halted { | ||
| return 4; | ||
| } | ||
|
|
||
| let pc = self.pc; | ||
|
|
||
| // simplified fetch for common cases | ||
| let (inst, new_pc, _, _) = self.fetch(pc); | ||
| self.ppc = self.pc; | ||
| self.pc = new_pc; | ||
|
|
||
| let (inst_fn, inst_time, _) = inst; | ||
|
|
||
| self.cycles = 0; | ||
| inst_fn(self); | ||
| self.cycles = self.cycles.wrapping_add(*inst_time); | ||
|
|
||
| self.cycles | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add HALT release on pending interrupts to clock_fast.
Implement the IME=0 pending-interrupt check so HALT is released like in clock().
Apply this diff:
#[inline(always)]
fn clock_fast(&mut self) -> u8 {
- if self.halted {
+ // Release HALT if an interrupt is pending while IME is disabled (GB quirk)
+ if self.halted && !self.ime {
+ let mut flags = 0u8;
+ flags |= self.mmu.ppu().int_vblank() as u8;
+ flags |= (self.mmu.ppu().int_stat() as u8) << 1;
+ flags |= (self.mmu.timer().int_tima() as u8) << 2;
+ flags |= (self.mmu.serial().int_serial() as u8) << 3;
+ flags |= (self.mmu.pad().int_pad() as u8) << 4;
+ let pending = flags & self.mmu.ie;
+ if pending != 0 {
+ self.halted = false;
+ }
+ }
+ if self.halted {
return 4;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Fast path clock without interrupt checking for better performance | |
| #[inline(always)] | |
| fn clock_fast(&mut self) -> u8 { | |
| if self.halted { | |
| return 4; | |
| } | |
| let pc = self.pc; | |
| // simplified fetch for common cases | |
| let (inst, new_pc, _, _) = self.fetch(pc); | |
| self.ppc = self.pc; | |
| self.pc = new_pc; | |
| let (inst_fn, inst_time, _) = inst; | |
| self.cycles = 0; | |
| inst_fn(self); | |
| self.cycles = self.cycles.wrapping_add(*inst_time); | |
| self.cycles | |
| } | |
| /// Fast path clock without interrupt checking for better performance | |
| #[inline(always)] | |
| fn clock_fast(&mut self) -> u8 { | |
| // Release HALT if an interrupt is pending while IME is disabled (GB quirk) | |
| if self.halted && !self.ime { | |
| let mut flags = 0u8; | |
| flags |= self.mmu.ppu().int_vblank() as u8; | |
| flags |= (self.mmu.ppu().int_stat() as u8) << 1; | |
| flags |= (self.mmu.timer().int_tima() as u8) << 2; | |
| flags |= (self.mmu.serial().int_serial() as u8) << 3; | |
| flags |= (self.mmu.pad().int_pad() as u8) << 4; | |
| let pending = flags & self.mmu.ie; | |
| if pending != 0 { | |
| self.halted = false; | |
| } | |
| } | |
| if self.halted { | |
| return 4; | |
| } | |
| let pc = self.pc; | |
| // simplified fetch for common cases | |
| let (inst, new_pc, _, _) = self.fetch(pc); | |
| self.ppc = self.pc; | |
| self.pc = new_pc; | |
| let (inst_fn, inst_time, _) = inst; | |
| self.cycles = 0; | |
| inst_fn(self); | |
| self.cycles = self.cycles.wrapping_add(*inst_time); | |
| self.cycles | |
| } |
| if (0xc000..=0xfdff).contains(&addr) { | ||
| return match addr { | ||
| 0xc000..=0xcfff => self.ram[(addr & 0x0fff) as usize], | ||
| 0xd000..=0xdfff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize], | ||
| 0xe000..=0xfdff => self.ram[(addr & 0x1fff) as usize], | ||
| _ => unreachable!(), | ||
| }; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Echo RAM (0xE000–0xFDFF) ignores SVBK bank — fix mapping for CGB correctness.
The fast-path maps Echo RAM with (addr & 0x1fff) which always targets WRAM bank 0 for the upper 4 KiB. On CGB, 0xF000–0xFDFF must echo the currently selected WRAM bank (SVBK). This breaks games relying on banked WRAM through the Echo region.
Refactor to mirror 0xC000–0xCFFF and 0xD000–0xDFFF behavior:
- 0xE000–0xEFFF → bank 0 (like 0xC000–0xCFFF)
- 0xF000–0xFDFF → selected bank (like 0xD000–0xDFFF), honoring
ram_offset
Apply this diff:
if (0xc000..=0xfdff).contains(&addr) {
return match addr {
0xc000..=0xcfff => self.ram[(addr & 0x0fff) as usize],
0xd000..=0xdfff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize],
- 0xe000..=0xfdff => self.ram[(addr & 0x1fff) as usize],
+ 0xe000..=0xefff => self.ram[(addr & 0x0fff) as usize],
+ 0xf000..=0xfdff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize],
_ => unreachable!(),
};
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (0xc000..=0xfdff).contains(&addr) { | |
| return match addr { | |
| 0xc000..=0xcfff => self.ram[(addr & 0x0fff) as usize], | |
| 0xd000..=0xdfff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize], | |
| 0xe000..=0xfdff => self.ram[(addr & 0x1fff) as usize], | |
| _ => unreachable!(), | |
| }; | |
| } | |
| if (0xc000..=0xfdff).contains(&addr) { | |
| return match addr { | |
| 0xc000..=0xcfff => self.ram[(addr & 0x0fff) as usize], | |
| 0xd000..=0xdfff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize], | |
| 0xe000..=0xefff => self.ram[(addr & 0x0fff) as usize], | |
| 0xf000..=0xfdff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize], | |
| _ => unreachable!(), | |
| }; | |
| } |
🤖 Prompt for AI Agents
In src/mmu.rs around lines 378 to 385, the Echo RAM fast-path incorrectly maps
0xE000–0xFDFF using (addr & 0x1fff) which always targets bank 0; change the
match to mirror 0xC000/0xD000 behavior: 0xC000..=0xCFFF -> ram[(addr & 0x0fff)
as usize], 0xD000..=0xDFFF -> ram[(self.ram_offset + (addr & 0x0fff)) as usize],
0xE000..=0xEFFF -> ram[(addr & 0x0fff) as usize], 0xF000..=0xFDFF ->
ram[(self.ram_offset + (addr & 0x0fff)) as usize], keeping the unreachable!()
default; this ensures 0xF000–0xFDFF honors the selected WRAM bank (SVBK) on CGB.
| // fast path for RAM writes (most common) | ||
| if (0xc000..=0xfdff).contains(&addr) { | ||
| match addr { | ||
| 0xc000..=0xcfff => self.ram[(addr & 0x0fff) as usize] = value, | ||
| 0xd000..=0xdfff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize] = value, | ||
| 0xe000..=0xfdff => self.ram[(addr & 0x1fff) as usize] = value, | ||
| _ => unreachable!(), | ||
| } | ||
| return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Write path has the same Echo RAM bank mapping bug.
Writes to 0xF000–0xFDFF should go to the currently selected WRAM bank, not bank 0. Mirror the read fix in the write fast-path.
Apply this diff:
if (0xc000..=0xfdff).contains(&addr) {
match addr {
- 0xc000..=0xcfff => self.ram[(addr & 0x0fff) as usize] = value,
- 0xd000..=0xdfff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize] = value,
- 0xe000..=0xfdff => self.ram[(addr & 0x1fff) as usize] = value,
+ 0xc000..=0xcfff => self.ram[(addr & 0x0fff) as usize] = value,
+ 0xd000..=0xdfff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize] = value,
+ 0xe000..=0xefff => self.ram[(addr & 0x0fff) as usize] = value,
+ 0xf000..=0xfdff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize] = value,
_ => unreachable!(),
}
return;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // fast path for RAM writes (most common) | |
| if (0xc000..=0xfdff).contains(&addr) { | |
| match addr { | |
| 0xc000..=0xcfff => self.ram[(addr & 0x0fff) as usize] = value, | |
| 0xd000..=0xdfff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize] = value, | |
| 0xe000..=0xfdff => self.ram[(addr & 0x1fff) as usize] = value, | |
| _ => unreachable!(), | |
| } | |
| return; | |
| // fast path for RAM writes (most common) | |
| if (0xc000..=0xfdff).contains(&addr) { | |
| match addr { | |
| 0xc000..=0xcfff => self.ram[(addr & 0x0fff) as usize] = value, | |
| 0xd000..=0xdfff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize] = value, | |
| 0xe000..=0xefff => self.ram[(addr & 0x0fff) as usize] = value, | |
| 0xf000..=0xfdff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize] = value, | |
| _ => unreachable!(), | |
| } | |
| return; | |
| } |
🤖 Prompt for AI Agents
In src/mmu.rs around lines 493 to 501, the write fast-path incorrectly maps
0xE000..=0xFDFF to bank 0 the same way for the whole 0xE000-0xFDFF range; split
the 0xE000..=0xFDFF case into 0xE000..=0xEFFF (mirror of C000, use (addr &
0x0fff) -> bank 0) and 0xF000..=0xFDFF (mirror of D000, use (self.ram_offset +
(addr & 0x0fff)) -> selected WRAM bank) so writes to 0xF000..0xFDFF go to the
currently selected WRAM bank, mirroring the read-path fix.
… and optimized memory access - Added comprehensive documentation for the `read` and `write` methods, explaining their functionality and arguments. - Improved memory access optimization by implementing branch prediction hints and switch statements for handling different memory regions. - Clarified memory region mappings in comments for better understanding of the address ranges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
src/mmu.rs (2)
394-396: Echo RAM read path ignores SVBK; 0xF000–0xFDFF must mirror selected WRAM bank (CGB).Currently, the entire 0xE000–0xFDFF range is indexed with
(addr & 0x1fff), which always targets bank 0. On CGB, 0xF000–0xFDFF must mirror the currently selected WRAM bank (SVBK), like 0xD000–0xDFFF. This breaks titles that use banked WRAM through the echo window.Fix by splitting the range to mirror 0xC000/0xD000 behavior:
- 0xE000–0xEFFF → bank 0 (
addr & 0x0fff)- 0xF000–0xFDFF → selected bank (
self.ram_offset + (addr & 0x0fff))Also correct the comment which states “(4 KB)”; the echo region is 0x1E00 bytes mirrored.
Apply this diff:
- // 0xE000-0xFDFF - Echo RAM (4 KB) - 0xe000..=0xfdff => self.ram[(addr & 0x1fff) as usize], + // 0xE000-0xEFFF - Echo RAM (bank 0 mirror) + 0xe000..=0xefff => self.ram[(addr & 0x0fff) as usize], + // 0xF000-0xFDFF - Echo RAM (selected bank mirror) + 0xf000..=0xfdff => self.ram[(self.ram_offset + (addr & 0x0fff)) as usize],
530-532: Echo RAM write path has the same bank mapping bug; mirror selected bank for 0xF000–0xFDFF.Writes to 0xF000–0xFDFF should target the currently selected WRAM bank (SVBK), not bank 0. Mirror the read-path fix for correctness.
Apply this diff:
- // 0xE000-0xFDFF - Echo RAM (4 KB) - 0xe000..=0xfdff => self.ram[(addr & 0x1fff) as usize] = value, + // 0xE000-0xEFFF - Echo RAM (bank 0 mirror) + 0xe000..=0xefff => self.ram[(addr & 0x0fff) as usize] = value, + // 0xF000-0xFDFF - Echo RAM (selected bank mirror) + 0xf000..=0xfdff => { + self.ram[(self.ram_offset + (addr & 0x0fff)) as usize] = value + },
🧹 Nitpick comments (5)
src/mmu.rs (5)
372-377: Doc wording nit: “switch statement” → “match expression.”Rust uses match expressions; consider updating the comment to avoid confusion.
- /// prediction hints and a switch statement to handle + /// prediction hints and a match expression to handle - /// the different memory regions. + /// the different memory regions.
475-477: KEY1 (0xFF4D) read: consider returning 0xFF on DMG for accuracy.On DMG, KEY1 is unmapped/unused; many implementations return 0xFF. Optional, but improves compatibility with DMG-only titles/tests.
- 0xff4d => (if self.switching { 0x01 } else { 0x00 }) | ((self.speed as u8) << 7) | 0x7e, + 0xff4d => { + if self.mode == GameBoyMode::Cgb { + (if self.switching { 0x01 } else { 0x00 }) | ((self.speed as u8) << 7) | 0x7e + } else { + 0xff + } + }
600-602: KEY1 (0xFF4D) write: optionally ignore writes on DMG.Writing KEY1 is CGB-only; on DMG it should be ignored. Not critical, but improves behavioral fidelity.
- 0xff4d => self.switching = value & 0x01 == 0x01, + 0xff4d => { + if self.mode == GameBoyMode::Cgb { + self.switching = value & 0x01 == 0x01; + } + }
658-667: Minor: read_many could take &self (not &mut self).read_many only calls self.read(&self), so it doesn’t need &mut. Keeping it &self can reduce borrow restrictions in callers. Purely optional.
- pub fn read_many(&mut self, addr: u16, count: u16) -> Vec<u8> { + pub fn read_many(&self, addr: u16, count: u16) -> Vec<u8> {
372-399: Add tests to lock the Echo RAM mirroring semantics.Once the Echo fixes are applied, consider adding regression tests that assert:
- 0xE000–0xEFFF mirrors bank 0 (0xC000–0xCFFF)
- 0xF000–0xFDFF mirrors selected bank (0xD000–0xDFFF), honoring 0xFF70 (SVBK)
Example to add under the existing #[cfg(test)] mod tests:
#[test] fn test_echo_ram_mirrors_bank0_and_selected_bank() { let mut mmu = Mmu::default(); mmu.allocate_cgb(); mmu.set_mode(GameBoyMode::Cgb); mmu.ppu.set_gb_mode(GameBoyMode::Cgb); // Bank 0 mirror: C000 <-> E000 mmu.write(0xc456, 0x12); assert_eq!(mmu.read(0xe456), 0x12); mmu.write(0xe789, 0x34); assert_eq!(mmu.read(0xc789), 0x34); // Select bank 3 and test D000 <-> F000 mirror mmu.write(0xff70, 0x03); // SVBK = 3 mmu.write(0xd123, 0xab); assert_eq!(mmu.read(0xf123), 0xab); mmu.write(0xf123, 0xcd); assert_eq!(mmu.read(0xd123), 0xcd); // Switch bank and ensure the echo mirrors the new bank mmu.write(0xff70, 0x02); // SVBK = 2 mmu.write(0xd123, 0x55); assert_eq!(mmu.read(0xf123), 0x55); }Also applies to: 510-536
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/mmu.rs(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/mmu.rs (2)
src/ppu.rs (4)
read(938-1005)read(2229-2231)write(1007-1152)write(2233-2235)src/rom.rs (4)
read(429-440)read(936-938)write(442-450)write(940-942)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: Build SDL (1.83.0)
- GitHub Check: Build Documentation (rust:latest)
- GitHub Check: Build Documentation (rust:1.84.0)
- GitHub Check: Build Documentation (rust:1.82.0)
- GitHub Check: Build Documentation (rustlang/rust:nightly)
- GitHub Check: Build Libretro Android (1.85.0)
- GitHub Check: Build WASM (1.84.0, 18)
- GitHub Check: Build Documentation (rust:1.85.0)
- GitHub Check: Build SDL (1.85.0)
- GitHub Check: Build WASM (latest, 18)
- GitHub Check: Build Documentation (rust:1.83.0)
- GitHub Check: Build WASM (1.83.0, 18)
- GitHub Check: Build WASM (1.85.0, 18)
- GitHub Check: Build Libretro Android (1.83.0)
- GitHub Check: Build Libretro Android (1.84.0)
- GitHub Check: Build WASM (1.82.0, 18)
- GitHub Check: Build SDL (latest)
- GitHub Check: Build SDL (1.84.0)
- GitHub Check: Build SDL (1.82.0)
- GitHub Check: Build Libretro Android (latest)
- GitHub Check: Build Libretro Android (1.82.0)
- GitHub Check: Build (1.83.0)
- GitHub Check: Build (1.82.0)
- GitHub Check: Build (1.84.0)
- GitHub Check: Build (1.85.0)
- GitHub Check: Build (latest)
- GitHub Check: Build Mac
- GitHub Check: Build Windows
🔇 Additional comments (2)
src/mmu.rs (2)
401-421: ROM fast-path and boot ROM window handling look good.The split boot ROM mapping for DMG (0x0000–0x00FF) and CGB (additional 0x0200–0x08FF while excluding 0x0100–0x01FF) matches hardware behavior and keeps the hot path tight.
383-385: Inlining hot paths is appropriate here.#[inline(always)] on read/write is reasonable given these functions are tiny dispatchers and are called in the hot loop.
Also applies to: 519-521
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements comprehensive performance optimizations across the audio, CPU, and memory subsystems of the Game Boy emulator to improve emulation speed through reduced memory allocations, better cache locality, and minimized function call overhead.
- Replaced VecDeque-based audio buffer with optimized circular buffer implementation
- Added fast-path CPU execution methods and streamlined interrupt handling
- Optimized MMU memory access patterns with branch prediction hints
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| src/apu.rs | Replaces VecDeque audio buffer with circular buffer using Vec and head/tail tracking |
| src/cpu.rs | Adds optimized interrupt handling and fast-path CPU execution methods |
| src/mmu.rs | Implements fast-path memory access with branch prediction hints for common operations |
| src/gb.rs | Updates audio buffer interfaces and adds pre-allocated cache for audio operations |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| // Buffer is full, advance tail (overwrite oldest) | ||
| self.audio_buffer_tail = (self.audio_buffer_tail + 1) % self.audio_buffer_max; | ||
| } else { | ||
| self.audio_buffer_size += 1; |
Copilot
AI
Aug 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The audio_buffer_size increment happens even when the buffer is full and we're overwriting data. This creates incorrect size tracking. The size should only increase when actually adding new samples, not when overwriting.
| 0xe000..=0xfdff => self.ram[(addr & 0x1fff) as usize], | ||
|
|
||
| _ => unreachable!(), | ||
| }; |
Copilot
AI
Aug 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unreachable!() macro will panic in release builds. Since the condition check (0xc000..=0xfdff).contains(&addr) doesn't guarantee all values will match the subsequent patterns due to potential range gaps, this could cause runtime panics.
| }; | |
| match addr { | |
| // 0xC000-0xCFFF - Work RAM (4 KB) | |
| 0xc000..=0xcfff => return self.ram[(addr & 0x0fff) as usize], | |
| // 0xD000-0xDFFF - Work RAM (4 KB) | |
| 0xd000..=0xdfff => return self.ram[(self.ram_offset + (addr & 0x0fff)) as usize], | |
| // 0xE000-0xFDFF - Echo RAM (4 KB) | |
| 0xe000..=0xfdff => return self.ram[(addr & 0x1fff) as usize], | |
| _ => {} |
|
|
||
| // 0xC000-0xCFFF - Working RAM 0 (4 KB) | ||
| 0xc000..=0xcfff => self.ram[(addr & 0x0fff) as usize] = value, | ||
| _ => unreachable!(), |
Copilot
AI
Aug 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the read function, this unreachable!() could panic if the range check doesn't perfectly align with the match patterns. This creates a potential runtime failure point.
| _ => unreachable!(), | |
| _ => { | |
| warnln!("MMU::write: Unexpected RAM address: {:#06x}", addr); | |
| // Optionally, do nothing or handle as needed | |
| } |
| } | ||
|
|
||
| return 20; | ||
| _ => unreachable!("Invalid interrupt bit"), |
Copilot
AI
Aug 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trailing_zeros() function can return values greater than 4 if no bits are set or if higher bits are set. Since pending is masked to only include valid interrupt bits (0-4), this should be safe, but the unreachable could still trigger if the bitmask logic has bugs.
| _ => unreachable!("Invalid interrupt bit"), | |
| _ => { | |
| debugln!("Warning: Invalid interrupt bit {} encountered in CPU interrupt handler", interrupt_bit); | |
| return 0; | |
| } |
| let mut i = self.audio_buffer_tail; | ||
| for _ in 0..self.audio_buffer_size { | ||
| deque.push_back(self.audio_buffer[i]); | ||
| i = (i + 1) % self.audio_buffer_max; |
Copilot
AI
Aug 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The modulo operation in a tight loop can be expensive. Consider using bitwise AND with (buffer_size - 1) if the buffer size is guaranteed to be a power of 2, or use conditional logic to avoid modulo.
| i = (i + 1) % self.audio_buffer_max; | |
| // Use bitwise AND for power-of-two buffer size for performance | |
| i = (i + 1) & (self.audio_buffer_max - 1); |
| self.audio_buffer[(self.audio_buffer_tail + offset) % self.audio_buffer_max] | ||
| }) | ||
| } | ||
|
|
Copilot
AI
Aug 19, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar modulo operation in iterator that could be optimized. This will be called for every sample access and could impact performance.
| let is_pow2 = Self::is_power_of_two(self.audio_buffer_max); | |
| let mask = self.audio_buffer_max - 1; | |
| (0..self.audio_buffer_size).map(move |offset| { | |
| let idx = self.audio_buffer_tail + offset; | |
| let wrapped_idx = if is_pow2 { | |
| idx & mask | |
| } else { | |
| idx % self.audio_buffer_max | |
| }; | |
| self.audio_buffer[wrapped_idx] | |
| }) | |
| } | |
| /// Returns true if n is a power of two | |
| fn is_power_of_two(n: usize) -> bool { | |
| n != 0 && (n & (n - 1)) == 0 | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is being reviewed by Cursor Bugbot
Details
You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
| self.clear_audio_buffer(); | ||
| } | ||
| buffer | ||
| self.audio_buffer_cache.clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Audio Buffer Method Causes Unnecessary Memory Usage
The audio_buffer_eager method introduces multiple unnecessary allocations and copies. It collects samples into a temporary vector, copies them to audio_buffer_cache, and then clones the cache for the return value, which negates the performance benefits of the cache.
This pull request introduces several performance optimizations and refactorings across the audio, CPU, and memory subsystems of the emulator. The most significant changes include replacing the audio buffer implementation with an optimized circular buffer, streamlining interrupt handling and CPU clock execution, and improving memory access patterns in the MMU. These changes aim to reduce memory allocations, improve cache locality, and minimize function call overhead for better overall emulation speed.
Audio Buffer Optimization
VecDeque<i16>audio buffer inApuwith a pre-allocated circular buffer (Vec<i16>) and added head/tail/size tracking for efficient sample management. Introduced methods for buffer iteration and optimized sample pushing. (src/apu.rs) [1] [2] [3] [4]VecDeque<i16>. (src/gb.rs,src/apu.rs) [1] [2] [3]CPU Performance Improvements
Cputo use bit manipulation and priority-based handling for faster execution. (src/cpu.rs) [1] [2]clock_batch) and a simplifiedclock_fastfunction to minimize overhead during non-interrupt cycles. (src/cpu.rs)Memory Access Optimizations
Mmuread and write methods to use branch prediction hints and handle the most common RAM and ROM access patterns first, improving cache locality and speed. (src/mmu.rs) [1] [2] [3]GameBoy Struct and Buffer Handling
audio_buffer_cachefield toGameBoyfor temporary storage of audio samples, reducing allocations when fetching audio data. Updated related methods for eager buffer access. (src/gb.rs) [1] [2] [3]General Refactoring and Minor Improvements
src/gb.rs) [1] [2]Let me know if you want a deeper explanation of any of these optimizations or how they affect emulator performance!
Summary by CodeRabbit
Refactor
Bug Fixes
Documentation