Skip to content

feat: Reduce ipsw diff memory usage#986

Open
PoomSmart wants to merge 3 commits intoblacktop:masterfrom
PoomSmart:feat/diff-low-memory
Open

feat: Reduce ipsw diff memory usage#986
PoomSmart wants to merge 3 commits intoblacktop:masterfrom
PoomSmart:feat/diff-low-memory

Conversation

@PoomSmart
Copy link

@PoomSmart PoomSmart commented Dec 14, 2025

Problem

ipsw diff consumed 48GB+ RAM, crashing on 16GB and lower memory machines.

Changes

This PR contains improvements for in-memory diffing and the addition of disk-cached diffing.

1. Streaming DMG Traversal (internal/search/search.go)

  • File paths are now streamed directly to handlers instead of collecting into []string slices
  • Eliminates gigabytes of path string accumulation for large DMGs

2. Per-Image DSC Diff (internal/commands/dsc/diff.go)

  • Diff images one-at-a-time instead of building two huge map[string]*DiffInfo maps
  • Close MachO handles immediately after comparison

3. Optional Disk Caching (internal/commands/macho/diff.go)

  • Added --low-memory flag for low memory machines
  • Default: fast in-memory mode (original behavior)
  • Low-memory: caches old IPSW's DiffInfo to temp gob files, loads on-demand

4. Stream gob.Encode (internal/diff/diff.go)

  • Save() now streams directly to file instead of buffering in bytes.Buffer

Usage

# Default (fast, needs large amount of RAM)
ipsw diff old.ipsw new.ipsw -o out -m

# Low-memory mode (slower, works on low memory machine)
ipsw diff old.ipsw new.ipsw -o out -m --low-memory

Fixes #985

@PoomSmart PoomSmart marked this pull request as ready for review December 20, 2025 04:35
@blacktop
Copy link
Owner

Have you done a speed comparison between low-memory and normal mode? or are you not able to due to the limitations of your host?

@PoomSmart
Copy link
Author

@blacktop Yeah unfortunately I cannot perform normal mode test. My machine just doesn't have enough RAM.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@PoomSmart
Copy link
Author

Hi, have you got a chance to look at this further? @blacktop

@blacktop
Copy link
Owner

blacktop commented Feb 9, 2026

I reviewed this and found two issues we should fix before merge:

  1. Low-memory matching edge case can misclassify files as New

In --low-memory mode we use one map both for “exists in old” and “still unmatched old”.
Trigger: if the same relative path is encountered more than once during the new-side walk (for example via duplicate mount-relative paths/symlink traversal), the first match deletes the key, and later occurrences of that same path are treated as New even though they existed in old.
Impact: false positives in New/Removed classification.

(though I think this might be a pre-existing issue?)

  1. DSC diff fidelity regression due to dropped local/private symbol merge

The refactor no longer calls ParseLocalSymbols(false) and no longer appends local symbols to the Mach-O symtab before GenerateDiffInfo.
Trigger: any dyld shared cache image where useful symbol identity comes from local/private symbols (common when exported symbols are sparse/obfuscated).
Impact: weaker symbol/function naming in the diff output and potentially noisier/less actionable DSC diffs vs current master.

@PoomSmart
Copy link
Author

Right, thanks for pointing out.

Low-memory matching edge case can misclassify files as New

I changed the low-memory path to track match state instead of deleting keys (map switched from map[string]struct{} -> map[string]bool, set to true once matched). This prevents later duplicate occurrences of the same new-side path from being classified as New.

DSC diff fidelity regression due to dropped local/private symbol merge

I added back ParseLocalSymbols(false) call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ipsw diff is taking too much memory

2 participants