-
Notifications
You must be signed in to change notification settings - Fork 381
Open
Description
The current combinator.c
implementation has an O(n*m) file I/O bottleneck:
- File2 is completely re-read for every line in file1 (line 217:
rewind(fd2)
) - For a 10,000 × 1,000 line combination, this results in 10 million file reads instead of 1
- Performance degrades exponentially as file1 size increases
Performance Impact
Stress Test Results (10,000 × 1,000 lines = 10M combinations):
- Current implementation: Timeout (>60 seconds)
- Optimized implementation: <1 second
- Speedup: >60x improvement
Memory usage remains similar (~33MB for both versions)
Proposed Solution
I've created combinator_optimized.c
that implements:
- File2 Memory Caching: Load entire file2 once into memory, eliminating all rewind operations
- Enhanced Error Handling: Added malloc() null pointer checks to prevent segfaults
- Combined Line Processing: Single-pass carriage return stripping
- Improved Buffer Management: More efficient I/O batching
Metadata
Metadata
Assignees
Labels
No labels