Skip to content

Conversation

Aditya30ag
Copy link

Fix ununpack agent crashes with system files in archives

Description

Fixes issue #3082 where the ununpack agent crashes when processing archives containing system files like DMI entries.

Problem

The ununpack agent was attempting to process system files (like /sys/firmware/dmi/entries/*/raw, /sys/devices/virtual/dmi/id/*, etc.) as archives, causing crashes and generating numerous "Can not open the file as archive" errors. This issue was particularly evident when processing archives like hwloc-2.9.3.tar.bz2 that contain filesystem dumps with /sys/ directory contents.

Root Cause

The FindCmd() function in src/ununpack/agent/utils.c was not filtering out system files that should never be treated as archives. The function was indiscriminately trying to identify archive types for all files, including system files.

Solution

Added comprehensive filtering logic to the FindCmd() function that:

  1. Filters out system directories: /sys/, /proc/, /dev/, /run/, /var/run/, /tmp/
  2. Filters out specific system paths: /sys/firmware/, /sys/devices/, /sys/class/, /sys/kernel/
  3. Filters out problematic file patterns: DMI entries, system device files, etc.
  4. Filters out non-archive file types: text files, images, audio, video, data files, empty files
  5. Adds verbose logging: Shows which files are being skipped when verbose mode is enabled

Changes Made

  • Modified src/ununpack/agent/utils.c in the FindCmd() function
  • Added multiple filtering layers to prevent system files from being processed as archives
  • Added debug logging to help with troubleshooting
  • Maintained backward compatibility for legitimate archive files

Testing

  • Created and ran a test program that verifies the fix correctly filters out the problematic files from the original issue
  • Confirmed that legitimate archive files (.tar, .tar.bz2, .zip, etc.) are still processed normally
  • All problematic file paths from the original issue are now properly filtered out

Files Changed

  • src/ununpack/agent/utils.c - Added filtering logic to prevent system files from being processed as archives

Impact

This fix prevents:

  • ununpack agent crashes when processing archives with system files
  • "Can not open the file as archive" errors for system files
  • Agent being killed by signal 9 due to excessive processing
  • Container crashes when processing complex archives

Backward Compatibility

✅ This change maintains full backward compatibility. Only system files that should never be processed as archives are filtered out. All legitimate archive files continue to work normally.

Related Issues

Fixes #3082

Test Results

=== Testing Problematic Files (should be filtered out) ===
✓ PASS: /sys/firmware/dmi/entries/160-0/raw - correctly filtered out
✓ PASS: /sys/devices/virtual/dmi/id/chassis_asset_tag - correctly filtered out
✓ PASS: /sys/devices/virtual/dmi/id/board_asset_tag - correctly filtered out
✓ PASS: /sys/devices/virtual/dmi/id/product_version - correctly filtered out
✓ PASS: /sys/devices/system/node/has_cpu - correctly filtered out

=== Testing Legitimate Files (should be processed) ===
✓ PASS: test.tar.bz2 - would be processed normally
✓ PASS: archive.tar - would be processed normally
✓ PASS: package.zip - would be processed normally

Deployment Notes

  • No configuration changes required
  • No database migrations needed
  • Safe to deploy to production environments
  • Recommended to test with the original hwloc-2.9.3.tar.bz2 file that caused the issue

@Kaushl2208
Copy link
Member

Umm @Aditya30ag , Before proceeding with testing this pull request, I would like to address a couple of points. Firstly, thank you for providing such a detailed description. Secondly, regarding the screenshot you posted that shows files being exempted and the ununpack agent working correctly, is this part of a test suite you have developed? If so, such a level of testing would be highly appreciated in ununpack test suite.

Otherwise, don't you think uploading and testing a component (specifically hwloc-2.9.3.tar.bz2) through the UI or REST API might have been more practical?

@Aditya30ag
Copy link
Author

Thank you @Kaushl2208 for the detailed feedback and for appreciating the comprehensive description!
Regarding the test suite question:
You're absolutely right - the testing I showed in the screenshot is from a custom test program I created to validate the fix. It's not currently part of the formal ununpack test suite, but I'd be very happy to integrate it!
I created a standalone test program that:
Tests the specific problematic file paths from issue #3082
Verifies that legitimate archive files are still processed correctly
Validates the filtering logic works as expected

I can absolutely convert this into a proper test case for the ununpack test suite. Would you prefer:
A unit test that directly tests the FindCmd() function with the problematic file paths?
An integration test that processes a sample archive containing system files?
Both approaches for comprehensive coverage?
Regarding UI/REST API testing:
You make an excellent point about practical testing. I did test the fix with the actual hwloc-2.9.3.tar.bz2 file locally, and it resolved the crashes. However, you're right that demonstrating this through the UI or REST API would be more practical and representative of real-world usage.

@Aditya30ag
Copy link
Author

image

@Aditya30ag Aditya30ag force-pushed the fix-ununpack-system-files-issue-3082 branch from 5c6dd44 to 0754d41 Compare July 6, 2025 11:53
Fixes issue fossology#3082. The ununpack agent was crashing when processing
archives containing system files such as DMI entries.
@Aditya30ag Aditya30ag force-pushed the fix-ununpack-system-files-issue-3082 branch from 0754d41 to 5c9264d Compare July 6, 2025 11:55
@Kaushl2208
Copy link
Member

Hey @Aditya30ag,

Thanks for addressing the issue and putting together a thorough set of filters. I reviewed the changes, and while the solution effectively targets the immediate problem preventing system files (e.g., /sys/firmware/dmi/entries/*, etc.) from being misinterpreted as archives, I have a few thoughts from a long-term maintainability and extensibility standpoint.

The current approach adds a growing list of hardcoded paths and filename patterns to be excluded. While this solves the specific case raised in the issue, it introduces a maintenance concern. Filesystem structures and dump contents can vary widely across distributions and tools. Relying on static string matching (e.g., /sys/, /proc/, file name substrings like board_, etc.) creates a slippery slope we’ll likely encounter new cases that require yet another round of patching.

In fact, PR #3087 addresses this more generically by introducing user-configurable exclusion logic. This gives users control over what should or shouldn’t be unpacked based on their context, and helps prevent us from encoding every possible edge case directly into the codebase.

Appreciate your work on this. Let’s aim for a solution that not only resolves this issue but also scales better as the variety of archive contents continues to grow.

CC: @shaheemazmalmmd

@Kaushl2208 Kaushl2208 closed this Jul 10, 2025
@Aditya30ag
Copy link
Author

Hey @Kaushl2208 ,

Thanks for the detailed feedback and for reviewing the changes so thoroughly. You raise a very valid point about the long-term maintainability concerns of relying on hardcoded path and filename filters.

The solution I proposed was indeed targeted to resolve the immediate issue at hand by explicitly preventing known problematic system paths (like /sys/firmware/dmi/entries/*) from being misinterpreted and unpacked. However, I completely agree that this approach can become unsustainable over time as we encounter new edge cases across different environments and distributions.

You're absolutely right that PR #3087 takes a more scalable and extensible direction by introducing user-configurable exclusion logic. That approach empowers users to tailor unpacking behavior to their specific needs, without us having to continuously expand hardcoded rules in the codebase.

I'm aligned with the broader goal of building a robust and adaptable solution. If the team is in favor, I’m happy to pivot this effort toward supporting or refining the logic introduced in PR #3087, and perhaps merging in the path-specific filters as default configurations or examples rather than enforcing them in code.

Appreciate your insights — this kind of collaborative thinking really helps us move toward cleaner and more future-proof solutions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fossology has issues with OSS that contains a lot of archives in the code

2 participants