Skip to content

Support file ownership when using file source #3345

@adammcclenaghan

Description

@adammcclenaghan

What would you like to be added:
Today, some of the catalogers support the concept of 'File Ownership', specifically catalogers which implement type FileOwner interface

For example, if I scan my DPKG directory using a directory source, artifact metadata contains entries on which files are owned by my DPKG installation. Take curl as an example:

syft -o syft-json dir:/var/lib/dpkg | jq '.artifacts[] | select(.name == "curl") | .metadata.files'

[
  {
    "path": "/usr/bin/curl",
    "digest": {
      "algorithm": "md5",
      "value": "fb9a88e8023f2fb2a0f475d1c85d8dcb"
    },
    "isConfigFile": false
  },
  {
    "path": "/usr/share/doc/curl/copyright",
    "digest": {
      "algorithm": "md5",
      "value": "39782ccc3532fee98360f19e317c6707"
    },
    "isConfigFile": false
  },
  {
    "path": "/usr/share/man/man1/curl.1.gz",
    "digest": {
      "algorithm": "md5",
      "value": "1326b53b4e64bf16ed6558a94496a0e8"
    },
    "isConfigFile": false
  },
  {
    "path": "/usr/share/zsh/vendor-completions/_curl",
    "digest": {
      "algorithm": "md5",
      "value": "1fe4ab18bfb8fe595c42534a37ab27a3"
    },
    "isConfigFile": false
  }
]

However, when scanning with file source, we see no file metadata associated with the DPKG installation

syft -o syft-json file:/var/lib/dpkg/status | jq '.artifacts[] | select(.name == "curl")'

{
  "id": "768c7f6773e9852e",
  "name": "curl",
  "version": "7.81.0-1ubuntu1.18",
  "type": "deb",
  "foundBy": "dpkg-db-cataloger",
  "locations": [
    {
      "path": "/status",
      "accessPath": "/status",
      "annotations": {
        "evidence": "primary"
      }
    }
  ],
  "licenses": [],
  "language": "",
  "cpes": [
    {
      "cpe": "cpe:2.3:a:curl:curl:7.81.0-1ubuntu1.18:*:*:*:*:*:*:*",
      "source": "syft-generated"
    }
  ],
  "purl": "",
  "metadataType": "dpkg-db-entry",
  "metadata": {
    "package": "curl",
    "source": "",
    "version": "7.81.0-1ubuntu1.18",
    "sourceVersion": "",
    "architecture": "amd64",
    "maintainer": "Ubuntu Developers <[email protected]>",
    "installedSize": 444,
    "depends": [
      "libc6 (>= 2.34)",
      "libcurl4 (= 7.81.0-1ubuntu1.18)",
      "zlib1g (>= 1:1.1.4)"
    ],
    "files": []
  }
}

This makes sense since using a file source will cause the file resolver to only index the target file and its containing directory. So when the DPKG cataloger tries to resolve the 'Infos' directory after parsing the DPKG DB, the index will contain no entries & it will fail to resolve the file ownership metadata.

However, as a user, I do not know that I have missing metadata here unless I go and read the cataloger implementation and understand that it requires more than the scanned file to correctly populate its results.

I would like to start a discussion here regarding how feasible it would be to make catalogers 'aware' of the fact that they require > 1 file to successfully perform all of their work.

In the case of DPKG for example, if it knows that we're scanning using a file source, it could then perform a 'second pass' and attempt to index the Infos or status.d directories used to determine file ownership so that the resolver passed to findDpkgInfoFiles can find owned files despite using a file source.

Why is this needed:
When I scan with file source, I'd like the catalogers to provide me with complete results even when a suitable cataloger requires more than one file to perform its work.

Additional context:

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions