feat: 4184 ai oci model support #4335

spiffcs · 2025-11-05T16:50:32Z

Description

This PR follows up on #4279 by adding support for a new docker source ocimodelsource (naming pending 😄)

With this change users can do the following: syft -o json docker.io/ai/qwen3-vl | jq .:

They'll get an SBOM with a single package showing the gguf model and details for the model pulled from https://hub.docker.com/u/ai

Example of metadata extracted

 "metadata": {
        "modelFormat": "gguf",
        "modelName": "Qwen3-Vl-8B-Instruct",
        "modelVersion": "unknown",
        "hash": "321c13d3e93151b5",
        "license": "apache-2.0",
        "ggufVersion": 3,
        "architecture": "qwen3vl",
        "quantization": "Q4_K_M",
        "parameters": 8190735360,
        "tensorCount": 399,
        "header": {
          "general.base_model.0.name": "Qwen3 VL 8B Instruct",
          "general.base_model.0.organization": "Qwen",
          "general.base_model.0.repo_url": "https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct",
          "general.base_model.count": 1,
          "general.basename": "Qwen3-Vl-8B-Instruct",
          "general.file_type": 15,
          "general.finetune": "Instruct",
          "general.quantization_version": 2,
          "general.quantized_by": "Unsloth",
          "general.repo_url": "https://huggingface.co/unsloth",
          "general.size_label": "8B",
          "general.tags": {
            "type": 8,
            "len": 2,
            "startOffset": 741,
            "size": 41
          },
          "general.type": "model",
          "quantize.imatrix.chunks_count": 694,
          "quantize.imatrix.dataset": "unsloth_calibration_Qwen3-VL-8B-Instruct.txt",
          "quantize.imatrix.entries_count": 252,
          "quantize.imatrix.file": "Qwen3-VL-8B-Instruct-GGUF/imatrix_unsloth.gguf",
          "qwen3vl.attention.head_count": 32,
          "qwen3vl.attention.head_count_kv": 8,
          "qwen3vl.attention.key_length": 128,
          "qwen3vl.attention.layer_norm_rms_epsilon": 0.000001,
          "qwen3vl.attention.value_length": 128,
          "qwen3vl.block_count": 36,
          "qwen3vl.context_length": 262144,
          "qwen3vl.embedding_length": 4096,
          "qwen3vl.feed_forward_length": 12288,
          "qwen3vl.n_deepstack_layers": 3,
          "qwen3vl.rope.dimension_sections": {
            "type": 5,
            "len": 4,
            "startOffset": 1268,
            "size": 16
          },
          "qwen3vl.rope.freq_base": 5000000,
         "tokenizer.ggml.add_bos_token": false,
          "tokenizer.ggml.bos_token_id": 151643,
          "tokenizer.ggml.eos_token_id": 151645,
          "tokenizer.ggml.merges": {
            "type": 8,
            "len": 151387,
            "startOffset": 3197544,
            "size": 2731548
          },

A larger google doc is being put together to go over the choices made in this PR and changes we need to make so that pt1/pt2 are working together as intended

Type of change

New feature (non-breaking change which adds functionality)

Checklist:

I have tested my code in common scenarios and confirmed there are no regressions
I have added comments to my code, particularly in hard-to-understand sections

Signed-off-by: Christopher Phillips <[email protected]>

* main: (76 commits) feat: snap can be queried by revision and ```track/risk/branch``` (#4439) fix: 4423 dotnet-deps cataloger skips project type by def signpost to docs site (#4483) chore(deps): bump github/codeql-action from 4.31.8 to 4.31.9 (#4481) chore(deps): bump github.com/goccy/go-yaml from 1.19.0 to 1.19.1 (#4482) Detect embedded deps.json in .NET binaries (#4375) chore(deps): bump actions/cache from 5.0.0 to 5.0.1 (#4476) chore(deps): bump actions/cache in /.github/actions/bootstrap (#4477) chore(deps): update tools to latest versions (#4473) unapply base path for resolver inbound requests (#4478) fix: golang PURL should include full module (#4395) fix:best effort to get the os info of an ELF binary (#4438) Improve PR template (#4472) feat: add support for Gemfile.next.lock (#4457) chore:cancel in-progress workflows for new commits on same PR (#4465) chore(deps): update tools to latest versions (#4466) chore(deps): bump github/codeql-action from 4.31.7 to 4.31.8 (#4468) chore(deps): bump actions/cache from 4.3.0 to 5.0.0 (#4469) chore(deps): bump github.com/anchore/stereoscope from 0.1.14 to 0.1.16 (#4470) chore(deps): bump actions/cache in /.github/actions/bootstrap (#4471) ... Signed-off-by: Christopher Phillips <[email protected]>

Signed-off-by: Christopher Phillips <[email protected]>

syft/source/ocimodelsource/metadata.go

Signed-off-by: Christopher Phillips <[email protected]>

syft/source/ocimodelsource/oci_model_source.go

Signed-off-by: Christopher Phillips <[email protected]>

syft/source/ocimodelsource/registry_client.go

syft/source/ocimodelsource/oci_model_source.go

syft/source/ocimodelsource/registry_client.go

wagoodman · 2025-12-23T15:50:55Z

syft/source/ocimodelsource/oci_model_source.go

+		li, err := fetchSingleGGUFHeader(ctx, client, artifact.Reference, layer, tempDir)
 		if err != nil {
-			return nil, fmt.Errorf("failed to create temp file: %w", err)
+			os.RemoveAll(tempDir)


should not be ignoring an error (we should at least log)

syft/source/ocimodelsource/oci_model_source.go

wagoodman · 2025-12-23T15:59:50Z

syft/source/ocimodelsource/oci_model_source.go

-	// Fetch GGUF layer headers via range-GET
-	tempFiles := make(map[string]string)
-	ggufLayers := make([]GGUFLayerInfo, 0, len(artifact.GGUFLayers))
+	id := deriveID(cfg.Reference, cfg.Alias, metadata.ManifestDigest)


we should have a test that ensures the ID generated for an artifact here vs an stereoscope image is the same

well we should really try to move the deriveIDFromStereoscopeImage function in the stereoscopesource package to source/internal package and share the invocation. We should be careful to not change existing behavior. Could be worth capturing an ID for each case we care about before making refactors, adding a test for that (if it does not already exist), then doing the refactor and ensure they dont change for stereoscope images.

syft/source/ocimodelsource/registry_client.go

syft/internal/fileresolver/container_image_oci_model.go

Signed-off-by: Christopher Phillips <[email protected]>

spiffcs added 14 commits November 12, 2025 23:56

feat: migrate gguf parser to separate PR from oci

6ceef5f

Signed-off-by: Christopher Phillips <[email protected]>

chore: lint-fix

f92b7d2

Signed-off-by: Christopher Phillips <[email protected]>

test: migrate gguf tests over

1ad4a27

Signed-off-by: Christopher Phillips <[email protected]>

chore: schema and test additions

bcd47d1

Signed-off-by: Christopher Phillips <[email protected]>

tests: account for epoch in dedupe test

b702952

Signed-off-by: Christopher Phillips <[email protected]>

test: fix local flake

08c0572

Signed-off-by: Christopher Phillips <[email protected]>

fix: first pass pr fixes

f664f9e

Signed-off-by: Christopher Phillips <[email protected]>

chore: refactor to use gguf-parser-go; 50mb limit

c689dcf

Signed-off-by: Christopher Phillips <[email protected]>

fix: update gguf data to be GGUFFileHeader

64dc451

Signed-off-by: Christopher Phillips <[email protected]>

chore: warn -> debug

38c0e6e

Signed-off-by: Christopher Phillips <[email protected]>

chore: pr feedback

9a2a45f

Signed-off-by: Christopher Phillips <[email protected]>

wip: wip

9b31c04

Signed-off-by: Christopher Phillips <[email protected]>

fix: pr comments

6daea43

Signed-off-by: Christopher Phillips <[email protected]>

chore: regenerate json schema

b18f7bb

Signed-off-by: Christopher Phillips <[email protected]>

spiffcs force-pushed the 4184-gguf-parser branch from 9c5279c to b18f7bb Compare November 13, 2025 05:03

spiffcs added 8 commits November 13, 2025 00:12

chore: ignore local agent files

cdb41b0

Signed-off-by: Christopher Phillips <[email protected]>

chore: pr comments

b80592f

Signed-off-by: Christopher Phillips <[email protected]>

fix: raise model version on package

56761ce

Signed-off-by: Christopher Phillips <[email protected]>

chore: remove test-binary

9609ce2

Signed-off-by: Christopher Phillips <[email protected]>

chore: schema and test additions

2976df5

Signed-off-by: Christopher Phillips <[email protected]>

chore: refactor to use gguf-parser-go; 50mb limit

7ed34c8

Signed-off-by: Christopher Phillips <[email protected]>

wip: wip no lrg file oci client

efcfecb

Signed-off-by: Christopher Phillips <[email protected]>

wip: wip

8031957

Signed-off-by: Christopher Phillips <[email protected]>

spiffcs force-pushed the 4184-pt2-oci-model-support branch from 5853129 to 8031957 Compare November 13, 2025 06:19

spiffcs added 3 commits November 13, 2025 01:44

fix: use OCI title annotation for virtual path in GGUF layer extraction

ec978f0

Signed-off-by: Christopher Phillips <[email protected]>

fix: update after rebase

1a85625

Signed-off-by: Christopher Phillips <[email protected]>

fix: add green fixes before pr fixes

bfe63bb

Signed-off-by: Christopher Phillips <[email protected]>

Base automatically changed from 4184-gguf-parser to main November 13, 2025 22:43

spiffcs added 2 commits December 19, 2025 00:18

chore: remove incorrect bump of schema

f5fd311

Signed-off-by: Christopher Phillips <[email protected]>

spiffcs added 10 commits December 22, 2025 21:24

chore: refactor resolver so cataloger can use FilesByMediaType

ea64192

Signed-off-by: Christopher Phillips <[email protected]>

chore: refactor source/provider so provider wraps source correctly

28dbf2f

Signed-off-by: Christopher Phillips <[email protected]>

chore: lint-fix

c1929fe

Signed-off-by: Christopher Phillips <[email protected]>

chore: update so ID is not affected by annotations

924c790

Signed-off-by: Christopher Phillips <[email protected]>

chore: do not export layerInfo

11e744d

Signed-off-by: Christopher Phillips <[email protected]>

chore: small refactor

2718e33

Signed-off-by: Christopher Phillips <[email protected]>

chore: update tests to have new method for file.Resolver

1bcd85c

Signed-off-by: Christopher Phillips <[email protected]>

chore: cut round trip requests in half

74fdc90

Signed-off-by: Christopher Phillips <[email protected]>

tests: add tests for provider/source layer

80ada3c

Signed-off-by: Christopher Phillips <[email protected]>

chore: gosec warnings

8e2ef24

Signed-off-by: Christopher Phillips <[email protected]>