feat: 4184 ai oci model support #4335

spiffcs · 2025-11-05T16:50:32Z

Description

This PR follows up on #4279 by adding support for a new docker source ocimodelsource (naming pending 😄)

With this change users can do the following: syft -o json docker.io/ai/qwen3-vl | jq .:

They'll get an SBOM with a single package showing the gguf model and details for the model pulled from https://hub.docker.com/u/ai

Example of metadata extracted

 "metadata": {
        "modelFormat": "gguf",
        "modelName": "Qwen3-Vl-8B-Instruct",
        "modelVersion": "unknown",
        "hash": "321c13d3e93151b5",
        "license": "apache-2.0",
        "ggufVersion": 3,
        "architecture": "qwen3vl",
        "quantization": "Q4_K_M",
        "parameters": 8190735360,
        "tensorCount": 399,
        "header": {
          "general.base_model.0.name": "Qwen3 VL 8B Instruct",
          "general.base_model.0.organization": "Qwen",
          "general.base_model.0.repo_url": "https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct",
          "general.base_model.count": 1,
          "general.basename": "Qwen3-Vl-8B-Instruct",
          "general.file_type": 15,
          "general.finetune": "Instruct",
          "general.quantization_version": 2,
          "general.quantized_by": "Unsloth",
          "general.repo_url": "https://huggingface.co/unsloth",
          "general.size_label": "8B",
          "general.tags": {
            "type": 8,
            "len": 2,
            "startOffset": 741,
            "size": 41
          },
          "general.type": "model",
          "quantize.imatrix.chunks_count": 694,
          "quantize.imatrix.dataset": "unsloth_calibration_Qwen3-VL-8B-Instruct.txt",
          "quantize.imatrix.entries_count": 252,
          "quantize.imatrix.file": "Qwen3-VL-8B-Instruct-GGUF/imatrix_unsloth.gguf",
          "qwen3vl.attention.head_count": 32,
          "qwen3vl.attention.head_count_kv": 8,
          "qwen3vl.attention.key_length": 128,
          "qwen3vl.attention.layer_norm_rms_epsilon": 0.000001,
          "qwen3vl.attention.value_length": 128,
          "qwen3vl.block_count": 36,
          "qwen3vl.context_length": 262144,
          "qwen3vl.embedding_length": 4096,
          "qwen3vl.feed_forward_length": 12288,
          "qwen3vl.n_deepstack_layers": 3,
          "qwen3vl.rope.dimension_sections": {
            "type": 5,
            "len": 4,
            "startOffset": 1268,
            "size": 16
          },
          "qwen3vl.rope.freq_base": 5000000,
         "tokenizer.ggml.add_bos_token": false,
          "tokenizer.ggml.bos_token_id": 151643,
          "tokenizer.ggml.eos_token_id": 151645,
          "tokenizer.ggml.merges": {
            "type": 8,
            "len": 151387,
            "startOffset": 3197544,
            "size": 2731548
          },

A larger google doc is being put together to go over the choices made in this PR and changes we need to make so that pt1/pt2 are working together as intended

Type of change

New feature (non-breaking change which adds functionality)

Checklist:

I have tested my code in common scenarios and confirmed there are no regressions
I have added comments to my code, particularly in hard-to-understand sections

Signed-off-by: Christopher Phillips <[email protected]>

* main: (76 commits) feat: snap can be queried by revision and ```track/risk/branch``` (#4439) fix: 4423 dotnet-deps cataloger skips project type by def signpost to docs site (#4483) chore(deps): bump github/codeql-action from 4.31.8 to 4.31.9 (#4481) chore(deps): bump github.com/goccy/go-yaml from 1.19.0 to 1.19.1 (#4482) Detect embedded deps.json in .NET binaries (#4375) chore(deps): bump actions/cache from 5.0.0 to 5.0.1 (#4476) chore(deps): bump actions/cache in /.github/actions/bootstrap (#4477) chore(deps): update tools to latest versions (#4473) unapply base path for resolver inbound requests (#4478) fix: golang PURL should include full module (#4395) fix:best effort to get the os info of an ELF binary (#4438) Improve PR template (#4472) feat: add support for Gemfile.next.lock (#4457) chore:cancel in-progress workflows for new commits on same PR (#4465) chore(deps): update tools to latest versions (#4466) chore(deps): bump github/codeql-action from 4.31.7 to 4.31.8 (#4468) chore(deps): bump actions/cache from 4.3.0 to 5.0.0 (#4469) chore(deps): bump github.com/anchore/stereoscope from 0.1.14 to 0.1.16 (#4470) chore(deps): bump actions/cache in /.github/actions/bootstrap (#4471) ... Signed-off-by: Christopher Phillips <[email protected]>

Signed-off-by: Christopher Phillips <[email protected]>

wagoodman · 2025-12-19T16:10:32Z

syft/source/ocimodelsource/metadata.go

+type OCIModelMetadata struct {
+	// Core OCI artifact metadata (mirrors ImageMetadata)
+	UserInput      string                 `json:"userInput"`
+	ID             string                 `json:"artifactID"`
+	ManifestDigest string                 `json:"manifestDigest"`
+	MediaType      string                 `json:"mediaType"`
+	Tags           []string               `json:"tags"`
+	Size           int64                  `json:"artifactSize"`
+	Layers         []source.LayerMetadata `json:"layers"`
+	RawManifest    []byte                 `json:"manifest"`
+	RawConfig      []byte                 `json:"config"`
+	RepoDigests    []string               `json:"repoDigests"`
+	Architecture   string                 `json:"architecture"`
+	Variant        string                 `json:"architectureVariant,omitempty"`
+	OS             string                 `json:"os"`
+	Labels         map[string]string      `json:"labels,omitempty"`
+
+	// OCI-specific metadata
+	Annotations map[string]string `json:"annotations,omitempty"`
+
+	// Model-specific metadata
+	ModelFormat string          `json:"modelFormat,omitempty"` // e.g., "gguf"
+	GGUFLayers  []GGUFLayerInfo `json:"ggufLayers,omitempty"`
+}


Suggested change

type OCIModelMetadata struct {

// Core OCI artifact metadata (mirrors ImageMetadata)

UserInput string `json:"userInput"`

ID string `json:"artifactID"`

ManifestDigest string `json:"manifestDigest"`

MediaType string `json:"mediaType"`

Tags []string `json:"tags"`

Size int64 `json:"artifactSize"`

Layers []source.LayerMetadata `json:"layers"`

RawManifest []byte `json:"manifest"`

RawConfig []byte `json:"config"`

RepoDigests []string `json:"repoDigests"`

Architecture string `json:"architecture"`

Variant string `json:"architectureVariant,omitempty"`

OS string `json:"os"`

Labels map[string]string `json:"labels,omitempty"`

// OCI-specific metadata

Annotations map[string]string `json:"annotations,omitempty"`

// Model-specific metadata

ModelFormat string `json:"modelFormat,omitempty"` // e.g., "gguf"

GGUFLayers []GGUFLayerInfo `json:"ggufLayers,omitempty"`

}

type OCIModelMetadata{

ImageMetadata

Annotations map[string]string

}

or

Suggested change

type OCIModelMetadata struct {

// Core OCI artifact metadata (mirrors ImageMetadata)

UserInput string `json:"userInput"`

ID string `json:"artifactID"`

ManifestDigest string `json:"manifestDigest"`

MediaType string `json:"mediaType"`

Tags []string `json:"tags"`

Size int64 `json:"artifactSize"`

Layers []source.LayerMetadata `json:"layers"`

RawManifest []byte `json:"manifest"`

RawConfig []byte `json:"config"`

RepoDigests []string `json:"repoDigests"`

Architecture string `json:"architecture"`

Variant string `json:"architectureVariant,omitempty"`

OS string `json:"os"`

Labels map[string]string `json:"labels,omitempty"`

// OCI-specific metadata

Annotations map[string]string `json:"annotations,omitempty"`

// Model-specific metadata

ModelFormat string `json:"modelFormat,omitempty"` // e.g., "gguf"

GGUFLayers []GGUFLayerInfo `json:"ggufLayers,omitempty"`

}

type OCIModelMetadata ImageMetadata

I'm not saying this must be done, but given that the gguf/model format properties are redundant, then it is essentially the same. The annotations are new, however, still relevant to OCI images.

we should probably have id:- to ensure this doesn't affect the ID of existing sources. Probably worth a test to ensure this isn't changing

wagoodman

we should take a pass to unexport whatever doesn't need to be in the public API

wagoodman · 2025-12-19T16:29:59Z

syft/pkg/cataloger/ai/cataloger.go

+// catalogFromOCILayers discovers GGUF models by querying OCI layers by media type.
+func (c *ggufCataloger) catalogFromOCILayers(ctx context.Context, resolver ocimodelsource.OCIResolver) ([]pkg.Package, []artifact.Relationship, error) {
+	// Find all GGUF layers by media type
+	digests, err := resolver.LayerDigestsByMediaType(ocimodelsource.GGUFLayerMediaType)


we shouldn't import from a package meant for a specific implementation of a source

wagoodman · 2025-12-19T16:31:25Z

syft/source/ocimodelsource/registry_client.go

+
+	// Handle TLS settings
+	if registryOpts.InsecureSkipTLSVerify {
+		transport := remote.DefaultTransport.(*http.Transport).Clone()


beware of possible panic here

wagoodman · 2025-12-19T16:38:22Z

syft/source/ocimodelsource/registry_client.go

+
+	// Parse manifest
+	manifest := &v1.Manifest{}
+	if err := json.Unmarshal(desc.Manifest, manifest); err != nil {


this could be a v1.Index maybe? worth checking out https://github.com/google/go-containerregistry/blob/main/pkg/v1/remote/puller.go#L93 and if its connected or not to https://github.com/google/go-containerregistry/blob/main/pkg/v1/remote/index.go and https://github.com/google/go-containerregistry/blob/main/pkg/v1/remote/descriptor.go#L44-L45. We should probably check if this is a manifest or index and return a helpful error if its an index.

wagoodman · 2025-12-19T16:41:45Z

syft/source/ocimodelsource/registry_client.go

+	defer reader.Close()
+
+	// Read up to maxBytes
+	data := make([]byte, maxBytes)


a comment that we're capped at 10 MB, not reading a full GGUF file into memory

wagoodman · 2025-12-19T16:44:14Z

syft/source/ocimodelsource/oci_model_source_provider.go

+	isModel, err := client.IsModelArtifactReference(ctx, p.reference)
+	if err != nil {
+		// Log the error but don't fail - let other providers try
+		log.WithFields("reference", p.reference, "error", err).Debug("failed to check if reference is a model artifact")
+		return nil, fmt.Errorf("not an OCI model artifact: %w", err)
+	}
+
+	if !isModel {
+		log.WithFields("reference", p.reference).Debug("reference is not a model artifact")
+		return nil, fmt.Errorf("not an OCI model artifact")
+	}
+
+	log.WithFields("reference", p.reference).Info("detected OCI model artifact, fetching headers")
+
+	// Fetch the full model artifact with metadata
+	artifact, err := client.FetchModelArtifact(ctx, p.reference)
+	if err != nil {


client.IsModelArtifactReference shuold probably return the remote.Descriptor so we don't call remote.Get again when you are using client.FetchModelArtifact.

wagoodman · 2025-12-19T16:47:51Z

syft/source/sourceproviders/source_providers.go

 		// 3. try remote sources after everything else...

+		// --from oci-model (model artifacts with header-only fetching)
+		Join(tagProvider(ocimodelsource.NewSourceProvider(userInput, cfg.RegistryOptions, cfg.Alias), OCIModelTag)).


This is tech debt for later: we're starting to do some work across independent providers that is wasted work and could consider some caching or another solution to share work across providers (e.g. this calls remote.Get as well as the registry provider... and this is called before the registry/docker cases (our most common case).

We should look into options where we can put this line below the pull tag approach.

Say for instance, we start updating the providers down in stereoscope to ignore the image entirely when there is a gguf mediatype layer found ( https://github.com/anchore/stereoscope/blob/8c8ea254b2613dae35f475343354c15ca04e16b0/pkg/image/containerd/daemon_provider.go#L230 and https://github.com/anchore/stereoscope/blob/8c8ea254b2613dae35f475343354c15ca04e16b0/pkg/image/docker/daemon_provider.go#L272-L290 and https://github.com/anchore/stereoscope/blob/8c8ea254b2613dae35f475343354c15ca04e16b0/pkg/image/oci/registry_provider.go#L73-L78 ... etc)

wagoodman · 2025-12-19T17:09:10Z

syft/source/ocimodelsource/oci_model_source.go

+}
+
+// NewFromArtifact creates a new OCI model source from a fetched model artifact.
+func NewFromArtifact(artifact *ModelArtifact, client *RegistryClient, alias source.Alias) (source.Source, error) {


takes as input a type that is really an internal concern for this package. This should really pass in a description of what the user is asking (declarative) and let the constructor do the remote checks / work to figure out if it's valid. I think much of the work that is in the provider should really be in the constructor here, and the constructor should just be New(Config)

wagoodman · 2025-12-19T17:12:30Z

syft/source/ocimodelsource/resolver.go

+	locations []file.Location
+
+	// OCI layer-aware fields
+	client   *RegistryClient // registry client for fetching layers


client should not be here (that's what the temp location is for)

wagoodman · 2025-12-19T17:16:36Z

syft/source/ocimodelsource/resolver.go

there are some implementations for file.Resolver that should really simply return nil, nil (but they currently have other behavior)

Signed-off-by: Christopher Phillips <[email protected]>

spiffcs · 2025-12-19T17:46:26Z

First big change to make here is how the Provider and Source interact. Provider should be thin and call Source, Source should do most of the work, provider should return the source.Source answer.

Signed-off-by: Christopher Phillips <[email protected]>

spiffcs added 14 commits November 12, 2025 23:56

feat: migrate gguf parser to separate PR from oci

6ceef5f

Signed-off-by: Christopher Phillips <[email protected]>

chore: lint-fix

f92b7d2

Signed-off-by: Christopher Phillips <[email protected]>

test: migrate gguf tests over

1ad4a27

Signed-off-by: Christopher Phillips <[email protected]>

chore: schema and test additions

bcd47d1

Signed-off-by: Christopher Phillips <[email protected]>

tests: account for epoch in dedupe test

b702952

Signed-off-by: Christopher Phillips <[email protected]>

test: fix local flake

08c0572

Signed-off-by: Christopher Phillips <[email protected]>

fix: first pass pr fixes

f664f9e

Signed-off-by: Christopher Phillips <[email protected]>

chore: refactor to use gguf-parser-go; 50mb limit

c689dcf

Signed-off-by: Christopher Phillips <[email protected]>

fix: update gguf data to be GGUFFileHeader

64dc451

Signed-off-by: Christopher Phillips <[email protected]>

chore: warn -> debug

38c0e6e

Signed-off-by: Christopher Phillips <[email protected]>

chore: pr feedback

9a2a45f

Signed-off-by: Christopher Phillips <[email protected]>

wip: wip

9b31c04

Signed-off-by: Christopher Phillips <[email protected]>

fix: pr comments

6daea43

Signed-off-by: Christopher Phillips <[email protected]>

chore: regenerate json schema

b18f7bb

Signed-off-by: Christopher Phillips <[email protected]>

spiffcs force-pushed the 4184-gguf-parser branch from 9c5279c to b18f7bb Compare November 13, 2025 05:03

spiffcs added 8 commits November 13, 2025 00:12

chore: ignore local agent files

cdb41b0

Signed-off-by: Christopher Phillips <[email protected]>

chore: pr comments

b80592f

Signed-off-by: Christopher Phillips <[email protected]>

fix: raise model version on package

56761ce

Signed-off-by: Christopher Phillips <[email protected]>

chore: remove test-binary

9609ce2

Signed-off-by: Christopher Phillips <[email protected]>

chore: schema and test additions

2976df5

Signed-off-by: Christopher Phillips <[email protected]>

chore: refactor to use gguf-parser-go; 50mb limit

7ed34c8

Signed-off-by: Christopher Phillips <[email protected]>

wip: wip no lrg file oci client

efcfecb

Signed-off-by: Christopher Phillips <[email protected]>

wip: wip

8031957

Signed-off-by: Christopher Phillips <[email protected]>

spiffcs force-pushed the 4184-pt2-oci-model-support branch from 5853129 to 8031957 Compare November 13, 2025 06:19

spiffcs added 3 commits November 13, 2025 01:44

fix: use OCI title annotation for virtual path in GGUF layer extraction

ec978f0

Signed-off-by: Christopher Phillips <[email protected]>

fix: update after rebase

1a85625

Signed-off-by: Christopher Phillips <[email protected]>

fix: add green fixes before pr fixes

bfe63bb

Signed-off-by: Christopher Phillips <[email protected]>

Base automatically changed from 4184-gguf-parser to main November 13, 2025 22:43

spiffcs added 2 commits December 19, 2025 00:18

chore: remove incorrect bump of schema

f5fd311

Signed-off-by: Christopher Phillips <[email protected]>

spiffcs added 3 commits December 19, 2025 00:20

chore: remove old file from previous PR review

ea50011

Signed-off-by: Christopher Phillips <[email protected]>

chore: add oci model resolver back into correct PR

2da3718

Signed-off-by: Christopher Phillips <[email protected]>

cleanup: cleanup

dbe0716

Signed-off-by: Christopher Phillips <[email protected]>

spiffcs changed the title ~~wip: 4184 pt2 oci model support~~ feat: 4184 ai oci model support Dec 19, 2025

wagoodman reviewed Dec 19, 2025

View reviewed changes

chore: remove stale implementation; root location

39a48d9

Signed-off-by: Christopher Phillips <[email protected]>

chore: small pr change with media type

a4ef861

Signed-off-by: Christopher Phillips <[email protected]>

spiffcs force-pushed the 4184-pt2-oci-model-support branch from 4fc836b to a4ef861 Compare December 19, 2025 18:06

spiffcs added 2 commits December 19, 2025 13:36

pr: remove stale debug statements

04365f3

Signed-off-by: Christopher Phillips <[email protected]>

wip: wip

9e83ddc

Signed-off-by: Christopher Phillips <[email protected]>

feat: 4184 ai oci model support #4335

Are you sure you want to change the base?

feat: 4184 ai oci model support #4335

Conversation

spiffcs commented Nov 5, 2025

Description

Example of metadata extracted

Type of change

Checklist:

Uh oh!

wagoodman Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wagoodman Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wagoodman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wagoodman Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spiffcs commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wagoodman Dec 19, 2025 •

edited

Loading

wagoodman Dec 19, 2025 •

edited

Loading

wagoodman Dec 19, 2025 •

edited

Loading