-
Notifications
You must be signed in to change notification settings - Fork 748
feat: 4184 ai oci model support #4335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
9c5279c to
b18f7bb
Compare
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
5853129 to
8031957
Compare
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
* main: (76 commits) feat: snap can be queried by revision and ```track/risk/branch``` (#4439) fix: 4423 dotnet-deps cataloger skips project type by def signpost to docs site (#4483) chore(deps): bump github/codeql-action from 4.31.8 to 4.31.9 (#4481) chore(deps): bump github.com/goccy/go-yaml from 1.19.0 to 1.19.1 (#4482) Detect embedded deps.json in .NET binaries (#4375) chore(deps): bump actions/cache from 5.0.0 to 5.0.1 (#4476) chore(deps): bump actions/cache in /.github/actions/bootstrap (#4477) chore(deps): update tools to latest versions (#4473) unapply base path for resolver inbound requests (#4478) fix: golang PURL should include full module (#4395) fix:best effort to get the os info of an ELF binary (#4438) Improve PR template (#4472) feat: add support for Gemfile.next.lock (#4457) chore:cancel in-progress workflows for new commits on same PR (#4465) chore(deps): update tools to latest versions (#4466) chore(deps): bump github/codeql-action from 4.31.7 to 4.31.8 (#4468) chore(deps): bump actions/cache from 4.3.0 to 5.0.0 (#4469) chore(deps): bump github.com/anchore/stereoscope from 0.1.14 to 0.1.16 (#4470) chore(deps): bump actions/cache in /.github/actions/bootstrap (#4471) ... Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
| type OCIModelMetadata struct { | ||
| // Core OCI artifact metadata (mirrors ImageMetadata) | ||
| UserInput string `json:"userInput"` | ||
| ID string `json:"artifactID"` | ||
| ManifestDigest string `json:"manifestDigest"` | ||
| MediaType string `json:"mediaType"` | ||
| Tags []string `json:"tags"` | ||
| Size int64 `json:"artifactSize"` | ||
| Layers []source.LayerMetadata `json:"layers"` | ||
| RawManifest []byte `json:"manifest"` | ||
| RawConfig []byte `json:"config"` | ||
| RepoDigests []string `json:"repoDigests"` | ||
| Architecture string `json:"architecture"` | ||
| Variant string `json:"architectureVariant,omitempty"` | ||
| OS string `json:"os"` | ||
| Labels map[string]string `json:"labels,omitempty"` | ||
|
|
||
| // OCI-specific metadata | ||
| Annotations map[string]string `json:"annotations,omitempty"` | ||
|
|
||
| // Model-specific metadata | ||
| ModelFormat string `json:"modelFormat,omitempty"` // e.g., "gguf" | ||
| GGUFLayers []GGUFLayerInfo `json:"ggufLayers,omitempty"` | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| type OCIModelMetadata struct { | |
| // Core OCI artifact metadata (mirrors ImageMetadata) | |
| UserInput string `json:"userInput"` | |
| ID string `json:"artifactID"` | |
| ManifestDigest string `json:"manifestDigest"` | |
| MediaType string `json:"mediaType"` | |
| Tags []string `json:"tags"` | |
| Size int64 `json:"artifactSize"` | |
| Layers []source.LayerMetadata `json:"layers"` | |
| RawManifest []byte `json:"manifest"` | |
| RawConfig []byte `json:"config"` | |
| RepoDigests []string `json:"repoDigests"` | |
| Architecture string `json:"architecture"` | |
| Variant string `json:"architectureVariant,omitempty"` | |
| OS string `json:"os"` | |
| Labels map[string]string `json:"labels,omitempty"` | |
| // OCI-specific metadata | |
| Annotations map[string]string `json:"annotations,omitempty"` | |
| // Model-specific metadata | |
| ModelFormat string `json:"modelFormat,omitempty"` // e.g., "gguf" | |
| GGUFLayers []GGUFLayerInfo `json:"ggufLayers,omitempty"` | |
| } | |
| type OCIModelMetadata{ | |
| ImageMetadata | |
| Annotations map[string]string | |
| } |
or
| type OCIModelMetadata struct { | |
| // Core OCI artifact metadata (mirrors ImageMetadata) | |
| UserInput string `json:"userInput"` | |
| ID string `json:"artifactID"` | |
| ManifestDigest string `json:"manifestDigest"` | |
| MediaType string `json:"mediaType"` | |
| Tags []string `json:"tags"` | |
| Size int64 `json:"artifactSize"` | |
| Layers []source.LayerMetadata `json:"layers"` | |
| RawManifest []byte `json:"manifest"` | |
| RawConfig []byte `json:"config"` | |
| RepoDigests []string `json:"repoDigests"` | |
| Architecture string `json:"architecture"` | |
| Variant string `json:"architectureVariant,omitempty"` | |
| OS string `json:"os"` | |
| Labels map[string]string `json:"labels,omitempty"` | |
| // OCI-specific metadata | |
| Annotations map[string]string `json:"annotations,omitempty"` | |
| // Model-specific metadata | |
| ModelFormat string `json:"modelFormat,omitempty"` // e.g., "gguf" | |
| GGUFLayers []GGUFLayerInfo `json:"ggufLayers,omitempty"` | |
| } | |
| type OCIModelMetadata ImageMetadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not saying this must be done, but given that the gguf/model format properties are redundant, then it is essentially the same. The annotations are new, however, still relevant to OCI images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably have id:- to ensure this doesn't affect the ID of existing sources. Probably worth a test to ensure this isn't changing
wagoodman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should take a pass to unexport whatever doesn't need to be in the public API
syft/pkg/cataloger/ai/cataloger.go
Outdated
| // catalogFromOCILayers discovers GGUF models by querying OCI layers by media type. | ||
| func (c *ggufCataloger) catalogFromOCILayers(ctx context.Context, resolver ocimodelsource.OCIResolver) ([]pkg.Package, []artifact.Relationship, error) { | ||
| // Find all GGUF layers by media type | ||
| digests, err := resolver.LayerDigestsByMediaType(ocimodelsource.GGUFLayerMediaType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shouldn't import from a package meant for a specific implementation of a source
|
|
||
| // Handle TLS settings | ||
| if registryOpts.InsecureSkipTLSVerify { | ||
| transport := remote.DefaultTransport.(*http.Transport).Clone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beware of possible panic here
|
|
||
| // Parse manifest | ||
| manifest := &v1.Manifest{} | ||
| if err := json.Unmarshal(desc.Manifest, manifest); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be a v1.Index maybe? worth checking out https://github.com/google/go-containerregistry/blob/main/pkg/v1/remote/puller.go#L93 and if its connected or not to https://github.com/google/go-containerregistry/blob/main/pkg/v1/remote/index.go and https://github.com/google/go-containerregistry/blob/main/pkg/v1/remote/descriptor.go#L44-L45. We should probably check if this is a manifest or index and return a helpful error if its an index.
| defer reader.Close() | ||
|
|
||
| // Read up to maxBytes | ||
| data := make([]byte, maxBytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a comment that we're capped at 10 MB, not reading a full GGUF file into memory
| isModel, err := client.IsModelArtifactReference(ctx, p.reference) | ||
| if err != nil { | ||
| // Log the error but don't fail - let other providers try | ||
| log.WithFields("reference", p.reference, "error", err).Debug("failed to check if reference is a model artifact") | ||
| return nil, fmt.Errorf("not an OCI model artifact: %w", err) | ||
| } | ||
|
|
||
| if !isModel { | ||
| log.WithFields("reference", p.reference).Debug("reference is not a model artifact") | ||
| return nil, fmt.Errorf("not an OCI model artifact") | ||
| } | ||
|
|
||
| log.WithFields("reference", p.reference).Info("detected OCI model artifact, fetching headers") | ||
|
|
||
| // Fetch the full model artifact with metadata | ||
| artifact, err := client.FetchModelArtifact(ctx, p.reference) | ||
| if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
client.IsModelArtifactReference shuold probably return the remote.Descriptor so we don't call remote.Get again when you are using client.FetchModelArtifact.
| // 3. try remote sources after everything else... | ||
|
|
||
| // --from oci-model (model artifacts with header-only fetching) | ||
| Join(tagProvider(ocimodelsource.NewSourceProvider(userInput, cfg.RegistryOptions, cfg.Alias), OCIModelTag)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tech debt for later: we're starting to do some work across independent providers that is wasted work and could consider some caching or another solution to share work across providers (e.g. this calls remote.Get as well as the registry provider... and this is called before the registry/docker cases (our most common case).
We should look into options where we can put this line below the pull tag approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say for instance, we start updating the providers down in stereoscope to ignore the image entirely when there is a gguf mediatype layer found ( https://github.com/anchore/stereoscope/blob/8c8ea254b2613dae35f475343354c15ca04e16b0/pkg/image/containerd/daemon_provider.go#L230 and https://github.com/anchore/stereoscope/blob/8c8ea254b2613dae35f475343354c15ca04e16b0/pkg/image/docker/daemon_provider.go#L272-L290 and https://github.com/anchore/stereoscope/blob/8c8ea254b2613dae35f475343354c15ca04e16b0/pkg/image/oci/registry_provider.go#L73-L78 ... etc)
| } | ||
|
|
||
| // NewFromArtifact creates a new OCI model source from a fetched model artifact. | ||
| func NewFromArtifact(artifact *ModelArtifact, client *RegistryClient, alias source.Alias) (source.Source, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
takes as input a type that is really an internal concern for this package. This should really pass in a description of what the user is asking (declarative) and let the constructor do the remote checks / work to figure out if it's valid. I think much of the work that is in the provider should really be in the constructor here, and the constructor should just be New(Config)
| locations []file.Location | ||
|
|
||
| // OCI layer-aware fields | ||
| client *RegistryClient // registry client for fetching layers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
client should not be here (that's what the temp location is for)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are some implementations for file.Resolver that should really simply return nil, nil (but they currently have other behavior)
Signed-off-by: Christopher Phillips <[email protected]>
|
First big change to make here is how the Provider and Source interact. Provider should be thin and call Source, Source should do most of the work, provider should return the source.Source answer. |
Signed-off-by: Christopher Phillips <[email protected]>
4fc836b to
a4ef861
Compare
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Description
This PR follows up on #4279 by adding support for a new docker source
ocimodelsource(naming pending 😄)With this change users can do the following:
syft -o json docker.io/ai/qwen3-vl | jq .:They'll get an SBOM with a single package showing the gguf model and details for the model pulled from
https://hub.docker.com/u/aiExample of metadata extracted
A larger google doc is being put together to go over the choices made in this PR and changes we need to make so that pt1/pt2 are working together as intended
Type of change
Checklist: