Skip to content

Work Unit 003: libs/refs Module with OCI Backend #125

@jmgilman

Description

@jmgilman

Work Unit 003: libs/refs Module with OCI Backend

Status: Specification
Estimated Effort: 3-4 days
Dependencies: Work Unit 002 (CUE Schema for .sow-ref.yaml Manifest)


Behavioral Goal

As a ref system consumer (CLI commands, packaging, inspection, installation),
I need a libs/refs module providing OCI registry operations with a clean interface,
So that I can interact with OCI registries to list files, pull images, and push refs without understanding the underlying OCI protocol complexity, while having the flexibility to mock these operations in tests.

Success Criteria

  1. A libs/refs Go module exists with Client and Registry interfaces following the ports/adapters pattern
  2. OCI operations are abstracted behind interfaces, enabling unit testing with mocks
  3. The OCI client wrapper successfully authenticates via Docker credential chain (transparent to caller)
  4. URL detection correctly identifies OCI registry URLs (both explicit oci:// prefix and auto-detected known registries)
  5. Security constraints are enforced: max file size 100MB, max total size 1GB, 10k file limit
  6. Retry with exponential backoff is configured for transient failures
  7. Mock generation via go:generate produces usable test doubles
  8. Unit tests pass using mocked OCI client

Existing Code Context

Explanatory Context

The sow codebase has recently undergone significant architectural changes (PRs #119-#123) establishing a consistent ports and adapters pattern in libs/. This pattern separates interface definitions (ports) from implementations (adapters), enabling testability and flexibility.

The existing RefType interface in cli/internal/refs/types.go (lines 23-79) defines how ref types (git, file) are registered and used. The new OCI functionality will be implemented as an OCIType that implements this interface, but the core OCI client operations belong in a new libs/refs module to enable reuse and testing.

The libs/exec module demonstrates the interface pattern we'll follow: Executor interface defines operations, local.go provides the concrete implementation, and mocks/executor.go is auto-generated for testing. Similarly, libs/git shows the factory pattern (NewGitHubClient()) for environment-aware client creation.

The URL parsing logic in cli/internal/refs/url.go currently handles git+ and file:// schemes. OCI URLs need different detection:

  • Explicit: oci://ghcr.io/org/repo:tag (new scheme prefix)
  • Auto-detect: ghcr.io/org/repo:tag, docker.io/library/image, *.azurecr.io/path (known registry patterns)
  • Digest pinning: registry/path@sha256:abc123...
  • Version tags: registry/path:v1.0.0, registry/path:latest

The github.com/jmgilman/go/oci library provides OCI operations with estargz support. This library was selected per ADR-003 for its security features and selective extraction capabilities. The library provides the following key APIs:

// Core client creation
oci.New() (*Client, error)
oci.NewWithOptions(opts ...ClientOption) (*Client, error)

// Registry operations
client.Push(ctx, sourceDir, reference, opts...) error
client.Pull(ctx, reference, targetDir, opts...) error
client.PullWithCache(ctx, reference, targetDir, cacheDir, opts...) error
client.ListFiles(ctx, reference) (*ListFilesResult, error)
client.ListFilesWithFilter(ctx, reference, patterns...) (*ListFilesResult, error)

// Key functional options
oci.WithFilesToExtract(patterns...)  // Selective extraction via glob patterns
oci.WithAnnotations(map[string]string)  // OCI annotations for metadata
oci.WithFilesystem(fsys core.FS)  // Filesystem abstraction for testability

The library already uses github.com/jmgilman/go/fs/core for filesystem abstraction, enabling tests to use in-memory filesystems via github.com/jmgilman/go/fs/billy.

Key Files

File Lines Purpose
libs/exec/executor.go 1-47 Interface definition pattern with //go:generate for mocks
libs/exec/local.go full Concrete implementation pattern
libs/exec/mocks/executor.go full Generated mock pattern
libs/git/client.go 1-45 Interface with method documentation pattern
libs/git/factory.go 1-25 Factory pattern for environment-aware instantiation
libs/git/errors.go full Dedicated error types pattern
cli/internal/refs/types.go 23-79 RefType interface that OCIType will implement
cli/internal/refs/git.go 1-226 Reference implementation of a RefType
cli/internal/refs/url.go 1-199 URL parsing and type inference (needs extension)
cli/internal/refs/registry.go full Type registry pattern

Existing Documentation Context

ADR-003 (Decision Rationale)

ADR-003 (.sow/knowledge/adrs/003-oci-refs-distribution.md) documents why OCI was chosen over alternatives (git status quo, custom HTTP, npm-style). Key technical decisions:

  • Use github.com/jmgilman/go/oci library for estargz support
  • Docker credential chain for transparent authentication
  • estargz format requirement (NOT standard tar.gz) for selective extraction
  • Standard OCI registries (ghcr.io, Docker Hub, Harbor) without custom server modifications

The implementation notes (lines 143-159) outline the integration approach: OCI client wrapper with estargz support, security constraints (file/total size limits), and registry recommendations.

Design Document (Implementation Details)

The OCI Refs Design Document (.sow/knowledge/designs/oci-refs/oci-refs-design.md) provides detailed specifications:

Component Breakdown (lines 233-270): Defines OCI Client Wrapper responsibilities:

  • Initialize OCI client with Docker credential chain
  • Push estargz images to registries
  • Pull images (full or selective) from registries
  • List files via estargz TOC without full download
  • Query image metadata (annotations) without download

Security Configuration (line 251): Configure github.com/jmgilman/go/oci with:

  • Max file size: 100MB
  • Max total size: 1GB
  • Max file count: 10,000
  • Retry with exponential backoff (library built-in)

URL Detection (lines 36-42 in task description, lines 351-358 in design): Support patterns:

  • Explicit oci:// prefix (recommended)
  • Known registry auto-detection: ghcr.io/, docker.io/, *.azurecr.io/
  • Digest pinning: @sha256:...
  • Version tags: :v1.0.0, :latest

Discovery Analysis (Context)

Section 6 of discovery analysis (.sow/project/discovery/analysis.md) confirms:

  • github.com/jmgilman/go/oci library is NOT currently in codebase (lines 201-215)
  • Need to add as dependency to CLI go.mod
  • Expected API: Push, Pull, ListFiles, ExtractSelective
  • Section 6.3 recommends either cli/internal/refs/ or new libs/oci/ module

Section 10.3 provides module structure decision:

  • Recommended: Start in libs/refs/ following recent patterns
  • Can extract to libs/oci/ later if broader reuse emerges

Detailed Requirements

Module Structure

Create libs/refs/ module with the following structure:

libs/refs/
├── doc.go              # Package documentation
├── client.go           # Client interface definition (port)
├── client_oci.go       # OCI implementation (adapter)
├── client_oci_test.go  # Unit tests with mocked OCI library
├── registry.go         # Registry interface for registry-specific operations
├── url.go              # OCI URL parsing and detection
├── url_test.go         # URL parsing tests
├── options.go          # Functional options for configuration
├── errors.go           # Dedicated error types
└── mocks/
    └── client.go       # Generated mocks

Interface Definitions

Client Interface (client.go):

//go:generate go run github.com/matryer/moq@latest -out mocks/client.go -pkg mocks . Client

// Client defines operations for interacting with OCI refs.
//
// This interface abstracts OCI registry operations, enabling:
//   - Unit testing with mocked implementations
//   - Future alternative backends if needed
//   - Consistent error handling across operations
type Client interface {
    // ListFiles returns file metadata from OCI image TOC without downloading content.
    // Uses estargz table-of-contents for efficient inspection.
    ListFiles(ctx context.Context, ref string) ([]FileEntry, error)

    // Pull downloads an OCI image and extracts to destination directory.
    // For full extraction, pass nil for globs.
    Pull(ctx context.Context, ref string, dest string, globs []string) error

    // Push packages a directory as estargz OCI image and pushes to registry.
    // The manifest is read from .sow-ref.yaml in srcDir.
    Push(ctx context.Context, srcDir string, ref string, opts ...PushOption) error

    // GetManifest retrieves .sow-ref.yaml content without full download.
    // Uses selective extraction to fetch only the manifest file.
    GetManifest(ctx context.Context, ref string) ([]byte, error)

    // GetDigest returns the digest of the image at ref.
    GetDigest(ctx context.Context, ref string) (string, error)
}

FileEntry Type:

// FileEntry represents a file in an OCI image.
type FileEntry struct {
    Path     string    // Relative path within image
    Size     int64     // File size in bytes
    Mode     os.FileMode
    ModTime  time.Time
    IsDir    bool
}

Registry Interface (optional, for registry-specific operations):

// Registry defines operations for querying OCI registries.
type Registry interface {
    // ListTags returns available tags for a repository.
    ListTags(ctx context.Context, repo string) ([]string, error)

    // ResolveTag resolves a tag to a digest.
    ResolveTag(ctx context.Context, ref string) (string, error)

    // CheckAuth verifies authentication is valid for registry.
    CheckAuth(ctx context.Context, registry string) error
}

URL Detection and Parsing

Create url.go with OCI-specific URL handling:

// IsOCIRef determines if a URL refers to an OCI registry.
//
// Returns true for:
//   - oci://ghcr.io/org/repo:tag (explicit prefix)
//   - ghcr.io/org/repo:tag (known registry)
//   - docker.io/library/image:latest (Docker Hub)
//   - myregistry.azurecr.io/path:v1 (Azure CR)
//   - registry/path@sha256:abc123... (digest)
func IsOCIRef(rawURL string) bool

// ParseOCIRef parses an OCI reference string.
//
// Returns structured components: registry, repository, tag/digest.
type OCIRef struct {
    Registry   string  // e.g., "ghcr.io"
    Repository string  // e.g., "org/repo"
    Tag        string  // e.g., "v1.0.0" (empty if digest)
    Digest     string  // e.g., "sha256:abc123..." (empty if tag)
}

func ParseOCIRef(rawURL string) (*OCIRef, error)

// NormalizeOCIRef normalizes an OCI reference to canonical form.
// Strips oci:// prefix, normalizes docker.io references.
func NormalizeOCIRef(rawURL string) (string, error)

Known Registry Detection:

var knownOCIRegistries = []string{
    "ghcr.io",
    "docker.io",
    "registry.hub.docker.com",
    "index.docker.io",
    "*.azurecr.io",
    "*.gcr.io",
    "*.amazonaws.com", // ECR
    "quay.io",
}

OCI Client Implementation

Factory Function (client_oci.go):

// NewClient creates an OCI client with Docker credential chain.
//
// The client is configured with security limits:
//   - Max file size: 100MB
//   - Max total size: 1GB
//   - Max file count: 10,000
//   - Retry with exponential backoff
func NewClient(opts ...ClientOption) (Client, error)

Security Configuration:

const (
    DefaultMaxFileSize  = 100 * 1024 * 1024     // 100MB
    DefaultMaxTotalSize = 1024 * 1024 * 1024    // 1GB
    DefaultMaxFileCount = 10000
)

Functional Options

// ClientOption configures the OCI client.
type ClientOption func(*clientOptions)

// WithMaxFileSize sets the maximum size for a single file.
func WithMaxFileSize(size int64) ClientOption

// WithMaxTotalSize sets the maximum total extraction size.
func WithMaxTotalSize(size int64) ClientOption

// WithMaxFileCount sets the maximum number of files.
func WithMaxFileCount(count int) ClientOption

// WithInsecure allows insecure (HTTP) registry connections.
func WithInsecure(insecure bool) ClientOption

// PushOption configures a push operation.
type PushOption func(*pushOptions)

// WithExclusions sets glob patterns to exclude from packaging.
func WithExclusions(patterns []string) PushOption

// WithAnnotations sets additional OCI annotations.
func WithAnnotations(annotations map[string]string) PushOption

Error Types

// ErrNotOCIRef indicates the URL is not an OCI reference.
var ErrNotOCIRef = errors.New("not an OCI reference")

// ErrManifestNotFound indicates .sow-ref.yaml is missing.
var ErrManifestNotFound = errors.New("manifest .sow-ref.yaml not found")

// ErrFileTooLarge indicates a file exceeds size limit.
type ErrFileTooLarge struct {
    Path     string
    Size     int64
    MaxSize  int64
}

// ErrTotalSizeExceeded indicates extraction exceeds total size limit.
type ErrTotalSizeExceeded struct {
    TotalSize int64
    MaxSize   int64
}

// ErrAuthFailed indicates registry authentication failed.
type ErrAuthFailed struct {
    Registry string
    Err      error
}

// ErrRegistryNotFound indicates the registry is unreachable.
type ErrRegistryNotFound struct {
    Registry string
    Err      error
}

go.mod Update

Add to cli/go.mod:

github.com/jmgilman/go/oci v0.x.x  // Use latest stable version

Testing Requirements

Unit Tests

  1. URL Detection Tests (url_test.go):

    • oci://ghcr.io/org/repo:tag → OCI type detected
    • ghcr.io/org/repo:tag → OCI type detected (known registry)
    • docker.io/library/nginx:latest → OCI type detected
    • myregistry.azurecr.io/path:v1 → OCI type detected (wildcard match)
    • registry/path@sha256:abc... → OCI type detected (digest)
    • git+https://github.com/org/repo → NOT OCI type
    • file:///path/to/dir → NOT OCI type
    • github.com/org/repo (no tag) → NOT OCI type (ambiguous)
  2. URL Parsing Tests (url_test.go):

    • Parse ghcr.io/org/repo:v1.0.0 → registry="ghcr.io", repo="org/repo", tag="v1.0.0"
    • Parse docker.io/library/nginx:latest → normalize to canonical form
    • Parse registry/repo@sha256:abc123... → extract digest correctly
    • Parse oci://ghcr.io/org/repo:tag → strip oci:// prefix
  3. Client Mock Tests (client_oci_test.go):

    • ListFiles returns expected entries
    • Pull extracts to correct destination
    • Pull with globs extracts only matching files
    • Push packages directory correctly
    • GetManifest retrieves only manifest file
    • GetDigest returns correct digest format
  4. Error Handling Tests:

    • ErrFileTooLarge triggered at 100MB limit
    • ErrTotalSizeExceeded triggered at 1GB limit
    • ErrManifestNotFound when .sow-ref.yaml missing
    • ErrAuthFailed with clear message

Integration Tests (Future, in consuming work units)

The actual OCI registry integration will be tested in Work Units 004-006 using a test registry (Docker local registry or mock registry).


Implementation Notes

Dependency on Work Unit 002

This work unit depends on Work Unit 002 (CUE Schema) for:

  • libs/schemas/ref_manifest.cue schema definition
  • Generated Go types (RefManifest, etc.)
  • Validation function for .sow-ref.yaml

The Push operation needs schema validation before packaging. However, the interface definition and URL parsing can proceed independently.

Integration with RefType System

After libs/refs is complete, Work Unit 007 (CLI Integration) will:

  1. Implement OCIType in cli/internal/refs/oci.go
  2. Register with refs.Register(&OCIType{})
  3. The OCIType will delegate to libs/refs.Client

This separation enables:

  • Clean testing of OCI operations independent of CLI
  • Potential reuse in other tools (marketplace, etc.)
  • Consistent architecture with other libs/ modules

Docker Credential Chain

The github.com/jmgilman/go/oci library handles Docker credential chain automatically. Users logged in via docker login or with ~/.docker/config.json credentials will authenticate transparently. No explicit credential handling needed in our code.

estargz Format

The OCI library produces estargz-format images automatically. This is critical for:

  • ListFiles operation (reads TOC without downloading content)
  • Selective extraction via glob patterns
  • The format is NOT optional - standard tar.gz won't work

Out of Scope

  • Packaging logic: Handled in Work Unit 004 (uses this module's Push)
  • Inspection commands: Handled in Work Unit 005 (uses this module's ListFiles, GetManifest)
  • Installation logic: Handled in Work Unit 006 (uses this module's Pull)
  • CLI commands: Handled in Work Unit 007 (wires everything together)
  • RefType implementation: Work Unit 007 implements OCIType using this module
  • Index schema updates: Work Unit 007 updates index to include OCI-specific fields

Implementation Standards

All code produced in this work unit MUST adhere to the following standards:

Code Quality Standards

  • STYLE.md Compliance: All Go code must follow the conventions documented in .standards/STYLE.md
  • TESTING.md Compliance: All tests must follow the patterns documented in .standards/TESTING.md
  • golangci-lint: Code must pass golangci-lint run with zero errors before completion

Required Dependencies

  • OCI Operations: Use github.com/jmgilman/go/oci for all OCI registry operations
    • Client creation: oci.New() or oci.NewWithOptions()
    • Push: client.Push(ctx, sourceDir, reference, opts...)
    • Pull: client.Pull(ctx, reference, targetDir, opts...)
    • List files: client.ListFiles(ctx, reference) (TOC-only, bandwidth efficient)
    • Selective extraction: oci.WithFilesToExtract(patterns...)
  • Filesystem Abstractions: Use github.com/jmgilman/go/fs/core and github.com/jmgilman/go/fs/billy for all file system operations
    • Pass oci.WithFilesystem(fsys) to enable testability
    • Use billy.NewMemoryFS() in unit tests
    • Use billy.NewLocalFS() for production

Verification Checklist

Before marking this work unit complete, verify:

  • golangci-lint run ./libs/refs/... passes with zero errors
  • All code follows STYLE.md conventions (functional options, error wrapping, etc.)
  • All tests follow TESTING.md patterns (table-driven tests, test helpers, etc.)
  • Unit tests use memory filesystem via billy.NewMemoryFS() where applicable

Acceptance Criteria

  • libs/refs/ Go module exists with go.mod
  • Client interface is defined with ListFiles, Pull, Push, GetManifest, GetDigest
  • //go:generate directive produces mocks/client.go
  • IsOCIRef() correctly identifies OCI URLs (explicit and auto-detect)
  • ParseOCIRef() extracts registry, repository, tag/digest components
  • NewClient() factory creates client with Docker credential chain
  • Security limits are configurable: max file size, max total size, max file count
  • Functional options pattern used for configuration
  • Dedicated error types provide actionable error messages
  • Unit tests pass for URL detection (OCI vs non-OCI)
  • Unit tests pass for URL parsing (various formats)
  • Unit tests pass with mocked OCI client
  • github.com/jmgilman/go/oci added to cli/go.mod

Metadata

Metadata

Assignees

No one assigned

    Labels

    sowIssues managed by sow breakdown workflow

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions