Skip to content

Work Unit 006: Ref Installation in libs/refs #128

@jmgilman

Description

@jmgilman

Work Unit 006: Ref Installation in libs/refs

Status: Specification
Estimated Effort: 3-4 days
Dependencies: Work Unit 002 (CUE Schema), Work Unit 003 (libs/refs OCI Backend)


Behavioral Goal

As a ref consumer,
I need a reliable installation mechanism that downloads OCI refs (either fully or selectively) and manages a local cache,
So that I can install refs quickly, avoid redundant downloads through digest-based caching, and selectively extract only the files I need to save bandwidth and storage while always having access to the ref manifest.

Success Criteria

  1. Full OCI ref installation downloads entire image, extracts to cache, and returns path for symlinking
  2. Selective installation with glob patterns downloads only matching files plus .sow-ref.yaml
  3. Multiple --path glob patterns work with OR logic (files matching any pattern are extracted)
  4. Digest-based caching prevents redundant downloads (same digest = skip download, return existing path)
  5. Cache paths follow ~/.cache/sow/refs/oci/{id}-{short-digest}/ structure (short digest = 7 chars)
  6. Installation is atomic: extract to temp directory first, move to final location only on success
  7. XDG_CACHE_HOME is respected on Linux systems
  8. Performance: full install of 10MB ref < 15 seconds, selective install (10% of files) < 3 seconds
  9. Security constraints enforced: path traversal protection, size limits, permission sanitization
  10. Integration tests pass for both full and selective installation scenarios

Existing Code Context

Explanatory Context

The current refs system in cli/internal/refs/ follows a type-based architecture where each ref type (git, file) implements the RefType interface. The CacheManager in manager.go orchestrates caching by delegating to the appropriate RefType implementation, then creates workspace symlinks from .sow/refs/{link} to the cache directory.

The existing GitType implementation (git.go:77-105) demonstrates the caching pattern: it uses an external library (github.com/jmgilman/go/git/cache) for the actual cache operations while implementing the RefType interface. The new OCI installation will follow this pattern: libs/refs provides the core installation logic (this work unit), and a future OCIType adapter (Work Unit 007) will wrap it to implement RefType.

The cache structure for OCI refs differs from git refs. Git uses {cacheDir}/git/checkouts/{id}/ while OCI will use {cacheDir}/oci/{id}-{short-digest}/. The short digest (first 7 characters of SHA256) enables:

  • Human-readable cache paths
  • Multiple versions coexisting (different digests)
  • Quick visual identification during debugging

Work Unit 003 establishes the libs/refs module with the Client interface providing Pull() for extraction and GetDigest() for digest resolution. This work unit adds the Installer interface that orchestrates these operations with caching logic.

Key Files

File Lines Purpose
cli/internal/refs/types.go 23-79 RefType interface (install target)
cli/internal/refs/manager.go 51-91 CacheManager.Install() flow pattern
cli/internal/refs/git.go 77-105 GitType.Cache() implementation pattern
cli/internal/refs/git.go 59-75 Lazy cache initialization pattern
libs/schemas/refs_cache.cue 1-82 Cache index schema (for future expansion)
libs/exec/executor.go full Interface pattern to follow
libs/git/client.go full Interface design pattern

Existing Documentation Context

Design Document (Primary Reference)

The OCI Refs Design Document (.sow/knowledge/designs/oci-refs/oci-refs-design.md, lines 308-344) specifies the installation and cache management:

Ref Installer (lines 308-326):

  • Pull OCI image (full or selective based on --path flags)
  • Extract to cache directory: ~/.cache/sow/refs/{id}-{short-digest}/
  • Create symlink: .sow/refs/{link} -> cache directory
  • Trigger search indexing (deferred to CLI layer)

Key behaviors documented:

  • Atomic extraction: use temp directory, move to final location on success
  • Short digest: first 7 characters of SHA256 for human-readable paths
  • Selective extraction: pass multiple glob patterns, files matching any pattern extracted
  • Always extract .sow-ref.yaml: even with selective extraction (needed for metadata)

Cache Manager (lines 328-344):

  • Allocate cache directories with naming: {id}-{short-digest}
  • Check if ref already cached (digest-based)
  • Prune unused cache entries (deferred to CLI layer)
  • Calculate cache statistics (deferred to CLI layer)

Performance requirements (documented in arc42 section 4):

  • Full install of 10MB ref: < 15 seconds
  • Selective install (10% of files): < 3 seconds
  • Must use estargz for selective extraction efficiency

Arc42 Concepts (Cache Structure)

Section 4 of .sow/knowledge/designs/oci-refs/arc42-08-concepts-oci-refs.md documents digest-based caching:

~/.cache/sow/refs/
├── go-standards-abc1234/  # <id>-<short-digest>
└── api-patterns-def5678/

.sow/refs/
├── team-go -> ~/.cache/sow/refs/go-standards-abc1234
└── apis -> ~/.cache/sow/refs/api-patterns-def5678

Benefits: deduplication (same digest stored once), fast rollback, integrity verification.

Discovery Analysis (Implementation Context)

Section 3 of discovery analysis (.sow/project/discovery/analysis.md) confirms the cache structure is compatible with existing patterns. Each RefType manages its own subdirectory under ~/.cache/sow/refs/.

Section 8.3 highlights that CacheManager.Install() expects:

  1. Type inference from URL
  2. refType.Cache() returns cache path
  3. Symlink creation to workspace

For OCI, this flow works but needs to handle:

  • Selective extraction (multiple globs)
  • Always extract .sow-ref.yaml even with selective
  • Atomic extraction (temp dir -> move)

Detailed Requirements

Module Extension

Extend libs/refs/ module (created in Work Unit 003) with installation functionality:

libs/refs/
├── ... (existing from Work Unit 003)
├── installer.go         # Installer interface definition
├── installer_impl.go    # Implementation
├── installer_test.go    # Integration tests
├── cache.go             # Cache management utilities
└── mocks/
    └── installer.go     # Generated mock

Interface Definitions

Installer Interface (installer.go):

//go:generate go run github.com/matryer/moq@latest -out mocks/installer.go -pkg mocks . Installer

// Installer defines operations for installing OCI refs to local cache.
//
// This interface handles:
//   - Full and selective extraction from OCI registries
//   - Digest-based cache deduplication
//   - Atomic extraction via temp directories
//   - Always extracting .sow-ref.yaml for metadata access
type Installer interface {
    // Install downloads and extracts a full OCI ref to cache.
    //
    // Flow:
    //   1. Resolve ref to digest
    //   2. Check if already cached by digest
    //   3. If cached, return existing path
    //   4. If not cached, pull full image to temp dir
    //   5. Move temp dir to final cache path
    //   6. Parse and return manifest
    //
    // Returns InstallResult with cache path and metadata.
    Install(ctx context.Context, ref string, opts ...InstallOption) (*InstallResult, error)

    // InstallSelective downloads only files matching glob patterns.
    //
    // Flow:
    //   1. Resolve ref to digest
    //   2. Check if already cached (by digest AND globs hash)
    //   3. Always include ".sow-ref.yaml" pattern
    //   4. Extract only matching files to temp dir
    //   5. Move to final cache path
    //   6. Return list of extracted files
    //
    // Globs use OR logic: files matching ANY pattern are extracted.
    InstallSelective(ctx context.Context, ref string, globs []string, opts ...InstallOption) (*InstallResult, error)

    // IsCached checks if a ref is already in cache (by digest).
    // Returns cache path if found, empty string if not cached.
    IsCached(ctx context.Context, ref string) (string, error)

    // GetCacheInfo returns cache statistics.
    GetCacheInfo(ctx context.Context) (*CacheInfo, error)
}

InstallResult Type:

// InstallResult contains the result of an installation operation.
type InstallResult struct {
    // CachePath is the absolute path to extracted content.
    // Format: ~/.cache/sow/refs/oci/{id}-{short-digest}/
    CachePath string

    // Digest is the full SHA256 digest of the installed image.
    // Format: sha256:abc123...
    Digest string

    // ShortDigest is the first 7 characters of the digest.
    // Used in cache path naming.
    ShortDigest string

    // Selective indicates whether selective extraction was used.
    Selective bool

    // Globs contains the glob patterns used for selective extraction.
    // Empty for full installations.
    Globs []string

    // ExtractedFiles lists all extracted file paths (relative to CachePath).
    // For full installation, this is all files in the image.
    ExtractedFiles []string

    // Manifest is the parsed .sow-ref.yaml content.
    // Always present (extracted even with selective installation).
    Manifest *schemas.RefManifest

    // FromCache indicates if result was served from cache (no download).
    FromCache bool
}

CacheInfo Type:

// CacheInfo contains cache statistics.
type CacheInfo struct {
    // Path is the base cache directory.
    Path string

    // TotalSize is the total size of all cached refs in bytes.
    TotalSize int64

    // RefCount is the number of cached refs.
    RefCount int

    // Refs contains info about each cached ref.
    Refs []CachedRefInfo
}

// CachedRefInfo contains info about a single cached ref.
type CachedRefInfo struct {
    // ID is the ref identifier (from manifest).
    ID string

    // Digest is the full SHA256 digest.
    Digest string

    // Path is the absolute cache path.
    Path string

    // Size is the total size in bytes.
    Size int64

    // InstalledAt is when the ref was cached.
    InstalledAt time.Time

    // Selective indicates if this was a selective installation.
    Selective bool
}

Functional Options

// InstallOption configures an install operation.
type InstallOption func(*installOptions)

// WithID sets a custom ID for the cache path.
// If not specified, ID is derived from manifest ref.link.
func WithID(id string) InstallOption

// WithCacheDir overrides the default cache directory.
// Default: ~/.cache/sow/refs/oci/
func WithCacheDir(dir string) InstallOption

// WithForce forces re-download even if cached.
func WithForce(force bool) InstallOption

// WithProgressCallback sets a callback for progress updates.
func WithProgressCallback(cb func(ProgressEvent)) InstallOption

Progress Event:

// ProgressEvent describes installation progress.
type ProgressEvent struct {
    Phase       ProgressPhase // resolving, downloading, extracting, moving
    BytesTotal  int64
    BytesDone   int64
    FilesTotal  int
    FilesDone   int
    CurrentFile string
}

type ProgressPhase string

const (
    PhaseResolving   ProgressPhase = "resolving"
    PhaseDownloading ProgressPhase = "downloading"
    PhaseExtracting  ProgressPhase = "extracting"
    PhaseMoving      ProgressPhase = "moving"
)

Cache Structure Implementation

Cache Path Resolution:

// DefaultCacheDir returns the default OCI refs cache directory.
// Respects XDG_CACHE_HOME on Linux, falls back to ~/.cache/sow/refs/oci/
func DefaultCacheDir() (string, error) {
    // Check XDG_CACHE_HOME first (Linux standard)
    if xdg := os.Getenv("XDG_CACHE_HOME"); xdg != "" {
        return filepath.Join(xdg, "sow", "refs", "oci"), nil
    }

    // Fall back to ~/.cache/sow/refs/oci/
    home, err := os.UserHomeDir()
    if err != nil {
        return "", fmt.Errorf("failed to get home directory: %w", err)
    }
    return filepath.Join(home, ".cache", "sow", "refs", "oci"), nil
}

// CachePath generates the cache path for a ref with given ID and digest.
// Format: {cacheDir}/{id}-{shortDigest}/
func CachePath(cacheDir, id, digest string) string {
    shortDigest := digest
    if len(digest) > 7 {
        // Handle sha256: prefix
        if strings.HasPrefix(digest, "sha256:") {
            shortDigest = digest[7:14] // First 7 chars after prefix
        } else {
            shortDigest = digest[:7]
        }
    }
    return filepath.Join(cacheDir, fmt.Sprintf("%s-%s", id, shortDigest))
}

Atomic Extraction Pattern

// atomicExtract extracts to a temp directory then moves to final location.
// This ensures partial extractions don't leave corrupt cache entries.
func atomicExtract(ctx context.Context, client Client, ref, dest string, globs []string) error {
    // Create temp directory in same filesystem for atomic move
    tempDir, err := os.MkdirTemp(filepath.Dir(dest), ".sow-extract-*")
    if err != nil {
        return fmt.Errorf("failed to create temp directory: %w", err)
    }

    // Clean up temp dir on failure
    success := false
    defer func() {
        if !success {
            os.RemoveAll(tempDir)
        }
    }()

    // Extract to temp directory
    if err := client.Pull(ctx, ref, tempDir, globs); err != nil {
        return fmt.Errorf("failed to pull ref: %w", err)
    }

    // Remove existing destination if present
    if err := os.RemoveAll(dest); err != nil && !os.IsNotExist(err) {
        return fmt.Errorf("failed to remove existing cache: %w", err)
    }

    // Atomic move
    if err := os.Rename(tempDir, dest); err != nil {
        return fmt.Errorf("failed to move to cache: %w", err)
    }

    success = true
    return nil
}

Selective Extraction with Mandatory Manifest

// ensureManifestGlob ensures .sow-ref.yaml is always extracted.
func ensureManifestGlob(globs []string) []string {
    const manifestGlob = ".sow-ref.yaml"

    for _, g := range globs {
        if g == manifestGlob {
            return globs // Already present
        }
    }

    return append(globs, manifestGlob)
}

Cache Deduplication Logic

// findCachedByDigest searches cache for an existing entry with matching digest.
// Returns empty string if not found.
func (i *installer) findCachedByDigest(digest string) (string, error) {
    entries, err := os.ReadDir(i.cacheDir)
    if err != nil {
        if os.IsNotExist(err) {
            return "", nil
        }
        return "", fmt.Errorf("failed to read cache dir: %w", err)
    }

    shortDigest := digest
    if strings.HasPrefix(digest, "sha256:") {
        shortDigest = digest[7:14]
    } else if len(digest) > 7 {
        shortDigest = digest[:7]
    }

    for _, entry := range entries {
        if entry.IsDir() && strings.HasSuffix(entry.Name(), "-"+shortDigest) {
            return filepath.Join(i.cacheDir, entry.Name()), nil
        }
    }

    return "", nil
}

Security Requirements

Per design document and discovery analysis, enforce:

  1. Path Traversal Protection: Reject paths containing ../ or absolute paths
  2. Size Limits: 100MB per file, 1GB total (enforced by libs/refs.Client)
  3. Permission Sanitization: No setuid/setgid (handled by OCI library)

These are primarily enforced at the Client.Pull() layer (Work Unit 003), but the installer should verify the manifest doesn't reference dangerous paths.

Error Types

Add to libs/refs/errors.go:

// ErrRefAlreadyCached indicates the ref is already in cache.
// This is informational, not an error in most cases.
var ErrRefAlreadyCached = errors.New("ref already cached")

// ErrManifestMissing indicates .sow-ref.yaml was not found after extraction.
type ErrManifestMissing struct {
    CachePath string
}

func (e ErrManifestMissing) Error() string {
    return fmt.Sprintf("manifest .sow-ref.yaml not found in extracted content at %s", e.CachePath)
}

// ErrCacheCorrupt indicates cache entry exists but is invalid.
type ErrCacheCorrupt struct {
    Path   string
    Reason string
}

func (e ErrCacheCorrupt) Error() string {
    return fmt.Sprintf("corrupt cache entry at %s: %s", e.Path, e.Reason)
}

Testing Requirements

Unit Tests

  1. Cache Path Tests (cache_test.go):

    • CachePath("id", "sha256:abcdefg..."){cacheDir}/id-abcdefg/
    • CachePath("my-ref", "1234567890"){cacheDir}/my-ref-1234567/
    • DefaultCacheDir() respects XDG_CACHE_HOME
    • DefaultCacheDir() falls back to ~/.cache/sow/refs/oci/
  2. Glob Pattern Tests (installer_test.go):

    • ensureManifestGlob(["docs/**"])["docs/**", ".sow-ref.yaml"]
    • ensureManifestGlob([".sow-ref.yaml"])[".sow-ref.yaml"] (no duplicate)
    • ensureManifestGlob([])[".sow-ref.yaml"]
  3. Mock-Based Install Tests (installer_test.go):

    • Install returns correct InstallResult structure
    • Install creates correct cache path format
    • Install extracts manifest and parses it
    • InstallSelective adds .sow-ref.yaml to globs
    • InstallSelective uses OR logic (files matching any pattern)
    • IsCached returns existing path for same digest
    • IsCached returns empty for unknown digest
    • WithForce bypasses cache check

Integration Tests

  1. Full Installation Flow (installer_integration_test.go):

    • Install from test registry, verify all files extracted
    • Install same ref twice, verify second uses cache (no download)
    • Install with WithForce, verify re-downloads
    • Verify cache path format: {id}-{shortDigest}/
    • Verify manifest parsed correctly from extracted .sow-ref.yaml
  2. Selective Installation Flow (installer_integration_test.go):

    • Install with --path "docs/**", verify only docs files extracted
    • Install with multiple globs, verify OR logic
    • Verify .sow-ref.yaml always extracted regardless of globs
    • Verify ExtractedFiles list is accurate
  3. Atomic Extraction Tests (installer_integration_test.go):

    • Simulate extraction failure, verify no partial cache entry
    • Verify temp directory cleaned up on failure
    • Verify existing cache replaced atomically
  4. Error Handling Tests:

    • Missing manifest after extraction → ErrManifestMissing
    • Invalid ref URL → appropriate client error
    • Network failure → client error propagates

Performance Tests (Optional)

  1. Benchmark Tests (installer_benchmark_test.go):
    • Full install of 10MB ref: measure against < 15s target
    • Selective install (10% of files): measure against < 3s target
    • Cache hit: verify near-instant return (< 100ms)

Implementation Notes

Dependency Chain

This work unit depends on:

  • Work Unit 002: Schema for parsing .sow-ref.yaml after extraction
  • Work Unit 003: Client interface for Pull() and GetDigest() operations

Work Unit 003's Client.Pull() handles:

  • OCI image download
  • estargz selective extraction
  • Security limits (file size, total size)
  • Path traversal protection

This work unit adds:

  • Cache management layer on top
  • Digest-based deduplication
  • Atomic extraction guarantees
  • Manifest parsing and validation

Integration with RefType System

After this work unit, Work Unit 007 (CLI Integration) will:

  1. Create OCIType implementing RefType interface
  2. OCIType.Cache() delegates to Installer.Install() or Installer.InstallSelective()
  3. OCIType.CachePath() uses libs/refs.CachePath()
  4. OCIType.IsStale() compares cached digest with remote
  5. OCIType.Cleanup() removes cache directory

XDG Compliance

On Linux systems, respect the XDG Base Directory Specification:

  • XDG_CACHE_HOME (default: ~/.cache) for cache data
  • This is consistent with other Unix tools and container environments

On macOS and Windows, fall back to ~/.cache/sow/refs/oci/ for consistency.

estargz Performance

The estargz format enables selective extraction efficiency. When using Client.Pull() with globs:

  • Only file chunks needed are downloaded
  • TOC is read first to identify matching files
  • Non-matching files are skipped entirely

This is why the 10% selective install can achieve < 3 second target even for large refs.


Out of Scope

  • CLI command implementation: Handled in Work Unit 007 (sow refs add with --path flags)
  • Workspace symlinking: Work Unit 007 handles .sow/refs/ symlink creation
  • Index management: Work Unit 007 handles index.json updates
  • Cache pruning commands: Work Unit 007 implements sow refs prune
  • RefType implementation: Work Unit 007 creates OCIType adapter
  • Search indexing: CLI layer triggers after installation

Implementation Standards

All code produced in this work unit MUST adhere to the following standards:

Code Quality Standards

  • STYLE.md Compliance: All Go code must follow the conventions documented in .standards/STYLE.md
  • TESTING.md Compliance: All tests must follow the patterns documented in .standards/TESTING.md
  • golangci-lint: Code must pass golangci-lint run with zero errors before completion

Required Dependencies

  • OCI Operations: Use github.com/jmgilman/go/oci for all OCI registry operations
    • Pull operations: client.Pull(ctx, reference, targetDir, opts...)
    • Selective extraction: oci.WithFilesToExtract(patterns...)
    • Cached pull: client.PullWithCache(ctx, reference, targetDir, cacheDir, opts...)
  • Filesystem Abstractions: Use github.com/jmgilman/go/fs/core and github.com/jmgilman/go/fs/billy for all file system operations
    • Use core.FS interface for filesystem operations requiring abstraction
    • Pass oci.WithFilesystem(fsys) for testability
    • Use billy.NewMemoryFS() in unit tests
    • Use billy.NewLocalFS() for production

Verification Checklist

Before marking this work unit complete, verify:

  • golangci-lint run ./libs/refs/... passes with zero errors
  • All code follows STYLE.md conventions (functional options, error wrapping, etc.)
  • All tests follow TESTING.md patterns (table-driven tests, test helpers, etc.)
  • Unit tests use memory filesystem via billy.NewMemoryFS() where applicable

Acceptance Criteria

  • Installer interface defined with Install, InstallSelective, IsCached, GetCacheInfo
  • InstallResult struct contains all documented fields
  • //go:generate directive produces mocks/installer.go
  • DefaultCacheDir() respects XDG_CACHE_HOME environment variable
  • CachePath() generates correct format: {id}-{shortDigest}/
  • Full installation extracts all files and parses manifest
  • Selective installation adds .sow-ref.yaml to globs automatically
  • Selective installation uses OR logic for multiple globs
  • Cache deduplication works (same digest = cached result)
  • Atomic extraction via temp directory pattern
  • WithForce option bypasses cache check
  • Progress callback option available
  • Unit tests pass for cache path generation
  • Unit tests pass with mocked client
  • Integration tests pass for full installation flow
  • Integration tests pass for selective installation flow
  • Integration tests verify atomic extraction guarantees
  • Performance: full 10MB install < 15 seconds
  • Performance: selective 10% install < 3 seconds

Metadata

Metadata

Assignees

No one assigned

    Labels

    sowIssues managed by sow breakdown workflow

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions