-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Work Unit 006: Ref Installation in libs/refs
Status: Specification
Estimated Effort: 3-4 days
Dependencies: Work Unit 002 (CUE Schema), Work Unit 003 (libs/refs OCI Backend)
Behavioral Goal
As a ref consumer,
I need a reliable installation mechanism that downloads OCI refs (either fully or selectively) and manages a local cache,
So that I can install refs quickly, avoid redundant downloads through digest-based caching, and selectively extract only the files I need to save bandwidth and storage while always having access to the ref manifest.
Success Criteria
- Full OCI ref installation downloads entire image, extracts to cache, and returns path for symlinking
- Selective installation with glob patterns downloads only matching files plus
.sow-ref.yaml - Multiple
--pathglob patterns work with OR logic (files matching any pattern are extracted) - Digest-based caching prevents redundant downloads (same digest = skip download, return existing path)
- Cache paths follow
~/.cache/sow/refs/oci/{id}-{short-digest}/structure (short digest = 7 chars) - Installation is atomic: extract to temp directory first, move to final location only on success
XDG_CACHE_HOMEis respected on Linux systems- Performance: full install of 10MB ref < 15 seconds, selective install (10% of files) < 3 seconds
- Security constraints enforced: path traversal protection, size limits, permission sanitization
- Integration tests pass for both full and selective installation scenarios
Existing Code Context
Explanatory Context
The current refs system in cli/internal/refs/ follows a type-based architecture where each ref type (git, file) implements the RefType interface. The CacheManager in manager.go orchestrates caching by delegating to the appropriate RefType implementation, then creates workspace symlinks from .sow/refs/{link} to the cache directory.
The existing GitType implementation (git.go:77-105) demonstrates the caching pattern: it uses an external library (github.com/jmgilman/go/git/cache) for the actual cache operations while implementing the RefType interface. The new OCI installation will follow this pattern: libs/refs provides the core installation logic (this work unit), and a future OCIType adapter (Work Unit 007) will wrap it to implement RefType.
The cache structure for OCI refs differs from git refs. Git uses {cacheDir}/git/checkouts/{id}/ while OCI will use {cacheDir}/oci/{id}-{short-digest}/. The short digest (first 7 characters of SHA256) enables:
- Human-readable cache paths
- Multiple versions coexisting (different digests)
- Quick visual identification during debugging
Work Unit 003 establishes the libs/refs module with the Client interface providing Pull() for extraction and GetDigest() for digest resolution. This work unit adds the Installer interface that orchestrates these operations with caching logic.
Key Files
| File | Lines | Purpose |
|---|---|---|
cli/internal/refs/types.go |
23-79 | RefType interface (install target) |
cli/internal/refs/manager.go |
51-91 | CacheManager.Install() flow pattern |
cli/internal/refs/git.go |
77-105 | GitType.Cache() implementation pattern |
cli/internal/refs/git.go |
59-75 | Lazy cache initialization pattern |
libs/schemas/refs_cache.cue |
1-82 | Cache index schema (for future expansion) |
libs/exec/executor.go |
full | Interface pattern to follow |
libs/git/client.go |
full | Interface design pattern |
Existing Documentation Context
Design Document (Primary Reference)
The OCI Refs Design Document (.sow/knowledge/designs/oci-refs/oci-refs-design.md, lines 308-344) specifies the installation and cache management:
Ref Installer (lines 308-326):
- Pull OCI image (full or selective based on
--pathflags) - Extract to cache directory:
~/.cache/sow/refs/{id}-{short-digest}/ - Create symlink:
.sow/refs/{link}-> cache directory - Trigger search indexing (deferred to CLI layer)
Key behaviors documented:
- Atomic extraction: use temp directory, move to final location on success
- Short digest: first 7 characters of SHA256 for human-readable paths
- Selective extraction: pass multiple glob patterns, files matching any pattern extracted
- Always extract
.sow-ref.yaml: even with selective extraction (needed for metadata)
Cache Manager (lines 328-344):
- Allocate cache directories with naming:
{id}-{short-digest} - Check if ref already cached (digest-based)
- Prune unused cache entries (deferred to CLI layer)
- Calculate cache statistics (deferred to CLI layer)
Performance requirements (documented in arc42 section 4):
- Full install of 10MB ref: < 15 seconds
- Selective install (10% of files): < 3 seconds
- Must use estargz for selective extraction efficiency
Arc42 Concepts (Cache Structure)
Section 4 of .sow/knowledge/designs/oci-refs/arc42-08-concepts-oci-refs.md documents digest-based caching:
~/.cache/sow/refs/
├── go-standards-abc1234/ # <id>-<short-digest>
└── api-patterns-def5678/
.sow/refs/
├── team-go -> ~/.cache/sow/refs/go-standards-abc1234
└── apis -> ~/.cache/sow/refs/api-patterns-def5678
Benefits: deduplication (same digest stored once), fast rollback, integrity verification.
Discovery Analysis (Implementation Context)
Section 3 of discovery analysis (.sow/project/discovery/analysis.md) confirms the cache structure is compatible with existing patterns. Each RefType manages its own subdirectory under ~/.cache/sow/refs/.
Section 8.3 highlights that CacheManager.Install() expects:
- Type inference from URL
refType.Cache()returns cache path- Symlink creation to workspace
For OCI, this flow works but needs to handle:
- Selective extraction (multiple globs)
- Always extract
.sow-ref.yamleven with selective - Atomic extraction (temp dir -> move)
Detailed Requirements
Module Extension
Extend libs/refs/ module (created in Work Unit 003) with installation functionality:
libs/refs/
├── ... (existing from Work Unit 003)
├── installer.go # Installer interface definition
├── installer_impl.go # Implementation
├── installer_test.go # Integration tests
├── cache.go # Cache management utilities
└── mocks/
└── installer.go # Generated mock
Interface Definitions
Installer Interface (installer.go):
//go:generate go run github.com/matryer/moq@latest -out mocks/installer.go -pkg mocks . Installer
// Installer defines operations for installing OCI refs to local cache.
//
// This interface handles:
// - Full and selective extraction from OCI registries
// - Digest-based cache deduplication
// - Atomic extraction via temp directories
// - Always extracting .sow-ref.yaml for metadata access
type Installer interface {
// Install downloads and extracts a full OCI ref to cache.
//
// Flow:
// 1. Resolve ref to digest
// 2. Check if already cached by digest
// 3. If cached, return existing path
// 4. If not cached, pull full image to temp dir
// 5. Move temp dir to final cache path
// 6. Parse and return manifest
//
// Returns InstallResult with cache path and metadata.
Install(ctx context.Context, ref string, opts ...InstallOption) (*InstallResult, error)
// InstallSelective downloads only files matching glob patterns.
//
// Flow:
// 1. Resolve ref to digest
// 2. Check if already cached (by digest AND globs hash)
// 3. Always include ".sow-ref.yaml" pattern
// 4. Extract only matching files to temp dir
// 5. Move to final cache path
// 6. Return list of extracted files
//
// Globs use OR logic: files matching ANY pattern are extracted.
InstallSelective(ctx context.Context, ref string, globs []string, opts ...InstallOption) (*InstallResult, error)
// IsCached checks if a ref is already in cache (by digest).
// Returns cache path if found, empty string if not cached.
IsCached(ctx context.Context, ref string) (string, error)
// GetCacheInfo returns cache statistics.
GetCacheInfo(ctx context.Context) (*CacheInfo, error)
}InstallResult Type:
// InstallResult contains the result of an installation operation.
type InstallResult struct {
// CachePath is the absolute path to extracted content.
// Format: ~/.cache/sow/refs/oci/{id}-{short-digest}/
CachePath string
// Digest is the full SHA256 digest of the installed image.
// Format: sha256:abc123...
Digest string
// ShortDigest is the first 7 characters of the digest.
// Used in cache path naming.
ShortDigest string
// Selective indicates whether selective extraction was used.
Selective bool
// Globs contains the glob patterns used for selective extraction.
// Empty for full installations.
Globs []string
// ExtractedFiles lists all extracted file paths (relative to CachePath).
// For full installation, this is all files in the image.
ExtractedFiles []string
// Manifest is the parsed .sow-ref.yaml content.
// Always present (extracted even with selective installation).
Manifest *schemas.RefManifest
// FromCache indicates if result was served from cache (no download).
FromCache bool
}CacheInfo Type:
// CacheInfo contains cache statistics.
type CacheInfo struct {
// Path is the base cache directory.
Path string
// TotalSize is the total size of all cached refs in bytes.
TotalSize int64
// RefCount is the number of cached refs.
RefCount int
// Refs contains info about each cached ref.
Refs []CachedRefInfo
}
// CachedRefInfo contains info about a single cached ref.
type CachedRefInfo struct {
// ID is the ref identifier (from manifest).
ID string
// Digest is the full SHA256 digest.
Digest string
// Path is the absolute cache path.
Path string
// Size is the total size in bytes.
Size int64
// InstalledAt is when the ref was cached.
InstalledAt time.Time
// Selective indicates if this was a selective installation.
Selective bool
}Functional Options
// InstallOption configures an install operation.
type InstallOption func(*installOptions)
// WithID sets a custom ID for the cache path.
// If not specified, ID is derived from manifest ref.link.
func WithID(id string) InstallOption
// WithCacheDir overrides the default cache directory.
// Default: ~/.cache/sow/refs/oci/
func WithCacheDir(dir string) InstallOption
// WithForce forces re-download even if cached.
func WithForce(force bool) InstallOption
// WithProgressCallback sets a callback for progress updates.
func WithProgressCallback(cb func(ProgressEvent)) InstallOptionProgress Event:
// ProgressEvent describes installation progress.
type ProgressEvent struct {
Phase ProgressPhase // resolving, downloading, extracting, moving
BytesTotal int64
BytesDone int64
FilesTotal int
FilesDone int
CurrentFile string
}
type ProgressPhase string
const (
PhaseResolving ProgressPhase = "resolving"
PhaseDownloading ProgressPhase = "downloading"
PhaseExtracting ProgressPhase = "extracting"
PhaseMoving ProgressPhase = "moving"
)Cache Structure Implementation
Cache Path Resolution:
// DefaultCacheDir returns the default OCI refs cache directory.
// Respects XDG_CACHE_HOME on Linux, falls back to ~/.cache/sow/refs/oci/
func DefaultCacheDir() (string, error) {
// Check XDG_CACHE_HOME first (Linux standard)
if xdg := os.Getenv("XDG_CACHE_HOME"); xdg != "" {
return filepath.Join(xdg, "sow", "refs", "oci"), nil
}
// Fall back to ~/.cache/sow/refs/oci/
home, err := os.UserHomeDir()
if err != nil {
return "", fmt.Errorf("failed to get home directory: %w", err)
}
return filepath.Join(home, ".cache", "sow", "refs", "oci"), nil
}
// CachePath generates the cache path for a ref with given ID and digest.
// Format: {cacheDir}/{id}-{shortDigest}/
func CachePath(cacheDir, id, digest string) string {
shortDigest := digest
if len(digest) > 7 {
// Handle sha256: prefix
if strings.HasPrefix(digest, "sha256:") {
shortDigest = digest[7:14] // First 7 chars after prefix
} else {
shortDigest = digest[:7]
}
}
return filepath.Join(cacheDir, fmt.Sprintf("%s-%s", id, shortDigest))
}Atomic Extraction Pattern
// atomicExtract extracts to a temp directory then moves to final location.
// This ensures partial extractions don't leave corrupt cache entries.
func atomicExtract(ctx context.Context, client Client, ref, dest string, globs []string) error {
// Create temp directory in same filesystem for atomic move
tempDir, err := os.MkdirTemp(filepath.Dir(dest), ".sow-extract-*")
if err != nil {
return fmt.Errorf("failed to create temp directory: %w", err)
}
// Clean up temp dir on failure
success := false
defer func() {
if !success {
os.RemoveAll(tempDir)
}
}()
// Extract to temp directory
if err := client.Pull(ctx, ref, tempDir, globs); err != nil {
return fmt.Errorf("failed to pull ref: %w", err)
}
// Remove existing destination if present
if err := os.RemoveAll(dest); err != nil && !os.IsNotExist(err) {
return fmt.Errorf("failed to remove existing cache: %w", err)
}
// Atomic move
if err := os.Rename(tempDir, dest); err != nil {
return fmt.Errorf("failed to move to cache: %w", err)
}
success = true
return nil
}Selective Extraction with Mandatory Manifest
// ensureManifestGlob ensures .sow-ref.yaml is always extracted.
func ensureManifestGlob(globs []string) []string {
const manifestGlob = ".sow-ref.yaml"
for _, g := range globs {
if g == manifestGlob {
return globs // Already present
}
}
return append(globs, manifestGlob)
}Cache Deduplication Logic
// findCachedByDigest searches cache for an existing entry with matching digest.
// Returns empty string if not found.
func (i *installer) findCachedByDigest(digest string) (string, error) {
entries, err := os.ReadDir(i.cacheDir)
if err != nil {
if os.IsNotExist(err) {
return "", nil
}
return "", fmt.Errorf("failed to read cache dir: %w", err)
}
shortDigest := digest
if strings.HasPrefix(digest, "sha256:") {
shortDigest = digest[7:14]
} else if len(digest) > 7 {
shortDigest = digest[:7]
}
for _, entry := range entries {
if entry.IsDir() && strings.HasSuffix(entry.Name(), "-"+shortDigest) {
return filepath.Join(i.cacheDir, entry.Name()), nil
}
}
return "", nil
}Security Requirements
Per design document and discovery analysis, enforce:
- Path Traversal Protection: Reject paths containing
../or absolute paths - Size Limits: 100MB per file, 1GB total (enforced by
libs/refs.Client) - Permission Sanitization: No setuid/setgid (handled by OCI library)
These are primarily enforced at the Client.Pull() layer (Work Unit 003), but the installer should verify the manifest doesn't reference dangerous paths.
Error Types
Add to libs/refs/errors.go:
// ErrRefAlreadyCached indicates the ref is already in cache.
// This is informational, not an error in most cases.
var ErrRefAlreadyCached = errors.New("ref already cached")
// ErrManifestMissing indicates .sow-ref.yaml was not found after extraction.
type ErrManifestMissing struct {
CachePath string
}
func (e ErrManifestMissing) Error() string {
return fmt.Sprintf("manifest .sow-ref.yaml not found in extracted content at %s", e.CachePath)
}
// ErrCacheCorrupt indicates cache entry exists but is invalid.
type ErrCacheCorrupt struct {
Path string
Reason string
}
func (e ErrCacheCorrupt) Error() string {
return fmt.Sprintf("corrupt cache entry at %s: %s", e.Path, e.Reason)
}Testing Requirements
Unit Tests
-
Cache Path Tests (
cache_test.go):CachePath("id", "sha256:abcdefg...")→{cacheDir}/id-abcdefg/CachePath("my-ref", "1234567890")→{cacheDir}/my-ref-1234567/DefaultCacheDir()respectsXDG_CACHE_HOMEDefaultCacheDir()falls back to~/.cache/sow/refs/oci/
-
Glob Pattern Tests (
installer_test.go):ensureManifestGlob(["docs/**"])→["docs/**", ".sow-ref.yaml"]ensureManifestGlob([".sow-ref.yaml"])→[".sow-ref.yaml"](no duplicate)ensureManifestGlob([])→[".sow-ref.yaml"]
-
Mock-Based Install Tests (
installer_test.go):- Install returns correct
InstallResultstructure - Install creates correct cache path format
- Install extracts manifest and parses it
- InstallSelective adds
.sow-ref.yamlto globs - InstallSelective uses OR logic (files matching any pattern)
- IsCached returns existing path for same digest
- IsCached returns empty for unknown digest
- WithForce bypasses cache check
- Install returns correct
Integration Tests
-
Full Installation Flow (
installer_integration_test.go):- Install from test registry, verify all files extracted
- Install same ref twice, verify second uses cache (no download)
- Install with
WithForce, verify re-downloads - Verify cache path format:
{id}-{shortDigest}/ - Verify manifest parsed correctly from extracted
.sow-ref.yaml
-
Selective Installation Flow (
installer_integration_test.go):- Install with
--path "docs/**", verify only docs files extracted - Install with multiple globs, verify OR logic
- Verify
.sow-ref.yamlalways extracted regardless of globs - Verify
ExtractedFileslist is accurate
- Install with
-
Atomic Extraction Tests (
installer_integration_test.go):- Simulate extraction failure, verify no partial cache entry
- Verify temp directory cleaned up on failure
- Verify existing cache replaced atomically
-
Error Handling Tests:
- Missing manifest after extraction →
ErrManifestMissing - Invalid ref URL → appropriate client error
- Network failure → client error propagates
- Missing manifest after extraction →
Performance Tests (Optional)
- Benchmark Tests (
installer_benchmark_test.go):- Full install of 10MB ref: measure against < 15s target
- Selective install (10% of files): measure against < 3s target
- Cache hit: verify near-instant return (< 100ms)
Implementation Notes
Dependency Chain
This work unit depends on:
- Work Unit 002: Schema for parsing
.sow-ref.yamlafter extraction - Work Unit 003:
Clientinterface forPull()andGetDigest()operations
Work Unit 003's Client.Pull() handles:
- OCI image download
- estargz selective extraction
- Security limits (file size, total size)
- Path traversal protection
This work unit adds:
- Cache management layer on top
- Digest-based deduplication
- Atomic extraction guarantees
- Manifest parsing and validation
Integration with RefType System
After this work unit, Work Unit 007 (CLI Integration) will:
- Create
OCITypeimplementingRefTypeinterface OCIType.Cache()delegates toInstaller.Install()orInstaller.InstallSelective()OCIType.CachePath()useslibs/refs.CachePath()OCIType.IsStale()compares cached digest with remoteOCIType.Cleanup()removes cache directory
XDG Compliance
On Linux systems, respect the XDG Base Directory Specification:
XDG_CACHE_HOME(default:~/.cache) for cache data- This is consistent with other Unix tools and container environments
On macOS and Windows, fall back to ~/.cache/sow/refs/oci/ for consistency.
estargz Performance
The estargz format enables selective extraction efficiency. When using Client.Pull() with globs:
- Only file chunks needed are downloaded
- TOC is read first to identify matching files
- Non-matching files are skipped entirely
This is why the 10% selective install can achieve < 3 second target even for large refs.
Out of Scope
- CLI command implementation: Handled in Work Unit 007 (
sow refs addwith--pathflags) - Workspace symlinking: Work Unit 007 handles
.sow/refs/symlink creation - Index management: Work Unit 007 handles
index.jsonupdates - Cache pruning commands: Work Unit 007 implements
sow refs prune - RefType implementation: Work Unit 007 creates
OCITypeadapter - Search indexing: CLI layer triggers after installation
Implementation Standards
All code produced in this work unit MUST adhere to the following standards:
Code Quality Standards
- STYLE.md Compliance: All Go code must follow the conventions documented in
.standards/STYLE.md - TESTING.md Compliance: All tests must follow the patterns documented in
.standards/TESTING.md - golangci-lint: Code must pass
golangci-lint runwith zero errors before completion
Required Dependencies
- OCI Operations: Use
github.com/jmgilman/go/ocifor all OCI registry operations- Pull operations:
client.Pull(ctx, reference, targetDir, opts...) - Selective extraction:
oci.WithFilesToExtract(patterns...) - Cached pull:
client.PullWithCache(ctx, reference, targetDir, cacheDir, opts...)
- Pull operations:
- Filesystem Abstractions: Use
github.com/jmgilman/go/fs/coreandgithub.com/jmgilman/go/fs/billyfor all file system operations- Use
core.FSinterface for filesystem operations requiring abstraction - Pass
oci.WithFilesystem(fsys)for testability - Use
billy.NewMemoryFS()in unit tests - Use
billy.NewLocalFS()for production
- Use
Verification Checklist
Before marking this work unit complete, verify:
-
golangci-lint run ./libs/refs/...passes with zero errors - All code follows STYLE.md conventions (functional options, error wrapping, etc.)
- All tests follow TESTING.md patterns (table-driven tests, test helpers, etc.)
- Unit tests use memory filesystem via
billy.NewMemoryFS()where applicable
Acceptance Criteria
-
Installerinterface defined withInstall,InstallSelective,IsCached,GetCacheInfo -
InstallResultstruct contains all documented fields -
//go:generatedirective producesmocks/installer.go -
DefaultCacheDir()respectsXDG_CACHE_HOMEenvironment variable -
CachePath()generates correct format:{id}-{shortDigest}/ - Full installation extracts all files and parses manifest
- Selective installation adds
.sow-ref.yamlto globs automatically - Selective installation uses OR logic for multiple globs
- Cache deduplication works (same digest = cached result)
- Atomic extraction via temp directory pattern
-
WithForceoption bypasses cache check - Progress callback option available
- Unit tests pass for cache path generation
- Unit tests pass with mocked client
- Integration tests pass for full installation flow
- Integration tests pass for selective installation flow
- Integration tests verify atomic extraction guarantees
- Performance: full 10MB install < 15 seconds
- Performance: selective 10% install < 3 seconds