-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Work Unit 005: Ref Inspection in libs/refs
Behavioral Goal
As a sow user evaluating external refs,
I need to inspect OCI refs before downloading them
so that I can understand their contents, validate their structure, and make informed decisions about whether to install them without consuming unnecessary bandwidth.
Success Criteria
- Users can run
sow refs inspect <url>and see file list, metadata, and validation status within 3 seconds - Inspection uses < 10KB bandwidth regardless of ref size (TOC + manifest only)
- Invalid refs are clearly identified before any installation attempt
- Directory tree and size estimates help users understand ref contents
Existing Code Context
Explanatory Context
This work unit creates the Inspector component within the new libs/refs module established in Work Unit 003. The inspection capability is a key differentiator for OCI refs over git refs - it enables users to preview contents before committing to a full download.
The implementation builds on the OCI client from Work Unit 003, which wraps github.com/jmgilman/go/oci. The OCI client provides a ListFiles operation that leverages the estargz (seekable tar.gz) format to download only the Table of Contents (TOC) without retrieving file contents. This is the key enabler for bandwidth-efficient inspection.
The existing refs architecture uses a clean interface pattern. The RefType interface (cli/internal/refs/types.go) defines operations like Cache(), Update(), IsStale(). However, inspection is a new capability not present in the current interface - it's specific to OCI refs because git refs require cloning to inspect contents.
The inspection workflow consists of two estargz operations:
- ListFiles: Downloads only the TOC (~few KB) which contains file paths, sizes, and modes
- Selective Extract: Downloads only
.sow-ref.yaml(~1-5KB) for metadata display
Schema validation uses CUE, following the pattern in libs/project/state/validate.go. The validator loads embedded schemas at init time and uses schema.Unify(value).Validate() for structural validation.
Reference List
Core Interface Patterns:
libs/exec/executor.go:26-47- Interface pattern with clear method contracts and go:generate for mockslibs/git/client.go:14-45- GitHubClient interface demonstrating operation groupingcli/internal/refs/types.go:23-79- RefType interface that inspection complements
Validation Patterns:
libs/project/state/validate.go:46-68- CUE schema validation pattern with Unify and Validatelibs/project/state/validate.go:22-38- Init-time schema loading from embedded FSlibs/schemas/- Embedded CUE schemas directory
CLI Command Patterns:
cli/cmd/refs/add.go:24-72- Cobra command setup with flagscli/cmd/refs/add.go:74-127- RunE implementation using manager patterncli/cmd/refs/add.go:129-163- Output formatting with Printf and emoji indicators
Manager Patterns:
cli/internal/refs/manager.go:12-40- CacheManager struct and factory functionscli/internal/refs/manager.go:53-91- Install method showing type inference and caching flow
Existing Documentation Context
Design Document (oci-refs-design.md)
The design document's Inspector section (lines 291-307) specifies the inspection workflow and responsibilities:
- Call
ListFilesto retrieve file list from estargz TOC - Parse
.sow-ref.yamlvia selective extraction - Display file count, total size estimate, directory tree
- Show metadata (title, description, classifications, tags)
- Validate structure before user commits to download
The Architecture Overview diagrams (lines 143-167) show the consumption flow for inspection:
OCI Registry → ListFiles (TOC only) → Display: "Ref contains 15 files, 2.3MB"
The Non-Functional Requirements (lines 99-107) establish:
- NFR2: Inspection completes in < 3 seconds
- NFR6: Security including path traversal protection
Cross-cutting Concepts (arc42-08-concepts-oci-refs.md)
Section 2 details Selective Extraction with estargz:
- Inspection downloads TOC only (~5-20KB) via
ListFiles - Selectively extracts
.sow-ref.yaml(~1-5KB) - Total bandwidth: < 10KB
- Structure validation occurs before suggesting full download
This establishes the key efficiency property: users can evaluate refs without bandwidth cost proportional to ref size.
Discovery Analysis (analysis.md)
Section 5 (OCI Library Integration) confirms:
github.com/jmgilman/go/ociprovidesListFilesfor TOC-only download- Expected API:
ListFiles(ctx, ref)returns file list from estargz TOC
Section 9.4 (CLI Output) specifies output conventions:
- Use
cmd.Printf()with emoji indicators:✓success,⚠warning,✗error
Dependencies
| Work Unit | Dependency Type | Reason |
|---|---|---|
| 003 | Hard prerequisite | Inspector uses OCI client for ListFiles and selective extraction |
| 002 | Hard prerequisite | Inspector validates .sow-ref.yaml against CUE schema |
| 004 | None | Packaging is independent (publishing vs consuming) |
| 006 | Consumer | Installation may call inspection internally for validation |
| 007 | Consumer | CLI commands invoke Inspector API |
Interface Design
Inspector Interface
// Inspector provides pre-download inspection of OCI refs.
// It enables bandwidth-efficient preview by downloading only
// the estargz TOC and manifest, not full file contents.
type Inspector interface {
// Inspect retrieves metadata and file listing from an OCI ref
// without downloading the full image contents.
//
// This operation uses estargz ListFiles to download only the TOC,
// then selectively extracts .sow-ref.yaml for metadata.
// Total bandwidth: typically < 10KB regardless of ref size.
//
// The ref parameter accepts OCI URLs:
// - ghcr.io/org/repo:tag
// - oci://ghcr.io/org/repo:tag
// - ghcr.io/org/repo@sha256:...
//
// Returns InspectResult with file listing, metadata, and validation status.
// Returns error if ref doesn't exist or network fails.
Inspect(ctx context.Context, ref string) (*InspectResult, error)
}Data Structures
// InspectResult contains all information gathered during ref inspection.
type InspectResult struct {
// Ref is the original ref URL as provided
Ref string
// Digest is the SHA256 digest of the OCI image
Digest string
// Files is the complete file listing from the estargz TOC
Files []FileEntry
// TotalSize is the estimated total size in bytes
// Calculated by summing file sizes from TOC
TotalSize int64
// FileCount is the number of files in the ref
FileCount int
// Manifest is the parsed .sow-ref.yaml content
// Nil if manifest doesn't exist or couldn't be parsed
Manifest *RefManifest
// Valid indicates whether the manifest passed schema validation
Valid bool
// ValidationErrors contains detailed validation failure messages
// Empty if Valid is true
ValidationErrors []string
}
// FileEntry represents a single file from the estargz TOC.
type FileEntry struct {
// Path is the file path relative to ref root
Path string
// Size is the file size in bytes
Size int64
// Mode is the Unix file mode (permissions)
Mode os.FileMode
}
// RefManifest represents the parsed .sow-ref.yaml content.
// This mirrors the CUE schema from Work Unit 002.
type RefManifest struct {
SchemaVersion string `yaml:"schema_version"`
Ref RefIdentity `yaml:"ref"`
Content RefContent `yaml:"content"`
Provenance *RefProvenance `yaml:"provenance,omitempty"`
Packaging *RefPackaging `yaml:"packaging,omitempty"`
Hints *RefHints `yaml:"hints,omitempty"`
Metadata map[string]any `yaml:"metadata,omitempty"`
}
type RefIdentity struct {
Title string `yaml:"title"`
Link string `yaml:"link"`
}
type RefContent struct {
Description string `yaml:"description"`
Summary string `yaml:"summary,omitempty"`
Classifications []Classification `yaml:"classifications"`
Tags []string `yaml:"tags"`
}
type Classification struct {
Type string `yaml:"type"`
Description string `yaml:"description"`
}
type RefProvenance struct {
Authors []string `yaml:"authors,omitempty"`
Created string `yaml:"created,omitempty"`
Updated string `yaml:"updated,omitempty"`
Source string `yaml:"source,omitempty"`
License string `yaml:"license,omitempty"`
}
type RefPackaging struct {
Exclude []string `yaml:"exclude,omitempty"`
}
type RefHints struct {
SuggestedQueries []string `yaml:"suggested_queries,omitempty"`
PrimaryFiles []string `yaml:"primary_files,omitempty"`
}Implementation Approach
High-Level Flow
Inspect(ctx, ref)
│
├─1─► Parse and validate ref URL
│ (reuse URL parsing from Work Unit 003)
│
├─2─► Call OCI client ListFiles(ctx, ref)
│ → Downloads estargz TOC only (~few KB)
│ → Returns []FileEntry with paths, sizes, modes
│
├─3─► Build directory tree representation
│ → Calculate total size from file entries
│ → Count files
│
├─4─► Selective extract .sow-ref.yaml
│ → Uses OCI client ExtractFile or similar
│ → Downloads only this one file (~1-5KB)
│
├─5─► Parse YAML and validate against CUE schema
│ → Use validator from Work Unit 002
│ → Capture validation errors if any
│
└─6─► Return InspectResult
→ File list, size, count
→ Parsed manifest (or nil)
→ Valid flag and any errors
Key Behaviors
-
TOC-Only Download: The
ListFilescall MUST NOT download file contents. It retrieves only the estargz table of contents which contains metadata about files without their actual data. -
Single-File Extraction: After getting the TOC, extract only
.sow-ref.yaml. This is a targeted download of ~1-5KB, not a full image pull. -
Graceful Degradation: If
.sow-ref.yamldoesn't exist or is malformed:- Still return file list and size information
- Set
Valid = false - Include descriptive error in
ValidationErrors - User can still make informed decision
-
Digest Capture: The OCI client should return the image digest from the registry. This enables digest pinning on subsequent install.
Error Handling
| Scenario | Behavior |
|---|---|
| Network failure | Return error with clear message; don't return partial result |
| Ref not found (404) | Return error: "ref not found: " |
| No .sow-ref.yaml | Return result with Valid=false, Manifest=nil, error message |
| Invalid YAML syntax | Return result with Valid=false, parse error in ValidationErrors |
| Schema validation fails | Return result with Valid=false, field-level errors in ValidationErrors |
| Auth required | Return error: "authentication required for " |
Testing Strategy
Unit Tests
Inspector logic tests (with mocked OCI client):
- Successful inspection with valid manifest
- Inspection of ref without .sow-ref.yaml
- Inspection with malformed YAML
- Inspection with schema validation failures
- File counting and size calculation accuracy
- Directory tree building
FileEntry parsing:
- Various file modes (regular, directory, symlink)
- Path normalization
- Size calculation overflow protection
Integration Tests
With test OCI registry:
- End-to-end inspection of published test ref
- Verify bandwidth usage (< 10KB)
- Verify timing (< 3 seconds)
- Test with refs of varying sizes (1MB, 10MB, 100MB)
- Verify same result regardless of ref size
Authentication scenarios:
- Anonymous access to public ref
- Authenticated access to private ref
- Clear error for unauthorized access
Benchmark Tests
- Measure ListFiles latency for various ref sizes
- Verify bandwidth is constant regardless of ref size
- Compare inspection time vs full download time
Performance Requirements
| Metric | Target | Rationale |
|---|---|---|
| Inspection time | < 3 seconds | Design doc NFR2 |
| Bandwidth | < 10KB | TOC (~5KB) + manifest (~5KB max) |
| Memory | O(file count) | Only store file entries, not contents |
Performance MUST be independent of ref size. A 1GB ref should inspect as fast as a 1KB ref because we never download file contents.
Security Considerations
- Path Traversal: Validate file paths from TOC don't contain
../sequences - Size Limits: Reject TOC if file count > 10,000 (DoS protection)
- Manifest Size: Reject .sow-ref.yaml if > 1MB (malicious manifest protection)
- URL Validation: Reject malformed or dangerous URLs before network calls
CLI Output Format
The CLI command (sow refs inspect) in Work Unit 007 will consume this API. Expected output format:
$ sow refs inspect ghcr.io/myorg/go-standards:v1.0.0
✓ Ref: ghcr.io/myorg/go-standards:v1.0.0
Digest: sha256:abc123def456...
Files: 23 files, 2.3 MB total
Directory Structure:
docs/
README.md (12 KB)
guide.md (45 KB)
api/
reference.md (120 KB)
examples/
demo.go (3 KB)
.sow-ref.yaml (2 KB)
Metadata:
Title: Go Team Standards
Link: go-standards
Description: Team Go coding conventions and best practices.
Classifications: guidelines
Tags: golang, conventions, testing
License: MIT
Status: ✓ Valid manifest
For invalid refs:
$ sow refs inspect ghcr.io/myorg/bad-ref:v1.0.0
⚠ Ref: ghcr.io/myorg/bad-ref:v1.0.0
Digest: sha256:xyz789...
Files: 5 files, 150 KB total
Status: ✗ Invalid manifest
- ref.title: required field missing
- content.classifications: must have at least one entry
Implementation Standards
All code produced in this work unit MUST adhere to the following standards:
Code Quality Standards
- STYLE.md Compliance: All Go code must follow the conventions documented in
.standards/STYLE.md - TESTING.md Compliance: All tests must follow the patterns documented in
.standards/TESTING.md - golangci-lint: Code must pass
golangci-lint runwith zero errors before completion
Required Dependencies
- OCI Operations: Use
github.com/jmgilman/go/ocifor all OCI registry operations- List files (TOC-only):
client.ListFiles(ctx, reference)- downloads only estargz TOC - Filtered list:
client.ListFilesWithFilter(ctx, reference, patterns...) - The library provides bandwidth-efficient inspection via estargz format
- List files (TOC-only):
- Filesystem Abstractions: Use
github.com/jmgilman/go/fs/coreandgithub.com/jmgilman/go/fs/billyfor all file system operations- Pass
oci.WithFilesystem(fsys)for testability - Use
billy.NewMemoryFS()in unit tests
- Pass
Verification Checklist
Before marking this work unit complete, verify:
-
golangci-lint run ./libs/refs/...passes with zero errors - All code follows STYLE.md conventions (functional options, error wrapping, etc.)
- All tests follow TESTING.md patterns (table-driven tests, test helpers, etc.)
- Unit tests use memory filesystem via
billy.NewMemoryFS()where applicable
Acceptance Criteria
-
Inspectorinterface defined inlibs/refs/inspector.go -
InspectResult,FileEntry,RefManifesttypes implemented - Implementation uses OCI client
ListFilesfor TOC-only download - Implementation selectively extracts only
.sow-ref.yaml - Schema validation uses CUE validator from Work Unit 002
- Graceful handling when manifest missing or invalid
- Unit tests with mocked OCI client achieve >80% coverage
- Integration test verifies < 10KB bandwidth
- Integration test verifies < 3 second completion
- Security validations (path traversal, size limits) implemented
- Mock generation via go:generate directive
Out of Scope
- CLI command implementation → Work Unit 007
- Full image download → Work Unit 006 (Installation)
- Publishing/packaging → Work Unit 004
- OCI client implementation → Work Unit 003
- CUE schema definition → Work Unit 002
- Cache management → Work Unit 006
References
| Document | Relevance |
|---|---|
.sow/knowledge/designs/oci-refs/oci-refs-design.md lines 291-307 |
Inspector component specification |
.sow/knowledge/designs/oci-refs/oci-refs-design.md lines 143-167 |
Inspection flow diagram |
.sow/knowledge/designs/oci-refs/arc42-08-concepts-oci-refs.md Section 2 |
Selective extraction concept |
.sow/project/discovery/analysis.md Sections 5, 9.4 |
OCI library API, CLI output conventions |
libs/exec/executor.go |
Interface pattern reference |
libs/project/state/validate.go |
CUE validation pattern |
cli/cmd/refs/add.go |
CLI command pattern |