Skip to content

Work Unit 004: Ref Packaging and Publishing in libs/refs #126

@jmgilman

Description

@jmgilman

Work Unit 004: Ref Packaging and Publishing in libs/refs

Status: Specification
Estimated Effort: 3-4 days
Dependencies: Work Unit 002 (CUE Schema), Work Unit 003 (OCI Client)


Behavioral Goal

As a ref author,
I need to package my documentation directory as an OCI image and publish it to a registry,
So that consumers can install my ref via sow refs add, inspect its contents without downloading, and benefit from digest-based versioning for reproducible installations.

Success Criteria

  1. A Packager interface exists in libs/refs that transforms directories into OCI images
  2. Publishers receive validation errors before packaging if .sow-ref.yaml is invalid or missing
  3. Exclusion patterns from packaging.exclude are applied correctly (e.g., *.draft.md files are not packaged)
  4. Default exclusions (.git/, .DS_Store, node_modules/) are applied automatically
  5. Metadata from .sow-ref.yaml is mapped to OCI annotations, enabling registry-level querying
  6. Published images use estargz format (NOT standard tar.gz), enabling TOC-based inspection
  7. Publishing a 10MB ref completes in under 30 seconds
  8. Integration tests verify round-trip: package → push → pull → verify content matches

Existing Code Context

Explanatory Context

The packaging functionality builds upon the libs/refs module created in Work Unit 003. The Client interface defined there provides the low-level Push operation, but the packaging layer adds:

  1. Manifest validation: Before packaging, validate .sow-ref.yaml against the CUE schema from Work Unit 002
  2. Content filtering: Apply exclusion patterns to skip files that shouldn't be distributed
  3. Annotation mapping: Transform manifest fields into OCI annotations for registry-level metadata
  4. Archive creation: Build estargz-format archives with proper directory structure

The existing refs system in cli/internal/refs/ shows how ref types handle caching and content management. The GitType implementation (cli/internal/refs/git.go) demonstrates the pattern of wrapping an external library (github.com/jmgilman/go/git/cache) with sow-specific validation and error handling. The Packager will follow a similar pattern with the OCI client.

The design document (lines 253-270) defines the Ref Packager component's responsibilities:

  • Validate .sow-ref.yaml schema before packaging
  • Apply exclusion patterns from packaging.exclude in manifest
  • Create estargz archive (not standard tar.gz)
  • Generate OCI annotations from .sow-ref.yaml fields
  • Calculate content digest

The annotation mapping (design doc lines 424-437) provides a complete mapping from .sow-ref.yaml fields to OCI annotation keys, enabling consumers to query metadata without downloading ref contents.

Key Files

File Lines Purpose
libs/refs/client.go (WU003) Client interface with Push method that packager will use
libs/refs/url.go (WU003) OCI URL parsing for ref destination
libs/schemas/ref_manifest.cue (WU002) CUE schema for validation
libs/schemas/cue_types_gen.go (WU002) Generated RefManifest Go type
libs/project/state/validate.go 46-68 CUE validation pattern to follow
cli/internal/refs/git.go 1-226 Pattern: wrapping library with sow-specific logic
libs/exec/executor.go 1-47 Interface definition pattern

Existing Documentation Context

Design Document (Primary Implementation Reference)

The OCI Refs Design Document (.sow/knowledge/designs/oci-refs/oci-refs-design.md) provides the complete specification:

Ref Packager Component (lines 253-270):

  • Validate .sow-ref.yaml schema before packaging
  • Apply exclusion patterns from packaging.exclude in manifest
  • Create estargz archive (NOT standard tar.gz) - this format is required for TOC-based inspection
  • Generate OCI annotations from .sow-ref.yaml fields
  • Calculate content digest
  • Default exclusions: .git/, .DS_Store, node_modules/
  • Preserve Unix permissions (sanitized: no setuid/setgid)
  • Include .sow-ref.yaml in root of image

OCI Annotation Mapping (lines 424-437): This mapping is contractual - consumers rely on these annotations being set correctly:

.sow-ref.yaml Field OCI Annotation Key
ref.title org.opencontainers.image.title
content.description org.opencontainers.image.description
provenance.authors org.opencontainers.image.authors (JSON array)
provenance.created org.opencontainers.image.created
provenance.source org.opencontainers.image.source
provenance.license org.opencontainers.image.licenses
content.classifications com.sow.ref.classifications (JSON)
content.tags com.sow.ref.tags (comma-separated)
ref.link com.sow.ref.link

Performance Requirements (lines 26, 100):

  • NFR1: Publishing 10MB ref completes in < 30 seconds
  • The estargz format is non-negotiable - it enables selective extraction

Discovery Analysis (Integration Guidance)

Section 6 of the discovery analysis (.sow/project/discovery/analysis.md) confirms:

  • The github.com/jmgilman/go/oci library provides estargz support (lines 217-224)
  • Expected API includes Push for publishing images
  • Security features are built into the library: path traversal protection, size limits

Section 9.3 shows the error handling pattern to follow:

if err != nil {
    return fmt.Errorf("failed to <operation>: %w", err)
}

Arc42 Building Blocks (Architecture Context)

The arc42-05 document (.sow/knowledge/designs/oci-refs/arc42-05-building-blocks-refs.md) places Ref Packager in the architecture:

  • Packager is a sub-component of OCI Client Wrapper (Level 2)
  • Receives validated directory input
  • Produces OCI image with estargz format
  • Maps metadata to OCI annotations
  • Packager depends on Schema Validator for manifest validation

Detailed Requirements

Module Structure

Extend libs/refs/ module with packaging functionality:

libs/refs/
├── ... (existing from WU003)
├── packager.go           # Packager interface definition (port)
├── packager_impl.go      # Implementation (adapter)
├── packager_test.go      # Unit tests
├── annotations.go        # Annotation mapping logic
├── annotations_test.go   # Annotation mapping tests
├── exclusions.go         # File exclusion logic
├── exclusions_test.go    # Exclusion pattern tests
└── mocks/
    └── packager.go       # Generated mock (add to existing generate directive)

Interface Definitions

Packager Interface (packager.go):

//go:generate go run github.com/matryer/moq@latest -out mocks/packager.go -pkg mocks . Packager

// Packager transforms local directories into publishable OCI refs.
//
// The packaging workflow:
//   1. Validate .sow-ref.yaml exists and passes schema validation
//   2. Read exclusion patterns from manifest + apply defaults
//   3. Create estargz archive with filtered contents
//   4. Generate OCI annotations from manifest fields
//   5. Calculate content digest
//   6. Delegate to Client.Push for registry upload
//
// Use NewPackager() to create an instance.
type Packager interface {
    // Publish packages a directory and pushes to registry.
    //
    // The directory must contain a valid .sow-ref.yaml at root.
    // Returns the digest of the published image.
    //
    // Example:
    //   digest, err := packager.Publish(ctx, "./docs", "ghcr.io/org/ref:v1.0.0")
    Publish(ctx context.Context, dir string, ref string, opts ...PublishOption) (digest string, err error)

    // Validate checks if a directory can be packaged.
    //
    // Returns nil if the directory contains a valid .sow-ref.yaml.
    // Use this for pre-flight validation before publishing.
    Validate(ctx context.Context, dir string) error

    // Package creates an estargz archive from directory without pushing.
    //
    // Returns path to temporary archive file. Caller must clean up.
    // Used for local inspection or alternative upload methods.
    Package(ctx context.Context, dir string, opts ...PublishOption) (archivePath string, annotations map[string]string, err error)
}

PublishOption Type:

// PublishOption configures a publish operation.
type PublishOption func(*publishOptions)

// WithAdditionalExclusions adds exclusion patterns beyond manifest defaults.
func WithAdditionalExclusions(patterns []string) PublishOption

// WithAlsoTagLatest also pushes the image with :latest tag.
func WithAlsoTagLatest() PublishOption

// WithDryRun validates and packages but does not push.
func WithDryRun() PublishOption

// WithProgressCallback reports packaging progress.
func WithProgressCallback(fn func(stage string, current, total int64)) PublishOption

Exclusion Logic

Create exclusions.go with file filtering:

// DefaultExclusions are always applied when packaging.
var DefaultExclusions = []string{
    ".git/",
    ".git",           // If .git is a file (submodule)
    ".DS_Store",
    "node_modules/",
    "*.swp",          // Vim swap files
    "*~",             // Backup files
    ".sow/",          // Don't include sow metadata
}

// Exclusions handles file filtering for packaging.
type Exclusions struct {
    patterns []string
}

// NewExclusions creates an exclusion matcher from manifest + defaults.
func NewExclusions(manifestExcludes []string) *Exclusions

// ShouldExclude returns true if the path should be excluded.
// The path is relative to the package root.
func (e *Exclusions) ShouldExclude(path string) bool

// MatchedPattern returns which pattern matched, if any.
// Used for debugging/logging.
func (e *Exclusions) MatchedPattern(path string) (pattern string, matched bool)

Glob Pattern Support:

  • *.md - matches any .md file in current directory
  • **/*.draft.md - matches draft files in any subdirectory
  • docs/internal/ - matches directory and all contents
  • !important.md - negation patterns (NOT excluded even if matched)

Annotation Mapping

Create annotations.go with OCI annotation logic:

// StandardAnnotations are OCI-standard annotation keys.
const (
    AnnotationTitle       = "org.opencontainers.image.title"
    AnnotationDescription = "org.opencontainers.image.description"
    AnnotationAuthors     = "org.opencontainers.image.authors"
    AnnotationCreated     = "org.opencontainers.image.created"
    AnnotationSource      = "org.opencontainers.image.source"
    AnnotationLicenses    = "org.opencontainers.image.licenses"
)

// SowAnnotations are sow-specific annotation keys.
const (
    AnnotationClassifications = "com.sow.ref.classifications"
    AnnotationTags            = "com.sow.ref.tags"
    AnnotationLink            = "com.sow.ref.link"
    AnnotationSchemaVersion   = "com.sow.ref.schema_version"
)

// MapManifestToAnnotations converts RefManifest fields to OCI annotations.
//
// The mapping follows the design document specification:
//   - ref.title → org.opencontainers.image.title
//   - content.description → org.opencontainers.image.description
//   - provenance.authors → org.opencontainers.image.authors (JSON array)
//   - provenance.created → org.opencontainers.image.created
//   - provenance.source → org.opencontainers.image.source
//   - provenance.license → org.opencontainers.image.licenses
//   - content.classifications → com.sow.ref.classifications (JSON)
//   - content.tags → com.sow.ref.tags (comma-separated)
//   - ref.link → com.sow.ref.link
//
// Returns map of annotation key to value. Empty/nil fields are omitted.
func MapManifestToAnnotations(manifest *schemas.RefManifest) map[string]string

Packager Implementation

Factory Function (packager_impl.go):

// NewPackager creates a Packager with the given OCI client.
//
// The packager validates manifests, applies exclusions, and delegates
// to the client for registry operations.
func NewPackager(client Client, opts ...PackagerOption) Packager

// PackagerOption configures packager behavior.
type PackagerOption func(*packagerOptions)

// WithDefaultExclusions sets custom default exclusions.
// If not set, DefaultExclusions are used.
func WithDefaultExclusions(patterns []string) PackagerOption

Implementation Flow:

func (p *packagerImpl) Publish(ctx context.Context, dir string, ref string, opts ...PublishOption) (string, error) {
    // 1. Validate directory exists
    if _, err := os.Stat(dir); err != nil {
        return "", fmt.Errorf("directory not found: %w", err)
    }

    // 2. Read and validate .sow-ref.yaml
    manifestPath := filepath.Join(dir, ".sow-ref.yaml")
    manifest, err := p.loadAndValidateManifest(manifestPath)
    if err != nil {
        return "", err // Already wrapped with context
    }

    // 3. Build exclusions (defaults + manifest + options)
    exclusions := p.buildExclusions(manifest, opts)

    // 4. Map manifest to OCI annotations
    annotations := MapManifestToAnnotations(manifest)

    // 5. Delegate to client.Push with exclusions and annotations
    digest, err := p.client.Push(ctx, dir, ref,
        WithExclusions(exclusions.Patterns()),
        WithAnnotations(annotations),
    )
    if err != nil {
        return "", fmt.Errorf("failed to push to registry: %w", err)
    }

    // 6. Optionally push :latest tag
    if opts.alsoTagLatest {
        latestRef := replaceTag(ref, "latest")
        if _, err := p.client.Push(ctx, dir, latestRef,
            WithExclusions(exclusions.Patterns()),
            WithAnnotations(annotations),
        ); err != nil {
            // Log warning but don't fail the primary push
            p.log.Warn("failed to push :latest tag", "error", err)
        }
    }

    return digest, nil
}

Error Types

// ErrManifestMissing indicates .sow-ref.yaml not found in directory.
var ErrManifestMissing = errors.New("manifest .sow-ref.yaml not found")

// ErrManifestInvalid indicates schema validation failed.
type ErrManifestInvalid struct {
    Path   string   // Path to manifest file
    Errors []string // Validation error messages
}

func (e *ErrManifestInvalid) Error() string {
    return fmt.Sprintf("manifest %s is invalid: %s", e.Path, strings.Join(e.Errors, "; "))
}

// ErrEmptyPackage indicates all files were excluded.
type ErrEmptyPackage struct {
    Dir        string
    Exclusions []string
}

func (e *ErrEmptyPackage) Error() string {
    return fmt.Sprintf("no files to package in %s (all excluded)", e.Dir)
}

// ErrExclusionPattern indicates an invalid glob pattern.
type ErrExclusionPattern struct {
    Pattern string
    Err     error
}

Testing Requirements

Unit Tests

1. Exclusion Pattern Tests (exclusions_test.go):

Test Case Input Pattern Input Path Expected
Default git .git/ .git/config excluded
Default DS_Store .DS_Store .DS_Store excluded
Default node_modules node_modules/ node_modules/pkg/index.js excluded
Manifest pattern *.draft.md guide.draft.md excluded
Manifest pattern *.draft.md guide.md NOT excluded
Recursive glob **/*.tmp docs/cache/file.tmp excluded
Directory pattern internal/ internal/secrets.md excluded
Negation pattern !keep.tmp keep.tmp NOT excluded
No match *.log README.md NOT excluded

2. Annotation Mapping Tests (annotations_test.go):

  • Full manifest → all annotations present
  • Minimal manifest → only required annotations
  • Authors array → JSON array format (["author1","author2"])
  • Tags array → comma-separated ("golang,testing,docs")
  • Classifications array → JSON array format
  • Empty optional fields → annotations omitted (not empty strings)
  • Special characters in values → properly escaped

3. Packager Validation Tests (packager_test.go):

  • Valid manifest → no error
  • Missing manifest → ErrManifestMissing
  • Invalid manifest schema → ErrManifestInvalid with field details
  • Directory doesn't exist → clear error
  • All files excluded → ErrEmptyPackage

4. Packager Integration Tests (with mocked client):

  • Publish calls client.Push with correct arguments
  • Exclusions are passed to client
  • Annotations are passed to client
  • WithAlsoTagLatest triggers second Push with :latest
  • WithDryRun doesn't call client.Push
  • WithProgressCallback receives updates

Integration Tests

Test with Real Registry (using Docker local registry):

func TestPublishRoundTrip(t *testing.T) {
    // 1. Start local registry container
    registry := testutil.StartRegistry(t)
    defer registry.Stop()

    // 2. Create test directory with valid manifest
    dir := t.TempDir()
    createTestRef(t, dir, "Test Ref", []string{"docs/guide.md", "examples/demo.go"})

    // 3. Publish to local registry
    packager := refs.NewPackager(refs.NewClient())
    ref := fmt.Sprintf("%s/test-ref:v1.0.0", registry.Addr)
    digest, err := packager.Publish(ctx, dir, ref)
    require.NoError(t, err)
    require.NotEmpty(t, digest)

    // 4. Pull and verify contents match
    client := refs.NewClient()
    pullDir := t.TempDir()
    err = client.Pull(ctx, ref, pullDir, nil)
    require.NoError(t, err)

    // 5. Verify files
    assertFileExists(t, filepath.Join(pullDir, ".sow-ref.yaml"))
    assertFileExists(t, filepath.Join(pullDir, "docs/guide.md"))
    assertFileExists(t, filepath.Join(pullDir, "examples/demo.go"))
}

func TestExclusionsApplied(t *testing.T) {
    // Create dir with files that should be excluded
    // Publish
    // Pull
    // Verify excluded files are NOT present
}

func TestAnnotationsPreserved(t *testing.T) {
    // Publish with manifest having all fields
    // Use ListFiles or GetManifest to verify annotations
}

Performance Tests

func BenchmarkPublish10MB(b *testing.B) {
    // Create 10MB test directory
    dir := createLargeTestDir(b, 10*1024*1024)

    packager := refs.NewPackager(refs.NewClient())
    ref := "localhost:5000/bench:latest"

    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        _, err := packager.Publish(ctx, dir, ref)
        require.NoError(b, err)
    }
    // Assert: average time < 30 seconds
}

Implementation Notes

Dependency Chain

  1. Work Unit 002 (CUE Schema) must be complete for:

    • libs/schemas/ref_manifest.cue schema file
    • Generated RefManifest Go type
    • Validation function availability
  2. Work Unit 003 (OCI Client) must be complete for:

    • Client interface with Push method
    • WithExclusions and WithAnnotations push options
    • Authentication via Docker credential chain

estargz Format Requirement

The github.com/jmgilman/go/oci library produces estargz-format archives automatically. This is critical because:

  • Work Unit 005 (Inspection) uses TOC-only download via ListFiles
  • Work Unit 006 (Installation) uses selective extraction via glob patterns
  • Standard tar.gz would require full download for any operation

The packager MUST NOT use standard tar.gz or any non-estargz format.

Manifest Location

The .sow-ref.yaml file MUST be at the root of the packaged image. This enables:

  • Quick extraction for inspection (GetManifest)
  • Consistent location for consumers
  • Schema validation at extraction time

Permission Handling

The OCI client handles permission sanitization:

  • setuid/setgid bits are stripped
  • Permissions are preserved (644 for files, 755 for directories)
  • No special handling needed in packager

Progress Reporting

For large refs, progress reporting improves UX. The WithProgressCallback option receives:

  • stage: "scanning", "packaging", "pushing"
  • current/total: bytes processed (during packaging/pushing)

Out of Scope

  • CLI command implementation: Work Unit 007 implements sow refs publish
  • Inspection of packaged refs: Work Unit 005 implements sow refs inspect
  • Installation of refs: Work Unit 006 implements sow refs add
  • Index updates after publishing: Not part of packaging (publishing is independent)
  • Registry authentication UI: Docker credential chain handles this transparently

Implementation Standards

All code produced in this work unit MUST adhere to the following standards:

Code Quality Standards

  • STYLE.md Compliance: All Go code must follow the conventions documented in .standards/STYLE.md
  • TESTING.md Compliance: All tests must follow the patterns documented in .standards/TESTING.md
  • golangci-lint: Code must pass golangci-lint run with zero errors before completion

Required Dependencies

  • OCI Operations: Use github.com/jmgilman/go/oci for all OCI registry operations
    • Push operations use client.Push(ctx, sourceDir, reference, opts...)
    • Annotations via oci.WithAnnotations(map[string]string)
    • The OCI library handles estargz format automatically
  • Filesystem Abstractions: Use github.com/jmgilman/go/fs/core and github.com/jmgilman/go/fs/billy for all file system operations
    • Use core.FS interface for filesystem operations requiring abstraction
    • Pass oci.WithFilesystem(fsys) for testability
    • Use billy.NewMemoryFS() in unit tests
    • Use billy.NewLocalFS() for production

Verification Checklist

Before marking this work unit complete, verify:

  • golangci-lint run ./libs/refs/... passes with zero errors
  • All code follows STYLE.md conventions (functional options, error wrapping, etc.)
  • All tests follow TESTING.md patterns (table-driven tests, test helpers, etc.)
  • Unit tests use memory filesystem via billy.NewMemoryFS() where applicable

Acceptance Criteria

  • Packager interface is defined in libs/refs/packager.go
  • //go:generate directive produces mocks/packager.go
  • Publish returns error if .sow-ref.yaml is missing
  • Publish returns error with field details if manifest is invalid
  • Default exclusions (.git/, .DS_Store, node_modules/) are applied
  • Manifest packaging.exclude patterns are applied
  • OCI annotations are correctly mapped from manifest fields
  • org.opencontainers.image.* standard annotations are set
  • com.sow.ref.* custom annotations are set
  • Published images use estargz format (verified by successful ListFiles)
  • Round-trip test passes: package → push → pull → verify contents
  • Publishing 10MB ref completes in < 30 seconds (benchmark test)
  • Unit tests pass for exclusion patterns (all cases from requirements)
  • Unit tests pass for annotation mapping (all fields)
  • Unit tests pass with mocked OCI client

Metadata

Metadata

Assignees

No one assigned

    Labels

    sowIssues managed by sow breakdown workflow

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions