Skip to content

Conversation

@amikofalvy
Copy link
Collaborator

Summary

Refactored AGENTS.md following HumanLayer's best practices for writing effective AI agent guidance files:

  • Reduced AGENTS.md from 442 lines to 96 lines (was 7x longer than recommended)
  • Created docs/agent-guides/ directory for progressive disclosure of detailed topics
  • Replaced embedded code examples with file references to prevent outdated snippets
  • Kept only universally applicable instructions in the main file

Progressive Disclosure Files Created

File Content
docs/agent-guides/testing-patterns.md Vitest patterns, test structure, coverage requirements
docs/agent-guides/ui-development.md Next.js/React patterns for agents-manage-ui
docs/agent-guides/documentation.md MDX documentation guidelines for agents-docs
docs/agent-guides/database-migrations.md Drizzle migration workflow and critical rules
docs/agent-guides/git-worktrees.md Parallel development with git worktrees
docs/agent-guides/debugging.md Jaeger tracing commands and debugging workflows

Key Improvements

  1. Less is more: Frontier models can follow ~150-200 instructions max; previous file likely exceeded this
  2. Universal applicability: Detailed task-specific content moved to separate files
  3. Pointers over copies: Code examples replaced with references to actual files
  4. Progressive disclosure: Agents only see detailed guides when they navigate to them

Test plan

  • Verify AI agents can still find essential commands in AGENTS.md
  • Verify links to docs/agent-guides/ files work correctly
  • Test that agents can follow progressive disclosure to find detailed information

@changeset-bot
Copy link

changeset-bot bot commented Dec 2, 2025

⚠️ No Changeset found

Latest commit: 8d3ca96

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel
Copy link

vercel bot commented Dec 2, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
agents-docs Ready Ready Preview Comment Dec 2, 2025 9:26pm
agents-manage-api Ready Ready Preview Comment Dec 2, 2025 9:26pm
agents-manage-ui Ready Ready Preview Comment Dec 2, 2025 9:26pm
agents-run-api Ready Ready Preview Comment Dec 2, 2025 9:26pm

@claude
Copy link

claude bot commented Dec 2, 2025

Claude finished @amikofalvy's task —— View job


Staff Engineer Review Complete

I've conducted a comprehensive architectural review of this PR covering both the AGENTS.md refactoring and the context tracking evaluation system. Here's my analysis:

🎯 Overall Assessment: Strong Refactoring with Well-Architected Context Tracking

This PR successfully addresses two distinct objectives: progressive disclosure for agent guidance and sophisticated context analysis capabilities. Both are well-executed with thoughtful architectural decisions.


📋 AGENTS.md Refactoring Analysis

Excellent Progressive Disclosure Implementation

The refactoring from 442 to 96 lines following HumanLayer's best practices is architecturally sound:

  • Cognitive Load Reduction: Maintains ~150-200 instruction limit for frontier models
  • Universal Applicability: Core AGENTS.md now contains only universally relevant commands and patterns
  • Smart Information Architecture: Task-specific content properly separated into docs/agent-guides/

Well-Structured Guide Files

The progressive disclosure files demonstrate good documentation patterns:

  • Consistent Structure: All guides follow similar format (Stack → Commands → Guidelines)
  • Practical Focus: Each guide provides immediately actionable information
  • Proper Scope: Each guide covers a cohesive domain (testing, UI, docs, etc.)

⚠️ Minor Naming Inconsistency

Issue: File is named Agents.md (PascalCase) but referenced as AGENTS.md in documentation.

Recommendation:

# Standardize on AGENTS.md convention
git mv Agents.md AGENTS.md

🔍 Context Tracking System Analysis

This is a sophisticated addition that adds significant observability value. The implementation demonstrates strong system design principles:

Excellent Token Estimation Strategy

agents-run-api/src/utils/token-estimator.ts:

  • Model Agnostic: 4-chars-per-token approximation works across OpenAI/Anthropic/Gemini
  • Performance Optimized: Simple string length calculation vs expensive tokenizer calls
  • Appropriate Precision: Perfect for relative comparisons between context components

Strong Type Safety & Extensibility

ContextBreakdown Interface Design:

export interface ContextBreakdown {
  systemPromptTemplate: number;
  coreInstructions: number;
  agentPrompt: number;
  // ... 11 total components with clear semantic naming
  total: number;
}
  • Comprehensive Coverage: Captures all major context components
  • Clear Semantics: Field names clearly indicate their purpose
  • Future-Proof: Easy to extend with new components

Sophisticated UI Implementation

agents-manage-ui/src/components/traces/context-breakdown.tsx:

  • Progressive Visual Design: Stacked bar + detailed breakdown provides multiple levels of insight
  • Accessibility: Proper ARIA labeling, tooltips, and color coding
  • Performance: useMemo for expensive calculations
  • User Experience: Filters out segments <0.5% to avoid visual clutter

Clean Architectural Integration

System Integration Points:

  • SystemPromptBuilder: Returns AssembleResult with both prompt and breakdown
  • Phase1Config/Phase2Config: Implements token tracking throughout prompt assembly
  • UI Components: Seamlessly integrates breakdown display in trace viewer

🔧 Architectural Considerations & Recommendations

1. Token Estimation Accuracy Trade-offs

Current Approach: Character-based approximation (~4 chars/token)
Considerations:

  • ✅ Fast, model-agnostic, dependency-free
  • ⚠️ May be 15-25% off for some content types (code vs prose)

Recommendation: Current approach is excellent for the use case. Consider adding a configuration option for precision-sensitive scenarios:

interface TokenEstimatorConfig {
  method: 'character-based' | 'model-specific';
  model?: string; // For future model-specific tokenizers
}

2. Context Component Granularity

Current Breakdown: 11 components tracked individually
Consideration: Balance between granularity and usability

Recommendation: Current granularity is appropriate. Consider grouping for high-level views:

interface ContextSummary {
  instructions: number; // systemPromptTemplate + coreInstructions + agentPrompt
  tools: number; // toolsSection + artifactComponents
  history: number; // conversationHistory
  // ... etc
}

3. Performance & Memory Considerations

Current Implementation: Creates breakdown objects for every prompt assembly

Considerations:

  • Memory usage for high-volume scenarios
  • CPU overhead of token estimation

Recommendations:

  • ✅ Current implementation is efficient (simple calculations)
  • Consider adding feature flag for production environments where breakdown isn't needed
  • Potential optimization: Cache breakdown results for identical prompt components

🚨 Critical Items (None Found)

No critical architectural flaws or security issues identified. Both the documentation refactoring and context tracking implementation follow solid engineering principles.


🎯 Customer Impact Assessment

Positive Impacts:

  1. Developer Experience: Cleaner, more discoverable agent guidance
  2. Debugging Capabilities: Rich context analysis for troubleshooting token usage
  3. Performance Insights: Ability to optimize prompts based on component breakdown
  4. Maintainability: Progressive disclosure makes guidance easier to maintain

Risk Assessment: LOW

  • No breaking changes to APIs or data contracts
  • Additive features with graceful degradation
  • Backward compatible changes only

📈 Future Extensibility

The architecture supports several natural extensions:

  1. Token Budget Warnings: Alert when approaching model context limits
  2. Component Optimization: Suggest which components to trim when over budget
  3. Historical Analysis: Track context evolution over time
  4. A/B Testing: Compare token efficiency across prompt variations

Final Recommendation: APPROVE

This PR demonstrates excellent system architecture across both objectives:

  • AGENTS.md Refactoring: Thoughtful progressive disclosure following industry best practices
  • Context Tracking: Well-designed observability system with strong technical implementation

The only suggestion is the minor naming standardization for AGENTS.md. Otherwise, this is production-ready code that adds significant value to the platform.

Confidence Level: High - This is well-architected, thoroughly implemented, and follows TypeScript/React best practices throughout.

Refactored AGENTS.md from 442 lines to 96 lines following HumanLayer's
best practices for writing effective AI agent guidance files:

- Condensed to essential commands, architecture, and quick reference
- Created docs/agent-guides/ for progressive disclosure of detailed topics:
  - testing-patterns.md
  - ui-development.md
  - documentation.md
  - database-migrations.md
  - git-worktrees.md
  - debugging.md
- Replaced embedded code examples with file references
- Kept only universally applicable instructions in the main file

Reference: https://www.humanlayer.dev/blog/writing-a-good-claude-md
- Reduce AGENTS.md from 442 to 106 lines (76% reduction)
- Reduce .cursorrules from 202 to 39 lines (81% reduction)
- Add progressive disclosure files in docs/agent-guides/
- Add "Before Committing" section with Biome lint commands
- Replace embedded code examples with file references
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants