A context-aware AI system that transforms your Obsidian vault into an intelligent assistant through natural language interaction. Built with a sophisticated service-oriented architecture combining a FastAPI intelligence backend with an advanced React-based Obsidian plugin frontend.
Core Philosophy: Transform knowledge work from command-driven to conversation-driven through sophisticated document understanding, intent detection, and natural language processing.
NotebookLocal implements a distributed intelligence architecture with clear separation between backend processing and frontend user experience, enabling sophisticated AI-powered knowledge management.
┌─────────────────────────────────────────────────────────────────┐
│ NotebookLocal System │
├─────────────────────────────────────────────────────────────────┤
│ Frontend Layer │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Obsidian Plugin │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ React │ │ Intelligence │ │ Universal │ │ │
│ │ │ Components │ │ Controller │ │ Processor │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ Communication Layer │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ RESTful API (14+ endpoints) │ │
│ │ HTTP/JSON + Server-Sent Events (Streaming) │ │
│ └─────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ Backend Layer │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Inference Server │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Intelligence │ │ Universal │ │ LLM │ │ │
│ │ │ System │ │ Processor │ │ Router │ │ │
│ │ │ (6 Engines) │ │ (Multi-fmt) │ │ (Multi-LLM) │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ Data Layer │
│ ┌──────────────────────────────┐ ┌─────────────────────────┐ │
│ │ PostgreSQL │ │ Weaviate │ │
│ │ (Metadata & Chunks) │ │ (Vector Embeddings) │ │
│ └──────────────────────────────┘ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
The system models six fundamental ways humans interact with knowledge, each optimized for specific reasoning tasks:
| Engine | Purpose | Temperature | Example Queries |
|---|---|---|---|
| 🤔 Understand | Question answering & comprehension | 0.3 | "What did I conclude about X?", "Explain this concept from my notes" |
| 🧭 Navigate | Knowledge discovery & connections | 0.4 | "Find related notes", "Show me everything about Y", "What should I read next?" |
| ✏️ Transform | Content editing & restructuring | 0.3 | "Make this clearer", "Rewrite professionally", "Restructure this section" |
| 🔍 Synthesize | Pattern analysis & insights | 0.2 | "Summarize themes", "Compare approaches", "What patterns emerge?" |
| 🔧 Maintain | Vault health & organization | 0.1 | "Check broken links", "Find duplicates", "Organize my vault" |
| 💬 Chat | Casual conversation & greetings | 0.7 | "Hello", "How are you?", "Thanks", "Good morning" |
graph TD
A[Natural Language Input] --> B[Intent Detection]
B --> C{Pattern + LLM Analysis}
C --> D[Context Pyramid Builder]
D --> E[Engine Selection & Routing]
E --> F[Specialized Processing]
F --> G[Response with Source Citations]
subgraph "Context Building"
D --> H[Mentions - Highest Priority]
D --> I[Current Document]
D --> J[Linked Notes]
D --> K[Semantic Similarity]
D --> L[Recent Files]
D --> M[Tagged Content]
end
subgraph "Engine Processing"
F --> N[Understand Engine]
F --> O[Navigate Engine]
F --> P[Transform Engine]
F --> Q[Synthesize Engine]
F --> R[Maintain Engine]
F --> S[Chat Engine]
end
// Sophisticated context control through mentions
interface MentionSystem {
file: '@filename.md' // Specific file reference
folder: '@folder/' // All files in folder
tag: '@#tag' // Files with specific tag
temporal: '@recent' // Recently modified files
active: '@current' // Currently active file
scope: '@all' // Entire vault
multiple: '@file1.md,file2.md,file3.md' // Multiple files
}// Natural language with precise context control
"@research-notes.md What were my key conclusions about AI safety?"
"@weekly-reviews/ What patterns emerge in my productivity over time?"
"@#meeting-notes What action items do we still have pending?"
"@project-docs/ Compare the different approaches we've discussed"
"/current Tell me about the document I'm currently editing"Technology Stack:
- FastAPI: High-performance async web framework
- LangGraph: Graph-based workflow orchestration
- PostgreSQL + Weaviate: Hybrid storage architecture
- SQLAlchemy: Advanced ORM with async support
- Pydantic: Data validation and serialization
Core Architecture:
# Intelligence System with Specialized Engines
class IntelligenceSystem:
def __init__(self):
self.intent_detector = IntentDetector() # Pattern + LLM analysis
self.context_engine = ContextEngine() # Context pyramid builder
self.engines = {
'understand': UnderstandEngine(),
'navigate': NavigateEngine(),
'transform': TransformEngine(),
'synthesize': SynthesizeEngine(),
'maintain': MaintainEngine(),
'chat': ChatEngine()
}
async def process_query(self, message: str) -> Response:
intent = await self.intent_detector.detect(message)
context = await self.context_engine.build_pyramid(message, intent)
engine = self.engines[intent.intent_type]
return await engine.process(message, intent, context)Key Features:
- Universal LLM Router: Supports OpenAI, Anthropic, and local models
- Dynamic Token Allocation: Calculates limits based on model capabilities
- Configuration-Driven: Extensive YAML configuration system
- Document Processing Pipeline: Multi-format support with intelligent chunking
- Hybrid Storage: PostgreSQL + vector search for optimal performance
Technology Stack:
- React 18: Modern functional components with hooks
- TypeScript: Type-safe development with comprehensive interfaces
- ESBuild: Fast bundling and development workflow
- Obsidian API: Deep integration with Obsidian's ecosystem
Core Architecture:
// Clean Architecture with React Components
class NotebookLocalPlugin extends Plugin {
async onload() {
this.apiClient = new ApiClient()
this.intelligenceController = new IntelligenceController()
this.universalProcessor = new UniversalFileProcessor()
this.fileWatcher = new UniversalFileWatcher()
// Register main view
this.registerView(CHAT_VIEWTYPE, (leaf) =>
new NotebookLocalView(leaf, this)
)
}
}
// Main UI Component with Tabbed Interface
function NotebookLocalView() {
const [currentTab, setCurrentTab] = useState('chat')
const [messages, setMessages] = useState([])
const [ragContext, setRagContext] = useState(null)
return (
<div className="notebook-local-view">
<TabNavigation currentTab={currentTab} onTabChange={setCurrentTab} />
{currentTab === 'chat' && <ChatInterface />}
{currentTab === 'context' && <ContextPreview />}
{currentTab === 'files' && <FileManager />}
</div>
)
}Key Features:
- Three-Tab Interface: Chat, Context Preview, and File Management
- Real-time Streaming: Server-sent events for live AI responses
- Command-Aware Input: Real-time parsing and syntax highlighting
- File Processing: Universal support for Markdown, PDF, DOCX, TXT
- Background Synchronization: Seamless state sync with backend
The Context Pyramid intelligently assembles relevant information with sophisticated ranking:
class ContextPyramid:
def build_context(self, message: str, mentions: List[str]) -> ContextData:
layers = {
0: self.get_mentioned_files(mentions), # Highest priority (60% tokens)
1: self.get_current_document(), # High priority
2: self.get_linked_documents(), # Medium-high priority
3: self.get_semantically_similar(), # Medium priority (vector search)
4: self.get_recent_files(), # Temporal context
5: self.get_tagged_content() # Topic-based context
}
return self.assemble_with_token_budget(layers)class LLMRouter:
def __init__(self):
self.adapters = {
'openai': OpenAIAdapter(),
'anthropic': AnthropicAdapter(),
'qwen': QwenAdapter()
}
self.routing_config = load_routing_config()
async def route(self, request: ChatRequest) -> ChatResponse:
# Dynamic model selection based on:
# - Content type (text, vision)
# - Model capabilities
# - Configuration preferences
# - Token requirements
adapter = self.select_optimal_adapter(request)
return await adapter.process(request)# Graph-based document processing
workflow = StateGraph(ProcessingState)
workflow.add_node("extract", extract_content) # PDF/DOCX → text
workflow.add_node("process_images", process_images) # Vision model descriptions
workflow.add_node("chunk", intelligent_chunking) # Context-aware segmentation
workflow.add_node("embed", generate_embeddings) # Vector embeddings
workflow.add_node("store", hybrid_storage) # PostgreSQL + Weaviate
workflow.add_edge("extract", "process_images")
workflow.add_edge("process_images", "chunk")
workflow.add_edge("chunk", "embed")
workflow.add_edge("embed", "store")
compiled_workflow = workflow.compile()Our system uses a standardized port allocation for modular service architecture:
- 8000: FastAPI Server (Main API Gateway & Intelligence System)
- 8001: LLM Service (Language Model Inference)
- 8002: Vision Service (Vision Model Processing)
- 8003: Embedding Service (Text Embedding Generation)
This port structure enables independent scaling, resource management, and monitoring of each service type.
We are currently conducting a major architectural test by transitioning from OpenAI API to local model inference. This represents a significant validation of our modular system design and GPU resource management capabilities.
Based on inference-server/configs/routing.yaml:
- Qwen Language Models:
Qwen3-14B-Instruct-bnb-4bit(Primary chat model)Qwen2.5-VL-7B-Instruct-unsloth-bnb-4bit(Vision-language model)Qwen3-embedding-0.6B(Embedding generation)
- BLIP2 Vision:
blip2-opt-2.7b(Vision understanding)
- Modular System Validation: Verify our adapter-based architecture works seamlessly with local models
- GPU Resource Management: Test concurrent model serving and memory allocation strategies
- Performance Benchmarking: Compare local inference latency/throughput vs OpenAI API
- System Reliability: Stress test local inference under various load conditions
- Resource Optimization: Fine-tune GPU memory usage across different model types
This transition validates our core design principles:
- Adapter Pattern: Each model type uses dedicated adapters (
qwen,blip2) - Resource Isolation: Independent services prevent resource conflicts
- Configuration-Driven: Model selection via YAML without code changes
- Scalability: Horizontal scaling of model-specific services
This testing phase is crucial for understanding how our modular system handles local GPU resources efficiently and whether our abstraction layers can support both cloud and local inference seamlessly.
Due to GPU memory limitations (24GB), we implement dynamic model loading with two vLLM configurations:
# Active models when text processing is needed
chat_default: Qwen3-14B-Instruct-bnb-4bit # Port 8001 (LLM Service)
embedding_default: Qwen3-embedding-0.6B # Port 8003 (Embedding Service)
vision_default: [unloaded] # Vision service inactive# Active models when vision processing is needed
chat_default: [unloaded] # Chat service inactive
embedding_default: [unloaded] # Embedding service inactive
vision_default: blip2-opt-2.7b # Port 8002 (Vision Service)- On text/chat requests: Load Configuration A (Qwen chat + embedding models)
- On vision requests: Unload text models → Load Configuration B (BLIP2 vision model)
- Memory optimization: Only load required models to stay within 24GB limit
- Service coordination: Port-based services enable clean model swapping
This approach validates our modular architecture's ability to handle resource-constrained environments with fully local inference while maintaining service availability through intelligent model lifecycle management.
Backend Requirements:
- Python 3.8+ with pip
- PostgreSQL 12+ (local or remote)
- Optional: Weaviate for enhanced vector search
- OpenAI API key (or other LLM provider)
Frontend Requirements:
- Obsidian with community plugins enabled
- Node.js 16+ with npm (for development)
# 1. Clone repository
git clone <repository-url>
cd 26th-summer-NotebookLocal
# 2. Setup backend
cd inference-server
python -m venv venv && source venv/bin/activate # Linux/Mac
pip install -r requirements.txt
# 3. Database setup
createdb notebooklocal
alembic upgrade head
# 4. Configure environment
cp .env.example .env
# Edit .env with your API keys and database URL
# 5. Build and install plugin
cd ../notebook-local
npm install && npm run build
# 6. Install plugin in Obsidian
cp -r dist/* /path/to/vault/.obsidian/plugins/notebook-local/
# 7. Start the system
cd ../inference-server
uvicorn src.main:app --host 0.0.0.0 --port 8000
# Enable plugin in Obsidian: Settings → Community plugins → "NotebookLocal"- Open NotebookLocal: Click the plugin icon or use command palette
- Start naturally: Just ask questions about your vault
- Use @mentions: Reference specific files with
@filename.md - Explore tabs: Check Context and Files tabs for system state
Example first interaction:
"Hello! What can you help me with regarding my vault?"
"@important-notes.md What are the key points I should remember?"
"Find notes about machine learning from last month"
// Instead of commands, natural conversation:
User: "What are the main themes in my research notes?"
System: [SYNTHESIZE intent detected] → Analyzes patterns across research files
User: "@meeting-notes.md What action items do we have?"
System: [UNDERSTAND intent] → Focuses on meeting-notes.md specifically
User: "Make this explanation clearer"
System: [TRANSFORM intent] → Improves current document content
User: "Find related notes about this topic"
System: [NAVIGATE intent] → Discovers connected content// Precise context specification
"@research/ @#ai-safety What are the key risks we've identified?"
// → Uses research folder + files tagged with ai-safety
"@file1.md,file2.md,file3.md Compare the approaches in these documents"
// → Focuses specifically on the three mentioned files
"@recent What patterns emerge from my recent writing?"
// → Analyzes recently modified files for trendsBackend Performance:
- Async Architecture: Full async/await throughout FastAPI
- Connection Pooling: Efficient database connection management
- Background Processing: Non-blocking document processing
- Caching: Intelligent response and embedding caching
Frontend Optimization:
- Lazy Loading: Components loaded on-demand
- Virtual Scrolling: Efficient large file list handling
- Debounced Updates: Optimized file watching and input
- Memory Management: Automatic cleanup of unused resources
Data Privacy:
- Local-First: Processing on your infrastructure
- No Long-term Storage: Server doesn't retain personal data
- Configurable Models: Choose between cloud and local AI
- Encrypted Communication: HTTPS for all API calls
Input Validation:
- Command Sanitization: All inputs validated and sanitized
- File Type Restrictions: Only safe file types processed
- Path Validation: Prevention of directory traversal
- Size Limits: Protection against DoS via large files
# configs/routing.yaml
intelligence:
engines:
understand: { temperature: 0.3 }
navigate: { temperature: 0.4 }
transform: { temperature: 0.3 }
synthesize: { temperature: 0.2 }
maintain: { temperature: 0.1 }
chat: { temperature: 0.7 }
token_allocation:
context_window_ratio: 0.6
engine_ratios:
understand: 0.15
navigate: 0.20
transform: 0.10
synthesize: 0.20
maintain: 0.15
chat: 0.20interface PluginSettings {
serverUrl: string // Backend server URL
enableStreaming: boolean // Real-time response streaming
autoProcessFiles: boolean // Automatic file processing
supportedExtensions: string[] // File types to process
debounceDelay: number // File change detection delay
maxConcurrentProcessing: number // Processing parallelism
}# 1. Create new engine
class CustomEngine(BaseEngine):
async def process(self, message: str, intent: DetectedIntent,
context: ContextPyramid) -> EngineResponse:
# Custom processing logic
pass
# 2. Add intent patterns
INTENT_PATTERNS = {
IntentType.CUSTOM: [
r'\b(custom|special|specific)\b',
# Add patterns for detection
]
}
# 3. Register in routing system
engines = {
'custom': CustomEngine(llm_router, 'custom')
}// Add new UI components
interface ExtensionAPI {
addCommand(command: Command): void
registerView(viewType: string, viewCreator: ViewCreator): void
addSettingsTab(tab: SettingsTab): void
addStatusBarItem(): HTMLElement
}
// Custom processors
class CustomFileProcessor {
canProcess(file: TFile): boolean
async process(file: TFile): Promise<ProcessedContent>
}Connection Problems:
# Check server status
curl http://localhost:8000/health
# Verify plugin settings
# Obsidian → Settings → NotebookLocal → Server URLFile Processing Issues:
# Check file processing status
# Files tab in NotebookLocal interface
# Look for error indicators (🔴) and detailsIntent Detection Problems:
// Check intent confidence in debug mode
// Enable debug logging in browser console
localStorage.setItem('notebook-local-debug', 'true')Backend Optimization:
- Adjust
MAX_CONCURRENT_PROCESSINGin environment - Tune PostgreSQL connection pool settings
- Configure Weaviate memory limits
- Monitor token usage and adjust allocation ratios
Plugin Optimization:
- Reduce
debounceDelayfor faster response - Limit
supportedExtensionsto required types - Adjust
maxConcurrentProcessingfor system capabilities
Every component built around understanding user intent and providing contextually relevant responses.
Streaming responses, live file status updates, and automatic synchronization provide immediate feedback.
@mentions + natural language eliminate command memorization and provide intuitive control.
Comprehensive error handling, logging, monitoring, and scalability considerations.
Extensive configuration without code changes, from model selection to processing behavior.
🧠 Context-Aware Intelligence: Understands why you're asking, not just what 🎯 Natural Language Interface: Talk to your notes like talking to a research assistant ⚡ Real-time Processing: Immediate feedback and streaming responses 🔧 Sophisticated Architecture: Production-ready with enterprise-grade reliability 📚 Vault-Native Understanding: Deep integration with your existing knowledge structure
NotebookLocal transforms knowledge management from a search-and-retrieve paradigm to an intelligent conversation with your accumulated wisdom.
- 🐍 Inference Server - Backend architecture, intelligence engines, and API documentation
- 🧩 Obsidian Plugin - Frontend architecture, React components, and plugin development
For a complete technical pipeline diagram and implementation details, see the comprehensive system flow documentation.