NotebookLocal - Intelligent Knowledge Assistant

A context-aware AI system that transforms your Obsidian vault into an intelligent assistant through natural language interaction. Built with a sophisticated service-oriented architecture combining a FastAPI intelligence backend with an advanced React-based Obsidian plugin frontend.

Core Philosophy: Transform knowledge work from command-driven to conversation-driven through sophisticated document understanding, intent detection, and natural language processing.

🏗️ System Architecture Overview

NotebookLocal implements a distributed intelligence architecture with clear separation between backend processing and frontend user experience, enabling sophisticated AI-powered knowledge management.

┌─────────────────────────────────────────────────────────────────┐
│                     NotebookLocal System                        │
├─────────────────────────────────────────────────────────────────┤
│                        Frontend Layer                          │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              Obsidian Plugin                            │   │
│  │  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐    │   │
│  │  │    React     │ │ Intelligence │ │  Universal   │    │   │
│  │  │ Components   │ │ Controller   │ │ Processor    │    │   │
│  │  └──────────────┘ └──────────────┘ └──────────────┘    │   │
│  └─────────────────────────────────────────────────────────┘   │
├─────────────────────────────────────────────────────────────────┤
│                      Communication Layer                       │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │           RESTful API (14+ endpoints)                   │   │
│  │     HTTP/JSON + Server-Sent Events (Streaming)          │   │
│  └─────────────────────────────────────────────────────────┘   │
├─────────────────────────────────────────────────────────────────┤
│                       Backend Layer                            │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              Inference Server                           │   │
│  │  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐    │   │
│  │  │ Intelligence │ │   Universal  │ │     LLM      │    │   │
│  │  │   System     │ │   Processor  │ │   Router     │    │   │
│  │  │ (6 Engines)  │ │  (Multi-fmt) │ │ (Multi-LLM)  │    │   │
│  │  └──────────────┘ └──────────────┘ └──────────────┘    │   │
│  └─────────────────────────────────────────────────────────┘   │
├─────────────────────────────────────────────────────────────────┤
│                        Data Layer                              │
│  ┌──────────────────────────────┐ ┌─────────────────────────┐  │
│  │        PostgreSQL            │ │       Weaviate         │  │
│  │    (Metadata & Chunks)       │ │   (Vector Embeddings)  │  │
│  └──────────────────────────────┘ └─────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

🧠 Intelligence System: Six Core Engines

The system models six fundamental ways humans interact with knowledge, each optimized for specific reasoning tasks:

Engine	Purpose	Temperature	Example Queries
🤔 Understand	Question answering & comprehension	0.3	"What did I conclude about X?", "Explain this concept from my notes"
🧭 Navigate	Knowledge discovery & connections	0.4	"Find related notes", "Show me everything about Y", "What should I read next?"
✏️ Transform	Content editing & restructuring	0.3	"Make this clearer", "Rewrite professionally", "Restructure this section"
🔍 Synthesize	Pattern analysis & insights	0.2	"Summarize themes", "Compare approaches", "What patterns emerge?"
🔧 Maintain	Vault health & organization	0.1	"Check broken links", "Find duplicates", "Organize my vault"
💬 Chat	Casual conversation & greetings	0.7	"Hello", "How are you?", "Thanks", "Good morning"

Intelligence Processing Pipeline

graph TD
    A[Natural Language Input] --> B[Intent Detection]
    B --> C{Pattern + LLM Analysis}
    C --> D[Context Pyramid Builder]
    D --> E[Engine Selection & Routing]
    E --> F[Specialized Processing]
    F --> G[Response with Source Citations]
    
    subgraph "Context Building"
        D --> H[Mentions - Highest Priority]
        D --> I[Current Document]
        D --> J[Linked Notes]
        D --> K[Semantic Similarity]
        D --> L[Recent Files]
        D --> M[Tagged Content]
    end
    
    subgraph "Engine Processing"
        F --> N[Understand Engine]
        F --> O[Navigate Engine]
        F --> P[Transform Engine]
        F --> Q[Synthesize Engine]
        F --> R[Maintain Engine]
        F --> S[Chat Engine]
    end

🎯 Natural Language Interface

Advanced @Mention System

// Sophisticated context control through mentions
interface MentionSystem {
  file: '@filename.md'        // Specific file reference
  folder: '@folder/'          // All files in folder
  tag: '@#tag'               // Files with specific tag
  temporal: '@recent'         // Recently modified files
  active: '@current'          // Currently active file
  scope: '@all'              // Entire vault
  multiple: '@file1.md,file2.md,file3.md'  // Multiple files
}

Example Interactions

// Natural language with precise context control
"@research-notes.md What were my key conclusions about AI safety?"
"@weekly-reviews/ What patterns emerge in my productivity over time?"
"@#meeting-notes What action items do we still have pending?"
"@project-docs/ Compare the different approaches we've discussed"
"/current Tell me about the document I'm currently editing"

📦 System Components

🐍 Inference Server (Backend)

Technology Stack:

FastAPI: High-performance async web framework
LangGraph: Graph-based workflow orchestration
PostgreSQL + Weaviate: Hybrid storage architecture
SQLAlchemy: Advanced ORM with async support
Pydantic: Data validation and serialization

Core Architecture:

# Intelligence System with Specialized Engines
class IntelligenceSystem:
    def __init__(self):
        self.intent_detector = IntentDetector()  # Pattern + LLM analysis
        self.context_engine = ContextEngine()   # Context pyramid builder
        self.engines = {
            'understand': UnderstandEngine(),
            'navigate': NavigateEngine(),
            'transform': TransformEngine(),
            'synthesize': SynthesizeEngine(),
            'maintain': MaintainEngine(),
            'chat': ChatEngine()
        }
        
    async def process_query(self, message: str) -> Response:
        intent = await self.intent_detector.detect(message)
        context = await self.context_engine.build_pyramid(message, intent)
        engine = self.engines[intent.intent_type]
        return await engine.process(message, intent, context)

Key Features:

Universal LLM Router: Supports OpenAI, Anthropic, and local models
Dynamic Token Allocation: Calculates limits based on model capabilities
Configuration-Driven: Extensive YAML configuration system
Document Processing Pipeline: Multi-format support with intelligent chunking
Hybrid Storage: PostgreSQL + vector search for optimal performance

🧩 Obsidian Plugin (Frontend)

Technology Stack:

React 18: Modern functional components with hooks
TypeScript: Type-safe development with comprehensive interfaces
ESBuild: Fast bundling and development workflow
Obsidian API: Deep integration with Obsidian's ecosystem

Core Architecture:

// Clean Architecture with React Components
class NotebookLocalPlugin extends Plugin {
    async onload() {
        this.apiClient = new ApiClient()
        this.intelligenceController = new IntelligenceController()
        this.universalProcessor = new UniversalFileProcessor()
        this.fileWatcher = new UniversalFileWatcher()
        
        // Register main view
        this.registerView(CHAT_VIEWTYPE, (leaf) => 
            new NotebookLocalView(leaf, this)
        )
    }
}

// Main UI Component with Tabbed Interface
function NotebookLocalView() {
    const [currentTab, setCurrentTab] = useState('chat')
    const [messages, setMessages] = useState([])
    const [ragContext, setRagContext] = useState(null)
    
    return (
        <div className="notebook-local-view">
            <TabNavigation currentTab={currentTab} onTabChange={setCurrentTab} />
            {currentTab === 'chat' && <ChatInterface />}
            {currentTab === 'context' && <ContextPreview />}
            {currentTab === 'files' && <FileManager />}
        </div>
    )
}

Key Features:

Three-Tab Interface: Chat, Context Preview, and File Management
Real-time Streaming: Server-sent events for live AI responses
Command-Aware Input: Real-time parsing and syntax highlighting
File Processing: Universal support for Markdown, PDF, DOCX, TXT
Background Synchronization: Seamless state sync with backend

🔧 Technical Deep Dive

Context Pyramid System

The Context Pyramid intelligently assembles relevant information with sophisticated ranking:

class ContextPyramid:
    def build_context(self, message: str, mentions: List[str]) -> ContextData:
        layers = {
            0: self.get_mentioned_files(mentions),      # Highest priority (60% tokens)
            1: self.get_current_document(),             # High priority
            2: self.get_linked_documents(),             # Medium-high priority
            3: self.get_semantically_similar(),         # Medium priority (vector search)
            4: self.get_recent_files(),                 # Temporal context
            5: self.get_tagged_content()                # Topic-based context
        }
        
        return self.assemble_with_token_budget(layers)

Universal LLM Router

class LLMRouter:
    def __init__(self):
        self.adapters = {
            'openai': OpenAIAdapter(),
            'anthropic': AnthropicAdapter(),
            'qwen': QwenAdapter()
        }
        self.routing_config = load_routing_config()
    
    async def route(self, request: ChatRequest) -> ChatResponse:
        # Dynamic model selection based on:
        # - Content type (text, vision)
        # - Model capabilities
        # - Configuration preferences
        # - Token requirements
        
        adapter = self.select_optimal_adapter(request)
        return await adapter.process(request)

Document Processing Workflow (LangGraph)

# Graph-based document processing
workflow = StateGraph(ProcessingState)
workflow.add_node("extract", extract_content)        # PDF/DOCX → text
workflow.add_node("process_images", process_images)  # Vision model descriptions
workflow.add_node("chunk", intelligent_chunking)     # Context-aware segmentation
workflow.add_node("embed", generate_embeddings)      # Vector embeddings
workflow.add_node("store", hybrid_storage)          # PostgreSQL + Weaviate

workflow.add_edge("extract", "process_images")
workflow.add_edge("process_images", "chunk")
workflow.add_edge("chunk", "embed")
workflow.add_edge("embed", "store")

compiled_workflow = workflow.compile()

🏗️ Port Convention & Infrastructure

Our system uses a standardized port allocation for modular service architecture:

8000: FastAPI Server (Main API Gateway & Intelligence System)
8001: LLM Service (Language Model Inference)
8002: Vision Service (Vision Model Processing)
8003: Embedding Service (Text Embedding Generation)

This port structure enables independent scaling, resource management, and monitoring of each service type.

🧪 Local Model Testing Initiative

We are currently conducting a major architectural test by transitioning from OpenAI API to local model inference. This represents a significant validation of our modular system design and GPU resource management capabilities.

Current Local Models in Production

Based on inference-server/configs/routing.yaml:

Qwen Language Models:
- Qwen3-14B-Instruct-bnb-4bit (Primary chat model)
- Qwen2.5-VL-7B-Instruct-unsloth-bnb-4bit (Vision-language model)
- Qwen3-embedding-0.6B (Embedding generation)
BLIP2 Vision: blip2-opt-2.7b (Vision understanding)

Testing Objectives

Modular System Validation: Verify our adapter-based architecture works seamlessly with local models
GPU Resource Management: Test concurrent model serving and memory allocation strategies
Performance Benchmarking: Compare local inference latency/throughput vs OpenAI API
System Reliability: Stress test local inference under various load conditions
Resource Optimization: Fine-tune GPU memory usage across different model types

Architecture Impact

This transition validates our core design principles:

Adapter Pattern: Each model type uses dedicated adapters (qwen, blip2)
Resource Isolation: Independent services prevent resource conflicts
Configuration-Driven: Model selection via YAML without code changes
Scalability: Horizontal scaling of model-specific services

This testing phase is crucial for understanding how our modular system handles local GPU resources efficiently and whether our abstraction layers can support both cloud and local inference seamlessly.

GPU Memory Management Strategy (24GB Constraint)

Due to GPU memory limitations (24GB), we implement dynamic model loading with two vLLM configurations:

Configuration A: Chat + Embedding Services (Local Models)

# Active models when text processing is needed
chat_default: Qwen3-14B-Instruct-bnb-4bit    # Port 8001 (LLM Service)
embedding_default: Qwen3-embedding-0.6B      # Port 8003 (Embedding Service)
vision_default: [unloaded]                   # Vision service inactive

Configuration B: Vision Service Only

# Active models when vision processing is needed  
chat_default: [unloaded]                     # Chat service inactive
embedding_default: [unloaded]               # Embedding service inactive
vision_default: blip2-opt-2.7b              # Port 8002 (Vision Service)

Dynamic Swapping Logic

On text/chat requests: Load Configuration A (Qwen chat + embedding models)
On vision requests: Unload text models → Load Configuration B (BLIP2 vision model)
Memory optimization: Only load required models to stay within 24GB limit
Service coordination: Port-based services enable clean model swapping

This approach validates our modular architecture's ability to handle resource-constrained environments with fully local inference while maintaining service availability through intelligent model lifecycle management.

🚀 Getting Started

Prerequisites

Backend Requirements:

Python 3.8+ with pip
PostgreSQL 12+ (local or remote)
Optional: Weaviate for enhanced vector search
OpenAI API key (or other LLM provider)

Frontend Requirements:

Obsidian with community plugins enabled
Node.js 16+ with npm (for development)

Quick Installation

# 1. Clone repository
git clone <repository-url>
cd 26th-summer-NotebookLocal

# 2. Setup backend
cd inference-server
python -m venv venv && source venv/bin/activate  # Linux/Mac
pip install -r requirements.txt

# 3. Database setup
createdb notebooklocal
alembic upgrade head

# 4. Configure environment
cp .env.example .env
# Edit .env with your API keys and database URL

# 5. Build and install plugin
cd ../notebook-local
npm install && npm run build

# 6. Install plugin in Obsidian
cp -r dist/* /path/to/vault/.obsidian/plugins/notebook-local/

# 7. Start the system
cd ../inference-server
uvicorn src.main:app --host 0.0.0.0 --port 8000
# Enable plugin in Obsidian: Settings → Community plugins → "NotebookLocal"

First Usage

Open NotebookLocal: Click the plugin icon or use command palette
Start naturally: Just ask questions about your vault
Use @mentions: Reference specific files with @filename.md
Explore tabs: Check Context and Files tabs for system state

Example first interaction:
"Hello! What can you help me with regarding my vault?"

"@important-notes.md What are the key points I should remember?"

"Find notes about machine learning from last month"

🎮 Usage Examples

Natural Conversation Flow

// Instead of commands, natural conversation:
User: "What are the main themes in my research notes?"
System: [SYNTHESIZE intent detected] → Analyzes patterns across research files

User: "@meeting-notes.md What action items do we have?"
System: [UNDERSTAND intent] → Focuses on meeting-notes.md specifically

User: "Make this explanation clearer"
System: [TRANSFORM intent] → Improves current document content

User: "Find related notes about this topic"
System: [NAVIGATE intent] → Discovers connected content

Advanced Context Control

// Precise context specification
"@research/ @#ai-safety What are the key risks we've identified?"
// → Uses research folder + files tagged with ai-safety

"@file1.md,file2.md,file3.md Compare the approaches in these documents"
// → Focuses specifically on the three mentioned files

"@recent What patterns emerge from my recent writing?"
// → Analyzes recently modified files for trends

📊 System Capabilities

Performance & Scalability

Backend Performance:

Async Architecture: Full async/await throughout FastAPI
Connection Pooling: Efficient database connection management
Background Processing: Non-blocking document processing
Caching: Intelligent response and embedding caching

Frontend Optimization:

Lazy Loading: Components loaded on-demand
Virtual Scrolling: Efficient large file list handling
Debounced Updates: Optimized file watching and input
Memory Management: Automatic cleanup of unused resources

Security & Privacy

Data Privacy:

Local-First: Processing on your infrastructure
No Long-term Storage: Server doesn't retain personal data
Configurable Models: Choose between cloud and local AI
Encrypted Communication: HTTPS for all API calls

Input Validation:

Command Sanitization: All inputs validated and sanitized
File Type Restrictions: Only safe file types processed
Path Validation: Prevention of directory traversal
Size Limits: Protection against DoS via large files

🔧 Configuration & Customization

Server Configuration

# configs/routing.yaml
intelligence:
  engines:
    understand: { temperature: 0.3 }
    navigate: { temperature: 0.4 }
    transform: { temperature: 0.3 }
    synthesize: { temperature: 0.2 }
    maintain: { temperature: 0.1 }
    chat: { temperature: 0.7 }
  
  token_allocation:
    context_window_ratio: 0.6
    engine_ratios:
      understand: 0.15
      navigate: 0.20
      transform: 0.10
      synthesize: 0.20
      maintain: 0.15
      chat: 0.20

Plugin Settings

interface PluginSettings {
  serverUrl: string                    // Backend server URL
  enableStreaming: boolean             // Real-time response streaming
  autoProcessFiles: boolean            // Automatic file processing
  supportedExtensions: string[]        // File types to process
  debounceDelay: number               // File change detection delay
  maxConcurrentProcessing: number     // Processing parallelism
}

🛠️ Development & Extension

Adding New Intelligence Engines

# 1. Create new engine
class CustomEngine(BaseEngine):
    async def process(self, message: str, intent: DetectedIntent, 
                     context: ContextPyramid) -> EngineResponse:
        # Custom processing logic
        pass

# 2. Add intent patterns
INTENT_PATTERNS = {
    IntentType.CUSTOM: [
        r'\b(custom|special|specific)\b',
        # Add patterns for detection
    ]
}

# 3. Register in routing system
engines = {
    'custom': CustomEngine(llm_router, 'custom')
}

Plugin Extension Points

// Add new UI components
interface ExtensionAPI {
  addCommand(command: Command): void
  registerView(viewType: string, viewCreator: ViewCreator): void
  addSettingsTab(tab: SettingsTab): void
  addStatusBarItem(): HTMLElement
}

// Custom processors
class CustomFileProcessor {
  canProcess(file: TFile): boolean
  async process(file: TFile): Promise<ProcessedContent>
}

🐛 Troubleshooting

Common Issues

Connection Problems:

# Check server status
curl http://localhost:8000/health

# Verify plugin settings
# Obsidian → Settings → NotebookLocal → Server URL

File Processing Issues:

# Check file processing status
# Files tab in NotebookLocal interface
# Look for error indicators (🔴) and details

Intent Detection Problems:

// Check intent confidence in debug mode
// Enable debug logging in browser console
localStorage.setItem('notebook-local-debug', 'true')

Performance Optimization

Backend Optimization:

Adjust MAX_CONCURRENT_PROCESSING in environment
Tune PostgreSQL connection pool settings
Configure Weaviate memory limits
Monitor token usage and adjust allocation ratios

Plugin Optimization:

Reduce debounceDelay for faster response
Limit supportedExtensions to required types
Adjust maxConcurrentProcessing for system capabilities

🎯 Architecture Benefits

🧠 Intelligence-First Design

Every component built around understanding user intent and providing contextually relevant responses.

🔄 Real-time Responsiveness

Streaming responses, live file status updates, and automatic synchronization provide immediate feedback.

🎯 Natural Interaction

@mentions + natural language eliminate command memorization and provide intuitive control.

🛡️ Production-Ready Architecture

Comprehensive error handling, logging, monitoring, and scalability considerations.

🔧 Highly Configurable

Extensive configuration without code changes, from model selection to processing behavior.

🏆 Why NotebookLocal Excels

🧠 Context-Aware Intelligence: Understands why you're asking, not just what 🎯 Natural Language Interface: Talk to your notes like talking to a research assistant ⚡ Real-time Processing: Immediate feedback and streaming responses 🔧 Sophisticated Architecture: Production-ready with enterprise-grade reliability 📚 Vault-Native Understanding: Deep integration with your existing knowledge structure

NotebookLocal transforms knowledge management from a search-and-retrieve paradigm to an intelligent conversation with your accumulated wisdom.

Detailed Documentation

🐍 Inference Server - Backend architecture, intelligence engines, and API documentation
🧩 Obsidian Plugin - Frontend architecture, React components, and plugin development

System Pipeline

For a complete technical pipeline diagram and implementation details, see the comprehensive system flow documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
inference-server		inference-server
notebook-local		notebook-local
.gitignore		.gitignore
How-to-Use-Universal-Processing.md		How-to-Use-Universal-Processing.md
Migration-Strategy.md		Migration-Strategy.md
PIPELINE.md		PIPELINE.md
README.md		README.md
SYSTEM-PIPELINE.md		SYSTEM-PIPELINE.md
TESTING_GUIDE.md		TESTING_GUIDE.md
Universal-File-Processing.md		Universal-File-Processing.md

YBIGTA/26th-summer-NotebookLocal

Folders and files

Latest commit

History

Repository files navigation