ドキュメントを知識に、知識を価値に変える日本語特化型インテリジェンス・プラットフォーム
Transform documents into knowledge, knowledge into value - Japanese-optimized Intelligence Platform
Oboyu (覚ゆ - "to remember" in ancient Japanese) is a comprehensive Knowledge Intelligence Platform that transforms your documents into actionable insights. Going beyond traditional RAG (Retrieval-Augmented Generation), Oboyu combines advanced semantic search, knowledge graph generation, and AI-powered data enrichment to unlock the full potential of your information assets.
While most solutions stop at document retrieval, Oboyu creates a living knowledge ecosystem:
- Knowledge Graph Generation: Automatically extracts entities, relationships, and concepts from your documents
- GraphRAG Search: Leverages knowledge graphs for deeper, more contextual search results
- Data Enrichment: Enhances CSV files and structured data with insights from your knowledge base
- Multi-dimensional Intelligence: Combines vector search, graph traversal, and semantic analysis
- 🧠 Knowledge Intelligence: Automatically generates knowledge graphs and extracts insights from your documents
- 📊 Data Enrichment: Enhances CSV files and structured data with AI-powered content from your knowledge base
- 🚀 Lightning Fast: Indexes thousands of documents in seconds, searches in milliseconds with GraphRAG acceleration
- 🎯 Beyond Accurate: Multi-layered search combining semantic understanding, knowledge graphs, and contextual reasoning
- 🇯🇵 Japanese Excellence: Built specifically for Japanese business environments with automatic encoding detection
- 🔒 Enterprise Private: Everything runs locally - your sensitive documents never leave your infrastructure
- 🤖 AI-Native: Built-in MCP server for Claude, Cursor, and other AI assistants with GraphRAG capabilities
- Python 3.13 or higher (3.11+ supported)
- pip (latest version recommended)
- Operating System: Linux, macOS, or Windows with WSL
Linux (Ubuntu/Debian):
sudo apt-get install -y \
git \
curl \
build-essential \
cmake \
pkg-config \
libfreetype6-dev \
libfontconfig1-dev \
libjpeg-dev \
libpng-dev \
zlib1g-dev \
libssl-devLinux (CentOS/RHEL):
sudo yum install -y \
git \
curl \
gcc-c++ \
cmake \
pkg-config \
freetype-devel \
fontconfig-devel \
libjpeg-devel \
libpng-devel \
zlib-devel \
openssl-develmacOS:
# Install Xcode Command Line Tools
xcode-select --install
# Install additional dependencies via Homebrew
brew install cmake pkg-configGet up and running in under 5 minutes:
# Install Oboyu
pip install oboyu
# Index your documents
oboyu index ~/Documents
# Search your documents
oboyu search "your search term"That's it! See our Documentation for complete guides and examples.
- Automatic Knowledge Graph Generation: Extracts entities, relationships, and concepts from your documents
- GraphRAG Search: Leverages knowledge graphs for deeper, contextual search results
- Multi-dimensional Associations: Discovers hidden connections between documents and concepts
- Semantic Entity Recognition: Identifies and links key entities across your knowledge base
- Relationship Mapping: Automatically maps relationships between concepts, people, and ideas
- CSV Auto-Enhancement: Enriches CSV files with relevant information from your knowledge base
- Schema-Driven Processing: Uses JSON schema to define enrichment rules and data transformation
- Semantic Data Completion: Fills missing information using AI-powered content matching
- Business Value Creation: Transforms raw data into actionable business insights
- Batch Processing: Efficiently processes large datasets with configurable batch sizes
- Hybrid Search: Combines semantic understanding with keyword matching and graph traversal
- Multiple Search Modes: Vector search, keyword search, GraphRAG, and hybrid modes
- AI-Powered Reranking: Built-in reranker improves result accuracy and relevance
- Contextual Understanding: Uses knowledge graphs to provide more relevant results
- Flexible Output: Command-line search with JSON, plain text, and structured formats
- Rich Format Support: PDF, plain text (.txt), Markdown (.md), HTML (.html), and source code files
- PDF Intelligence: Advanced text extraction with metadata preservation and structure understanding
- Incremental Indexing: Only processes new or changed files for lightning-fast updates
- Smart Chunking: Intelligent document splitting optimized for knowledge extraction
- Automatic Encoding: Seamlessly handles UTF-8, Shift-JIS, EUC-JP, and other encodings
- Native Japanese Support: Purpose-built for Japanese business environments and content
- Automatic Encoding Detection: Handles legacy Japanese encodings (Shift-JIS, EUC-JP) automatically
- Specialized Language Models: Optimized embedding and processing models for Japanese text
- Mixed Language Intelligence: Seamlessly processes Japanese-English bilingual documents
- Business Context Understanding: Trained on Japanese business terminology and concepts
- ONNX Acceleration: 2-4x faster processing with automatic model optimization
- MCP Server Integration: Native support for Claude Desktop and AI coding assistants
- GraphRAG API: RESTful API for knowledge graph queries and data enrichment
- Rich CLI Interface: Beautiful terminal interface with real-time progress tracking
- Resource Efficient: Low memory footprint suitable for edge computing and local deployment
uv tool install oboyupip install oboyugit clone https://github.com/sonesuke/oboyu.git
cd oboyu
pip install -e .- Python: 3.13 or higher (3.11+ supported)
- OS: macOS, Linux (Windows via WSL)
- Memory: 2GB RAM minimum (4GB recommended)
- Storage: 1GB for models and index
- Build Tools: See system dependencies above if building from source
Note: Models are automatically downloaded on first use (~90MB). For installation from PyPI, most system dependencies are not required as we provide pre-built wheels.
# Index a directory
oboyu index ~/Documents/notes
# Search your documents
oboyu search "machine learning optimization techniques"
# Get results in JSON format for processing
oboyu search "machine learning" --format json# Build knowledge graph from your documents
oboyu build-kg
# Search using GraphRAG for deeper insights
oboyu search "project management methodologies" --mode graphrag
# Find related concepts and entities
oboyu search "agile development" --rerank --max-results 10Schema Configuration (enrichment_schema.json):
{
"input_schema": {
"columns": {
"company_name": {"type": "string", "description": "Company name"}
}
},
"enrichment_schema": {
"columns": {
"description": {
"type": "string",
"source_strategy": "search_content",
"query_template": "{company_name} company overview business model"
},
"industry": {
"type": "string",
"source_strategy": "search_content",
"query_template": "{company_name} industry sector business domain"
}
}
}
}Enrichment Commands:
# Enrich CSV with knowledge from your documents
oboyu enrich companies.csv enrichment_schema.json
# Custom output location and batch processing
oboyu enrich data.csv schema.json -o enriched_data.csv --batch-size 5
# Disable GraphRAG for faster processing
oboyu enrich simple_data.csv schema.json --no-graph# Index only specific file types
oboyu index ~/projects --include-patterns "*.md,*.txt,*.pdf"
# GraphRAG search with relationship traversal
oboyu search "API design patterns" --mode graphrag --confidence 0.7
# Hybrid search combining multiple approaches
oboyu search "microservices architecture" --mode hybrid --rerank
# Search with custom result limits and confidence
oboyu search "database optimization" --max-results 15 --confidence 0.6# Start MCP server with GraphRAG capabilities
oboyu mcp
# Or configure in Claude Desktop's settingsSee our MCP Integration Guide for detailed setup instructions.
- Installation - Install and verify setup
- Your First Index - Create your first searchable index
- Your First Search - Learn to search effectively
- Daily Workflows - Essential daily patterns
- Technical Documentation - Code and API docs
- Meeting Notes - Track decisions and actions
- Research Papers - Academic content search
- Configuration Guide - Customize for your needs
- Performance Tuning - Optimize speed and quality
- Japanese Support - Japanese language features
- Claude MCP Integration - AI-powered search
- CLI Reference - All commands and options
- Troubleshooting - Solutions to common issues
Learn about the cutting-edge technologies that power Oboyu's intelligence:
- 📚 Technology Stack Overview - Complete stack architecture and philosophy
- 🗄️ DuckDB: The Analytics Engine - Why DuckDB powers our knowledge intelligence
- 🤖 HuggingFace: Japanese AI Excellence - Specialized Japanese language models and embeddings
- 🔗 GraphRAG: Beyond Simple RAG - Graph-enhanced retrieval and knowledge understanding
- ⚡ ONNX: Optimization Without Compromise - 3x faster inference with maintained quality
- ⚖️ Our Decision Framework - How we evaluate and choose technologies
We believe in transparency and sharing our technical journey. These deep-dives include performance benchmarks, implementation insights, and honest assessments of alternatives.
Transform organizational documents into a searchable knowledge graph:
# Index company documents and build knowledge graph
oboyu index ~/company_docs --include "*.pdf,*.md,*.docx"
oboyu build-kg
# Search for strategic insights
oboyu search "competitive analysis market positioning" --mode graphragEnrich customer or product data with insights from your knowledge base:
# Enhance customer list with company information
oboyu enrich customers.csv customer_enrichment_schema.json
# Add product descriptions from documentation
oboyu enrich products.csv product_schema.json --batch-size 10Create a comprehensive research knowledge base:
# Index research papers and notes
oboyu index ~/research --include "*.pdf,*.md,*.txt"
oboyu build-kg
# Find related concepts and methodologies
oboyu search "neural network optimization techniques" --mode graphragMake your codebase and documentation more discoverable:
# Index code and documentation
oboyu index ~/projects/myapp --include "*.md,*.py,*.js,*.java"
# Find implementation patterns and examples
oboyu search "authentication middleware patterns" --rerankTransform meeting notes into actionable insights:
# Index meeting notes and decisions
oboyu index ~/meetings --include "*.md,*.txt"
# Search for decisions and action items
oboyu search "budget approval Q4 initiatives" --mode hybridPerfect for Japanese-English business environments:
# Index multilingual business documents
oboyu index ~/business_docs --include "*.pdf,*.md"
# Search across languages seamlessly
oboyu search "プロジェクト管理 project management methodology" --mode graphrag# Run fast tests (recommended for development)
uv run pytest -m "not slow"
# Run all tests with coverage
uv run pytest --cov=srcOboyu includes comprehensive E2E display testing using Claude Code SDK:
# Run all E2E display tests
python e2e/run_tests.py
# Run specific test category
python e2e/run_tests.py --test searchSee our Full Documentation for more details.
We welcome contributions! See our Contributing Guidelines for details.
# Quick start for contributors
git clone https://github.com/YOUR_USERNAME/oboyu.git
cd oboyu
uv sync
uv run pytest -m "not slow"- 📋 GitHub Issues - Report bugs or request features
- 📖 Documentation - Comprehensive guides and references
- 💬 Discussions - Ask questions and share ideas
This project is licensed under the MIT License - see the LICENSE.md file for details.
- The name "Oboyu" (覚ゆ) comes from ancient Japanese, meaning "to remember"
- Built with ❤️ for the Japanese business and NLP community
- Inspired by the goal of making knowledge accessible and actionable across languages
- Special thanks to the TinySwallow model for Japanese language understanding and knowledge extraction
- GraphRAG implementation inspired by Microsoft's GraphRAG research and methodology
Made with 🇯🇵 by sonesuke