Skip to content

sonesuke/oboyu

Oboyu (覚ゆ)

License: MIT Python Version PyPI Version

ドキュメントを知識に、知識を価値に変える日本語特化型インテリジェンス・プラットフォーム
Transform documents into knowledge, knowledge into value - Japanese-optimized Intelligence Platform

demo

What is Oboyu?

Oboyu (覚ゆ - "to remember" in ancient Japanese) is a comprehensive Knowledge Intelligence Platform that transforms your documents into actionable insights. Going beyond traditional RAG (Retrieval-Augmented Generation), Oboyu combines advanced semantic search, knowledge graph generation, and AI-powered data enrichment to unlock the full potential of your information assets.

Beyond Traditional RAG

While most solutions stop at document retrieval, Oboyu creates a living knowledge ecosystem:

  • Knowledge Graph Generation: Automatically extracts entities, relationships, and concepts from your documents
  • GraphRAG Search: Leverages knowledge graphs for deeper, more contextual search results
  • Data Enrichment: Enhances CSV files and structured data with insights from your knowledge base
  • Multi-dimensional Intelligence: Combines vector search, graph traversal, and semantic analysis

Why Oboyu?

  • 🧠 Knowledge Intelligence: Automatically generates knowledge graphs and extracts insights from your documents
  • 📊 Data Enrichment: Enhances CSV files and structured data with AI-powered content from your knowledge base
  • 🚀 Lightning Fast: Indexes thousands of documents in seconds, searches in milliseconds with GraphRAG acceleration
  • 🎯 Beyond Accurate: Multi-layered search combining semantic understanding, knowledge graphs, and contextual reasoning
  • 🇯🇵 Japanese Excellence: Built specifically for Japanese business environments with automatic encoding detection
  • 🔒 Enterprise Private: Everything runs locally - your sensitive documents never leave your infrastructure
  • 🤖 AI-Native: Built-in MCP server for Claude, Cursor, and other AI assistants with GraphRAG capabilities

Quick Start

Prerequisites

  • Python 3.13 or higher (3.11+ supported)
  • pip (latest version recommended)
  • Operating System: Linux, macOS, or Windows with WSL

System Dependencies (for building from source)

Linux (Ubuntu/Debian):

sudo apt-get install -y \
    git \
    curl \
    build-essential \
    cmake \
    pkg-config \
    libfreetype6-dev \
    libfontconfig1-dev \
    libjpeg-dev \
    libpng-dev \
    zlib1g-dev \
    libssl-dev

Linux (CentOS/RHEL):

sudo yum install -y \
    git \
    curl \
    gcc-c++ \
    cmake \
    pkg-config \
    freetype-devel \
    fontconfig-devel \
    libjpeg-devel \
    libpng-devel \
    zlib-devel \
    openssl-devel

macOS:

# Install Xcode Command Line Tools
xcode-select --install

# Install additional dependencies via Homebrew
brew install cmake pkg-config

Installation

Get up and running in under 5 minutes:

# Install Oboyu
pip install oboyu

# Index your documents
oboyu index ~/Documents

# Search your documents
oboyu search "your search term"

That's it! See our Documentation for complete guides and examples.

Key Features

🧠 Knowledge Intelligence

  • Automatic Knowledge Graph Generation: Extracts entities, relationships, and concepts from your documents
  • GraphRAG Search: Leverages knowledge graphs for deeper, contextual search results
  • Multi-dimensional Associations: Discovers hidden connections between documents and concepts
  • Semantic Entity Recognition: Identifies and links key entities across your knowledge base
  • Relationship Mapping: Automatically maps relationships between concepts, people, and ideas

📊 Data Enrichment & Enhancement

  • CSV Auto-Enhancement: Enriches CSV files with relevant information from your knowledge base
  • Schema-Driven Processing: Uses JSON schema to define enrichment rules and data transformation
  • Semantic Data Completion: Fills missing information using AI-powered content matching
  • Business Value Creation: Transforms raw data into actionable business insights
  • Batch Processing: Efficiently processes large datasets with configurable batch sizes

🔍 Advanced Search Capabilities

  • Hybrid Search: Combines semantic understanding with keyword matching and graph traversal
  • Multiple Search Modes: Vector search, keyword search, GraphRAG, and hybrid modes
  • AI-Powered Reranking: Built-in reranker improves result accuracy and relevance
  • Contextual Understanding: Uses knowledge graphs to provide more relevant results
  • Flexible Output: Command-line search with JSON, plain text, and structured formats

📚 Comprehensive Document Support

  • Rich Format Support: PDF, plain text (.txt), Markdown (.md), HTML (.html), and source code files
  • PDF Intelligence: Advanced text extraction with metadata preservation and structure understanding
  • Incremental Indexing: Only processes new or changed files for lightning-fast updates
  • Smart Chunking: Intelligent document splitting optimized for knowledge extraction
  • Automatic Encoding: Seamlessly handles UTF-8, Shift-JIS, EUC-JP, and other encodings

🇯🇵 Japanese Business Excellence

  • Native Japanese Support: Purpose-built for Japanese business environments and content
  • Automatic Encoding Detection: Handles legacy Japanese encodings (Shift-JIS, EUC-JP) automatically
  • Specialized Language Models: Optimized embedding and processing models for Japanese text
  • Mixed Language Intelligence: Seamlessly processes Japanese-English bilingual documents
  • Business Context Understanding: Trained on Japanese business terminology and concepts

🚀 Enterprise Performance & Integration

  • ONNX Acceleration: 2-4x faster processing with automatic model optimization
  • MCP Server Integration: Native support for Claude Desktop and AI coding assistants
  • GraphRAG API: RESTful API for knowledge graph queries and data enrichment
  • Rich CLI Interface: Beautiful terminal interface with real-time progress tracking
  • Resource Efficient: Low memory footprint suitable for edge computing and local deployment

Installation

Using UV (Recommended)

uv tool install oboyu

Using pip

pip install oboyu

From Source

git clone https://github.com/sonesuke/oboyu.git
cd oboyu
pip install -e .

System Requirements

  • Python: 3.13 or higher (3.11+ supported)
  • OS: macOS, Linux (Windows via WSL)
  • Memory: 2GB RAM minimum (4GB recommended)
  • Storage: 1GB for models and index
  • Build Tools: See system dependencies above if building from source

Note: Models are automatically downloaded on first use (~90MB). For installation from PyPI, most system dependencies are not required as we provide pre-built wheels.

Usage Examples

Basic Usage

# Index a directory
oboyu index ~/Documents/notes

# Search your documents
oboyu search "machine learning optimization techniques"

# Get results in JSON format for processing
oboyu search "machine learning" --format json

Knowledge Intelligence & GraphRAG

# Build knowledge graph from your documents
oboyu build-kg

# Search using GraphRAG for deeper insights
oboyu search "project management methodologies" --mode graphrag

# Find related concepts and entities
oboyu search "agile development" --rerank --max-results 10

Data Enrichment Workflows

Schema Configuration (enrichment_schema.json):

{
  "input_schema": {
    "columns": {
      "company_name": {"type": "string", "description": "Company name"}
    }
  },
  "enrichment_schema": {
    "columns": {
      "description": {
        "type": "string",
        "source_strategy": "search_content",
        "query_template": "{company_name} company overview business model"
      },
      "industry": {
        "type": "string",
        "source_strategy": "search_content",
        "query_template": "{company_name} industry sector business domain"
      }
    }
  }
}

Enrichment Commands:

# Enrich CSV with knowledge from your documents
oboyu enrich companies.csv enrichment_schema.json

# Custom output location and batch processing
oboyu enrich data.csv schema.json -o enriched_data.csv --batch-size 5

# Disable GraphRAG for faster processing
oboyu enrich simple_data.csv schema.json --no-graph

Advanced Search Examples

# Index only specific file types
oboyu index ~/projects --include-patterns "*.md,*.txt,*.pdf"

# GraphRAG search with relationship traversal
oboyu search "API design patterns" --mode graphrag --confidence 0.7

# Hybrid search combining multiple approaches
oboyu search "microservices architecture" --mode hybrid --rerank

# Search with custom result limits and confidence
oboyu search "database optimization" --max-results 15 --confidence 0.6

MCP Server for AI Assistants

# Start MCP server with GraphRAG capabilities
oboyu mcp

# Or configure in Claude Desktop's settings

See our MCP Integration Guide for detailed setup instructions.

Documentation

🚀 Getting Started

💼 Real-world Usage

⚙️ Configuration & Optimization

🔗 Integration & Reference

📖 View Full Documentation →

🛠️ Technology Stack

Learn about the cutting-edge technologies that power Oboyu's intelligence:

We believe in transparency and sharing our technical journey. These deep-dives include performance benchmarks, implementation insights, and honest assessments of alternatives.

Common Use Cases

🏢 Enterprise Knowledge Management

Transform organizational documents into a searchable knowledge graph:

# Index company documents and build knowledge graph
oboyu index ~/company_docs --include "*.pdf,*.md,*.docx"
oboyu build-kg

# Search for strategic insights
oboyu search "competitive analysis market positioning" --mode graphrag

📊 Business Data Enhancement

Enrich customer or product data with insights from your knowledge base:

# Enhance customer list with company information
oboyu enrich customers.csv customer_enrichment_schema.json

# Add product descriptions from documentation
oboyu enrich products.csv product_schema.json --batch-size 10

📚 Research & Academic Intelligence

Create a comprehensive research knowledge base:

# Index research papers and notes
oboyu index ~/research --include "*.pdf,*.md,*.txt"
oboyu build-kg

# Find related concepts and methodologies
oboyu search "neural network optimization techniques" --mode graphrag

💻 Technical Documentation Intelligence

Make your codebase and documentation more discoverable:

# Index code and documentation
oboyu index ~/projects/myapp --include "*.md,*.py,*.js,*.java"

# Find implementation patterns and examples
oboyu search "authentication middleware patterns" --rerank

📋 Meeting & Decision Intelligence

Transform meeting notes into actionable insights:

# Index meeting notes and decisions
oboyu index ~/meetings --include "*.md,*.txt"

# Search for decisions and action items
oboyu search "budget approval Q4 initiatives" --mode hybrid

🌏 Multilingual Business Operations

Perfect for Japanese-English business environments:

# Index multilingual business documents
oboyu index ~/business_docs --include "*.pdf,*.md"

# Search across languages seamlessly
oboyu search "プロジェクト管理 project management methodology" --mode graphrag

Testing

Unit and Integration Tests

# Run fast tests (recommended for development)
uv run pytest -m "not slow"

# Run all tests with coverage
uv run pytest --cov=src

E2E Display Testing

Oboyu includes comprehensive E2E display testing using Claude Code SDK:

# Run all E2E display tests
python e2e/run_tests.py

# Run specific test category
python e2e/run_tests.py --test search

See our Full Documentation for more details.

Contributing

We welcome contributions! See our Contributing Guidelines for details.

# Quick start for contributors
git clone https://github.com/YOUR_USERNAME/oboyu.git
cd oboyu
uv sync
uv run pytest -m "not slow"

Support

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Acknowledgments

  • The name "Oboyu" (覚ゆ) comes from ancient Japanese, meaning "to remember"
  • Built with ❤️ for the Japanese business and NLP community
  • Inspired by the goal of making knowledge accessible and actionable across languages
  • Special thanks to the TinySwallow model for Japanese language understanding and knowledge extraction
  • GraphRAG implementation inspired by Microsoft's GraphRAG research and methodology

Made with 🇯🇵 by sonesuke

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •