Skip to content

airweave-ai/airweave

Repository files navigation

airweave-lettermark

Make Any App Searchable for AI Agents

Ruff ESLint System Tests PyPI Downloads Discord

airweave-ai%2Fairweave | TrendshiftΒ Β Launch YC: Airweave - Let Agents Search Any App

⭐ Help us reach more developers and grow the Airweave community. Star this repo!

What is Airweave?

Airweave is a fully open-source tool that lets agents search any app. It connects to apps, productivity tools, databases, or document stores and transforms their contents into searchable knowledge bases, accessible through a standardized interface for agents.

The search interface is exposed via REST API or MCP. When using MCP, Airweave essentially builds a semantically searchable MCP server. The platform handles everything from auth and extraction to embedding and serving. You can find our documentation here.

πŸ“Ί Check out a quick demo of Airweave below:

Airweave.Demo.mp4

πŸ”— Example notebooks

Table of Contents

πŸš€ Quick Start

Managed Service: Airweave Cloud

Self-hosted:

Make sure docker and docker-compose are installed, then...

# 1. Clone the repository
git clone https://github.com/airweave-ai/airweave.git
cd airweave

# 2. Build and run
chmod +x start.sh
./start.sh

That's it! Access the dashboard at http://localhost:8080

πŸ”Œ Supported Integrations

Airtable Asana Attio Bitbucket Box ClickUp Confluence CTTI Dropbox Github Gitlab Gmail Google Calendar Google Drive Hubspot Jira Linear Monday Notion Onedrive Outlook Calendar Outlook Mail Postgresql Salesforce Sharepoint Slack Stripe Teams Todoist Trello Zendesk

πŸ’» Usage

Frontend

  • Access the UI at http://localhost:8080
  • Connect sources, configure syncs, and query data

API

  • Swagger docs: http://localhost:8001/docs
  • Create connections, trigger syncs, and search data

πŸ“¦ SDKs

Python

pip install airweave-sdk
from airweave import AirweaveSDK

# Initialize client
client = AirweaveSDK(
    api_key="YOUR_API_KEY",
    base_url="http://localhost:8001"
)

# Create a collection
collection = client.collections.create(name="My Collection")

# Add a source connection
source = client.source_connections.create(
    name="My Stripe Connection",
    short_name="stripe",
    readable_collection_id=collection.readable_id,
    authentication={
        "credentials": {"api_key": "your_stripe_api_key"}
    }
)

# Semantic search (default)
results = client.collections.search(
    readable_id=collection.readable_id,
    query="Find recent failed payments"
)

# Hybrid search (semantic + keyword)
results = client.collections.search(
    readable_id=collection.readable_id,
    query="customer invoices Q4 2024",
    search_type="hybrid"
)

# With query expansion and reranking
results = client.collections.search(
    readable_id=collection.readable_id,
    query="technical documentation",
    enable_query_expansion=True,
    enable_reranking=True,
    top_k=20
)

# Search with recency bias (prioritize recent results)
results = client.collections.search(
    readable_id=collection.readable_id,
    query="critical bugs",
    recency_bias=0.8,  # 0.0 to 1.0, higher = more recent
    limit=10
)

# Get AI-generated answer instead of raw results
answer = client.collections.search(
    readable_id=collection.readable_id,
    query="What are our customer refund policies?",
    response_type="completion",
    enable_reranking=True
)

TypeScript/JavaScript

npm install @airweave/sdk
# or
yarn add @airweave/sdk
import { AirweaveSDKClient, AirweaveSDKEnvironment } from "@airweave/sdk";

// Initialize client
const client = new AirweaveSDKClient({
    apiKey: "YOUR_API_KEY",
    environment: AirweaveSDKEnvironment.Local
});

// Create a collection
const collection = await client.collections.create({
    name: "My Collection"
});

// Add a source connection
const source = await client.sourceConnections.create({
    name: "My Stripe Connection",
    shortName: "stripe",
    readableCollectionId: collection.readableId,
    authentication: {
        credentials: { apiKey: "your_stripe_api_key" }
    }
});

// Semantic search (default)
const results = await client.collections.search(
    collection.readableId,
    { query: "Find recent failed payments" }
);

// Hybrid search (semantic + keyword)
const hybridResults = await client.collections.search(
    collection.readableId,
    {
        query: "customer invoices Q4 2024",
        searchType: "hybrid"
    }
);

// With query expansion and reranking
const advancedResults = await client.collections.search(
    collection.readableId,
    {
        query: "technical documentation",
        enableQueryExpansion: true,
        enableReranking: true,
        topK: 20
    }
);

// Search with recency bias (prioritize recent results)
const recentResults = await client.collections.search(
    collection.readableId,
    {
        query: "critical bugs",
        recencyBias: 0.8,  // 0.0 to 1.0, higher = more recent
        limit: 10
    }
);

// Get AI-generated answer instead of raw results
const answer = await client.collections.search(
    collection.readableId,
    {
        query: "What are our customer refund policies?",
        responseType: "completion",
        enableReranking: true
    }
);

πŸ”‘ Key Features

  • Data synchronization from 30+ sources with minimal config
  • Entity extraction and transformation pipeline
  • Multi-tenant architecture with OAuth2
  • Incremental updates using content hashing
  • Semantic search for agent queries
  • Versioning for data changes

πŸ”§ Tech Stack

  • Frontend: React/TypeScript with ShadCN
  • Backend: FastAPI (Python)
  • Databases: PostgreSQL (metadata), Qdrant (vectors)
  • Workers: Temporal (workflow orchestration), Redis (pub/sub)
  • Deployment: Docker Compose (dev), Kubernetes (prod)

πŸ‘₯ Contributing

We welcome contributions! Please check CONTRIBUTING.md for details.

πŸ“„ License

Airweave is released under the MIT license.

πŸ”— Connect