Core Features

Agensight uses Github Discussions for Support and Feature Requests.

Pype AI's Agensight is an open-source experimentation studio built for conversational AI agents. It is similar to LangGraph but supports any agentic framework (likes of Autogen, LangGraph etc) or modality (voice, image & text). With minimal code changes, Pype AI provides complete observability to help you trace agentic workflows for entire sessions or user conversations.

It features a plug & play playground for editing prompts and tools. It uses an MCP server that if used via cursor or whindsurf can explore your code and generate a playground synced to your code. You can do any edits to your prompts or tools in this playground. Changes made in the playground sync directly to your code, allowing you to effortlessly run, replay, and evaluate experiments.

It provides Conversational Replays that help you visit any session, replay the conversation with any multiple versions of the agents (created by editing the agents (model, prompt, rag, tools) and Evaluate to help you improve your customer interactions.

Agensight empowers you to quickly iterate, build evaluations, and improve agent conversations.

agensight.studio.mp4

Core Features

Agent Observability

Agensight provides comprehensive observability for your AI agents through auto-instrumented tracing of all LLM calls, function executions, and agent interactions. The local development mode enables offline trace inspection with detailed performance metrics and token usage analytics. Customize your traces and spans with meaningful names and organize them for better debugging and analysis of your agent workflows.

Interactive Playground

The interactive playground offers a visual workflow editor for designing and modifying agent workflows through an intuitive interface. All changes made in the playground automatically sync with your codebase, ensuring seamless integration between development and experimentation. The platform maintains version control for your prompts and agent configurations, allowing you to track and revert changes as needed.

LLM Evaluations

Evaluate your agent's performance with custom metrics tailored to your specific use cases. Agensight's evaluation framework provides automated scoring of agent responses using predefined or custom criteria, giving you instant feedback on performance. Track improvements over time with detailed evaluation reports and analytics, helping you continuously enhance your agent's capabilities.

Conversational Replay

Access and replay any past conversation with your agents through the session history feature. Compare different versions of your agents (model, prompt, tools) side by side to identify improvements and regressions. The interactive debugging capabilities allow you to step through conversations, making it easier to identify and fix issues in your agent's behavior.

Security & Local Storage

All data is stored locally inside the SDK, ensuring complete privacy and control over your information. No data is uploaded or tracked externally, and all prompt versions are stored locally in the .agensight file. We recommend running Agensight in isolated virtual environments for enhanced security.

Quick Start Guide

Prerequisites

Python 3.10 or higher

Verify your Python version:

python --version

Getting Started

Navigate to your project directory
```
cd path/to/your/project
```

Set up isolated environment

python -m venv .venv
source .venv/bin/activate  # Linux/macOS
# .venv\Scripts\activate   # Windows

Install Agensight
```
pip install --upgrade agensight
```
Launch dashboard
```
agensight view
```
Access the dashboard at http://localhost:5001

Examples

agensight-example-langgraph-chatbot

Setup Tracing

Add to your Python code

from agensight import init, trace, span

# Initialize Agensight
init(name="my-agent")

# Add tracing to your functions
@trace("my_workflow")
def my_function():
    @span()
    def my_subtask():
        # Your code here
        pass
    return my_subtask()

Setup Playground

Install MCP Server

# Clone the MCP server
git clone https://github.com/pype-ai/agensight_mcpserver.git
cd agensight_mcpserver

# Setup MCP server
python -m venv mcp-env
source mcp-env/bin/activate  # On Windows: mcp-env\Scripts\activate
pip install -r requirements.txt

Configure Cursor/Windsurf Add this to your Cursor/Windsurf settings:

{
  "mcpServers": {
    "agensight": {
      "command": "/path/to/agensight_mcpserver/mcp-env/bin/python",
      "args": ["/path/to/agensight_mcpserver/server.py"],
      "description": "Agensight Playground Generator"
    }
  }
}

Generate Playground

Open your project in Cursor/Windsurf
Type in chat: "Please analyze this codebase using the generateAgensightConfig MCP tool"
Your config will be automatically generated

That's it! You now have both tracing and playground features set up. The dashboard at http://localhost:5001 will show your traces and allow you to edit your agents in the playground.

Agent Observability

Overview

Agensight provides first-class support for tracing your agent workflows using just two decorators: @trace for high-level operations and @span for finer-grained steps like LLM calls or tool executions. This gives you powerful visibility into your agent's behavior across sessions, tools, and models.

Initialize Tracing

Before you use any tracing features, initialize Agensight at the start of your application:

import agensight

agensight.init(
    name="chatbot-with-tools",
    mode="prod",  # Use "local" for local development
    token="abc12345",  # Required token for prod/dev
    session="user_123"      # Optional: can be an ID or full session dict
)

Parameters:

name: Your app or service name
mode: One of "local", "dev", or "prod"
token: Required in cloud modes to associate logs
session: Optional session ID or metadata (str or {id, name, user_id})

ℹ️ If both init() and @trace() specify a session, @trace() takes precedence for that specific trace.

Create Traces with @trace

The @trace decorator marks a top-level user workflow (e.g. a request handler, multi-agent loop, or RAG pipeline). All nested spans are automatically tied to this trace.

from agensight import trace

@trace(name="multi_agent_chat", session={"id": "xyz123", "name": "multi agent chat", "user_id": "123"})
def main():
    ...

Automatically generates a unique trace_id
Associates child spans to the trace
Inherits or overrides session metadata

Add Spans with @span

Use the @span decorator to capture individual operations like LLM calls, tool executions, or preprocessing steps. It records execution time, input/output, token usage, and more.

from agensight import span

@span(name="llm")
def call_llm(messages):
    return client.chat.completions.create(messages=messages)

Captures LLM usage details (e.g. model, token count)
Works with OpenAI, Claude, or custom models
Tool calls inside spans are automatically detected if tool_choice="auto"

Features of Tracing

Structured Trace Data: Includes input/output, tokens used, and timing for each step
Session-Aware: Group all traces and spans by user/session automatically
LLM-Aware: Automatically captures model usage, prompts, completions, and costs
Tool Logging: Captures tool invocations inside spans, no manual work needed
Cloud Compatible: In prod/dev mode, all traces are sent to your Supabase backend

Example

from agensight import init, trace, span

init(name="chat-service", mode="prod", token="abc12345", session="user_456")

@span(name="llm")
def call_llm(messages):
    return {"content": "Mock response", "tool_used": "get_weather"}

@trace(name="multi_agent_chat", session="user_789")
def main():
    plan = call_llm([{"role": "user", "content": "Weather in Paris?"}])
    summary = call_llm([{"role": "user", "content": f"Summarize: {plan['content']}"}])
    print("Final Output:", summary["content"])

main()

This example shows two @span-wrapped LLM calls under a single @trace. The session metadata ensures everything is tied to the correct user and session in your dashboard.

LLM Evaluations

GEval: Custom Evaluation Metrics for LLM Applications

You can automatically evaluate any component of your LLM application by attaching custom evaluation metrics to a span using the @span decorator. This allows you to assess agent responses with metrics such as factual accuracy, helpfulness, or any custom criteria you define.

# Define evaluation metrics
from agensight.eval.g_eval import GEvalEvaluator

factual_accuracy = GEvalEvaluator(
    name="Factual Accuracy",
    criteria="Evaluate whether the actual output contains factually accurate information based on the expected output.",
    threshold=0.7,
    verbose_mode=True
)

helpfulness = GEvalEvaluator(
    name="Helpfulness",
    criteria="Evaluate whether the output is helpful and addresses the user's input question.",
    threshold=0.6,
    verbose_mode=True,
)

# Attach metrics to your span
@span(name="improve_joke", metrics=[factual_accuracy, helpfulness])
def improve_joke(actual_output, expected_output):
    # ... your logic here ...
    return actual_output

Agensight offers a variety of evaluation metrics tailored to different use cases. Our metrics include Task Completion, Tool Correctness, Conversation Completeness, and Conversation Relevancy, each designed to provide specific insights into LLM performance. For more details, visit our documentation page.

Playground

Once your playground is generated, you'll have access to these features:

Agent Configuration
- Edit agent prompts and system messages
- Configure model parameters (temperature, max tokens, etc.)
- Set up tools and function calls
- Define agent variables and connections
Workflow Visualization
- View your agent workflow as a visual graph
- Drag and drop to modify agent connections
- Add or remove agents from the workflow
- Configure input/output relationships
Prompt Management
- Create and edit prompts in real-time
- Save different versions of prompts
- Test prompts with sample inputs
- Compare prompt performance
Tool Configuration
- Add or modify tools for your agents
- Configure tool parameters
- Test tool functionality
- Monitor tool usage and performance

Example playground configuration:

{
  "agents": [
    {
      "name": "ResearchAgent",
      "prompt": "You are a research assistant...",
      "modelParams": {
        "model": "gpt-4",
        "temperature": 0.7
      },
      "tools": ["web_search", "document_reader"]
    },
    {
      "name": "SummaryAgent",
      "prompt": "Summarize the following information...",
      "modelParams": {
        "model": "gpt-3.5-turbo",
        "temperature": 0.3
      }
    }
  ],
  "connections": [
    {"from": "ResearchAgent", "to": "SummaryAgent"}
  ]
}

All changes made in the playground automatically sync with your codebase, allowing you to:

Test changes before committing them
Version control your agent configurations
Collaborate with team members
Maintain consistency across environments

Configuration

Trace Configuration

Feature	Default	Customizable With
Project name	`"default"`	`init(name="...")`
Trace name	Function name	`@trace("...")`
Span name	Auto (`Agent 1`, etc.)	`@span(name="...")`

Playground Configuration

Agensight uses a configuration file (agensight.config.json by default) to define agents, their connections, and parameters.

Basic Structure

{
  "agents": [
    {
      "name": "AnalysisAgent",
      "prompt": "You are an expert analysis agent...",
      "variables": ["input_data"],
      "modelParams": {
        "model": "gpt-4o",
        "temperature": 0.2
      }
    },
    {
      "name": "SummaryAgent",
      "prompt": "Summarize the following information...",
      "variables": ["analysis_result"],
      "modelParams": {
        "model": "gpt-3.5-turbo",
        "temperature": 0.7
      }
    }
  ],
  "connections": [
    {"from": "AnalysisAgent", "to": "SummaryAgent"}
  ]
}

Contributing

Open source contributions are welcome! Please see our Contributing Guide for details on how to get started, coding standards, and our development workflow.

Development Workflow

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the package in development mode:
```
pip install -e .
```

Development Guidelines

Follow PEP 8 for Python code
Use snake_case for Python functions and variables
Use PascalCase for component names in React/TypeScript
Add type annotations to all Python functions
Follow Conventional Commits for commit messages

Roadmap

JavaScript SDK
Cloud viewer

Name		Name	Last commit message	Last commit date
Latest commit History 180 Commits
.github/workflows		.github/workflows
agensight		agensight
cli		cli
docs		docs
examples		examples
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
deploy.sh		deploy.sh
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
readme.md		readme.md
readme.sdk.md		readme.sdk.md
requirements.txt		requirements.txt

Uh oh!

License

Uh oh!

pype-ai/agensight

Folders and files

Latest commit

History

Repository files navigation

Core Features

Agent Observability

Interactive Playground

LLM Evaluations

Conversational Replay

Security & Local Storage

Quick Start Guide

Prerequisites

Getting Started

Examples

Setup Tracing

Setup Playground

Agent Observability

Overview

Initialize Tracing

Create Traces with @trace

Add Spans with @span

Features of Tracing

Example

LLM Evaluations

GEval: Custom Evaluation Metrics for LLM Applications

Playground

Configuration

Trace Configuration

Playground Configuration

Basic Structure

Contributing

Development Workflow

Development Guidelines

Roadmap

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages