Pype AI's Agensight is an open-source experimentation studio built for conversational AI agents. It is similar to LangGraph but supports any agentic framework (likes of Autogen, LangGraph etc) or modality (voice, image & text). With minimal code changes, Pype AI provides complete observability to help you trace agentic workflows for entire sessions or user conversations.
It features a plug & play playground for editing prompts and tools. It uses an MCP server that if used via cursor or whindsurf can explore your code and generate a playground synced to your code. You can do any edits to your prompts or tools in this playground. Changes made in the playground sync directly to your code, allowing you to effortlessly run, replay, and evaluate experiments.
It provides Conversational Replays that help you visit any session, replay the conversation with any multiple versions of the agents (created by editing the agents (model, prompt, rag, tools) and Evaluate to help you improve your customer interactions.
Agensight empowers you to quickly iterate, build evaluations, and improve agent conversations.
agensight.studio.mp4
Agensight provides comprehensive observability for your AI agents through auto-instrumented tracing of all LLM calls, function executions, and agent interactions. The local development mode enables offline trace inspection with detailed performance metrics and token usage analytics. Customize your traces and spans with meaningful names and organize them for better debugging and analysis of your agent workflows.
The interactive playground offers a visual workflow editor for designing and modifying agent workflows through an intuitive interface. All changes made in the playground automatically sync with your codebase, ensuring seamless integration between development and experimentation. The platform maintains version control for your prompts and agent configurations, allowing you to track and revert changes as needed.
Evaluate your agent's performance with custom metrics tailored to your specific use cases. Agensight's evaluation framework provides automated scoring of agent responses using predefined or custom criteria, giving you instant feedback on performance. Track improvements over time with detailed evaluation reports and analytics, helping you continuously enhance your agent's capabilities.
Access and replay any past conversation with your agents through the session history feature. Compare different versions of your agents (model, prompt, tools) side by side to identify improvements and regressions. The interactive debugging capabilities allow you to step through conversations, making it easier to identify and fix issues in your agent's behavior.
All data is stored locally inside the SDK, ensuring complete privacy and control over your information. No data is uploaded or tracked externally, and all prompt versions are stored locally in the .agensight file. We recommend running Agensight in isolated virtual environments for enhanced security.
- Python 3.10 or higher
Verify your Python version:
python --version-
Navigate to your project directory
cd path/to/your/project -
Set up isolated environment
python -m venv .venv source .venv/bin/activate # Linux/macOS # .venv\Scripts\activate # Windows
-
Install Agensight
pip install --upgrade agensight
-
Launch dashboard
agensight view
Access the dashboard at
http://localhost:5001
- Add to your Python code
from agensight import init, trace, span
# Initialize Agensight
init(name="my-agent")
# Add tracing to your functions
@trace("my_workflow")
def my_function():
@span()
def my_subtask():
# Your code here
pass
return my_subtask()- Install MCP Server
# Clone the MCP server
git clone https://github.com/pype-ai/agensight_mcpserver.git
cd agensight_mcpserver
# Setup MCP server
python -m venv mcp-env
source mcp-env/bin/activate # On Windows: mcp-env\Scripts\activate
pip install -r requirements.txt- Configure Cursor/Windsurf Add this to your Cursor/Windsurf settings:
{
"mcpServers": {
"agensight": {
"command": "/path/to/agensight_mcpserver/mcp-env/bin/python",
"args": ["/path/to/agensight_mcpserver/server.py"],
"description": "Agensight Playground Generator"
}
}
}- Generate Playground
- Open your project in Cursor/Windsurf
- Type in chat: "Please analyze this codebase using the generateAgensightConfig MCP tool"
- Your config will be automatically generated
That's it! You now have both tracing and playground features set up. The dashboard at http://localhost:5001 will show your traces and allow you to edit your agents in the playground.
Agensight provides first-class support for tracing your agent workflows using just two decorators: @trace for high-level operations and @span for finer-grained steps like LLM calls or tool executions. This gives you powerful visibility into your agent's behavior across sessions, tools, and models.
Before you use any tracing features, initialize Agensight at the start of your application:
import agensight
agensight.init(
name="chatbot-with-tools",
mode="prod", # Use "local" for local development
token="abc12345", # Required token for prod/dev
session="user_123" # Optional: can be an ID or full session dict
)Parameters:
- name: Your app or service name
- mode: One of "local", "dev", or "prod"
- token: Required in cloud modes to associate logs
- session: Optional session ID or metadata (str or {id, name, user_id})
ℹ️ If both init() and @trace() specify a session, @trace() takes precedence for that specific trace.
The @trace decorator marks a top-level user workflow (e.g. a request handler, multi-agent loop, or RAG pipeline). All nested spans are automatically tied to this trace.
from agensight import trace
@trace(name="multi_agent_chat", session={"id": "xyz123", "name": "multi agent chat", "user_id": "123"})
def main():
...- Automatically generates a unique trace_id
- Associates child spans to the trace
- Inherits or overrides session metadata
Use the @span decorator to capture individual operations like LLM calls, tool executions, or preprocessing steps. It records execution time, input/output, token usage, and more.
from agensight import span
@span(name="llm")
def call_llm(messages):
return client.chat.completions.create(messages=messages)- Captures LLM usage details (e.g. model, token count)
- Works with OpenAI, Claude, or custom models
- Tool calls inside spans are automatically detected if tool_choice="auto"
- Structured Trace Data: Includes input/output, tokens used, and timing for each step
- Session-Aware: Group all traces and spans by user/session automatically
- LLM-Aware: Automatically captures model usage, prompts, completions, and costs
- Tool Logging: Captures tool invocations inside spans, no manual work needed
- Cloud Compatible: In prod/dev mode, all traces are sent to your Supabase backend
from agensight import init, trace, span
init(name="chat-service", mode="prod", token="abc12345", session="user_456")
@span(name="llm")
def call_llm(messages):
return {"content": "Mock response", "tool_used": "get_weather"}
@trace(name="multi_agent_chat", session="user_789")
def main():
plan = call_llm([{"role": "user", "content": "Weather in Paris?"}])
summary = call_llm([{"role": "user", "content": f"Summarize: {plan['content']}"}])
print("Final Output:", summary["content"])
main()This example shows two @span-wrapped LLM calls under a single @trace. The session metadata ensures everything is tied to the correct user and session in your dashboard.
You can automatically evaluate any component of your LLM application by attaching custom evaluation metrics to a span using the @span decorator. This allows you to assess agent responses with metrics such as factual accuracy, helpfulness, or any custom criteria you define.
# Define evaluation metrics
from agensight.eval.g_eval import GEvalEvaluator
factual_accuracy = GEvalEvaluator(
name="Factual Accuracy",
criteria="Evaluate whether the actual output contains factually accurate information based on the expected output.",
threshold=0.7,
verbose_mode=True
)
helpfulness = GEvalEvaluator(
name="Helpfulness",
criteria="Evaluate whether the output is helpful and addresses the user's input question.",
threshold=0.6,
verbose_mode=True,
)
# Attach metrics to your span
@span(name="improve_joke", metrics=[factual_accuracy, helpfulness])
def improve_joke(actual_output, expected_output):
# ... your logic here ...
return actual_outputAgensight offers a variety of evaluation metrics tailored to different use cases. Our metrics include Task Completion, Tool Correctness, Conversation Completeness, and Conversation Relevancy, each designed to provide specific insights into LLM performance. For more details, visit our documentation page.
Once your playground is generated, you'll have access to these features:
-
Agent Configuration
- Edit agent prompts and system messages
- Configure model parameters (temperature, max tokens, etc.)
- Set up tools and function calls
- Define agent variables and connections
-
Workflow Visualization
- View your agent workflow as a visual graph
- Drag and drop to modify agent connections
- Add or remove agents from the workflow
- Configure input/output relationships
-
Prompt Management
- Create and edit prompts in real-time
- Save different versions of prompts
- Test prompts with sample inputs
- Compare prompt performance
-
Tool Configuration
- Add or modify tools for your agents
- Configure tool parameters
- Test tool functionality
- Monitor tool usage and performance
Example playground configuration:
{
"agents": [
{
"name": "ResearchAgent",
"prompt": "You are a research assistant...",
"modelParams": {
"model": "gpt-4",
"temperature": 0.7
},
"tools": ["web_search", "document_reader"]
},
{
"name": "SummaryAgent",
"prompt": "Summarize the following information...",
"modelParams": {
"model": "gpt-3.5-turbo",
"temperature": 0.3
}
}
],
"connections": [
{"from": "ResearchAgent", "to": "SummaryAgent"}
]
}All changes made in the playground automatically sync with your codebase, allowing you to:
- Test changes before committing them
- Version control your agent configurations
- Collaborate with team members
- Maintain consistency across environments
| Feature | Default | Customizable With |
|---|---|---|
| Project name | "default" |
init(name="...") |
| Trace name | Function name | @trace("...") |
| Span name | Auto (Agent 1, etc.) |
@span(name="...") |
Agensight uses a configuration file (agensight.config.json by default) to define agents, their connections, and parameters.
{
"agents": [
{
"name": "AnalysisAgent",
"prompt": "You are an expert analysis agent...",
"variables": ["input_data"],
"modelParams": {
"model": "gpt-4o",
"temperature": 0.2
}
},
{
"name": "SummaryAgent",
"prompt": "Summarize the following information...",
"variables": ["analysis_result"],
"modelParams": {
"model": "gpt-3.5-turbo",
"temperature": 0.7
}
}
],
"connections": [
{"from": "AnalysisAgent", "to": "SummaryAgent"}
]
}
Open source contributions are welcome! Please see our Contributing Guide for details on how to get started, coding standards, and our development workflow.
-
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install the package in development mode:
pip install -e .
- Follow PEP 8 for Python code
- Use snake_case for Python functions and variables
- Use PascalCase for component names in React/TypeScript
- Add type annotations to all Python functions
- Follow Conventional Commits for commit messages
- JavaScript SDK
- Cloud viewer
MIT License • © 2025 agensight contributors