Intermediateaillmobservabilityevaluationtracingpythonagentsragguardrailsred-teaming

RagaAI Catalyst: Agent AI Observability and Evaluation Framework

Comprehensive platform for LLM project management, evaluation, tracing, and monitoring with support for agents, RAG applications, and multi-model AI systems.

Step 1
What is RagaAI Catalyst?
RagaAI Catalyst is a comprehensive Python SDK and platform designed for managing, evaluating, and optimizing LLM projects. It provides end-to-end observability for AI applications including agent tracing, multi-agentic system debugging, execution graph visualization, and advanced analytics.

Key capabilities include:
- Project Management: Create and organize LLM projects with different use cases
- Dataset Management: Efficiently manage training and evaluation datasets
- Evaluation Management: Create experiments and run metrics on RAG applications
- Trace Management: Record and analyze execution traces of RAG applications
- Agent Tracing: Track multi-agent system behaviors and interactions
- Prompt Management: Manage and version prompts for your AI applications
- Synthetic Data Generation: Generate synthetic data for testing and evaluation
- Guardrail Management: Implement safety filters to prevent harmful outputs
- Red-teaming: Comprehensive scans to detect model vulnerabilities and biases

Step 2

Technology stack

RagaAI Catalyst is built on a modern Python stack with extensive LLM ecosystem integrations:

Core Stack:

Python 3.10-3.13
aiohttp for async HTTP operations
Pydantic for data validation
Pandas for data processing
Rich for terminal UI

LLM Frameworks:

LangChain (core and full framework)
LlamaIndex
LiteLLM for multi-model support

Model Providers:

OpenAI
Google Generative AI
Groq
Anthropic

Observability:

OpenTelemetry (SDK, OTLP exporter, instrumentation)
OpenInference (for all major frameworks: LangChain, LlamaIndex, CrewAI, Haystack, Anthropic, OpenAI, Mistral, etc.)

Utilities:

Requests, tqdm, tiktoken, Jinja2, PyYAML

Tech Stack:
├── Python 3.10-3.13
├── Core Libraries
│   ├── aiohttp>=3.10.2
│   ├── pydantic
│   ├── pandas
│   └── rich>=13.9.4
├── LLM Frameworks
│   ├── langchain-core>=0.2.11
│   ├── langchain>=0.2.11
│   ├── llama-index>=0.10.0
│   └── litellm>=1.51.1
├── Model Clients
│   ├── openai>=1.57.0
│   ├── google-genai>=1.3.0
│   └── groq>=0.11.0
├── Observability
│   ├── opentelemetry-sdk
│   ├── opentelemetry-exporter-otlp
│   └── openinference-* (all major frameworks)
└── Utilities
    ├── requests~=2.32.3
    ├── tiktoken>=0.7.0
    └── tqdm>=4.66.5

Step 3

Prerequisites

Before installing RagaAI Catalyst, ensure you have:

Python 3.10-3.13 installed
pip or uv package manager
RagaAI platform credentials (access_key and secret_key)

# Check Python version (3.10-3.13 required)
python --version

# Recommended: Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh

# Verify installation
python --version  # Should be 3.10-3.13

Step 4
Installation
Install RagaAI Catalyst via pip. The package includes all core dependencies for LLM observability and evaluation.
```
# Using pip
pip install ragai-catalyst

# Using uv (recommended)
uv pip install ragai-catalyst
```

Step 5

Authentication setup

Before using RagaAI Catalyst, you need to obtain credentials from the RagaAI platform:

Navigate to your profile settings on the RagaAI platform
Select "Authentication" to create your keys
Click "Generate new key" to create access and secret keys
Store these credentials securely

You can configure credentials via environment variables or directly in code:

# Option 1: Environment variables
export RAGAI_ACCESS_KEY="your_access_key"
export RAGAI_SECRET_KEY="your_secret_key"
export RAGAI_BASE_URL="https://api.raga.ai"  # or your self-hosted URL

# Option 2: .env file
# Create a .env file in your project
echo "RAGAI_ACCESS_KEY=your_access_key" > .env
echo "RAGAI_SECRET_KEY=your_secret_key" >> .env
echo "RAGAI_BASE_URL=https://api.raga.ai" >> .env

Step 6

Initializing the SDK

Import and initialize the RagaAI Catalyst SDK with your credentials. The SDK provides a unified interface for all observability features.

from ragai_catalyst import RagaAICatalyst
import os

# Option 1: Pass credentials directly
catalyst = RagaAICatalyst(
    access_key="YOUR_ACCESS_KEY",
    secret_key="YOUR_SECRET_KEY",
    base_url="https://api.raga.ai"
)

# Option 2: Use environment variables
catalyst = RagaAICatalyst(
    access_key=os.getenv("RAGAI_ACCESS_KEY"),
    secret_key=os.getenv("RAGAI_SECRET_KEY"),
    base_url=os.getenv("RAGAI_BASE_URL", "https://api.raga.ai")
)

# Verify connection
print("Connected to RagaAI Catalyst!")

Step 7

Project management

Create and manage LLM projects. Projects organize your datasets, evaluations, and traces under a single namespace.

from ragai_catalyst import RagaAICatalyst

# Initialize catalyst
catalyst = RagaAICatalyst(
    access_key="YOUR_ACCESS_KEY",
    secret_key="YOUR_SECRET_KEY"
)

# Create a new project
project = catalyst.create_project(
    project_name="My-RAG-Application",
    usecase="Chatbot"
)
print(f"Created project: {project.name}")

# List available use cases
use_cases = project.use_cases()
print(f"Use cases: {use_cases}")

# List all your projects
projects = catalyst.list_projects()
for p in projects:
    print(f"- {p.name}: {p.usecase}")

Step 8

Dataset management

Manage datasets efficiently for evaluation. Upload CSV files and map columns to the RAG schema (query, response, context, etc.).

from ragai_catalyst import DatasetManager

# Initialize Dataset manager for a specific project
dataset_manager = DatasetManager(
    project_name="My-RAG-Application",
    access_key="YOUR_ACCESS_KEY",
    secret_key="YOUR_SECRET_KEY"
)

# List existing datasets
datasets = dataset_manager.list_datasets()
print(f"Existing Datasets: {datasets}")

# Create a dataset from CSV with custom schema mapping
dataset_manager.create_from_csv(
    csv_path="path/to/your/data.csv",
    mappings={
        "question": "query",      # Your column -> RAG schema field
        "answer": "response",     # Map to response field
        "source_text": "context"  # Map to context field
    },
    schema_mapping="custom"
)

# Get the default schema mapping
schema = dataset_manager.get_schema_mapping()
print(f"Schema: {schema}")

Step 9

Evaluation management

Create and run evaluations to measure your RAG application performance. RagaAI supports multiple metrics for evaluating retrieval quality, response accuracy, and more.

Available metrics include:

Faithfulness (does the answer follow from context)
Context relevance (is the retrieved context relevant)
Answer relevance (is the answer relevant to the question)
Similarity metrics
Custom metrics

from ragai_catalyst import Evaluation

# Create an evaluation experiment
evaluation = Evaluation(
    project_name="My-RAG-Application",
    dataset_name="MyDataset",
    access_key="YOUR_ACCESS_KEY",
    secret_key="YOUR_SECRET_KEY"
)

# List available metrics
available_metrics = evaluation.list_metrics()
print(f"Available metrics: {available_metrics}")

# Configure the schema mapping for your dataset
schema_mapping = {
    "query": "prompt",           # Your CSV column for queries
    "response": "response",      # Your CSV column for responses
    "context": "context_passages", # Your CSV column for context
    "reference": "ground_truth"  # (Optional) ground truth answers
}

# Run evaluation with specific metrics
evaluation.add_metrics(
    metrics=["faithfulness_v2", "relevance_v2"],
    schema=schema_mapping
)

# Get evaluation results
results = evaluation.evaluate()
print(f"Evaluation Results: {results}")

Step 10

Trace management

Record and analyze traces of your RAG application execution. Traces provide visibility into the step-by-step flow of your AI application, helping you debug and optimize performance.

from ragai_catalyst import TraceManager

# Initialize trace manager
tm = TraceManager(
    project_name="My-RAG-Application",
    access_key="YOUR_ACCESS_KEY",
    secret_key="YOUR_SECRET_KEY"
)

# Record a trace for a request
tm.record_trace(
    input_query="What is the capital of France?",
    output_response="The capital of France is Paris.",
    context=["Paris is the capital and largest city of France"],
    metadata={
        "model": "gpt-4",
        "latency_ms": 245,
        "tokens_used": 42
    }
)

# List all traces
traces = tm.list_traces()
for trace in traces:
    print(f"Trace ID: {trace.id}, Input: {trace.input[:50]}...")

# Get a specific trace
t = tm.get_trace(trace_id="trace-uuid-here")
print(f"Trace details: {t}")

Step 11

Agent tracing (multi-agent systems)

Track multi-agent system behaviors and interactions. RagaAI Catalyst provides specialized tracing for agent-based workflows, including tool usage, agent handoffs, and decision-making processes.

from ragai_catalyst import AgentTracer

# Initialize agent tracer
tracer = AgentTracer(
    project_name="Multi-Agent-System",
    access_key="YOUR_ACCESS_KEY",
    secret_key="YOUR_SECRET_KEY"
)

# Trace an agent interaction
with tracer.trace_agent(agent_name="researcher") as trace:
    # Your agent code here
    result = agent.run("Research quantum computing")
    
    # Record agent-specific metadata
    trace.record_tool_call(
        tool_name="search_engine",
        input={"query": "quantum computing"},
        output="Search results..."
    )
    trace.record_decision(
        decision="Continue research",
        reasoning="More information needed",
        alternatives_explored=["summarize", "ask_user"]
    )

# View agent execution graph
executor_graph = tracer.get_execution_graph(trace.id)
print(f"Execution graph: {executor_graph}")

Step 12

Prompt management

Manage and version prompts for your AI applications. Store, version, and retrieve prompts efficiently for consistent AI behavior.

from ragai_catalyst import PromptManager

# Initialize prompt manager
pm = PromptManager(
    project_name="My-RAG-Application",
    access_key="YOUR_ACCESS_KEY",
    secret_key="YOUR_SECRET_KEY"
)

# Create and store a prompt
prompt_id = pm.create_prompt(
    name="rag-system-prompt",
    template="""You are a helpful AI assistant. Use the following context to answer the question.

Context: {context}
Question: {question}

Answer:""",
    version="1.0",
    variables=["context", "question"]
)
print(f"Created prompt with ID: {prompt_id}")

# List all prompts
prompts = pm.list_prompts()
for p in prompts:
    print(f"- {p.name} (v{p.version}): {p.template[:50]}...")

# Get a specific prompt
prompt = pm.get_prompt(prompt_id)
rendered = prompt.render(context="Some context", question="What is X?")
print(f"Rendered: {rendered}")

Step 13

Synthetic data generation

Generate synthetic data for testing and evaluation. Create diverse test cases without collecting real user data.

from ragai_catalyst import SyntheticDataGenerator

# Initialize synthetic data generator
generator = SyntheticDataGenerator(
    project_name="My-RAG-Application",
    access_key="YOUR_ACCESS_KEY",
    secret_key="YOUR_SECRET_KEY"
)

# Generate synthetic Q&A pairs
data = generator.generate(
    num_samples=100,
    domain="technology",
    include_context=True,
    difficulty_levels=["easy", "medium", "hard"]
)

# Export to CSV
generator.export_to_csv(
    data=data,
    output_path="synthetic_data.csv",
    schema={
        "query": "question",
        "response": "answer",
        "context": "context_passages"
    }
)

print(f"Generated {len(data)} synthetic samples")

Step 14

Guardrail management

Implement safety filters (guardrails) to prevent harmful, biased, or inappropriate AI outputs. Configure multiple guardrail types for comprehensive protection.

Guardrail types include:

Toxicity detection
PII (personally identifiable information) detection
Jailbreak attempt detection
Custom regex/keyword filters

from ragai_catalyst import GuardrailManager

# Initialize guardrail manager
guardrail = GuardrailManager(
    project_name="My-RAG-Application",
    access_key="YOUR_ACCESS_KEY",
    secret_key="YOUR_SECRET_KEY"
)

# Create a toxicity guardrail
guardrail.create_guardrail(
    name="toxicity-filter",
    type="toxicity",
    threshold=0.7,
    action="block"  # or "flag", "modify"
)

# Create a PII detection guardrail
guardrail.create_guardrail(
    name="pii-detector",
    type="pii",
    fields=["email", "phone", "ssn"],
    action="redact"
)

# Run guardrails on input/output
def check_with_guardrails(input_text: str, output_text: str):
    # Check input
    input_result = guardrail.check(input_text)
    if input_result.blocked:
        print(f"Input blocked: {input_result.reason}")
        return None
    
    # Check output
    output_result = guardrail.check(output_text)
    if output_result.blocked:
        print(f"Output blocked: {output_result.reason}")
        return None
    elif output_result.redacted:
        print(f"Output redacted: {output_result.redacted_text}")
        return output_result.redacted_text
    
    return output_text

Step 15

Red-teaming

Perform comprehensive scans to detect model vulnerabilities, biases, and potential misuse. Red-teaming helps identify security gaps before deployment.

Red-teaming capabilities:

Jailbreak attack simulation
Bias detection
Adversarial input testing
Prompt injection detection

from ragai_catalyst import RedTeaming

# Initialize red-teaming module
redteam = RedTeaming(
    project_name="My-RAG-Application",
    access_key="YOUR_ACCESS_KEY",
    secret_key="YOUR_SECRET_KEY"
)

# Run a comprehensive vulnerability scan
scan_result = redteam.run_scan(
    model="gpt-4",
    attack_types=[
        "jailbreak",
        "prompt_injection",
        "bias_detection",
        "pii_extraction"
    ],
    num_attacks=50,
    severity_threshold="medium"
)

# View scan results
print(f"Vulnerabilities found: {len(scan_result.vulnerabilities)}")
for vuln in scan_result.vulnerabilities:
    print(f"- [{vuln.severity}] {vuln.type}: {vuln.description}")

# Get detailed report
report = redteam.generate_report(scan_result)
print(f"Report: {report}")

Step 16

Self-hosted deployment

Deploy RagaAI Catalyst as a self-hosted solution for complete control over your observability infrastructure. The self-hosted version includes a dashboard with timeline and execution graph views.

# Using Docker (if available)
docker pull ragaai/catalyst:latest

docker run -d \
  --name ragaai-catalyst \
  -p 8080:8080 \
  -v ragaai-data:/app/data \
  -e POSTGRES_URL=postgresql://user:pass@localhost:5432/ragaai \
  ragaai/catalyst:latest

# Access the dashboard at http://localhost:8080

# Initialize SDK with self-hosted instance
from ragai_catalyst import RagaAICatalyst

sdk = RagaAICatalyst(
    access_key="YOUR_ACCESS_KEY",
    secret_key="YOUR_SECRET_KEY",
    base_url="http://your-self-hosted-domain:8080/api"
)

Step 17
Resources
Official GitHub repository: https://github.com/raga-ai-hub/RagaAI-Catalyst

PyPI package: https://pypi.org/project/ragai-catalyst/

Stars: 16,166+ (indicating strong community adoption)

Documentation: Refer to the GitHub README for the latest API documentation and examples.

Support: Reach out to the RagaAI team through GitHub issues for questions and feature requests.
```
GitHub: https://github.com/raga-ai-hub/RagaAI-Catalyst
PyPI: https://pypi.org/project/ragai-catalyst/
Stars: 16,166+
Documentation: See GitHub README
Support: GitHub Issues
```