Observability

See Everything Your AI Does

Self-hosted Langfuse tracing, OpenTelemetry-compatible telemetry, cost analytics, latency tracking, and per-agent dashboards. Full visibility, zero data leaving your infrastructure.

Instrumented at Every Layer

Forge is not a black box. Every request generates detailed tracing data through every stage of the pipeline -- from the initial security scan to the final output delivery. No sampling, no approximation, every single request is traced.

End-to-End Request Tracing

Every API call generates a complete trace through the ForgeGuard security pipeline, routing layer, memory retrieval, LLM provider call, and output scanning. Traces capture token counts, latency at each stage, model selection decisions, cache hits, and security scan results. Drill into any trace to understand exactly why a specific model was chosen and how the request was processed.

Cost Attribution

Track LLM spend per tenant, per agent, per user, and per feature. Forge breaks down costs by provider, model, token type (input vs. output), and request category. Compare actual spend against budgets in real-time. Identify which agents are most cost-efficient and which conversations consume the most tokens.

Latency Tracking

Monitor P50, P95, and P99 latencies across every stage of the request pipeline. Identify bottlenecks in security scanning, memory retrieval, or provider response times. Set alerts for latency regressions and track trends over time. Latency data feeds back into the routing layer to automatically deprioritize slow providers.

Experiment Tracking

Opik integration enables structured experiment tracking for prompt engineering and model evaluation. Compare outputs across models, track quality metrics over time, and run A/B tests on prompt variants. Experiments link directly to traces so you can see real production data alongside controlled test results.

Purpose-Built Dashboards

Different stakeholders need different views. Platform operators need system-wide health. Agent developers need per-agent drill-downs. Tenant admins need isolated usage reports. Forge provides all four out of the box.

Overview Dashboard

High-level metrics across all tenants and agents: total requests, cost, latency percentiles, cache hit rate, security events, and active sessions. The overview is designed for platform operators who need a quick health check.

Total requests/minCost burn rateCache hit %Security blocksP95 latencyActive agents

Per-Agent Dashboard

Deep dive into a single agent's performance. Track which models it uses, how much it spends, its average response quality, memory utilization, and tool invocations. Compare agents side-by-side to identify optimization opportunities.

Model distributionCost per conversationMemory queries/sessionTool callsError rateUser satisfaction

Per-Tenant Dashboard

Isolated view for multi-tenant platforms. Each tenant sees only their own data: usage, spend, agents, sessions, and security events. Tenant dashboards support RBAC so tenant admins can invite team members with scoped permissions.

Tenant spendUser countAgent fleet statusAPI call volumeQuota utilizationCompliance status

Cost Analytics

Financial reporting with daily, weekly, and monthly breakdowns. Forecast future spend based on growth trends. Identify cost anomalies and runaway agents. Export reports in FOCUS-compliant format for enterprise chargeback.

Daily/weekly/monthly spendProvider cost breakdownModel mix optimizationBudget vs. actualCost forecastChargeback reports

Observability Stack

Forge integrates with three observability layers: Langfuse for LLM-specific tracing, Opik for experiment tracking, and OpenLLMetry for standards-based telemetry export to your existing monitoring infrastructure.

Langfuse

Primary tracing and analytics platform

Self-hosted Langfuse instance captures every trace, span, and generation event. Forge instruments all internal operations with Langfuse spans, giving you millisecond-level visibility into routing decisions, cache lookups, security scans, and LLM calls. Langfuse's web UI provides search, filtering, and visualization out of the box.

Opik

Experiment tracking and evaluation

Run structured experiments comparing prompt variants, model configurations, and routing strategies. Opik tracks experiment metadata, input/output pairs, and quality scores in a dedicated experiment store. Integrate with CI/CD to run evaluation suites on every deployment.

OpenLLMetry + OpenTelemetry

Standards-based telemetry export

OpenLLMetry auto-instruments LLM provider SDKs and exports spans in OpenTelemetry format. Connect to Datadog, New Relic, Grafana, Honeycomb, or any OTLP-compatible backend. Use standard OTEL collectors, dashboards, and alerting rules alongside Forge-specific data.

Alerting & Anomaly Detection

Forge monitors eight categories of signals and fires alerts through your preferred channels: Slack, PagerDuty, email, webhooks, or Telegram. Alerts are configurable per tenant, per agent, and per severity level. Critical alerts like security events and runaway agents trigger immediately. Cost and latency alerts use rolling window averages to avoid false positives.

Anomaly detection uses statistical baselines to identify unusual patterns without requiring manual threshold configuration. When a metric deviates significantly from its historical norm, Forge raises a warning before it becomes a critical issue.

Cost Spike

Triggered when spend exceeds a percentage threshold above the rolling average

Latency Regression

Fired when P95 latency rises above a configured SLA target for a sustained period

Error Rate

Alerts when provider error rates exceed normal thresholds, often indicating an outage

Security Event

Immediate notification when ForgeGuard blocks a request or Augustus detects a breach

Quota Exhaustion

Warning when a tenant or agent approaches their credit or rate limit ceiling

Quality Drop

Triggered when response quality scores (ELO, user feedback) fall below baseline

Runaway Agent

Circuit breaker notification when an agent's cost velocity exceeds safe bounds

Memory Anomaly

Fired when vector write patterns suggest potential memory poisoning or data drift

Know exactly where every dollar goes

Observability is enabled by default on every Forge request. Traces appear in your Langfuse dashboard within seconds. No additional setup required.