See Everything Your AI Does
Self-hosted Langfuse tracing, OpenTelemetry-compatible telemetry, cost analytics, latency tracking, and per-agent dashboards. Full visibility, zero data leaving your infrastructure.
Instrumented at Every Layer
Forge is not a black box. Every request generates detailed tracing data through every stage of the pipeline -- from the initial security scan to the final output delivery. No sampling, no approximation, every single request is traced.
End-to-End Request Tracing
Every API call generates a complete trace through the ForgeGuard security pipeline, routing layer, memory retrieval, LLM provider call, and output scanning. Traces capture token counts, latency at each stage, model selection decisions, cache hits, and security scan results. Drill into any trace to understand exactly why a specific model was chosen and how the request was processed.
Cost Attribution
Track LLM spend per tenant, per agent, per user, and per feature. Forge breaks down costs by provider, model, token type (input vs. output), and request category. Compare actual spend against budgets in real-time. Identify which agents are most cost-efficient and which conversations consume the most tokens.
Latency Tracking
Monitor P50, P95, and P99 latencies across every stage of the request pipeline. Identify bottlenecks in security scanning, memory retrieval, or provider response times. Set alerts for latency regressions and track trends over time. Latency data feeds back into the routing layer to automatically deprioritize slow providers.
Experiment Tracking
Opik integration enables structured experiment tracking for prompt engineering and model evaluation. Compare outputs across models, track quality metrics over time, and run A/B tests on prompt variants. Experiments link directly to traces so you can see real production data alongside controlled test results.
Purpose-Built Dashboards
Different stakeholders need different views. Platform operators need system-wide health. Agent developers need per-agent drill-downs. Tenant admins need isolated usage reports. Forge provides all four out of the box.
Overview Dashboard
High-level metrics across all tenants and agents: total requests, cost, latency percentiles, cache hit rate, security events, and active sessions. The overview is designed for platform operators who need a quick health check.
Per-Agent Dashboard
Deep dive into a single agent's performance. Track which models it uses, how much it spends, its average response quality, memory utilization, and tool invocations. Compare agents side-by-side to identify optimization opportunities.
Per-Tenant Dashboard
Isolated view for multi-tenant platforms. Each tenant sees only their own data: usage, spend, agents, sessions, and security events. Tenant dashboards support RBAC so tenant admins can invite team members with scoped permissions.
Cost Analytics
Financial reporting with daily, weekly, and monthly breakdowns. Forecast future spend based on growth trends. Identify cost anomalies and runaway agents. Export reports in FOCUS-compliant format for enterprise chargeback.
Observability Stack
Forge integrates with three observability layers: Langfuse for LLM-specific tracing, Opik for experiment tracking, and OpenLLMetry for standards-based telemetry export to your existing monitoring infrastructure.
Langfuse
Primary tracing and analytics platform
Self-hosted Langfuse instance captures every trace, span, and generation event. Forge instruments all internal operations with Langfuse spans, giving you millisecond-level visibility into routing decisions, cache lookups, security scans, and LLM calls. Langfuse's web UI provides search, filtering, and visualization out of the box.
Opik
Experiment tracking and evaluation
Run structured experiments comparing prompt variants, model configurations, and routing strategies. Opik tracks experiment metadata, input/output pairs, and quality scores in a dedicated experiment store. Integrate with CI/CD to run evaluation suites on every deployment.
OpenLLMetry + OpenTelemetry
Standards-based telemetry export
OpenLLMetry auto-instruments LLM provider SDKs and exports spans in OpenTelemetry format. Connect to Datadog, New Relic, Grafana, Honeycomb, or any OTLP-compatible backend. Use standard OTEL collectors, dashboards, and alerting rules alongside Forge-specific data.
Alerting & Anomaly Detection
Forge monitors eight categories of signals and fires alerts through your preferred channels: Slack, PagerDuty, email, webhooks, or Telegram. Alerts are configurable per tenant, per agent, and per severity level. Critical alerts like security events and runaway agents trigger immediately. Cost and latency alerts use rolling window averages to avoid false positives.
Anomaly detection uses statistical baselines to identify unusual patterns without requiring manual threshold configuration. When a metric deviates significantly from its historical norm, Forge raises a warning before it becomes a critical issue.
Cost Spike
Triggered when spend exceeds a percentage threshold above the rolling average
Latency Regression
Fired when P95 latency rises above a configured SLA target for a sustained period
Error Rate
Alerts when provider error rates exceed normal thresholds, often indicating an outage
Security Event
Immediate notification when ForgeGuard blocks a request or Augustus detects a breach
Quota Exhaustion
Warning when a tenant or agent approaches their credit or rate limit ceiling
Quality Drop
Triggered when response quality scores (ELO, user feedback) fall below baseline
Runaway Agent
Circuit breaker notification when an agent's cost velocity exceeds safe bounds
Memory Anomaly
Fired when vector write patterns suggest potential memory poisoning or data drift
Know exactly where every dollar goes
Observability is enabled by default on every Forge request. Traces appear in your Langfuse dashboard within seconds. No additional setup required.