Forge is preparing the requested surface and verifying the live route.
Forge is preparing the requested surface and verifying the live route.
Self-hosted Langfuse tracing, OpenTelemetry-compatible telemetry, cost analytics, latency tracking, and per-agent dashboards. Full visibility, zero data leaving your infrastructure.
Forge is not a black box. Every request generates detailed tracing data through every stage of the pipeline -- from the initial security scan to the final output delivery. No sampling, no approximation, every single request is traced.
Every API call generates a complete trace through the ForgeGuard security pipeline, routing layer, memory retrieval, LLM provider call, and output scanning. Traces capture token counts, latency at each stage, model selection decisions, cache hits, and security scan results. Drill into any trace to understand exactly why a specific model was chosen and how the request was processed.
Track LLM spend per tenant, per agent, per user, and per feature. Forge breaks down costs by provider, model, token type (input vs. output), and request category. Compare actual spend against budgets in real-time. Identify which agents are most cost-efficient and which conversations consume the most tokens.
Monitor P50, P95, and P99 latencies across every stage of the request pipeline. Identify bottlenecks in security scanning, memory retrieval, or provider response times. Set alerts for latency regressions and track trends over time. Latency data feeds back into the routing layer to automatically deprioritize slow providers.
Opik integration enables structured experiment tracking for prompt engineering and model evaluation. Compare outputs across models, track quality metrics over time, and run A/B tests on prompt variants. Experiments link directly to traces so you can see real production data alongside controlled test results.
Different stakeholders need different views. Platform operators need system-wide health. Agent developers need per-agent drill-downs. Tenant admins need isolated usage reports. Forge provides all four out of the box.
High-level metrics across all tenants and agents: total requests, cost, latency percentiles, cache hit rate, security events, and active sessions. The overview is designed for platform operators who need a quick health check.
Deep dive into a single agent's performance. Track which models it uses, how much it spends, its average response quality, memory utilization, and tool invocations. Compare agents side-by-side to identify optimization opportunities.
Isolated view for multi-tenant platforms. Each tenant sees only their own data: usage, spend, agents, sessions, and security events. Tenant dashboards support RBAC so tenant admins can invite team members with scoped permissions.
Financial reporting with daily, weekly, and monthly breakdowns. Forecast future spend based on growth trends. Identify cost anomalies and runaway agents. Export reports in FOCUS-compliant format for enterprise chargeback.
Forge integrates with three observability layers: Langfuse for LLM-specific tracing, Opik for experiment tracking, and OpenLLMetry for standards-based telemetry export to your existing monitoring infrastructure.
Primary tracing and analytics platform
Self-hosted Langfuse instance captures every trace, span, and generation event. Forge instruments all internal operations with Langfuse spans, giving you millisecond-level visibility into routing decisions, cache lookups, security scans, and LLM calls. Langfuse's web UI provides search, filtering, and visualization out of the box.
Experiment tracking and evaluation
Run structured experiments comparing prompt variants, model configurations, and routing strategies. Opik tracks experiment metadata, input/output pairs, and quality scores in a dedicated experiment store. Integrate with CI/CD to run evaluation suites on every deployment.
Standards-based telemetry export
OpenLLMetry auto-instruments LLM provider SDKs and exports spans in OpenTelemetry format. Connect to Datadog, New Relic, Grafana, Honeycomb, or any OTLP-compatible backend. Use standard OTEL collectors, dashboards, and alerting rules alongside Forge-specific data.
Forge monitors eight categories of signals and fires alerts through your preferred channels: Slack, PagerDuty, email, webhooks, or Telegram. Alerts are configurable per tenant, per agent, and per severity level. Critical alerts like security events and runaway agents trigger immediately. Cost and latency alerts use rolling window averages to avoid false positives.
Anomaly detection uses statistical baselines to identify unusual patterns without requiring manual threshold configuration. When a metric deviates significantly from its historical norm, Forge raises a warning before it becomes a critical issue.
Triggered when spend exceeds a percentage threshold above the rolling average
Fired when P95 latency rises above a configured SLA target for a sustained period
Alerts when provider error rates exceed normal thresholds, often indicating an outage
Immediate notification when ForgeGuard blocks a request or Augustus detects a breach
Warning when a tenant or agent approaches their credit or rate limit ceiling
Triggered when response quality scores (ELO, user feedback) fall below baseline
Circuit breaker notification when an agent's cost velocity exceeds safe bounds
Fired when vector write patterns suggest potential memory poisoning or data drift
Observability is enabled by default on every Forge request. Traces appear in your Langfuse dashboard within seconds. No additional setup required.