airmy.dev/platform/monitoring

/ Platform · Monitoring

Full observability,
zero config.

Every agent call, every output, every latency spike — monitored, logged, and queryable from the moment you deploy. No third-party observability tool required.

<200ms
dashboard load
10B+
events indexed / mo
99.9%
query SLA
13mo
log retention

/ Capabilities

Real-Time Agent Dashboard

A live view of every deployed agent: call volume, latency distribution, error rates, and token usage. Drill down to individual calls in one click.

P50/P95/P99 latency histograms per agent
Call volume sparklines with real-time refresh
Live error feed with full stack traces
Token usage and cost breakdown per agent

Smart Alerting

Define alert rules in plain language or JSON. Get notified via email, Slack, PagerDuty, or webhook the moment a threshold is breached.

Composite rules (latency AND error rate)
Alert suppression during deployments
Escalation policies per team
Runbook links attached to each alert

Audit Log Explorer

Every agent invocation is logged immutably. Search, filter, and export audit logs with millisecond precision. Full compliance traceability out of the box.

Full input/output capture (opt-in)
Tamper-evident append-only log
Sub-second log search via full-text index
1-click export to SIEM (Splunk, Datadog, CloudWatch)

Cost Analytics

Track spend at every level — per agent, per team, per project, per API key. Set budget alerts and forecast usage before surprises arrive.

Hourly cost breakdown by agent
Budget cap alerts at 50% / 80% / 100%
Team chargeback reports (CSV export)
Forecasting with 30-day rolling average

Multi-Agent Flow Tracing

Trace requests across complex multi-agent pipelines. See exactly which agents were invoked, in what order, and where latency accumulated.

Distributed trace waterfall view
Parent/child span relationships
Cross-agent latency attribution
OpenTelemetry-compatible export

SIEM & External Export

Push every event to your existing observability stack. Native connectors for Datadog, Splunk, New Relic, Grafana, and AWS CloudWatch.

Real-time streaming export (<2s lag)
Structured JSON schema, fully documented
Custom field mapping per destination
Replay historical windows on demand

/ Works with your stack

DatadogSplunkNew RelicGrafanaAWS CloudWatchPagerDutyOpsGenieSlack

/ Live Dashboard Preview

Agent Runtime · Live· updated 1s ago
1H6H24H7D
2,847,291
total calls
48ms
avg latency
0.03%
error rate
47
active agents
Call volume — last 24hpeak: 187k/hr
24h agonow

Recent Errors

03:14:22agent/summarize-v2: context_length_exceeded — input tokens 32,847 > limit 32,768400
02:58:07agent/code-review: upstream_timeout — model endpoint did not respond within 30s504
01:33:51agent/data-extract: rate_limit_exceeded — org-level RPM quota hit (tier: business)429

/ Monitoring by Plan

Free

Basic Metrics

Get started with essential observability

Call volume & error rate
Basic latency metrics
7-day data retention
Dashboard access
Smart alerting
Audit log explorer
SIEM export
Custom dashboards
Get Started Free

Business

Full Metrics

Complete observability for production teams

All Free metrics
P50/P95/P99 histograms
90-day retention
Smart alerting (email + Slack)
Cost analytics
Multi-agent tracing
SIEM export
Custom dashboards
Upgrade to Business

Enterprise

Unlimited

Maximum retention, compliance, and control

All Business metrics
Unlimited retention
Audit logs with full I/O
SIEM export (Datadog, Splunk…)
Custom dashboards
Dedicated metrics endpoint
SLA-backed query performance
OpenTelemetry export
Talk to Sales →

Start monitoring your agents in under 60 seconds.

Deploy an agent and get full observability instantly — no SDK changes required.

Deploy Agent