/ Agent Studio

Design agents.
Test them. Ship them.

Agent Studio is the visual workspace for configuring, fine-tuning, and evaluating agents before they go to production.

01 / Visual Configuration Editor

Configure without code.

Set system prompts, bind tools, adjust model parameters, and configure context windows — all from a clean visual interface. Changes preview in real-time so you can iterate quickly without redeploying.

  • Drag-and-drop tool binding
  • Real-time preview pane
  • Export to YAML or JSON

Configuration

System prompt2,048 tokens
Context window128k
Temperature0.7
Tools boundsql, python, file-io

Test Harness

test_revenue_query48ms
test_date_parsing31ms
test_null_handlingfailed

02 / Interactive Test Harness

Catch regressions before production.

Run test suites against your agent directly in the browser. Compare outputs across versions side-by-side. Every save runs your suite automatically so regressions never reach production users.

03 / Fine-tuning Pipeline

Teach your agents new behaviours.

Provide example input/output pairs and AIRMY handles the fine-tuning job end to end. Track training loss live, preview model outputs at each checkpoint, and publish to production when the eval metrics satisfy your threshold.

  • Upload CSV or JSONL training data
  • Live training loss chart
  • Checkpoint comparison

Training Job — data-engineer-v2

StatusTraining
Step842 / 2000
Training loss0.1847
Validation loss0.2103
Data Analyst
Skilled at SQL queries and data summarisation.
Code Reviewer
Detailed PR review with security and style notes.
Customer Support
Empathetic, concise, escalation-aware.
Legal Summariser
Plain-language contract and clause summarisation.

04 / Prompt Template Library

200+ battle-tested system prompts.

Choose from 200+ community-contributed and AIRMY-verified system prompt templates covering data analysis, code review, customer support, legal summarisation, and more. Fully customisable — use as a starting point or deploy as-is.

05 / Evaluation Metrics Dashboard

Measure what matters.

Track accuracy, consistency, latency, and token usage across agent versions in a unified dashboard. AIRMY's built-in eval harness runs automated tests on every save, so you always have a fresh signal before pushing to production.

Eval Metrics — v1.4 vs v1.5

Accuracy
v1.4: 91.2%v1.5: 94.7%
P50 latency
v1.4: 88msv1.5: 61ms
Tokens / call
v1.4: 1,420v1.5: 1,180

Review Queue

PN

Increased context window to 128k

Priya Nair · awaiting approval

JD

Updated system prompt — tone adjustment

James D. · approved 1h ago

06 / Collaboration & Review

Ship together, safely.

Invite team members to collaborate on agent configuration. Review proposed changes in a Git-style diff view. Require approval from a designated reviewer before any change reaches production — keeping your critical agents stable.

Your agents deserve a proper workbench.

Open Agent Studio and start building in minutes.

Open Agent Studio