/ Agent Studio

Roadmap preview

Design agents.
Test them. Ship them.

Agent Studio is planned as the visual workspace for configuring, fine-tuning, and evaluating agents before they go to production. Today, teams deploy catalog agents from the dashboard and API.

Open Marketplace Open Playground

01 / Visual Configuration Editor

Configure without code.

Set system prompts, bind tools, adjust model parameters, and configure context windows — all from a clean visual interface. Changes preview in real-time so you can iterate quickly without redeploying.

Drag-and-drop tool binding
Real-time preview pane
Export to YAML or JSON

Configuration

System prompt2,048 tokens

Context window128k

Temperature0.7

Tools boundsql, python, file-io

Test Harness

test_revenue_query48ms

test_date_parsing31ms

test_null_handlingfailed

02 / Interactive Test Harness

Catch regressions before production.

Run test suites against your agent directly in the browser. Compare outputs across versions side-by-side. Every save runs your suite automatically so regressions never reach production users.

03 / Fine-tuning Pipeline

Teach your agents new behaviours.

Provide example input/output pairs and AIRMY handles the fine-tuning job end to end. Track training loss live, preview model outputs at each checkpoint, and publish to production when the eval metrics satisfy your threshold.

Upload CSV or JSONL training data
Live training loss chart
Checkpoint comparison

Training Job — data-engineer-v2

StatusTraining

Step842 / 2000

Training loss0.1847

Validation loss0.2103

Data Analyst

Skilled at SQL queries and data summarisation.

Code Reviewer

Detailed PR review with security and style notes.

Customer Support

Empathetic, concise, escalation-aware.

Legal Summariser

Plain-language contract and clause summarisation.

04 / Prompt Template Library

200+ battle-tested system prompts.

Choose from 200+ community-contributed and AIRMY-verified system prompt templates covering data analysis, code review, customer support, legal summarisation, and more. Fully customisable — use as a starting point or deploy as-is.

05 / Evaluation Metrics Dashboard

Measure what matters.

Track accuracy, consistency, latency, and token usage across agent versions in a unified dashboard. AIRMY's built-in eval harness runs automated tests on every save, so you always have a fresh signal before pushing to production.

Eval Metrics — v1.4 vs v1.5

Accuracy

v1.4: 91.2%v1.5: 94.7%

P50 latency

v1.4: 88msv1.5: 61ms

Tokens / call

v1.4: 1,420v1.5: 1,180

Review Queue

Increased context window to 128k

Priya Nair · awaiting approval

Updated system prompt — tone adjustment

James D. · approved 1h ago

06 / Collaboration & Review

Ship together, safely.

Invite team members to collaborate on agent configuration. Review proposed changes in a Git-style diff view. Require approval from a designated reviewer before any change reaches production — keeping your critical agents stable.

Your agents deserve a proper workbench.

Use the marketplace and dashboard today while Agent Studio moves through the roadmap.

Open Marketplace

Design agents.Test them. Ship them.