QA built for conversational AINow in beta

Test your AI agents like you ship real software.

Obodek is the human-in-the-loop QA platform for teams shipping chatbots, voice bots, and AI assistants. Run structured tests, gate releases, and catch what automation misses — before your users do.

obodek.app / acme-support-bot / sprint-24 / regression

Sprint 24 Regression · 84 cases

Run grid
Test caseReviewerTurnsStatus
Refund > 30 days, gift purchasejordan7Pass
Address change mid-checkoutpriya4Pass
Escalate to human, after hoursmateo3Review
PII redaction, voice transcriptjordan11Fail
Multi-language switch (es → en)Queued
Politely refuse out-of-scope askQueued

Trusted by QA teams testing AI agents at

SyntaraVoxlabMeridian AICrestlineNorthpeak

The Problem

Conversational AI broke your QA stack.

The tools and rituals built for deterministic software don't survive contact with non-deterministic agents.

Generic tools weren't built for conversation

Test-case managers assume one input, one output. Real agent QA spans multi-turn dialogue, tone, memory, and edge-case recovery.

Automation misses what humans catch

LLM-as-judge catches the obvious. Humans catch the subtle — passive-aggressive tone, brittle escalation, the answer that's technically right but wrong for your brand.

Spreadsheets don't enforce QA gates

Tabs full of test results can't block a bad release. You need environment gates, audit trails, and sign-off — not another Google Sheet.

Platform

Everything your QA org needs to ship agents.

From structured test grids to release gates, Obodek replaces six tools with one system of record for agent quality.

🧪

Test Grids

Structured multi-turn cases with reviewer assignment, rubrics, and pass/fail criteria — versioned per agent.

🚦

Environment Gating

Block promotion from staging to production unless required test grids pass at your defined threshold.

🐞

Bug Reporting

Reviewers file bugs from inside a test run — full transcript, trace, and reproduction context attached.

📚

Prep Library

Reusable personas, intents, and seed conversations so new test grids start at 80%, not zero.

📝

Change Log

Every prompt, tool, model, and dataset change is timestamped — so you know exactly what broke when.

🔒

Audit Trail

Immutable record of who reviewed what, when, and what verdict they gave. SOC 2-ready out of the box.

How it works

Three steps from prompt to production.

Wire Obodek into your agent pipeline once. Then every release flows through the same gate.

1

Set up your agent

Connect via API or SDK. Define environments, reviewers, and the rubric you care about.

2

Run structured tests

Build grids of multi-turn cases. Mix human review with automated checks for breadth and depth.

3

Gate and promote

Releases are blocked until quality bars are met. Ship to production with the receipts.

Pricing

Start free. Pay when you ship.

Simple, predictable plans. No per-seat surprises, no per-evaluation gotchas.

Free

$0/forever

For solo builders and weekend agents.

  • 1 agent workspace
  • 50 test runs per month
  • 2 reviewers
  • 7-day history
  • Community support
Get started
Most Popular

Pro

$49/mo per agent

For QA teams shipping agents to real users.

  • Unlimited test grids & runs
  • Environment gating & release blocks
  • Up to 20 reviewers
  • Bug reporting + change log
  • 90-day history & exports
  • Priority email support
Start Pro trial

Enterprise

Custom

For regulated industries and large agent fleets.

  • SSO, SCIM, role-based access
  • Immutable audit trail
  • SOC 2 Type II & data residency
  • Custom rubrics & integrations
  • Dedicated CSM & SLAs
Contact us

14-day free trial on Pro. No credit card required.

FAQ

Questions, answered.

How is Obodek different from LLM eval frameworks?+

Eval frameworks score outputs. Obodek is a QA platform — it manages reviewers, enforces release gates, tracks bugs against test cases, and gives you an audit trail. We integrate with eval frameworks; we don't replace the scoring, we replace the workflow around it.

Do reviewers need engineering skills?+

No. The reviewer experience is built for QA analysts, support leads, and domain experts. Engineers wire up the agent and the gates; everyone else runs grids and files bugs through the UI.

What does Obodek connect to?+

Any agent reachable via HTTPS — OpenAI, Anthropic, custom orchestrators, voice stacks like Vapi or Retell. We ship SDKs for TypeScript and Python, plus webhooks for CI integration.

How do you handle voice agents?+

Voice runs are captured with synchronized transcript, audio, and tool-call trace. Reviewers can scrub the audio inline while marking turns pass or fail — latency and tone both count.

Is our data used to train models?+

Never. Customer conversations, test grids, and reviewer notes are tenant-isolated and never used for model training. Enterprise plans add data residency and customer-managed encryption keys.

Ship agents your team can stand behind.

Set up your first test grid in under 10 minutes.