Free plan — first 10 evals each month

Welcome to Kiln

Test whether coding agents can successfully integrate your API. Run real agents, grade the results, see exactly where they fail.

1

Define a task

Describe a realistic integration task — "Build a checkout flow using our Payments SDK."

2

Add your context

Link your docs, SDK repo, or upload example files. This is what the agent sees.

3

Set pass/fail tests

Define assertions: HTTP checks, file existence, shell commands, or LLM-judged criteria.

4

Get your report

We run the agent in an isolated sandbox, grade the output, and give you a shareable report URL showing exactly where the agent succeeded or failed.

Create Your First Eval →

Free plan — 10 evals/month, no credit card required