Welcome to Kiln

Test whether coding agents can successfully integrate your API. Run real agents, grade the results, see exactly where they fail.

Describe a realistic integration task — "Build a checkout flow using our Payments SDK."

Link your docs, SDK repo, or upload example files. This is what the agent sees.

Define assertions: HTTP checks, file existence, shell commands, or LLM-judged criteria.

We run the agent in an isolated sandbox, grade the output, and give you a shareable report URL showing exactly where the agent succeeded or failed.

Free plan — 10 evals/month, no credit card required