ARTICLE

How Amazon’s Q CLI and Kiro can turn specs into automated QA

AI is moving beyond just writing code. The next frontier is ensuring that the code does what it’s supposed to by validating its own work against defined requirements.

Paul Sawers

·11 Sept 2025·5 min read

AI in software development is moving beyond just writing code. The next frontier is making sure that code actually does what it’s supposed to, including enabling AI to validate its own work against clearly defined requirements.

With that in mind, AI Native Dev last week reported on Amazon’s move to bring custom agents into its Q Developer CLI, allowing developers to spin up purpose-built versions of Q Developer (Amazon’s AI coding assistant) scoped to particular tasks. What makes that especially powerful is how those agents can plug directly into Kiro, an agentic IDE AWS debuted in July that turns natural-language prompts into structured specifications like requirements, design notes, and implementation guides, then uses them to steer the development process (listen to the AI Naive Dev podcast all about Kiro).

Now, the folks at AWS are showcasing how Q CLI agents can be guided by those very specifications, using Kiro’s structured docs as both blueprint and test plan. Instead of a developer telling an agent what to check, the spec itself defines what should be validated.

AI goes from code generation to self-validation

Massimo Re Ferrè, a product management director at AWS, introduced a hands-on experiment that did just that.

“I often spend time to check manually if the result of my prompt leads to good results… why should I (or someone in QA) follow that checklist, instead of having ‘an AI’ go through it, and report back with its findings?,” Re Ferrè wrote. “Enter Amazon Q CLI.”

Instead of manual QA, Re Ferrè crafted a custom Q CLI agent equipped with Playwright (for UI automation) and Fetch (for API testing), exposed as MCP servers. He defined it in a simple JSON config — giving the agent access to those tools, pointing it at Kiro’s requirements.md, and restricting its permissions so it could read files and run checks but not alter the codebase. From there, the agent navigated the generated application and validated each acceptance criterion, producing a structured report of passes, failures, and partial matches.

{ "$schema": "https://raw.githubusercontent.com/aws/amazon-q-developer-cli/refs/heads/main/schemas/agent-v1.json", "name": "kiroqa", "description": "An agent to QA Kiro specs requirements", "mcpServers": { "fetch": { "command": "uvx", "args": \["mcp-server-fetch"\] }, "playwright": { "command": "npx", "args": \["@playwright/mcp@latest"\] } }, "allowedTools": \["fs\_read", "execute\_bash", "@fetch", "@playwright"\], "resources": \["file://requirements.md"\] }

Next, Re Ferrè extended the setup with a companion markdown file, test_requirements_prompt.md, which laid out what the agent should do during its QA run. Rather than hand-craft it, he prompted Q CLI to generate the script, then iterated between manual tweaks and further AI refinements. With that in place, he ran his go-to “litmus test” (a small Flask voting app spec) and let the agent automatically check each acceptance criterion, outputting a structured report of passes, failures, and partial matches.

A proof-of-concept with promise

Re Ferrè is careful to frame the exercise as a proof-of-concept, rather than a production-ready setup. The experiment shows how agents can validate Kiro’s specs automatically, but he also emphasizes the importance of keeping them constrained and under human control. And while his conclusions are worth noting, they also come with the caveat that Re Ferrè is an AWS employee, so he is incentivized to present the company’s tools favorably. But of course, anyone is free to take the same approach and test it for themselves.

Nonetheless, the experiment demonstrates a practical loop where AI not only builds software but also validates it against the specifications that defined it. It’s a glimpse at a future where QA is continuous, automated, and spec-driven by design.

COPY & SHARE

Paul Sawers

Freelance tech writer at Tessl, former TechCrunch senior writer covering startups and open source

130 posts

READING

IN THIS POST

AI goes from code generation to self-validation A proof-of-concept with promise

COPY & SHARE

Paul Sawers

Freelance tech writer at Tessl, former TechCrunch senior writer covering startups and open source

130 posts

YOUR NEXT READ

What GitHub learned when better tools made Copilot code review worse

GitHub's migration of Copilot code review to shared tools initially worsened performance. Rewriting instructions improved accuracy and reduced costs by 20%.

Paul Sawers

·14 Jul 2026·8 min read

How Amazon’s Q CLI and Kiro can turn specs into automated QA

AI goes from code generation to self-validation

A proof-of-concept with promise

What GitHub learned when better tools made Copilot code review worse

More articles by Paul Sawers