Logo
Book a Demo
CareersDocsRegistryBook a Demo

ARTICLE

How Amazon’s Q CLI and Kiro can turn specs into automated QA

AI is moving beyond just writing code. The next frontier is ensuring that the code does what it’s supposed to by validating its own work against defined requirements.

Paul Sawers

·11 Sept 2025·5 min read

AI in software development is moving beyond just writing code. The next frontier is making sure that code actually does what it’s supposed to, including enabling AI to validate its own work against clearly defined requirements.

With that in mind, AI Native Dev last week reported on Amazon’s move to bring custom agents into its Q Developer CLI, allowing developers to spin up purpose-built versions of Q Developer (Amazon’s AI coding assistant) scoped to particular tasks. What makes that especially powerful is how those agents can plug directly into Kiro, an agentic IDE AWS debuted in July that turns natural-language prompts into structured specifications like requirements, design notes, and implementation guides, then uses them to steer the development process (listen to the AI Naive Dev podcast all about Kiro).

Now, the folks at AWS are showcasing how Q CLI agents can be guided by those very specifications, using Kiro’s structured docs as both blueprint and test plan. Instead of a developer telling an agent what to check, the spec itself defines what should be validated.

AI goes from code generation to self-validation

Massimo Re Ferrè, a product management director at AWS, introduced a hands-on experiment that did just that.

“I often spend time to check manually if the result of my prompt leads to good results… why should I (or someone in QA) follow that checklist, instead of having ‘an AI’ go through it, and report back with its findings?,” Re Ferrè wrote. “Enter Amazon Q CLI.”

Instead of manual QA, Re Ferrè crafted a custom Q CLI agent equipped with Playwright (for UI automation) and Fetch (for API testing), exposed as MCP servers. He defined it in a simple JSON config — giving the agent access to those tools, pointing it at Kiro’s requirements.md, and restricting its permissions so it could read files and run checks but not alter the codebase. From there, the agent navigated the generated application and validated each acceptance criterion, producing a structured report of passes, failures, and partial matches.

{ "$schema": "https://raw.githubusercontent.com/aws/amazon-q-developer-cli/refs/heads/main/schemas/agent-v1.json", "name": "kiroqa", "description": "An agent to QA Kiro specs requirements", "mcpServers": { "fetch": { "command": "uvx", "args": \["mcp-server-fetch"\] }, "playwright": { "command": "npx", "args": \["@playwright/mcp@latest"\] } }, "allowedTools": \["fs\_read", "execute\_bash", "@fetch", "@playwright"\], "resources": \["file://requirements.md"\] }

Next, Re Ferrè extended the setup with a companion markdown file, test_requirements_prompt.md, which laid out what the agent should do during its QA run. Rather than hand-craft it, he prompted Q CLI to generate the script, then iterated between manual tweaks and further AI refinements. With that in place, he ran his go-to “litmus test” (a small Flask voting app spec) and let the agent automatically check each acceptance criterion, outputting a structured report of passes, failures, and partial matches.

A proof-of-concept with promise

Re Ferrè is careful to frame the exercise as a proof-of-concept, rather than a production-ready setup. The experiment shows how agents can validate Kiro’s specs automatically, but he also emphasizes the importance of keeping them constrained and under human control. And while his conclusions are worth noting, they also come with the caveat that Re Ferrè is an AWS employee, so he is incentivized to present the company’s tools favorably. But of course, anyone is free to take the same approach and test it for themselves.

Nonetheless, the experiment demonstrates a practical loop where AI not only builds software but also validates it against the specifications that defined it. It’s a glimpse at a future where QA is continuous, automated, and spec-driven by design.

COPY & SHARE

Paul Sawers

Freelance tech writer at Tessl, former TechCrunch senior writer covering startups and open source

READING

·

0%

IN THIS POST

AI goes from code generation to self-validationA proof-of-concept with promise

COPY & SHARE

Paul Sawers

Freelance tech writer at Tessl, former TechCrunch senior writer covering startups and open source

YOUR NEXT READ

The model's solved, now comes the hard part: Reviewability as the bottleneck

AI engineering shifts focus from model development to ensuring system reviewability, emphasizing manageable task sizes for reliable and governable outputs.

Paul Sawers

·2 Jun 2026·9 min read
Read more

More articles by Paul Sawers

See all articles

OpenAI is shutting down self-serve fine-tuning – what this signals for enterprise AI

OpenAI is phasing out self-serve fine-tuning, citing advanced models reducing its necessity, signaling a shift in enterprise AI towards infrastructure challenges.

Paul Sawers·20 May 2026