Audit a documentation site for agent-friendliness: discovery, markdown delivery, crawlability, semantic structure, machine-readable surfaces, and content legibility. Use when asked to assess docs.docker.com or any docs site for AI/agent readiness, produce a scored report, compare with external scanners, or generate a remediation list. Triggers on: "audit docs for agent readiness", "how agent-friendly is docs.docker.com", "score our docs for AI agents", "review llms.txt / markdown / crawlability", "create an agent-readiness remediation plan".
93
92%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Advisory
Suggest reviewing before use
Audit the live site, not the source tree alone. Prefer the same fetch path an external agent would use in the wild: direct HTTP requests, sitemap sampling, and page-level inspection.
Do not reduce the result to a homepage-only scan or a binary checklist.
Use $ARGUMENTS as the base URL when provided. Otherwise infer the base
URL from context and state the assumption.
Decide whether the host being audited is:
This matters for optional checks such as MCP, plugin manifests, or other tool discovery files. Do not penalize a docs-only host for missing tooling manifests that belong on a separate service.
For docs.docker.com, treat the public docs host as docs-only. Docker's
MCP server is published separately, so missing MCP files on the docs host
should be reported as N/A, not as a failure.
Always check these resources first:
/llms.txt/llms-full.txt/robots.txt/sitemap.xmlOnly check host-level tool manifests when the host is an app/tool host, mixed host, or explicitly advertises them:
/.well-known/ai-plugin.json/.well-known/agent.json/.well-known/agents.jsonUse the bundled script for a baseline:
bash .agents/skills/agent-readiness-audit/scripts/baseline-probes.sh \
"$ARGUMENTS"The script produces baseline evidence only. You still need to interpret what matters for a docs property and score it with the rubric.
For docs-only hosts, you may skip tool-manifest probes to reduce noise:
CHECK_TOOL_MANIFESTS=0 \
bash .agents/skills/agent-readiness-audit/scripts/baseline-probes.sh \
"$ARGUMENTS"Use the sitemap when available. Do not rely on the homepage alone.
If llms.txt exists, sample some URLs from it as well. This helps catch
stale or misleading discovery surfaces that a sitemap-only sample would miss.
Sample at least 12 pages when the site is large enough, and cover multiple page types:
If the sitemap is missing or unusable, discover pages through internal links and note the lower confidence.
If the site has distinct delivery patterns, sample each one. For example:
For each sampled page, verify:
Accept: text/markdown behavior<page>.md or another stable pathDo not assume a .md mirror exists just because another site uses one.
Verify the actual markdown path the site exposes.
Treat these as separate signals:
If the page advertises dead markdown alternates but a working markdown route exists, do not fail markdown delivery outright. Score it as a discoverability and consistency problem instead.
For API or generated reference pages, also verify whether a machine-readable asset such as OpenAPI YAML is directly linked and fetchable.
Measure structural signals:
h1main and article presence where appropriateAlso make a qualitative judgment about agent legibility:
Measure code block labeling explicitly when code samples are common. A page type with many untagged fenced blocks should lose points even if the prose is otherwise clean.
For page types that intentionally render interactive UIs with JavaScript, judge them separately from normal docs pages. If the HTML shell is thin, check whether the page still provides:
Use references/rubric.md.
Rules:
N/AApply the foundational caps from the rubric. A site with broken discovery or broken markdown delivery should not earn a high grade because it has clean metadata.
Do not average away a weak page type. If one major page type, such as API reference, is materially worse than the rest of the corpus, call it out as the weakest segment and reflect it in the category notes.
If external scanner results are available, compare them to your live findings. Treat them as secondary evidence.
If a scanner and the live fetch disagree:
Turn findings into a short backlog:
P0: fetchability or discovery blockersP1: recurring structural or parity issuesP2: polish, optional manifests, or low-impact enhancementsFor each remediation, include:
Use references/report-template.md.
Always include:
c0aa985
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.