flows-design-review

Semi-automated design quality review for Flows apps. Runs concrete repo probes (grep, lint, build) to propose a draft 1–5 score for each of the official 10 quality-guidelines questions from docs.cognite.com/cdf/flows/guides/quality-guidelines, then asks the user to confirm or override each score. Still requires the user to walk their tasks end-to-end in the running app (Step 2) since navigation and clickability feel cannot be measured statically. Writes reviews/design-review/feedback-round-<N>/design-review-report.md with an overall average and prioritized fix lists. Use when the user asks to run a Flows design review, run the design quality assessment, or run flows-design-review. Must be run AFTER flows-code-review reaches 0 Must Fix and BEFORE flows-external-app-submit.

Quality

—

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Passed

No known issues

Quality

Content

77%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The content is highly actionable with a clear, well-validated workflow, but is verbose and keeps large per-question rubric anchors inline despite no bundle files to offload them to. Tightening repeated prose and externalizing the rubric definitions would lift the weaker dimensions.

Suggestions

Move the ten per-question 5/4/3/2/1 rubric anchor blocks into a references/ file (e.g. quality-rubric.md) and link to it, keeping only the probe list and translation rule per question inline — this would shorten SKILL.md and improve progressive_disclosure.

Trim non-essential rationale such as "This dramatically reduces the manual burden" and "not to grade from scratch" and de-duplicate the repeated "Translate to draft score" framing to improve conciseness.

Where probes repeat across questions (e.g. focus-style, onClick grep patterns), consolidate shared probes into a single reference section instead of restating them per question.

Dimension	Reasoning	Score
Conciseness	The body is mostly efficient and actionable, but contains non-essential rationale prose ("This dramatically reduces the manual burden", "The user's job is to confirm... not to grade from scratch") and repeats the "Translate to draft score"/heuristic framing across all ten question blocks, so it is not the fully lean level 3.	2 / 3
Actionability	Provides exact executable commands per question (grep/eslint/npx), concrete draft-score translation rules, a defined AskQuestion option structure, and a copy-paste-ready machine-readable report template, matching the 'fully executable, copy-paste ready' anchor.	3 / 3
Workflow Clarity	Sequenced Steps 0–6 with explicit validation checkpoints ("Do NOT proceed to scoring until the user confirms"; a stub-report-and-exit refusal path; a fix-and-re-run feedback loop), matching the clear-sequence-with-explicit-validation anchor rather than the implicit-checkpoint level 2.	3 / 3
Progressive Disclosure	No bundle files exist and the skill is a single ~360-line SKILL.md with all ten per-question 5/4/3/2/1 rubric anchors inline — reference-grade content that could be split out — so it matches 'content that should be separate is inline' rather than the well-split level 3.	2 / 3
	Total	10 / 12 Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description is specific, well-triggered, complete, and distinct: it states concrete actions, includes natural "Use when" phrasing, answers both what and when, and is tightly scoped to the Flows certification flow. No improvements needed.

Dimension	Reasoning	Score
Specificity	Lists multiple concrete actions — "Runs concrete repo probes (grep, lint, build) to propose a draft 1–5 score", "asks the user to confirm or override each score", and "Writes reviews/design-review/.../design-review-report.md" — matching the 'lists multiple specific concrete actions' anchor.	3 / 3
Completeness	Clearly answers both what (probes, draft scores, report output) and when (explicit "Use when..." triggers), so it is not the level-2 case where 'when' is only implied.	3 / 3
Trigger Term Quality	"Use when the user asks to run a Flows design review, run the design quality assessment, or run flows-design-review" covers natural phrasings a user would actually say, matching the good-coverage anchor rather than the partial-coverage level 2.	3 / 3
Distinctiveness Conflict Risk	Highly specific niche — Flows app design review, gated "AFTER flows-code-review... and BEFORE flows-external-app-submit" — gives it a clear niche with distinct triggers unlikely to conflict with other skills.	3 / 3
	Total	12 / 12 Passed

Validation

93%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 15 / 16 Passed

Validation for skill structure

Criteria	Description	Result
allowed_tools_field	'allowed-tools' contains unusual tool name(s)	Warning

	Total	15 / 16 Passed

Repository: cognitedata/builder-skills
Commit: ab7b5f8

Reviewed: 1 day ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.