Semi-automated design quality review for Flows apps. Runs concrete repo probes (grep, lint, build) to propose a draft 1–5 score for each of the official 10 quality-guidelines questions from docs.cognite.com/cdf/flows/guides/quality-guidelines, then asks the user to confirm or override each score. Still requires the user to walk their tasks end-to-end in the running app (Step 2) since navigation and clickability feel cannot be measured statically. Writes reviews/design-review/feedback-round-<N>/design-review-report.md with an overall average and prioritized fix lists. Use when the user asks to run a Flows design review, run the design quality assessment, or run flows-design-review. Must be run AFTER flows-code-review reaches 0 Must Fix and BEFORE flows-external-app-submit.
This is step 3 of the Flows app certification flow:
flows-app-brief → build → flows-code-review → flows-design-review (this skill) → flows-external-app-submitThis is the manual design quality assessment described in docs.cognite.com/cdf/flows/guides/quality-guidelines. Target overall average: 3.8 or higher to be launch-ready.
AskQuestion for every score so answers are structured. For each question present three options: (a) accept the draft score, (b) override with a specific score, (c) override + add a note.App-Brief.md frontmatter when present.Always pre-scan before asking the user anything. Read these sources silently and surface what you found as evidence — never as scores, never auto-saved:
| Source | Use it for |
|---|---|
App-Brief.md frontmatter | Pre-fill primary user (userRole), tasks (oneSentenceStory), success criteria |
package.json | Confirm @cognite/aura is installed and surface its version (informs Q1) |
Latest reviews/code-review/feedback-round-<N>/code-review-report.md | Pull design-adjacent findings (accessibility, error handling, UX copy) and present them as evidence under Q4/Q10 |
src/**/*.{ts,tsx,css} | Q1 probe — grep for hard-coded hex/rgb colors and raw px/rem values outside Aura tokens |
src/**/*.{ts,tsx} | Q5 probe — onClick on non-button elements without role/tabIndex |
src/**/*.{ts,tsx} | Q10 probe — icon buttons missing aria-label, <img> without alt, missing focus styles |
Show the user the pre-scan results in your opening message before any scoring. They are starting points, not verdicts. The manual task walkthrough (Step 2) and user-assigned scores remain authoritative.
Look at reviews/design-review/. If it doesn't exist, this is round 1. Otherwise increment to the next missing feedback-round-<N>/ directory.
Per the docs, "the quality assessment is only as useful as the clarity of the user and tasks it's based on."
If App-Brief.md exists, parse userRole, oneSentenceStory, and successCriteria from its frontmatter and propose them as the primary user and tasks. Ask the user to confirm or extend.
Capture, via AskQuestion:
Instruct the user to:
For each task, prompt the user to paste back: what happened, where they got stuck, and any screenshots / notes. Capture these as taskWalkthroughs[] for the report.
Do NOT proceed to scoring until the user confirms they walked every task. If they refuse, write a stub report that records "task walkthrough skipped" and exits — do not score.
For every question Q1–Q10, follow the same loop:
AskQuestion with three options: (a) accept the proposed score N, (b) override with a specific score, (c) override + add a note.These thresholds are starting points — adjust based on the specific evidence and the rubric language. The user always has the final say.
| Signal | Drift toward |
|---|---|
| 0 anti-pattern matches, lint clean for the relevant rule | 5 |
| ≤ 3 small matches, mostly in one file | 4 |
| 5–15 matches across several files, or 1 systemic issue | 3 |
| 15+ matches, or pervasive anti-pattern | 2 |
| Anti-pattern is the default style | 1 |
Each question's probe list is the first thing the agent should run before asking the user anything about that question. Always state which probes were run and what they returned.
Q1 — Aura design system consistency. Are you using Aura tokens, layouts, components and patterns correctly?
Probes (automatable):
grep -c '@cognite/aura' package.json — confirm Aura is a dependencygrep -rlE "from '@cognite/aura'" --include='*.ts' --include='*.tsx' src | wc -l — count files importing Auragrep -rlE '#[0-9a-fA-F]{3,8}' --include='*.css' --include='*.tsx' --include='*.ts' src — files with hard-coded hex colorsgrep -rlE '\b(rgb|rgba|hsl|hsla)\(' --include='*.tsx' --include='*.css' src — files with raw rgb/hsl valuesnpx eslint . --ext .ts,.tsx --rule '{"aura/no-overriding-styles":"error"}' --no-eslintrc --quiet 2>&1 | tail -5 or read the existing lint output for aura/no-overriding-styles warning countsTranslate to draft score: 0 hard-coded colors + 0 aura/no-overriding-styles warnings → 5. Few warnings (1–5) → 4. Many warnings (>15) or no Aura imports → 2–3.
Q2 — Navigation, layout and hierarchy. Can users tell where they are and navigate easily?
Probes (partially automatable — relies on Step 2 walkthrough):
grep -rcE '<Route\b' --include='*.tsx' src — count routes (informs navigation surface)grep -rlE 'Breadcrumb' --include='*.tsx' src — files using breadcrumb components (location cues)grep -rlE 'NavLink|Link to=|useLocation' --include='*.tsx' src — navigation primitives in usegrep -rlE '<Topbar|<Sidebar|<Header' --include='*.tsx' src — top-level chromesrc/routes/) and ask: does each non-trivial page show its own title and a way back?Translate to draft score: Default to the walkthrough finding since navigation feel is hard to measure statically. Use probes to flag risks (e.g. routes without breadcrumbs).
Q3 — Clear labels and language. Are buttons, inputs, and actions labeled clearly?
Probes (automatable):
grep -rcE ">(Submit|OK|Click here|Go|Yes|No)<" --include='*.tsx' src — count vague button labelsgrep -rcE '<Button[^>]*>[[:space:]]*</Button>' --include='*.tsx' src — empty buttons (icon-only without label needs aria-label, handled in Q10)grep -rlE '<Label\b' --include='*.tsx' src and grep -rlE '<input\b' --include='*.tsx' src — input elements vs labels; mismatch suggests unlabeled inputsgrep -rcE 'placeholder=' --include='*.tsx' src — placeholder-as-label is an anti-pattern; high count without matching <Label> is a smellTranslate to draft score: 0 vague labels + every input has a matching label → 5. Few placeholder-only inputs → 4. Vague labels in several places → 3.
Q4 — System feedback and validation. Do users know what's happening? Are forms easy to use?
Probes (automatable):
grep -rlE 'isLoading|isPending|<Skeleton|<Loader|<Spinner' --include='*.tsx' src — files with loading affordancesgrep -rlE 'isError|onError|<Alert|toast\.' --include='*.tsx' src — files with error/success affordancesgrep -rlE 'useMutation' --include='*.tsx' src — mutation sites; cross-check that each has onSuccess/onError handlersgrep -rlE 'ErrorBoundary' --include='*.tsx' src — error boundaries (also cross-checked in code review)Translate to draft score: Loading and error states present on every fetch/mutation → 5. A few mutations without explicit error handling → 4. Mixed coverage → 3.
Q5 — Clickability and interactions. Is it obvious what's clickable?
Probes (automatable):
grep -rcE '<div[^>]*onClick' --include='*.tsx' src — onClick on <div> (non-semantic, often missing keyboard support)grep -rcE '<span[^>]*onClick' --include='*.tsx' src — same for <span>grep -rcE 'role="button"' --include='*.tsx' src — explicit role assignments (good if <div onClick> is unavoidable)grep -rcE 'hover:|focus:' --include='*.tsx' src — Tailwind hover/focus utility usage (high = good)grep -rcE 'cursor-pointer' --include='*.tsx' src — explicit pointer cursorTranslate to draft score: 0 <div onClick> without role + many hover/focus utilities → 5. 1–3 violations → 4. Many onClick on non-button elements → 2–3.
Q6 — Error prevention and recovery. Can users undo or cancel destructive actions?
Probes (partially automatable):
grep -rilE 'delete|remove|archive|reset' --include='*.tsx' src | head -20 — files with potentially destructive actionsgrep -rlE 'AlertDialog|ConfirmDialog|window\.confirm' --include='*.tsx' src — confirm-dialog usagegrep -rcE 'variant="destructive"|destructive' --include='*.tsx' src — destructive button stylingAlertDialog/ConfirmDialog invocation in the same file or its importsN/A guidance: Read-only viewer apps (the common case for Flows demos) have no destructive actions and should score 5 by default with a "no destructive actions" rationale. Do not penalize an app for not having confirmations it does not need.
Q7 — Responsive design and multi-device support. Does it work on different screen sizes?
Probes (automatable):
grep -rcE '\b(sm|md|lg|xl|2xl):' --include='*.tsx' src — Tailwind responsive utility usage (high = good)grep -E '<meta name="viewport"' index.html — viewport meta tag presentgrep -rcE 'overflow-x-auto|overflow-x-scroll' --include='*.tsx' src — horizontal scroll containers (often a smell)grep -rcE '\bw-\[[0-9]+px\]|\bh-\[[0-9]+px\]' --include='*.tsx' src — fixed-px sizing (usually breaks small screens)App-Brief.md userRole — if it says "desktop or laptop in control room" the app may be intentionally desktop-only; this is acceptable per the rubric ("Hidden or limited on mobile if not intended for mobile")Translate to draft score: If app is desktop-only by design (per App-Brief) and renders cleanly on laptop down to 13" → 5. Mixed responsive utility usage → 4. Many fixed-px sizes → 3.
Q8 — Empty states and first-time experience. When there's no data, is it clear what to do next?
Probes (automatable):
grep -rilE 'empty|no\s+(data|results|items|files|matches)' --include='*.tsx' src — files with empty-state copygrep -rlE '<EmptyState|EmptyPlaceholder' --include='*.tsx' src — explicit empty-state components.list( or .items.map(), check there is at least one branch handling items.length === 0 with user-visible copy. List the panels that DO and DO NOT.grep -rcE 'items\.length === 0|items\.length > 0' --include='*.tsx' src — explicit empty checksTranslate to draft score: Every data-fetching panel has an empty-state branch with copy → 5. One or two missing → 4. Many panels missing → 2–3.
Q9 — Performance and efficiency. Does the app load quickly?
Probes (automatable):
First, check whether a recent build already exists — avoids a slow rebuild when dist/ is fresh:
find dist -maxdepth 1 -newer package.json -name '*.js' 2>/dev/null | wc -l
du -sh dist/ 2>/dev/nullIf the count is 0 (no recent build), fall back to:
npm run build 2>&1 | tail -20Then gather the remaining metrics:
grep -rcE 'React\.lazy|lazy\(' --include='*.tsx' src — code-split routes (good)grep -rcE 'useMemo|useCallback' --include='*.tsx' src — memoization usage (informs render efficiency)grep -rlE 'useVirtual|react-window|react-virtual' --include='*.tsx' src — list virtualization (good for big lists)grep -rlE '\.list\([^)]*\)' --include='*.ts' --include='*.tsx' src | xargs -I{} grep -l 'limit:' {} 2>/dev/null | wc -l vs total list call sites — pagination coveragecode-review-report.md criterion 2.3 (Limits & pages) scoreTranslate to draft score: Build under 1 MB gzipped + every list has a limit + react-query in use → 5. Bundle 1–2 MB or some lists missing limits → 4. Bundle > 2 MB or systemic unbounded fetches → 2–3.
Q10 — Accessibility (WCAG AA 2.1). Can people use it with assistive tech?
Probes (automatable):
<img> tags and <img> tags with alt attributes separately to identify missing alt text:
grep -rcE '<img\b' --include='*.tsx' src
grep -rcE '<img[^>]*\balt=' --include='*.tsx' srcalt.grep -rcE '<button[^>]*>[[:space:]]*<(svg|Icon)' --include='*.tsx' src — icon-only buttons (need aria-label)grep -rcE 'aria-label=' --include='*.tsx' src — ARIA label usagegrep -rcE 'focus-visible:|focus:' --include='*.tsx' src — focus stylesgrep -rcE 'tabIndex=\{-1\}|tabIndex="?-1' --include='*.tsx' src — elements removed from tab order (sometimes intentional, sometimes a bug)eslint-plugin-jsx-a11y is installed: npx eslint . --ext .ts,.tsx --no-eslintrc --rule '{"jsx-a11y/alt-text":"error","jsx-a11y/anchor-is-valid":"error","jsx-a11y/click-events-have-key-events":"error"}' 2>&1 | tail -10axe-core is available: suggest the user run an axe scan in the running app and paste results — automation can flag candidates, not enforce contrastTranslate to draft score: 0 missing alts + 0 icon-only buttons without aria-label + focus styles everywhere → 5. A few violations → 4. Systemic gaps → 2–3.
Average = sum of all 10 scores ÷ 10.
Map to the quality level table from the docs:
| Average | Quality level | Recommendation |
|---|---|---|
| 4.5 – 5.0 | Excellent — ready to launch | Minor improvements over time |
| 3.8 – 4.4 | Good — launch with minor fixes | Address lower-scoring areas |
| 3.0 – 3.7 | Average — needs improvement | Fix major problems before launching |
| Below 3.0 | Needs significant work | Substantial improvements required |
flows-external-app-submit gates on average ≥ 3.8.
Create reviews/design-review/feedback-round-<N>/design-review-report.md with this structure:
# Design Review — <appName> — round <N>
## User and tasks
- **Primary user:** ...
- **Tasks evaluated:**
1. ...
2. ...
3. ...
- **Context:** ...
## Task walkthrough findings
- **Task 1 — ...** ...
- **Task 2 — ...** ...
- **Task 3 — ...** ...
## Scores
| Question | Score | Rationale | Improvement note |
| --- | --- | --- | --- |
| Q1 Aura consistency | n | ... | ... |
| Q2 Navigation & hierarchy | n | ... | ... |
| Q3 Labels & language | n | ... | ... |
| Q4 Feedback & validation | n | ... | ... |
| Q5 Clickability | n | ... | ... |
| Q6 Error prevention | n | ... | ... |
| Q7 Responsive | n | ... | ... |
| Q8 Empty states | n | ... | ... |
| Q9 Performance | n | ... | ... |
| Q10 Accessibility | n | ... | ... |
## Summary
- Average score: <X.X>
- Quality level: <Excellent | Good | Average | Needs significant work>
## Must Fix (any score < 3)
- ...
## Should Fix (any score 3 – 3.7)
- ...
## Nice to Fix (any score 3.8 – 4.4)
- ...The Average score: line must be machine-readable in exactly that format — flows-external-app-submit parses it.
After writing, print to the terminal:
flows-external-app-submit gate (≥ 3.8)d6af887
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.