Name: sharaf/professional-design-system-architect
Rating: 100 (1 reviews)
Author: sharaf

sharaf/professional-design-system-architect

Design, build, or audit professional UI design systems across strategy, product language, foundations, tokens, components, patterns, accessibility, content, Figma/code libraries, documentation, QA, governance, adoption, measurement, theming, releases, and migration. Use when the user wants to create a design-system blueprint, review an existing design system, fix design-system drift, plan Figma/code parity, define token or component architecture, evaluate accessibility and governance maturity, or sequence design-system adoption and migration work.

100

1.75x

Quality

100%

Does it follow best practices?

Impact

100%

1.75x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

{
  "context": "Tests whether the agent follows the audit workflow correctly: building an evidence inventory before judging, scoring maturity per domain on a 0-4 scale without averaging into a single score, writing findings that include all six required fields, avoiding vague findings, grouping the roadmap into Stabilize/Standardize/Scale, labeling evidence gaps, not conflating dark mode with high contrast, and not treating component count as quality.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Evidence inventory present",
      "description": "The report includes an evidence inventory table (or equivalent structured section) that lists audit areas and marks each as Reviewed, Partial, Not Provided, or Not Applicable — BEFORE any scores or findings are presented.",
      "max_score": 10
    },
    {
      "name": "Mode stated as audit",
      "description": "The report explicitly identifies the mode as 'audit' (not 'blueprint') and includes a completed First Actions checklist or equivalent preamble covering mode, inputs checked, and assumptions/evidence gaps.",
      "max_score": 5
    },
    {
      "name": "Per-domain maturity scores (0–4)",
      "description": "The maturity scorecard assigns a score from 0 to 4 to each assessed domain, using the defined scale (0=absent, 1=ad hoc, 2=defined but incomplete, 3=operational, 4=measured/governed/scalable).",
      "max_score": 10
    },
    {
      "name": "Scores NOT averaged into single number",
      "description": "The report does NOT compute a single composite health score or average the domain scores into one headline number as the primary maturity summary.",
      "max_score": 8
    },
    {
      "name": "Findings contain all 6 required fields",
      "description": "Each individual finding block includes all six required fields: Evidence, Why it matters, Affected surfaces, Recommended fix, Owner or function, Sequencing dependency.",
      "max_score": 15
    },
    {
      "name": "Findings avoid vague language",
      "description": "Findings do NOT use generic phrases like 'needs better documentation' without specifying which surface is missing and what the user impact is.",
      "max_score": 8
    },
    {
      "name": "Findings ordered by severity",
      "description": "Findings are organized by severity in the order: Critical, High, Medium, Low — not sorted by domain, alphabetically, or arbitrarily.",
      "max_score": 8
    },
    {
      "name": "Roadmap grouped Stabilize/Standardize/Scale",
      "description": "The remediation roadmap explicitly groups recommendations into the three tracks: Stabilize, Standardize, and Scale.",
      "max_score": 10
    },
    {
      "name": "Evidence gaps labeled",
      "description": "The report clearly labels what could not be assessed because artifacts were not provided (e.g., no analytics data, no WCAG audit results, no Figma access), rather than implying those areas were reviewed or skipping them silently.",
      "max_score": 8
    },
    {
      "name": "Dark mode NOT equated with high contrast",
      "description": "The report treats dark mode and high contrast as distinct requirements — it does NOT count the dark-mode theme as a high-contrast accessibility mode or adequate accessibility coverage.",
      "max_score": 10
    },
    {
      "name": "Component count not equated with quality",
      "description": "The report does NOT treat the presence of 47 components as evidence of component quality — findings reference missing states, accessibility gaps, or missing documentation as quality deficits.",
      "max_score": 8
    }
  ]
}

evals

scenario-1

scenario-2

criteria.json

task.md

scenario-3

references

README.md

SKILL.md

tile.json

sharaf/professional-design-system-architect

criteria.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-2/

criteria.jsonevals/scenario-2/