Use when the user wants to design, critique, or refine a website theme, visual style, look and feel, branding, CSS theme, design system, tokens.css file, or CSS token system. Produces theme directions, typography/color/composition/motion systems, implementation-ready CSS custom properties, component motif rules, and light/dark/density variant strategy.
99
100%
Does it follow best practices?
Impact
99%
1.45xAverage score across 7 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent uses the full design-mode workflow for a theme direction, not just competent visual-design prose. The strongest answers explore routes with tradeoffs, commit to one clear direction contract, avoid architecture-category defaults, and turn the concept into a system that survives dense portfolio pages, mobile, accessibility, and implementation.",
"type": "weighted_checklist",
"checklist": [
{
"name": "Routes and recommendation have tradeoffs",
"description": "The output presents 2-3 distinct named routes before the full direction, names a concrete tradeoff or risk for every route including the recommended one, and states why one route should lead.",
"max_score": 10
},
{
"name": "Direction contract is explicit",
"description": "The recommended direction has one consolidated contract: 3-5 listed brand adjectives, exactly one primary visual archetype, and one explicit anti-target that is used to reject weaker choices.",
"max_score": 12
},
{
"name": "Category-default avoidance is specific",
"description": "The theme explicitly avoids default architecture-portfolio sameness such as minimal white boxes, oversized generic sans typography, and black-and-white photo grids, while translating adaptive reuse, material layering, time, and civic trust into interface decisions rather than cliches.",
"max_score": 12
},
{
"name": "Typography supports institution-scale content",
"description": "Typography recommendations include role-level font choices, concrete size/leading values for multiple roles, and rationale for long case studies, captions, drawings, team bios, and institutional decision-maker scanning. The solution should not default to Helvetica-style neutrality.",
"max_score": 10
},
{
"name": "Color roles and mode behavior are usable",
"description": "The color system names at least five semantic roles such as background, surface, text, accent, border, muted, focus, or glow; states light/dark or dual-mode behavior; and discusses contrast or legibility so the atmosphere remains rigorous and warm without becoming a beige or monochrome cliche.",
"max_score": 10
},
{
"name": "Composition rhythm governs multiple page types",
"description": "The composition section names one clear primary rhythm from modular, editorial, asymmetrical, dashboard, or chaptered, then defines rules for homepage, project index, case study pages, dense captions, drawings, team pages, and contact rather than only describing a hero layout. Secondary layout modes are acceptable when they are subordinate to the primary rhythm. Award full credit when the site-wide primary rhythm is clear from the composition section and page rules, even if one page type receives the most explicit label.",
"max_score": 12
},
{
"name": "Theme identity survives inner pages and mobile",
"description": "The answer states what remains invariant across desktop, mobile, and dense inner pages, plus what can recompose. The personality should not depend on one desktop hero or collapse into generic stacked cards on mobile.",
"max_score": 10
},
{
"name": "Surface and motifs repeat across components",
"description": "The direction defines 2-4 recurring motifs or surface rules and maps them to concrete behavior across project cards, navigation, image/drawing pairings, captions, CTAs, filters, and case-study modules.",
"max_score": 10
},
{
"name": "Motion and accessibility are designed together",
"description": "The motion section defines a small intentional motion vocabulary, states exactly what happens under prefers-reduced-motion, and names accessibility or performance constraints that prevent expressive choices from hurting usability.",
"max_score": 7
},
{
"name": "Implementation token mapping is buildable",
"description": "The implementation section maps repeated decisions to CSS custom properties or named design tokens, including at least color, type, spacing, motif, state, and variant or density strategy tokens.",
"max_score": 7
}
]
}