Optimize your skills and tiles: review SKILL.md quality, generate eval scenarios, run evals, compare across models, diagnose gaps, and re-run until scores improve.
88
94%
Does it follow best practices?
Impact
88%
1.07xAverage score across 24 eval scenarios
Passed
No known issues
You have just downloaded three eval scenarios generated from recent commits. Before running evals, you need to review each scenario for quality issues that would make the eval results unreliable or misleading.
Common anti-patterns in generated scenarios include:
Review each of the three scenarios below and identify any quality issues.
Write a scenario-review.md file that:
=============== FILE: evals/api-rate-limiter/task.md ===============
Your team's API has been experiencing abuse from automated clients sending thousands of requests per second. The existing Express.js application has no rate limiting. You need to add rate limiting middleware that protects the API without impacting legitimate users.
The application uses Express 4.x with an existing middleware chain: CORS, body parser, auth, then route handlers. The rate limiter should be added to this chain.
Add rate limiting to the Express application:
src/middleware/rate-limiter.ts=============== FILE: inputs/src/app.ts =============== import express from 'express'; import cors from 'cors'; import { authMiddleware } from './middleware/auth'; import { apiRouter } from './routes/api';
const app = express();
app.use(cors()); app.use(express.json()); app.use(authMiddleware); app.use('/api', apiRouter);
export default app; =============== END FILE ===============
=============== FILE: evals/api-rate-limiter/criteria.json =============== { "type": "weighted_checklist", "checklist": [ { "name": "Rate limiter middleware created", "max_score": 15, "description": "A new rate limiting middleware file exists at src/middleware/rate-limiter.ts" }, { "name": "Sliding window or token bucket algorithm", "max_score": 20, "description": "The rate limiter uses a sliding window or token bucket algorithm rather than a simple counter reset" }, { "name": "429 status code returned", "max_score": 15, "description": "Rate-limited requests receive HTTP 429 Too Many Requests response" }, { "name": "Middleware ordering correct", "max_score": 20, "description": "Rate limiter is placed after CORS and body parser but before auth middleware in the chain" }, { "name": "Retry-After header included", "max_score": 15, "description": "Rate-limited responses include a Retry-After header indicating when the client can retry" }, { "name": "Configurable limits", "max_score": 15, "description": "Rate limits (requests per window, window duration) are configurable rather than hardcoded" } ] } =============== END FILE ===============
=============== FILE: evals/config-migration/task.md ===============
Your team is standardizing on TOML for all configuration files. You need to migrate the application's main configuration from YAML to TOML format.
The current YAML config is at config/app.yaml. The migrated TOML config should be written to config/app.toml. You also need to update the config loader in src/config/loader.ts to read TOML instead of YAML.
The TOML output should:
@iarna/toml package (already in package.json) with TOML.parse() for reading[database] table[cache] table[logging] tableformat_version = 2config/app.toml with the migrated configurationsrc/config/loader.ts to read from the TOML file=============== FILE: inputs/config/app.yaml =============== app: name: platform-api port: 3000 environment: production
database: host: db.internal.acme.com port: 5432 name: platform pool_size: 20
cache: host: redis.internal.acme.com port: 6379 ttl: 3600
logging: level: info format: json output: stdout =============== END FILE ===============
=============== FILE: inputs/src/config/loader.ts =============== import * as yaml from 'js-yaml'; import * as fs from 'fs';
export function loadConfig() { const raw = fs.readFileSync('config/app.yaml', 'utf-8'); return yaml.load(raw) as Record<string, unknown>; } =============== END FILE ===============
=============== FILE: evals/config-migration/criteria.json =============== { "type": "weighted_checklist", "checklist": [ { "name": "TOML file created with correct structure", "max_score": 20, "description": "config/app.toml exists with [database], [cache], and [logging] tables containing the migrated values" }, { "name": "Uses @iarna/toml package", "max_score": 15, "description": "The updated loader imports and uses @iarna/toml with TOML.parse() to read the config" }, { "name": "Format version key present", "max_score": 15, "description": "The TOML file includes format_version = 2 as a top-level key" }, { "name": "Database section uses TOML table", "max_score": 15, "description": "Database settings are under a [database] TOML table with host, port, name, and pool_size keys" }, { "name": "Loader reads correct file path", "max_score": 15, "description": "The config loader reads from 'config/app.toml' instead of 'config/app.yaml'" }, { "name": "Old YAML file handled", "max_score": 20, "description": "The old config/app.yaml is either deleted or renamed to indicate it's deprecated" } ] } =============== END FILE ===============
=============== FILE: evals/error-boundary-component/task.md ===============
Your React application currently has no error boundaries. When a component throws during rendering, the entire app crashes to a white screen. You need to add error boundary components that catch rendering errors and display fallback UIs.
The application uses React 18 with TypeScript. There are two main areas that need protection: the top-level app shell and individual route-level page components.
ErrorBoundary component in src/components/ErrorBoundary.tsxsrc/components/ErrorFallback.tsx=============== FILE: inputs/src/App.tsx =============== import { BrowserRouter, Routes, Route } from 'react-router-dom'; import { Header } from './components/Header'; import { Dashboard } from './pages/Dashboard'; import { Settings } from './pages/Settings'; import { Profile } from './pages/Profile';
export function App() { return ( <BrowserRouter> <Header /> <main> <Routes> <Route path="/" element={<Dashboard />} /> <Route path="/settings" element={<Settings />} /> <Route path="/profile" element={<Profile />} /> </Routes> </main> </BrowserRouter> ); } =============== END FILE ===============
=============== FILE: evals/error-boundary-component/criteria.json =============== { "type": "weighted_checklist", "checklist": [ { "name": "ErrorBoundary uses componentDidCatch", "max_score": 15, "description": "The ErrorBoundary is a class component that implements componentDidCatch lifecycle method for catching render errors" }, { "name": "ErrorBoundary uses getDerivedStateFromError", "max_score": 15, "description": "The ErrorBoundary implements static getDerivedStateFromError to update state and trigger fallback rendering" }, { "name": "Fallback UI is user-friendly", "max_score": 10, "description": "The ErrorFallback component displays a helpful message and offers a way to recover (e.g., retry button or link to home)" }, { "name": "Error boundary wraps routes", "max_score": 15, "description": "The error boundary wraps the Routes component or route outlet in App.tsx so route-level crashes are caught" }, { "name": "Error reporting to external service", "max_score": 15, "description": "Caught errors are reported to an external error tracking service (e.g., Sentry, Datadog, or a custom endpoint)" }, { "name": "No unrelated changes", "max_score": 5, "description": "The solution does not modify files or components unrelated to error boundary functionality" }, { "name": "Reusable component design", "max_score": 10, "description": "The ErrorBoundary accepts a fallback prop or render prop, making it reusable across different parts of the app" }, { "name": "Error boundary catches rendering errors", "max_score": 15, "description": "The ErrorBoundary component correctly catches errors thrown during rendering of its children" } ] } =============== END FILE ===============
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10
scenario-11
scenario-12
scenario-13
scenario-14
scenario-15
scenario-16
scenario-17
scenario-18
scenario-19
scenario-20
scenario-21
scenario-22
scenario-23
scenario-24
skills
compare-skill-model-performance
optimize-skill-instructions
references
optimize-skill-performance
optimize-skill-performance-and-instructions