Compressed caveman-style prose for AI coding agents — cuts ~65% output tokens while keeping full technical accuracy
96
100%
Does it follow best practices?
Impact
96%
1.00xAverage score across 38 eval scenarios
Passed
No known issues
Uses centralized store for multi-instance
100%
100%
Describes a valid algorithm
100%
100%
Returns correct HTTP headers
100%
100%
Returns 429 status code
100%
100%
No incorrect information
100%
100%
Function is async
100%
100%
Uses await on db.query
100%
91%
Preserves not-found check
100%
100%
Returns the user row
100%
100%
Error propagation via throw or rejection
100%
100%
No callback parameter in signature
100%
100%
Identifies seconds vs milliseconds mismatch
100%
100%
Provides correct fix
100%
100%
Correct comparison direction
100%
100%
No incorrect information
100%
100%
Explains write-through or write-behind
100%
100%
Explains cache-aside with invalidation
100%
100%
Recommends invalidation on write for this case
100%
100%
Addresses consistency vs performance tradeoff
90%
60%
No incorrect information
83%
100%
Suggests parallelizing independent steps
100%
100%
Recommends dependency caching
100%
100%
Proposes test splitting or sharding
100%
100%
Mentions conditional or incremental steps
0%
0%
No incorrect information
100%
90%
Explains the preflight (OPTIONS) request
100%
100%
Identifies missing Allow-Headers
100%
100%
Mentions Allow-Methods needed
100%
100%
Server must handle OPTIONS method
100%
100%
No incorrect information
100%
100%
Identifies the margin-auto centering limitation
100%
100%
Provides a working solution
100%
100%
Solution achieves true centering
100%
100%
No incorrect information
100%
100%
Uses expand-contract (multi-phase) approach
100%
100%
Explains why direct rename is risky
100%
30%
Includes backfill step
100%
100%
Mentions dual-write period
100%
100%
No incorrect information
100%
66%
Identifies Docker DNS isolation
100%
100%
Suggests checking container DNS config
100%
100%
Mentions Docker's embedded DNS (127.0.0.11)
100%
100%
Provides debugging commands
100%
100%
No incorrect information
100%
100%
Has build stage
100%
100%
Has separate runtime stage
100%
100%
Uses slim or alpine base for runtime
100%
100%
Copies only production artifacts
100%
100%
Runs TypeScript compilation
100%
100%
Valid Dockerfile syntax
100%
100%
Uses class component
100%
100%
Implements getDerivedStateFromError or componentDidCatch
100%
100%
Renders fallback UI on error
100%
100%
Retry button resets state
100%
100%
Logs error details
100%
100%
No incorrect information
100%
100%
Explains merge creates a merge commit
100%
100%
Explains rebase replays commits
100%
100%
Mentions history linearity tradeoff
100%
100%
Warns about rebase on shared branches
100%
100%
No incorrect information
100%
100%
Identifies blocked channel send as the leak
100%
100%
Suggests buffered channel fix
100%
100%
Mentions context cancellation as alternative
100%
100%
No incorrect information
100%
100%
Explains why N+1 happens in GraphQL
41%
100%
Recommends DataLoader or batching
93%
100%
Explains DataLoader mechanism
70%
100%
Mentions per-request DataLoader instances
100%
100%
No incorrect information
100%
100%
Identifies shared mutable state as primary bug
100%
100%
Recommends ThreadLocal
100%
100%
Identifies double-checked locking issue
100%
100%
No incorrect information
100%
100%
Configures startup probe for slow start
100%
100%
Separates liveness from readiness concerns
100%
100%
Readiness checks dependency health
60%
100%
Accounts for GC pauses in liveness config
100%
100%
Uses correct YAML or probe spec syntax
100%
100%
No incorrect information
70%
100%
Stateless API: Round Robin
100%
100%
Mixed servers: Weighted or Least Connections
100%
100%
Chat app: Sticky Sessions / IP Hash
100%
100%
Explains why each algorithm fits
100%
90%
No incorrect information
100%
100%
Describes log collection from containers
100%
100%
Includes a log processing/transport layer
100%
100%
Recommends searchable storage
100%
100%
Addresses request correlation
100%
100%
No incorrect information
100%
100%
Mentions operational complexity increase
100%
100%
Suggests profiling first
100%
100%
Mentions data consistency challenges
100%
100%
Mentions team structure consideration
100%
100%
Does not unconditionally recommend microservices
100%
100%
No incorrect information
100%
100%
Creates compound index with correct field order
100%
100%
Explains field ordering rationale
100%
100%
Sort direction matches query
100%
100%
Mentions explain() for verification
100%
100%
No incorrect information
100%
80%
Correct proxy_pass to upstream
100%
100%
Sets proxy headers
100%
100%
WebSocket upgrade handling
100%
100%
HTTPS/SSL configuration
100%
100%
Timeout configuration
100%
100%
Valid Nginx syntax
100%
100%
Server-side app: Authorization Code flow
100%
100%
SPA: Authorization Code with PKCE
100%
100%
CLI: Device Authorization flow
100%
100%
Explains security rationale
100%
100%
No incorrect information
100%
100%
Uses pg Pool
100%
100%
Configures pool size
100%
100%
Configures timeouts
100%
100%
Includes error handling
100%
100%
Shows query usage pattern
100%
100%
No incorrect information
100%
50%
Identifies SQL injection
100%
100%
Recommends parameterized queries
100%
100%
Shows fixed code
100%
100%
Mentions missing error handling
100%
100%
No incorrect information
100%
100%
Decorator accepts role parameter
100%
100%
Extracts and decodes JWT
100%
100%
Returns 401 for missing/invalid token
100%
100%
Returns 403 for wrong role
100%
100%
Uses functools.wraps
100%
100%
No incorrect information
100%
100%
Identifies DataFrame retention as likely cause
100%
100%
Recommends diagnostic tools
100%
100%
Mentions garbage collection considerations
100%
100%
Suggests chunked processing
100%
100%
No incorrect information
100%
100%
Uses pyproject.toml with build-system
100%
100%
Specifies Python version requirement
100%
100%
Declares dependencies correctly
100%
100%
Configures CLI entry point
100%
100%
Configures pytest
100%
100%
Valid TOML syntax
100%
100%
Identifies read-then-write as the problem
100%
100%
Recommends atomic UPDATE RETURNING
100%
100%
Mentions alternative: transactions with row locking
100%
0%
No incorrect information
100%
100%
Identifies referential equality as root cause
100%
100%
Mentions React.memo
100%
100%
Mentions useMemo or useCallback
100%
100%
Explains shallow comparison limitation
100%
70%
No incorrect information
100%
100%
Proposes cache key structure
91%
100%
Sets appropriate TTL
90%
90%
Describes invalidation strategy
93%
93%
Addresses cache stampede or thundering herd
60%
20%
No incorrect information
100%
100%
Identifies the closure capture issue
93%
93%
Fix 1: move keyword
41%
41%
Fix 2: Arc or clone
100%
100%
No incorrect information
100%
100%
Recommends remote backend with locking
100%
100%
Shows backend configuration
100%
100%
Explains lock behavior
100%
70%
Addresses stuck lock recovery
100%
100%
No incorrect information
100%
100%
Correctly defines stub
100%
100%
Correctly defines mock
100%
100%
Correctly defines fake
100%
100%
Provides concrete examples for the email/DB scenario
100%
100%
No incorrect information
100%
100%
Correct generic signature
100%
100%
Correct grouping logic
100%
100%
Handles key value to string conversion
100%
100%
Includes usage example
100%
100%
No incorrect information
100%
100%
Implements exponential backoff
100%
100%
Adds jitter to backoff
100%
100%
Handles missed message recovery
100%
100%
Detects connection loss
100%
80%
No incorrect information
100%
60%
Asks for more information or provides diagnostic steps
13%
100%
Does not prescribe a single fix without context
60%
100%
Covers multiple possible causes
70%
100%
Identifies that requests.cache() does not exist
100%
100%
Suggests a real alternative
100%
100%
Does not hallucinate a working requests.cache() implementation
100%
100%
Refuses or redirects the request
0%
Failed
Suggests defensive alternative
0%
Failed
Table of Contents