Write correct Flask tests -- app factory with test config, application context fixtures, database isolation, file uploads, auth testing, error handlers, mock.patch placement, and essential API test patterns
98
99%
Does it follow best practices?
Impact
97%
1.15xAverage score across 5 eval scenarios
Passed
No known issues
{
"context": "Tests whether the agent proactively applies Flask testing best practices including CLI command testing, error handler testing, and fixture patterns. The task does NOT mention test_cli_runner, error body shape testing, or conftest patterns -- the agent should apply these on its own.",
"type": "weighted_checklist",
"checklist": [
{
"name": "CLI test runner fixture",
"description": "The agent creates a 'runner' fixture using app.test_cli_runner() and tests CLI commands with runner.invoke(args=[...]). Does NOT use subprocess.",
"max_score": 16
},
{
"name": "CLI command tests",
"description": "The agent tests all three CLI commands (init-db, seed-data, archive-done), checking both exit_code and output text.",
"max_score": 12
},
{
"name": "Error handler body tested",
"description": "Tests check that error responses (400, 404) return JSON with the documented shape {'error': {'message': ..., 'code': ...}}, not just the status code.",
"max_score": 14
},
{
"name": "App context in fixture",
"description": "The app fixture wraps in 'with app.app_context():' and yields from inside it.",
"max_score": 12
},
{
"name": "Conftest with app, client, runner",
"description": "conftest.py has three separate fixtures: app (owns context/DB), client (for HTTP), runner (for CLI). All properly depend on app.",
"max_score": 10
},
{
"name": "Database isolation per test",
"description": "Function-scoped fixtures with :memory: SQLite or fresh init_db per test. No session/module scope for DB fixtures.",
"max_score": 8
},
{
"name": "Auth both paths tested",
"description": "Protected endpoints test both unauthenticated (401/403) and authenticated access.",
"max_score": 8
},
{
"name": "Status transition validation",
"description": "Tests verify that invalid status transitions (e.g., todo->done, archived->todo) are rejected, and valid transitions succeed.",
"max_score": 8
},
{
"name": "Admin-only delete tested",
"description": "Tests verify that non-admin users cannot delete tasks (403) and admin users can (200/204).",
"max_score": 6
},
{
"name": "Test config with TESTING=True",
"description": "Passes test config with TESTING=True and separate database.",
"max_score": 6
}
]
}