CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl-labs/flask-testing

Write correct Flask tests -- app factory with test config, application context fixtures, database isolation, file uploads, auth testing, error handlers, mock.patch placement, and essential API test patterns

98

1.15x
Quality

99%

Does it follow best practices?

Impact

97%

1.15x

Average score across 5 eval scenarios

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

criteria.jsonevals/scenario-5/

{
  "context": "Tests whether the agent proactively applies Flask testing best practices including CLI command testing, error handler testing, and fixture patterns. The task does NOT mention test_cli_runner, error body shape testing, or conftest patterns -- the agent should apply these on its own.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "CLI test runner fixture",
      "description": "The agent creates a 'runner' fixture using app.test_cli_runner() and tests CLI commands with runner.invoke(args=[...]). Does NOT use subprocess.",
      "max_score": 16
    },
    {
      "name": "CLI command tests",
      "description": "The agent tests all three CLI commands (init-db, seed-data, archive-done), checking both exit_code and output text.",
      "max_score": 12
    },
    {
      "name": "Error handler body tested",
      "description": "Tests check that error responses (400, 404) return JSON with the documented shape {'error': {'message': ..., 'code': ...}}, not just the status code.",
      "max_score": 14
    },
    {
      "name": "App context in fixture",
      "description": "The app fixture wraps in 'with app.app_context():' and yields from inside it.",
      "max_score": 12
    },
    {
      "name": "Conftest with app, client, runner",
      "description": "conftest.py has three separate fixtures: app (owns context/DB), client (for HTTP), runner (for CLI). All properly depend on app.",
      "max_score": 10
    },
    {
      "name": "Database isolation per test",
      "description": "Function-scoped fixtures with :memory: SQLite or fresh init_db per test. No session/module scope for DB fixtures.",
      "max_score": 8
    },
    {
      "name": "Auth both paths tested",
      "description": "Protected endpoints test both unauthenticated (401/403) and authenticated access.",
      "max_score": 8
    },
    {
      "name": "Status transition validation",
      "description": "Tests verify that invalid status transitions (e.g., todo->done, archived->todo) are rejected, and valid transitions succeed.",
      "max_score": 8
    },
    {
      "name": "Admin-only delete tested",
      "description": "Tests verify that non-admin users cannot delete tasks (403) and admin users can (200/204).",
      "max_score": 6
    },
    {
      "name": "Test config with TESTING=True",
      "description": "Passes test config with TESTING=True and separate database.",
      "max_score": 6
    }
  ]
}

evals

tile.json