coding-agent-helpers/skeptic-verifier

Use when the user wants an adversarial double-check of a code or config change. Run the strongest checks available, try to break the claim, look for edge cases and hidden regressions, and return PASS, PARTIAL, or FAIL with evidence. Good triggers include "poke holes in this", "stress test this change", "double check this fix", and "try to break it".

1.30x

Quality

94%

Does it follow best practices?

Impact

81%

1.30x

Average score across 8 eval scenarios

Securityby

Passed

No known issues

Verify the Database Migration Script Safety Claim

Name: coding-agent-helpers/skeptic-verifier
Rating: 84.89999999999999 (1 reviews)
Author: coding-agent-helpers

Problem/Feature Description

The infrastructure team has written a migration script that claims to safely backfill a new user_tier column in the production PostgreSQL database without locking the table or causing downtime. The engineer who wrote it says the script uses batched UPDATEs with a sleep between batches to avoid overwhelming the database, and validates preconditions before running.

This script will be run against a 50 million row table during a maintenance window. The CTO wants a second opinion before the script is approved — specifically, they want to know whether the safety claims hold and whether there are scenarios where the script could still cause production issues.

Output Specification

Write your assessment to verification_report.md. Be explicit about what you were and were not able to verify, and provide a final verdict.

Input Files

The following files are provided as inputs. Extract them before beginning.

=============== FILE: scripts/backfill_user_tier.py =============== """ Backfill migration: add user_tier column to users table.

Safety claims:

Processes rows in batches to avoid long table locks
Sleeps between batches to reduce DB load
Validates that the column exists before running (idempotent)
Skips rows already backfilled """ import time import os

DB_BATCH_SIZE = 1000 SLEEP_BETWEEN_BATCHES = 0.05 # seconds

def get_db_connection(): import psycopg2 return psycopg2.connect(os.environ["DATABASE_URL"])

def column_exists(conn, table, column): with conn.cursor() as cur: cur.execute(""" SELECT 1 FROM information_schema.columns WHERE table_name = %s AND column_name = %s """, (table, column)) return cur.fetchone() is not None

def run_backfill(): conn = get_db_connection() try: if not column_exists(conn, 'users', 'user_tier'): raise RuntimeError("Column user_tier does not exist — run the DDL migration first")

with conn.cursor() as cur:
        cur.execute("SELECT COUNT(*) FROM users WHERE user_tier IS NULL")
        total = cur.fetchone()[0]
        print(f"Rows to backfill: {total}")

    offset = 0
    batches_processed = 0
    while True:
        with conn.cursor() as cur:
            cur.execute("""
                UPDATE users
                SET user_tier = 'standard'
                WHERE id IN (
                    SELECT id FROM users
                    WHERE user_tier IS NULL
                    ORDER BY id
                    LIMIT %s
                )
            """, (DB_BATCH_SIZE,))
            rows_updated = cur.rowcount
        conn.commit()
        batches_processed += 1

        if rows_updated == 0:
            break

        offset += rows_updated
        print(f"Backfilled {offset} rows...")
        time.sleep(SLEEP_BETWEEN_BATCHES)

    print(f"Done. {offset} rows backfilled in {batches_processed} batches.")
finally:
    conn.close()

if name == "main": run_backfill()

evals

scenario-1

scenario-2

scenario-3

scenario-4

scenario-5

criteria.json

task.md

scenario-6

scenario-7

scenario-8

skills

tile.json

coding-agent-helpers/skeptic-verifier

task.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-5/

Verify the Database Migration Script Safety Claim

Problem/Feature Description

Output Specification

Input Files

task.mdevals/scenario-5/