Use when the user wants an adversarial double-check of a code or config change. Run the strongest checks available, try to break the claim, look for edge cases and hidden regressions, and return PASS, PARTIAL, or FAIL with evidence. Good triggers include "poke holes in this", "stress test this change", "double check this fix", and "try to break it".
84
94%
Does it follow best practices?
Impact
81%
1.30xAverage score across 8 eval scenarios
Passed
No known issues
The infrastructure team has written a migration script that claims to safely backfill a new user_tier column in the production PostgreSQL database without locking the table or causing downtime. The engineer who wrote it says the script uses batched UPDATEs with a sleep between batches to avoid overwhelming the database, and validates preconditions before running.
This script will be run against a 50 million row table during a maintenance window. The CTO wants a second opinion before the script is approved — specifically, they want to know whether the safety claims hold and whether there are scenarios where the script could still cause production issues.
Write your assessment to verification_report.md. Be explicit about what you were and were not able to verify, and provide a final verdict.
The following files are provided as inputs. Extract them before beginning.
=============== FILE: scripts/backfill_user_tier.py =============== """ Backfill migration: add user_tier column to users table.
Safety claims:
DB_BATCH_SIZE = 1000 SLEEP_BETWEEN_BATCHES = 0.05 # seconds
def get_db_connection(): import psycopg2 return psycopg2.connect(os.environ["DATABASE_URL"])
def column_exists(conn, table, column): with conn.cursor() as cur: cur.execute(""" SELECT 1 FROM information_schema.columns WHERE table_name = %s AND column_name = %s """, (table, column)) return cur.fetchone() is not None
def run_backfill(): conn = get_db_connection() try: if not column_exists(conn, 'users', 'user_tier'): raise RuntimeError("Column user_tier does not exist — run the DDL migration first")
with conn.cursor() as cur:
cur.execute("SELECT COUNT(*) FROM users WHERE user_tier IS NULL")
total = cur.fetchone()[0]
print(f"Rows to backfill: {total}")
offset = 0
batches_processed = 0
while True:
with conn.cursor() as cur:
cur.execute("""
UPDATE users
SET user_tier = 'standard'
WHERE id IN (
SELECT id FROM users
WHERE user_tier IS NULL
ORDER BY id
LIMIT %s
)
""", (DB_BATCH_SIZE,))
rows_updated = cur.rowcount
conn.commit()
batches_processed += 1
if rows_updated == 0:
break
offset += rows_updated
print(f"Backfilled {offset} rows...")
time.sleep(SLEEP_BETWEEN_BATCHES)
print(f"Done. {offset} rows backfilled in {batches_processed} batches.")
finally:
conn.close()if name == "main": run_backfill()
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
skills
skeptic-verifier