Use when the user wants an adversarial double-check of a code or config change. Run the strongest checks available, try to break the claim, look for edge cases and hidden regressions, and return PASS, PARTIAL, or FAIL with evidence. Good triggers include "poke holes in this", "stress test this change", "double check this fix", and "try to break it".
84
94%
Does it follow best practices?
Impact
81%
1.30xAverage score across 8 eval scenarios
Passed
No known issues
A developer on the customer data platform team claims to have fixed a bug in the phone number normalization function. Previously, phone numbers with country codes were being stored inconsistently — sometimes with a leading +, sometimes without, and sometimes with spaces or dashes. The fix is supposed to normalize all incoming phone numbers to E.164 format (+[country code][number], digits only after the +).
This function runs on every customer record import, which processes millions of records from various international sources. The data team wants you to try to break it — find any inputs that are not handled correctly or produce a wrong output — before it is deployed.
Write your findings to verification_report.md. If you find a counterexample, document the exact input and the actual vs expected output.
The following files are provided as inputs. Extract them before beginning.
=============== FILE: src/phone_normalizer.py =============== import re
def normalize_phone(raw: str) -> str: """ Normalize a phone number to E.164 format: +[digits only] Claimed to handle: - Numbers with country code prefix (e.g. +1, +44, +49) - Numbers with spaces, dashes, dots, parentheses - US numbers without country code (assumed +1) Returns normalized E.164 string or raises ValueError if unparseable. """ if not raw or not isinstance(raw, str): raise ValueError(f"Invalid input: {raw!r}")
# Strip whitespace
cleaned = raw.strip()
# Remove all formatting characters: spaces, dashes, dots, parens
digits_and_plus = re.sub(r'[\s\-\.\(\)]', '', cleaned)
# Validate remaining characters
if not re.match(r'^\+?[0-9]+$', digits_and_plus):
raise ValueError(f"Cannot parse phone number: {raw!r}")
# If starts with +, treat as already having country code
if digits_and_plus.startswith('+'):
return digits_and_plus
# If 10 digits, assume US number, prepend +1
if len(digits_and_plus) == 10:
return '+1' + digits_and_plus
# If 11 digits starting with 1, assume US with country code
if len(digits_and_plus) == 11 and digits_and_plus.startswith('1'):
return '+' + digits_and_plus
raise ValueError(f"Cannot determine country code for: {raw!r}")=============== FILE: tests/test_phone_normalizer.py =============== import sys sys.path.insert(0, 'src') from phone_normalizer import normalize_phone
passed = 0 failed = 0
def check(name, fn): global passed, failed try: fn() print(f"PASS: {name}") passed += 1 except AssertionError as e: print(f"FAIL: {name} — {e}") failed += 1 except Exception as e: print(f"ERROR: {name} — {e}") failed += 1
def test_us_with_plus(): assert normalize_phone("+12125551234") == "+12125551234"
def test_us_10_digit(): assert normalize_phone("2125551234") == "+12125551234"
def test_us_formatted(): assert normalize_phone("(212) 555-1234") == "+12125551234"
def test_uk_number(): assert normalize_phone("+44 20 7946 0958") == "+442079460958"
def test_rejects_letters(): try: normalize_phone("abc") assert False except ValueError: pass
check("us_with_plus", test_us_with_plus) check("us_10_digit", test_us_10_digit) check("us_formatted", test_us_formatted) check("uk_number", test_uk_number) check("rejects_letters", test_rejects_letters) print(f"\n{passed} passed, {failed} failed")
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
skills
skeptic-verifier