coding-agent-helpers/skeptic-verifier

Use when the user wants an adversarial double-check of a code or config change. Run the strongest checks available, try to break the claim, look for edge cases and hidden regressions, and return PASS, PARTIAL, or FAIL with evidence. Good triggers include "poke holes in this", "stress test this change", "double check this fix", and "try to break it".

1.30x

Quality

94%

Does it follow best practices?

Impact

81%

1.30x

Average score across 8 eval scenarios

Securityby

Passed

No known issues

Try to Break the Phone Number Normalization Fix

Name: coding-agent-helpers/skeptic-verifier
Rating: 84.89999999999999 (1 reviews)
Author: coding-agent-helpers

Problem/Feature Description

A developer on the customer data platform team claims to have fixed a bug in the phone number normalization function. Previously, phone numbers with country codes were being stored inconsistently — sometimes with a leading +, sometimes without, and sometimes with spaces or dashes. The fix is supposed to normalize all incoming phone numbers to E.164 format (+[country code][number], digits only after the +).

This function runs on every customer record import, which processes millions of records from various international sources. The data team wants you to try to break it — find any inputs that are not handled correctly or produce a wrong output — before it is deployed.

Output Specification

Write your findings to verification_report.md. If you find a counterexample, document the exact input and the actual vs expected output.

Input Files

The following files are provided as inputs. Extract them before beginning.

=============== FILE: src/phone_normalizer.py =============== import re

def normalize_phone(raw: str) -> str: """ Normalize a phone number to E.164 format: +[digits only] Claimed to handle: - Numbers with country code prefix (e.g. +1, +44, +49) - Numbers with spaces, dashes, dots, parentheses - US numbers without country code (assumed +1) Returns normalized E.164 string or raises ValueError if unparseable. """ if not raw or not isinstance(raw, str): raise ValueError(f"Invalid input: {raw!r}")

# Strip whitespace
cleaned = raw.strip()

# Remove all formatting characters: spaces, dashes, dots, parens
digits_and_plus = re.sub(r'[\s\-\.\(\)]', '', cleaned)

# Validate remaining characters
if not re.match(r'^\+?[0-9]+$', digits_and_plus):
    raise ValueError(f"Cannot parse phone number: {raw!r}")

# If starts with +, treat as already having country code
if digits_and_plus.startswith('+'):
    return digits_and_plus

# If 10 digits, assume US number, prepend +1
if len(digits_and_plus) == 10:
    return '+1' + digits_and_plus

# If 11 digits starting with 1, assume US with country code
if len(digits_and_plus) == 11 and digits_and_plus.startswith('1'):
    return '+' + digits_and_plus

raise ValueError(f"Cannot determine country code for: {raw!r}")

=============== FILE: tests/test_phone_normalizer.py =============== import sys sys.path.insert(0, 'src') from phone_normalizer import normalize_phone

passed = 0 failed = 0

def check(name, fn): global passed, failed try: fn() print(f"PASS: {name}") passed += 1 except AssertionError as e: print(f"FAIL: {name} — {e}") failed += 1 except Exception as e: print(f"ERROR: {name} — {e}") failed += 1

def test_us_with_plus(): assert normalize_phone("+12125551234") == "+12125551234"

def test_us_10_digit(): assert normalize_phone("2125551234") == "+12125551234"

def test_us_formatted(): assert normalize_phone("(212) 555-1234") == "+12125551234"

def test_uk_number(): assert normalize_phone("+44 20 7946 0958") == "+442079460958"

def test_rejects_letters(): try: normalize_phone("abc") assert False except ValueError: pass

check("us_with_plus", test_us_with_plus) check("us_10_digit", test_us_10_digit) check("us_formatted", test_us_formatted) check("uk_number", test_uk_number) check("rejects_letters", test_rejects_letters) print(f"\n{passed} passed, {failed} failed")

evals

scenario-1

scenario-2

scenario-3

scenario-4

scenario-5

scenario-6

scenario-7

criteria.json

task.md

scenario-8

skills

tile.json

coding-agent-helpers/skeptic-verifier

task.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-7/

Try to Break the Phone Number Normalization Fix

Problem/Feature Description

Output Specification

Input Files

task.mdevals/scenario-7/