CtrlK
BlogDocsLog inGet started
Tessl Logo

python-test-updater

Updates broken pytest tests after intentional code changes — triaging assertion failures from mock-coupling failures from genuine regressions, using Python's introspection to automate where safe. Use when a refactor or API change leaves a pile of failing tests and you need to decide update vs. fix vs. delete.

Install with Tessl CLI

npx tessl i github:santosomar/general-secure-coding-agent-skills --skill python-test-updater
What are skills?

97

Quality

96%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SKILL.md
Review
Evals

Python Test Updater

Same triage discipline as → java-test-updater. Python differences: no compile-time breaks (everything fails at runtime), more mocker.patch coupling, and snapshot libraries make some updates one command.

Python failure taxonomy

No compile step means everything surfaces at test runtime:

FailurePython-specific signalAction
AttributeError: 'X' has no attribute 'foo'Renamed/removed methodUpdate call site
TypeError: f() missing 1 positional argumentSignature changedAdd arg or use default
TypeError: f() got an unexpected keywordKwarg renamed/removedUpdate kwarg
ImportError / ModuleNotFoundErrorModule movedUpdate import
AssertionError with value diffBehavior changed (intentional?) or regressionTriage
AssertionError: Expected 'mock' to be calledOver-mocked internalLoosen/delete mock
AssertionError in snapshot compareSnapshot staleReview diff, --snapshot-update if intentional

Automating the mechanical fixes

Python's runtime errors point straight at the problem. For signature changes across many tests:

# conftest.py — one-time shim during migration
# OLD: Order(items, region)   NEW: Order(items, region, currency="USD")
# Tests still call Order(items, region).  Temporary compat:

@pytest.fixture(autouse=True)
def _order_compat(mocker):
    orig_init = Order.__init__
    def compat_init(self, items, region, currency="USD"):
        orig_init(self, items, region, currency)
    mocker.patch.object(Order, "__init__", compat_init)

This is a bridge, not a fix. All tests pass → remove the shim → fix tests one by one as you touch them. Don't leave compat shims in permanently.

Mock coupling — the mocker.patch problem

def test_process(mocker):
    mock_validate = mocker.patch("orders.service._validate_order")
    mock_save = mocker.patch("orders.service._save_order")
    process(order)
    mock_validate.assert_called_once_with(order)
    mock_save.assert_called_once()

_validate_order was inlined into process. Test fails: Expected '_validate_order' to have been called once. Called 0 times.

The behavior didn't change — the order is still validated — but the test was asserting structure. Delete the mock assertion. The test should assert the outcome of validation:

def test_process_rejects_invalid():
    bad = Order(items=[])
    with pytest.raises(InvalidOrder, match="empty"):
        process(bad)

This survives refactors because it tests what validation does, not that a function named _validate_order was called.

Snapshot updates — review first

If using syrupy / pytest-snapshot:

pytest --snapshot-update

This updates all failing snapshots to current output. Dangerous if any failure is a regression. Workflow:

  1. Run without --snapshot-update. Read every diff.
  2. For each: intentional change? Or regression?
  3. Only after confirming all are intentional: --snapshot-update.
  4. Commit the snapshot changes with a message explaining why they changed.

Assertion triage — same as Java

assert invoice.total == Decimal("27.80")   # fails: actual 27.55

git log -p -- src/pricing.py → "Fix: half-even rounding." Intentional. Update with reason:

# abc123: half-even rounding fix. Was 27.80 (half-up bug).
assert invoice.total == Decimal("27.55")

Versus:

assert len(results) == 5   # fails: actual 4

The change was to a completely different module. Why are there fewer results? Regression. Don't update. Investigate.

pytest.approx drift

assert score == 0.8472819  # now fails: 0.8472820

Last digit changed — floating-point ops reordered. If precision to 7 decimals isn't spec'd, this was over-tight:

assert score == pytest.approx(0.8473, rel=1e-4)

This isn't "loosening to make it pass." It's fixing an over-specific assertion that should never have been that tight.

Do not

  • Do not blindly run --snapshot-update. Review diffs first. A regression in a snapshot looks identical to an intentional change.
  • Do not fix mocker.patch("module._private") failures by updating the patch path. You're chasing implementation. Replace with behavioral assertions.
  • Do not widen pytest.approx tolerance until the test passes. If rel=0.5 is what it takes, the test isn't testing anything.
  • Do not skip investigating assertion failures in "unrelated" tests. Unrelated is where regressions hide.
  • Do not leave compat shims in conftest.py after the migration. They mask further API drift.

Output format

## Failing tests
Total: <N>  Import/Attribute: <N>  Signature: <N>  Assertion: <N>  Mock: <N>  Snapshot: <N>

## Mechanical fixes
| Test | Error | Fix |
| ---- | ----- | --- |

## Mock decoupling
| Test | Over-coupled patch | Replacement assertion |
| ---- | ------------------ | --------------------- |

## Assertion triage
| Test | Old | New | Cause commit | Classification | Action |
| ---- | --- | --- | ------------ | -------------- | ------ |

## Snapshot review
| Snapshot | Diff summary | Intentional? |
| -------- | ------------ | ------------ |

## Regressions
<tests correctly failing — file bugs, don't update>

## After
Passing: <N>  Updated: <N>  Decoupled: <N>  Deleted: <N>  Bugs filed: <N>
Repository
santosomar/general-secure-coding-agent-skills
Last updated
Created

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.