behavior-preservation-checker

Compare runtime behavior between original and migrated repositories to detect behavioral differences, regressions, and semantic changes. Use when validating code migrations, refactorings, language ports, framework upgrades, or any transformation that should preserve behavior. Automatically compares test results, execution traces, API responses, and observable outputs between two repository versions. Provides actionable guidance for fixing deviations and ensuring behavioral equivalence.

2.62x

Quality

78%

Does it follow best practices?

Impact

97%

2.62x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Fix and improve this skill with Tessl

tessl review fix ./skills/behavior-preservation-checker/SKILL.md

Behavior Preservation Checker

Overview

Validate that a migrated or refactored codebase preserves the original behavior by automatically comparing runtime behavior, test results, execution traces, and observable outputs between two repository versions.

Core Workflow

1. Setup Repositories

Prepare both repositories for comparison:

# Clone or locate repositories
ORIGINAL_REPO=/path/to/original
MIGRATED_REPO=/path/to/migrated

# Ensure both are at comparable states
cd $ORIGINAL_REPO && git checkout main
cd $MIGRATED_REPO && git checkout main

2. Run Behavior Comparison

Use the comparison script to analyze behavioral differences:

python scripts/behavior_checker.py \
    --original $ORIGINAL_REPO \
    --migrated $MIGRATED_REPO \
    --output behavior_report.json

3. Review Results

Examine the generated report for:

Test result differences
Execution trace divergences
Output mismatches
Performance regressions
API contract violations

4. Fix Deviations

Follow actionable guidance to resolve behavioral differences.

Comparison Methods

Method 1: Test-Based Comparison

Run the same test suite on both repositories and compare results:

Workflow:

Identify common test suite (or create equivalent tests)
Run tests on original repository
Run tests on migrated repository
Compare pass/fail status, assertions, and outputs

Example:

# Run on original
cd $ORIGINAL_REPO
pytest tests/ --json-report --json-report-file=original_results.json

# Run on migrated
cd $MIGRATED_REPO
pytest tests/ --json-report --json-report-file=migrated_results.json

# Compare
python scripts/compare_test_results.py \
    original_results.json \
    migrated_results.json

Method 2: Execution Trace Comparison

Capture and compare execution traces:

Workflow:

Instrument code to capture function calls, arguments, and return values
Run identical inputs through both versions
Compare execution traces for divergences

Example:

# Trace original
python scripts/trace_execution.py \
    --repo $ORIGINAL_REPO \
    --input test_inputs.json \
    --output original_trace.json

# Trace migrated
python scripts/trace_execution.py \
    --repo $MIGRATED_REPO \
    --input test_inputs.json \
    --output migrated_trace.json

# Compare traces
python scripts/compare_traces.py \
    original_trace.json \
    migrated_trace.json

Method 3: Observable Output Comparison

Compare program outputs for identical inputs:

Workflow:

Define test inputs (API requests, CLI commands, function calls)
Capture outputs from both versions (stdout, files, API responses)
Compare outputs for differences

Example:

# Test API endpoints
python scripts/compare_api_outputs.py \
    --original-url http://localhost:8000 \
    --migrated-url http://localhost:8001 \
    --test-cases api_test_cases.json

Method 4: Property-Based Testing

Use property-based testing to find behavioral differences:

Workflow:

Define behavioral properties (invariants, contracts)
Generate random inputs
Verify properties hold for both versions
Report any property violations

Example:

# Property: sorting should produce same result
from hypothesis import given, strategies as st

@given(st.lists(st.integers()))
def test_sort_equivalence(input_list):
    original_result = original_sort(input_list)
    migrated_result = migrated_sort(input_list)
    assert original_result == migrated_result

Difference Detection

Test Result Differences

What to check:

Tests that pass in original but fail in migrated
Tests that fail in original but pass in migrated
New test failures
Changed assertion messages

Severity levels:

Critical: Core functionality tests fail
High: Integration tests fail
Medium: Edge case tests fail
Low: Flaky tests or timing-dependent failures

Execution Trace Differences

What to check:

Different function call sequences
Different argument values
Different return values
Missing or extra function calls

Example divergence:

Original trace:
  calculate(x=10) -> 20
  validate(20) -> True
  save(20) -> Success

Migrated trace:
  calculate(x=10) -> 21  # ← Difference!
  validate(21) -> True
  save(21) -> Success

Output Differences

What to check:

Different stdout/stderr
Different file contents
Different API response bodies
Different status codes
Different error messages

Tolerance levels:

# Exact match required
assert original_output == migrated_output

# Numerical tolerance
assert abs(original_value - migrated_value) < 0.001

# Structural equivalence (ignore formatting)
assert json.loads(original) == json.loads(migrated)

Actionable Guidance

Pattern 1: Logic Error

Symptom: Different outputs for same inputs

Diagnosis:

python scripts/isolate_difference.py \
    --original $ORIGINAL_REPO \
    --migrated $MIGRATED_REPO \
    --failing-test test_calculation

Guidance:

Identify the diverging function
Compare implementations side-by-side
Check for off-by-one errors, operator changes, or logic inversions
Add unit test for the specific case

Pattern 2: Missing Functionality

Symptom: Tests pass in original but fail in migrated with "not implemented" or "attribute error"

Diagnosis:

python scripts/find_missing_functions.py \
    --original $ORIGINAL_REPO \
    --migrated $MIGRATED_REPO

Guidance:

List all missing functions/methods
Implement missing functionality
Verify with targeted tests

Pattern 3: API Contract Violation

Symptom: Different response structure or status codes

Diagnosis:

python scripts/compare_api_contracts.py \
    --original-spec openapi_original.yaml \
    --migrated-spec openapi_migrated.yaml

Guidance:

Document API contract differences
Update migrated API to match original contract
Add contract tests to prevent future violations

Pattern 4: Performance Regression

Symptom: Migrated version is significantly slower

Diagnosis:

python scripts/benchmark_comparison.py \
    --original $ORIGINAL_REPO \
    --migrated $MIGRATED_REPO \
    --iterations 100

Guidance:

Profile both versions to identify bottlenecks
Check for algorithmic changes (O(n) → O(n²))
Look for missing optimizations or caching
Verify database query efficiency

Pattern 5: State Management Issues

Symptom: Tests fail intermittently or depend on execution order

Diagnosis:

python scripts/detect_state_issues.py \
    --repo $MIGRATED_REPO \
    --test-suite tests/

Guidance:

Identify shared state between tests
Add proper setup/teardown
Ensure test isolation
Check for global variable usage

Report Format

The behavior checker generates a comprehensive JSON report:

{
  "summary": {
    "total_tests": 150,
    "passed_both": 140,
    "failed_both": 2,
    "passed_original_failed_migrated": 5,
    "failed_original_passed_migrated": 3,
    "behavioral_equivalence": "92.7%"
  },
  "differences": [
    {
      "type": "test_failure",
      "test_name": "test_user_authentication",
      "severity": "critical",
      "original_result": "passed",
      "migrated_result": "failed",
      "error_message": "AssertionError: Expected 200, got 401",
      "guidance": "Check authentication logic in migrated version",
      "affected_files": ["auth/login.py"]
    }
  ],
  "recommendations": [
    "Fix 5 critical test failures before deployment",
    "Review 3 output differences for correctness"
  ]
}

Best Practices

Start with tests: Ensure comprehensive test coverage before migration
Incremental validation: Check behavior after each migration step
Document intentional changes: Mark expected behavioral differences
Use multiple comparison methods: Combine tests, traces, and outputs
Automate the process: Integrate into CI/CD pipeline
Set tolerance thresholds: Define acceptable differences (e.g., timing, formatting)

Resources

references/comparison_techniques.md: Detailed comparison methodologies
references/difference_patterns.md: Common behavioral difference patterns
scripts/behavior_checker.py: Main comparison orchestrator
scripts/compare_test_results.py: Test result comparison
scripts/trace_execution.py: Execution trace capture
scripts/compare_traces.py: Trace comparison and analysis

Repository: ArabelaTso/Skills-4-SE
Commit: 0f00a4f

Last updated: 4 months ago
Created: 4 months ago

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.