Compare runtime behavior between original and migrated repositories to detect behavioral differences, regressions, and semantic changes. Use when validating code migrations, refactorings, language ports, framework upgrades, or any transformation that should preserve behavior. Automatically compares test results, execution traces, API responses, and observable outputs between two repository versions. Provides actionable guidance for fixing deviations and ensuring behavioral equivalence.
Install with Tessl CLI
npx tessl i github:ArabelaTso/Skills-4-SE --skill behavior-preservation-checker83
Does it follow best practices?
If you maintain this skill, you can automatically optimize it using the tessl CLI to improve its score:
npx tessl skill review --optimize ./path/to/skillValidation for skill structure
Test-based comparison workflow
pytest json-report flag
0%
100%
json-report-file flag
0%
100%
compare_test_results.py usage
0%
50%
Report behavioral_equivalence
50%
100%
Report summary fields
0%
100%
Severity classification
100%
100%
Differences list
0%
100%
Recommendations list
0%
100%
Separate result files
0%
100%
Variance regression identified
100%
100%
Without context: $0.4355 · 5m 54s · 22 turns · 27 in / 6,918 out tokens
With context: $1.1129 · 3m 43s · 39 turns · 4,129 in / 11,614 out tokens
behavior_checker.py orchestrator and report format
behavior_checker.py invocation
41%
66%
behavior_report.json created
100%
100%
Report summary section
20%
100%
passed_original_failed_migrated count
0%
100%
Differences array
0%
100%
Severity on differences
0%
100%
Critical severity regression
0%
100%
Recommendations array
0%
100%
Guidance on differences
0%
100%
behavioral_equivalence percentage
0%
100%
Workflow log
100%
100%
Without context: $0.6131 · 4m 31s · 25 turns · 32 in / 9,137 out tokens
With context: $1.1942 · 4m 51s · 40 turns · 4,088 in / 15,387 out tokens
Property-based testing and multi-method comparison
hypothesis import
0%
100%
@given decorator used
0%
100%
strategies used
0%
100%
Equivalence assertion in property tests
50%
100%
Multiple comparison methods
100%
100%
Tolerance threshold applied
100%
100%
behavior_report.json present
100%
100%
Report differences populated
100%
100%
Report recommendations
100%
100%
Severity levels present
100%
100%
comparison_notes.md present
100%
100%
Without context: $1.2700 · 9m 27s · 38 turns · 42 in / 20,108 out tokens
With context: $2.6824 · 12m 26s · 58 turns · 2,199 in / 31,020 out tokens
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.