behavior-preservation-checker

Compare runtime behavior between original and migrated repositories to detect behavioral differences, regressions, and semantic changes. Use when validating code migrations, refactorings, language ports, framework upgrades, or any transformation that should preserve behavior. Automatically compares test results, execution traces, API responses, and observable outputs between two repository versions. Provides actionable guidance for fixing deviations and ensuring behavioral equivalence.

2.62x

Quality

78%

Does it follow best practices?

Impact

97%

2.62x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Fix and improve this skill with Tessl

tessl review fix ./skills/behavior-preservation-checker/SKILL.md

Quality

Content

57%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This skill provides a comprehensive framework for behavior preservation checking with good structure and multiple comparison methods. However, it relies heavily on scripts that aren't provided, making the guidance illustrative rather than truly executable. The workflow could benefit from explicit validation checkpoints to ensure users verify results before acting on them.

Suggestions

Provide actual implementations for the referenced scripts (behavior_checker.py, compare_test_results.py, etc.) or use only standard tools that exist

Add explicit validation checkpoints in the core workflow, such as 'Verify report accuracy by manually checking 2-3 differences before bulk fixing'

Reduce verbosity in the 'What to check' sections by consolidating into concise checklists rather than separate bullet lists for each method

Dimension	Reasoning	Score
Conciseness	The skill is reasonably efficient but includes some unnecessary verbosity, such as explaining what to check in obvious categories and repeating similar patterns across methods. Some sections could be tightened without losing clarity.	2 / 3
Actionability	Provides concrete bash commands and code examples, but many reference scripts that don't exist (behavior_checker.py, compare_test_results.py, etc.) without providing their implementations. The examples are illustrative but not truly executable without the missing scripts.	2 / 3
Workflow Clarity	The core workflow has clear numbered steps, but lacks explicit validation checkpoints and feedback loops. For a process involving behavioral comparison (which can have destructive implications if wrong conclusions are drawn), there should be clearer verification steps before acting on results.	2 / 3
Progressive Disclosure	Well-structured with clear sections, appropriate use of headers, and references to external files (references/comparison_techniques.md, etc.) that are one level deep and clearly signaled. The overview leads naturally into detailed methods.	3 / 3
	Total	9 / 12 Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a well-crafted skill description that excels across all dimensions. It clearly articulates specific capabilities (comparing test results, execution traces, API responses), provides explicit trigger guidance with a 'Use when' clause covering multiple scenarios, and carves out a distinct niche around migration validation that minimizes conflict with other skills.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: 'compares test results, execution traces, API responses, and observable outputs', 'detect behavioral differences, regressions, and semantic changes', 'Provides actionable guidance for fixing deviations'.	3 / 3
Completeness	Clearly answers both what ('Compare runtime behavior...detect behavioral differences') AND when ('Use when validating code migrations, refactorings, language ports, framework upgrades, or any transformation that should preserve behavior'). Explicit 'Use when' clause present.	3 / 3
Trigger Term Quality	Includes natural keywords users would say: 'migrations', 'refactorings', 'language ports', 'framework upgrades', 'behavioral differences', 'regressions', 'test results', 'API responses'. Good coverage of terms developers naturally use.	3 / 3
Distinctiveness Conflict Risk	Clear niche focused on comparing two repository versions for behavioral equivalence during migrations/transformations. Distinct from general testing, code review, or single-repo analysis skills. Specific triggers like 'migrated repositories', 'behavioral equivalence' reduce conflict risk.	3 / 3
	Total	12 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository: ArabelaTso/Skills-4-SE
Commit: 0f00a4f

Reviewed: 4 months ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.