multi-version-behavior-comparator

Compare behavior across multiple versions of programs or repositories. Use when you need to analyze how functionality changes between versions, identify regressions, compare outputs and exceptions, or validate upgrades. The skill compares execution behavior, test results, outputs, exceptions, and observable states across versions, generating detailed reports showing behavioral divergences, potential regressions, added/removed functionality, and areas requiring validation. Supports multiple programming languages and can work with test suites or execution traces.

1.58x

Quality

76%

Does it follow best practices?

Impact

100%

1.58x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Optimize this skill with Tessl

npx tessl skill review --optimize ./skills/multi-version-behavior-comparator/SKILL.md

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is a well-crafted skill description that excels across all dimensions. It provides specific concrete actions, includes natural trigger terms developers would use, explicitly states both what the skill does and when to use it with a clear 'Use when...' clause, and carves out a distinct niche around version comparison that minimizes conflict risk with other skills.

Dimension	Reasoning	Score
Specificity	Lists multiple specific concrete actions: 'analyze how functionality changes', 'identify regressions', 'compare outputs and exceptions', 'validate upgrades', 'compares execution behavior, test results, outputs, exceptions, and observable states', 'generating detailed reports showing behavioral divergences'.	3 / 3
Completeness	Clearly answers both what (compares behavior across versions, generates reports on divergences) AND when ('Use when you need to analyze how functionality changes between versions, identify regressions, compare outputs and exceptions, or validate upgrades').	3 / 3
Trigger Term Quality	Includes natural keywords users would say: 'versions', 'regressions', 'upgrades', 'compare', 'test results', 'outputs', 'exceptions', 'repositories'. These are terms developers naturally use when discussing version comparison tasks.	3 / 3
Distinctiveness Conflict Risk	Clear niche focused specifically on cross-version behavioral comparison with distinct triggers like 'versions', 'regressions', 'upgrades'. Unlikely to conflict with general code analysis or testing skills due to the specific version comparison focus.	3 / 3
	Total	12 / 12 Passed

Implementation

52%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The skill is concise and well-structured but lacks actionability since it references a non-existent script without providing implementation details. The workflow guidance is particularly weak - for a tool that identifies regressions and validation areas, there's no guidance on how to actually validate findings or handle discovered issues.

Suggestions

Add implementation code or clarify that scripts/compare.py must be created, with guidance on what it should do

Include a workflow section with explicit steps: run comparison -> review regressions -> validate flagged areas -> document decisions -> proceed with upgrade

Add an example of the JSON report output so users know what to expect and how to interpret results

Include validation checkpoints like 'If regressions found: investigate each before proceeding' with specific guidance on investigation steps

Dimension	Reasoning	Score
Conciseness	The content is lean and efficient, with no unnecessary explanations of concepts Claude would already know. Every section serves a clear purpose without padding.	3 / 3
Actionability	Provides concrete CLI commands that are copy-paste ready, but the actual comparison script doesn't exist - it's referencing a hypothetical tool. No executable code showing how to implement the comparison logic itself.	2 / 3
Workflow Clarity	No clear multi-step workflow with validation checkpoints. Missing critical guidance on what to do when regressions are found, how to interpret the report, or feedback loops for addressing issues. The 'Tips' section is vague rather than actionable.	1 / 3
Progressive Disclosure	Content is well-organized with clear sections, but there are no references to additional documentation for advanced usage, report interpretation, or implementation details that would benefit from separate files.	2 / 3
	Total	8 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository: ArabelaTso/Skills-4-SE
Commit: 0f00a4f

Reviewed: about 2 months ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.