CtrlK
BlogDocsLog inGet started
Tessl Logo

tdg-personal/skill-comply

Visualize whether skills, rules, and agent definitions are actually followed — auto-generates scenarios at 3 prompt strictness levels, runs agents, classifies behavioral sequences, and reports compliance rates with full tool call timelines

61

Quality

61%

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Passed

No known issues

Overview
Quality
Evals
Security
Files

Quality

Discovery

67%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description excels at specificity and distinctiveness, clearly articulating a unique and detailed set of capabilities around agent compliance testing and visualization. However, it lacks an explicit 'Use when...' clause, which caps completeness, and the trigger terms lean technical rather than matching natural user language patterns. Adding explicit trigger guidance and more natural keywords would significantly improve skill selection accuracy.

Suggestions

Add an explicit 'Use when...' clause, e.g., 'Use when the user wants to test whether their skills, rules, or agent prompts are being followed, or asks about compliance checking or behavioral evaluation.'

Include more natural trigger terms users might say, such as 'test my skill', 'check rule compliance', 'evaluate agent behavior', 'prompt adherence', or 'are my instructions being followed'.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: auto-generates scenarios at 3 prompt strictness levels, runs agents, classifies behavioral sequences, and reports compliance rates with full tool call timelines. These are detailed, concrete capabilities.

3 / 3

Completeness

The 'what' is well-covered (auto-generates scenarios, runs agents, classifies behavioral sequences, reports compliance rates), but there is no explicit 'Use when...' clause or equivalent trigger guidance. The when is only implied from the description of capabilities.

2 / 3

Trigger Term Quality

Contains some relevant terms like 'skills', 'rules', 'agent definitions', 'compliance rates', and 'tool call timelines', but misses common natural language triggers users might say such as 'test my skill', 'check if rules are followed', 'evaluate agent behavior', or 'prompt testing'. The language is more technical/descriptive than what users would naturally type.

2 / 3

Distinctiveness Conflict Risk

This is a highly specific niche — compliance visualization for skills/rules/agent definitions with scenario generation at multiple strictness levels. It is very unlikely to conflict with other skills due to its unique combination of behavioral compliance testing and reporting.

3 / 3

Total

10

/

12

Passed

Implementation

37%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The skill provides a clear high-level overview of what skill-comply does and how to invoke it via CLI, but falls short on actionability and workflow clarity. The multi-step pipeline is described conceptually without validation checkpoints or error recovery guidance, and there are no concrete examples of specs, reports, or expected outputs that would help Claude understand what success looks like.

Suggestions

Add a concrete example of what a generated spec and report look like (even abbreviated), so Claude knows what to expect and can verify correctness

Add validation checkpoints and error handling guidance for the multi-step pipeline — e.g., what to do if spec generation produces incorrect steps, or if claude -p fails mid-run

Include guidance on interpreting and acting on compliance results — e.g., what compliance rate thresholds mean and what remediation steps to take

Link to external reference files for detailed report format, spec schema, and configuration options rather than leaving those details unaddressed

DimensionReasoningScore

Conciseness

Mostly efficient but includes some unnecessary explanation. The 'Key Concept: Prompt Independence' section is a one-liner that could be folded into the intro. The 'Supported Targets' section explains things Claude could infer from examples. The 'When to Activate' section is useful but slightly verbose.

2 / 3

Actionability

Provides concrete CLI commands for running the tool, which is good. However, it lacks executable examples of what a spec looks like, what report output looks like, or how to interpret results. The numbered pipeline steps (1-6) describe rather than instruct — there's no guidance on what to do when compliance is low or how to act on the report.

2 / 3

Workflow Clarity

The skill describes a multi-step pipeline (generate specs → generate scenarios → run agents → classify → check ordering → report) but provides no validation checkpoints, no error handling guidance, and no feedback loops. There's no guidance on what to do if spec generation is wrong, if claude -p fails, or if classification seems off. For a complex multi-step automated process, this is insufficient.

1 / 3

Progressive Disclosure

The content is reasonably structured with clear sections, and the 'Advanced' subsection is a nice touch for optional content. However, there are no references to external files for detailed information (e.g., report format details, spec format, configuration options), and the report contents section could benefit from linking to an example report file.

2 / 3

Total

7

/

12

Passed

Validation

90%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation10 / 11 Passed

Validation for skill structure

CriteriaDescriptionResult

frontmatter_unknown_keys

Unknown frontmatter key(s) found; consider removing or moving to metadata

Warning

Total

10

/

11

Passed

Reviewed

Table of Contents