Automated code review for pull requests using multiple specialized agents
69
55%
Does it follow best practices?
Impact
99%
1.50xAverage score across 3 eval scenarios
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./.claude/skills/code-review/SKILL.mdQuality
Discovery
32%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description identifies the domain (code review for PRs) and hints at a distinctive mechanism (multiple specialized agents), but it lacks concrete action details and completely omits trigger guidance. It reads more like a tagline than a functional description that would help Claude reliably select this skill from a large pool.
Suggestions
Add a 'Use when...' clause with explicit triggers, e.g., 'Use when the user asks to review a pull request, PR, merge request, or wants automated code feedback on a diff.'
List specific concrete actions the skill performs, e.g., 'Analyzes pull request diffs for bugs, style issues, security vulnerabilities, and performance concerns. Posts inline comments and summary reviews.'
Include common keyword variations users might say: 'PR', 'merge request', 'review my code', 'code feedback', 'diff review'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (code review, pull requests) and mentions 'multiple specialized agents' as a mechanism, but doesn't list specific concrete actions like commenting on PRs, checking style, identifying bugs, suggesting fixes, etc. | 2 / 3 |
Completeness | Describes what it does at a high level (automated code review) but completely lacks a 'Use when...' clause or any explicit trigger guidance for when Claude should select this skill. Per rubric guidelines, missing 'Use when' caps completeness at 2, and the 'what' is also weak, so this scores a 1. | 1 / 3 |
Trigger Term Quality | Includes relevant terms like 'code review' and 'pull requests' which users would naturally say, but misses common variations like 'PR', 'review my code', 'PR feedback', 'merge request', or file type triggers. | 2 / 3 |
Distinctiveness Conflict Risk | The combination of 'pull requests' and 'multiple specialized agents' provides some distinctiveness, but 'code review' is broad enough to overlap with general code analysis or linting skills. The 'agents' aspect helps differentiate somewhat. | 2 / 3 |
Total | 7 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong, highly actionable skill that provides a clear multi-agent code review workflow with excellent validation checkpoints and explicit criteria for issue flagging. Its main weakness is moderate verbosity—some content is repeated (no-issues format, false positive guidance) and the monolithic structure could benefit from splitting detailed reference material into separate files. The workflow clarity is exemplary with its flag-validate-filter pattern preventing false positive comments.
Suggestions
Extract the false positive list and link formatting rules into separate referenced files (e.g., FALSE_POSITIVES.md, FORMATTING.md) to reduce the main skill's length and improve progressive disclosure.
Remove the duplicate 'no issues found' format specification—it appears both in step 7 and in the Notes section.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is fairly detailed and necessarily so given the complexity of the multi-agent workflow, but there is some redundancy (e.g., the false positive list is mentioned twice, the 'no issues found' format is specified twice, and some instructions could be tightened). It doesn't over-explain concepts Claude knows, but it's not maximally lean. | 2 / 3 |
Actionability | The skill provides highly specific, concrete instructions: exact agent types to use (haiku/sonnet/opus), specific CLI commands (gh pr view, gh pr comment, gh pr review), precise formatting for GitHub links, explicit criteria for what to flag and what not to flag, and clear decision logic at each step. This is copy-paste-ready operational guidance. | 3 / 3 |
Workflow Clarity | The 9-step workflow is clearly sequenced with explicit validation checkpoints (step 1 gates the entire process, step 5 validates issues from step 4, step 6 filters, step 7 branches based on results). The feedback loop of flag → validate → filter is a strong pattern for avoiding false positives in a destructive operation (posting public review comments). | 3 / 3 |
Progressive Disclosure | The content is a single monolithic file with no references to supporting documents. For a skill of this complexity (~80+ lines with detailed agent specifications, false positive lists, and formatting rules), some content could be split into referenced files (e.g., a separate file for the false positive criteria, link formatting rules, or agent specifications). However, the sections are well-organized with clear headers and formatting. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
81%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 9 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
allowed_tools_field | 'allowed-tools' contains unusual tool name(s) | Warning |
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 9 / 11 Passed | |
ea296ba
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.