Review Sentry Python and Django changes for bug patterns drawn from real production issues. Use when reviewing a backend diff or PR, checking Warden findings, auditing the current branch, reviewing production-error patterns, or looking for common regressions in `src/` and `tests/`.
87
83%
Does it follow best practices?
Impact
94%
0.97xAverage score across 3 eval scenarios
Passed
No known issues
Quality
Discovery
89%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong description with excellent completeness and distinctiveness, clearly scoped to Sentry Python/Django code review with explicit trigger scenarios. Its main weakness is that the 'what' portion could be more specific about the concrete bug patterns or actions performed, rather than staying at the level of 'review for bug patterns'. The trigger terms are well-chosen and cover natural user language.
Suggestions
Add specific examples of the bug patterns detected (e.g., 'Detects N+1 queries, missing error handlers, unsafe migrations, race conditions') to improve specificity from general 'bug patterns' to concrete actions.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | It names the domain (Sentry Python/Django) and the general action (review for bug patterns from production issues), but doesn't list specific concrete actions like 'detect N+1 queries, flag missing error handling, check for race conditions'. The actions remain at a high level ('review', 'check', 'audit'). | 2 / 3 |
Completeness | Clearly answers both 'what' (review Sentry Python and Django changes for bug patterns from real production issues) and 'when' (explicit 'Use when' clause listing five specific trigger scenarios: reviewing diffs/PRs, checking Warden findings, auditing branches, reviewing production-error patterns, looking for regressions). | 3 / 3 |
Trigger Term Quality | Includes strong natural trigger terms users would say: 'diff', 'PR', 'Warden findings', 'production-error patterns', 'regressions', 'src/', 'tests/', 'backend', 'Sentry', 'Django'. These cover multiple natural ways a user might phrase their request. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive due to the specific niche of Sentry Python/Django bug pattern review, references to Warden findings, and specific paths like 'src/' and 'tests/'. Unlikely to conflict with generic code review or other language-specific skills. | 3 / 3 |
Total | 11 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong, well-structured skill that encodes specific production knowledge into an actionable review workflow. Its greatest strength is the concrete, pattern-based checks with real code examples for both red flags and safe patterns, plus explicit guidance on what NOT to flag (reducing false positives). The main weakness is that it's somewhat long for a SKILL.md — the detailed checks could potentially live in the referenced files — and the referenced bundle files are not provided, making it impossible to verify the progressive disclosure structure works end-to-end.
Suggestions
Consider moving the detailed check descriptions (red flags, safe patterns) into the referenced files and keeping only a summary table with one-line descriptions in SKILL.md, since the references are already defined in Step 1.
Trim the preamble statistics ('638 real production issues', '27 million error events') — Claude doesn't need persuasion to follow instructions, and this adds tokens without changing behavior.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is fairly long (~300 lines) but most content earns its place — the pattern checks encode specific, non-obvious production knowledge Claude wouldn't have. However, the preamble about '638 real production issues' and event counts, while lending credibility, is context Claude doesn't need to act on. Some checks repeat similar safe patterns (e.g., try/except DoesNotExist appears in multiple checks). The 'Not a bug' callouts are valuable and prevent false positives, which justifies their inclusion. | 2 / 3 |
Actionability | Each check provides specific red flags with concrete code patterns (e.g., `Model.objects.get(id=some_id)` without try/except), concrete safe patterns with actual code snippets, and clear guidance on what to report vs. skip. The confidence table gives precise criteria for action. Fix suggestions are required to include actual code. The instruction to trace data flow using Read and Grep is concrete and executable. | 3 / 3 |
Workflow Clarity | The three-step workflow (Classify → Check Patterns → Report) is clearly sequenced with explicit decision points. Step 1 has a classification table mapping code types to references. Step 2 is ordered by impact. The confidence table provides clear validation criteria (HIGH/MEDIUM/LOW with specific actions). The instruction to stop and report zero findings when nothing matches is an important validation checkpoint that prevents false positives. | 3 / 3 |
Progressive Disclosure | The skill references 8 external reference files (e.g., `references/missing-records.md`, `references/null-and-type-errors.md`) which is good progressive disclosure design. However, no bundle files were provided, so we cannot verify these references exist or are well-structured. The main SKILL.md itself is quite long — some of the detailed check patterns could potentially live in the referenced files rather than being duplicated in the main body, though having them inline does make the skill self-contained for quick scanning. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
552fb5c
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.