Investigate query evaluation failures in the Knowledge Graph synthetic data pipeline. Use when queries fail or return unexpected results after running the evaluate binary.
81
76%
Does it follow best practices?
Impact
84%
1.10xAverage score across 3 eval scenarios
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./.claude/skills/debug-clickhouse-queries/SKILL.mdQuality
Discovery
75%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description is well-structured with a clear 'Use when' clause that explicitly defines trigger conditions, and it targets a very specific domain that minimizes conflict risk. However, it could benefit from listing more concrete investigative actions and including additional natural trigger term variations that users might employ when encountering these issues.
Suggestions
Add specific concrete actions like 'analyze query logs, debug evaluation mismatches, inspect synthetic data outputs, troubleshoot binary execution errors'.
Include additional natural trigger terms users might say, such as 'eval errors', 'KG pipeline', 'query results wrong', or 'evaluate command failed'.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (Knowledge Graph synthetic data pipeline, evaluate binary) and the general action (investigate query evaluation failures), but doesn't list multiple concrete actions like debugging steps, log analysis, or specific troubleshooting procedures. | 2 / 3 |
Completeness | Clearly answers both 'what' (investigate query evaluation failures in the KG synthetic data pipeline) and 'when' (when queries fail or return unexpected results after running the evaluate binary) with an explicit 'Use when' clause. | 3 / 3 |
Trigger Term Quality | Includes relevant terms like 'query evaluation failures', 'Knowledge Graph', 'synthetic data pipeline', and 'evaluate binary', but these are fairly specialized. Missing common variations users might say like 'eval errors', 'query results wrong', 'KG pipeline broken', or 'unexpected query output'. | 2 / 3 |
Distinctiveness Conflict Risk | Highly specific niche targeting the Knowledge Graph synthetic data pipeline's evaluate binary — this is unlikely to conflict with other skills due to the very specific domain and tool references. | 3 / 3 |
Total | 10 / 12 Passed |
Implementation
77%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a strong diagnostic skill with excellent actionability — concrete SQL queries, CLI commands, and specific file paths make it immediately useful. The workflow from symptom to diagnosis to fix is well-structured with a clear checklist. The main weakness is length; several sections contain explanatory material about system internals that could be split into reference files or trimmed, and some conceptual explanations (traversal path semantics, cardinality math) add tokens without proportional value for Claude.
Suggestions
Consider splitting 'Common data generation issues' and 'Simulator configuration impact' into separate reference files, keeping only brief summaries with links in the main SKILL.md.
Trim explanatory passages like the traversal path semantics section and the cardinality math example — Claude can infer these from the code and SQL examples.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is fairly long (~200 lines) and includes some explanatory content that Claude could infer (e.g., explaining what traversal paths are, how startsWith works, basic cardinality math). However, most content is domain-specific and genuinely useful for this codebase. Some sections like 'Traversal path semantics' and 'Association iteration direction' could be tightened. | 2 / 3 |
Actionability | Provides concrete, executable SQL queries, bash commands, and specific file paths throughout. The `orbit query` examples are copy-paste ready with multiple invocation patterns. The debugging checklist and diagnostic queries are immediately actionable. | 3 / 3 |
Workflow Clarity | The skill presents a clear diagnostic workflow: check data → inspect generated SQL → distinguish bug types → hypothesis test → fix → regenerate. The 'Debugging checklist for empty results' provides an explicit ordered sequence with validation at each step. The 'Distinguishing bug types' section provides clear decision criteria for categorizing failures. | 3 / 3 |
Progressive Disclosure | The content is well-structured with clear headers and logical sections, but it's a long monolithic file that could benefit from splitting detailed reference material (e.g., 'Common data generation issues', 'Simulator configuration impact') into separate files. The 'Places to investigate' section is a good reference index but the surrounding content is quite dense for a single SKILL.md. | 2 / 3 |
Total | 10 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
f5efc36
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.