C++ テストの作成/更新/修正、GoogleTest/CTest の設定、失敗またはフレーキーなテストの診断、カバレッジ/サニタイザーの追加時にのみ使用します。
87
82%
Does it follow best practices?
Impact
92%
1.22xAverage score across 3 eval scenarios
Passed
No known issues
Quality
Discovery
100%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
This is a strong, well-crafted skill description that clearly defines its scope around C++ testing with specific frameworks and tools. It explicitly states when to use the skill with a restrictive 'only when' clause, and includes natural trigger terms that users working in C++ testing environments would use. The description is concise yet comprehensive, covering the full range of C++ testing activities.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Lists multiple specific concrete actions: creating/updating/fixing C++ tests, configuring GoogleTest/CTest, diagnosing failing or flaky tests, and adding coverage/sanitizers. | 3 / 3 |
Completeness | Clearly answers both 'what' (C++ test creation/update/fix, GoogleTest/CTest configuration, diagnosing failures, adding coverage/sanitizers) and 'when' with the explicit trigger clause '〜時にのみ使用します' (use only when...), which serves as an explicit usage condition. | 3 / 3 |
Trigger Term Quality | Includes strong natural trigger terms users would say: 'C++ テスト' (C++ test), 'GoogleTest', 'CTest', 'フレーキー' (flaky), 'カバレッジ' (coverage), 'サニタイザー' (sanitizer). These cover the key terms a user working with C++ testing would naturally use. | 3 / 3 |
Distinctiveness Conflict Risk | Highly distinctive with a clear niche: C++ testing specifically with GoogleTest/CTest. The mention of specific frameworks, sanitizers, and flaky test diagnosis makes it very unlikely to conflict with general coding or other language testing skills. | 3 / 3 |
Total | 12 / 12 Passed |
Implementation
64%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This is a solid, actionable C++ testing skill with excellent executable code examples covering gtest, gmock, CMake/CTest, coverage, and sanitizers. Its main weaknesses are verbosity (redundant sections on best practices/pitfalls/guardrails, explanations of concepts Claude already knows) and a monolithic structure that would benefit from splitting advanced topics into separate files. The workflow sections could be strengthened with explicit validation checkpoints.
Suggestions
Remove or consolidate the overlapping 'Flaky Test Guardrails', 'Best Practices', and 'Common Pitfalls' sections into a single concise checklist, eliminating redundant advice.
Add explicit validation checkpoints to the debugging and coverage workflows (e.g., 'Verify coverage.info is non-empty before running genhtml', 'Re-run the specific failing test to confirm the fix before running the full suite').
Move the coverage toolchain details, sanitizer CMake configuration, and fuzzing appendix into separate referenced files (e.g., COVERAGE.md, SANITIZERS.md, FUZZING.md) to reduce the main skill's token footprint.
Remove explanations of concepts Claude already knows, such as what TDD is, what mocks vs fakes are, and basic testing principles like 'keep tests deterministic'.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is fairly comprehensive but includes some unnecessary content Claude already knows (e.g., explaining TDD loop concepts, what mocks vs fakes are, basic best practices like 'keep tests deterministic'). The 'when to use / when not to use' section and the best practices/pitfalls sections have significant overlap with the flaky test guardrails section. The document could be tightened by ~30%. | 2 / 3 |
Actionability | The skill provides fully executable, copy-paste ready code examples throughout: complete gtest/gmock tests, CMakeLists.txt configuration, bash commands for building/running/coverage/sanitizers. The CMake examples include FetchContent setup, coverage flags for both GCC and Clang, and sanitizer options—all concrete and immediately usable. | 3 / 3 |
Workflow Clarity | The TDD workflow (RED→GREEN→REFACTOR) is clearly sequenced, and the debugging section has a reasonable 4-step process. However, the debugging workflow lacks explicit validation checkpoints (e.g., 'confirm the fix by re-running the specific test before expanding'). The coverage workflow lists commands sequentially but doesn't include verification steps to confirm coverage was actually collected correctly. | 2 / 3 |
Progressive Disclosure | The content is a long monolithic document (~250+ lines) with no references to external files. Sections like the full coverage toolchain commands, sanitizer CMake options, and fuzzing appendix could be split into separate reference files. The document is reasonably well-organized with clear headers, but everything is inline rather than appropriately layered. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
100%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 11 / 11 Passed
Validation for skill structure
No warnings or errors.
928076c
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.