Build robust, production-grade backtesting systems that avoid common pitfalls and produce reliable strategy performance estimates.
36
33%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/backtesting-frameworks/SKILL.mdQuality
Discovery
32%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description identifies a clear domain (backtesting) but remains too high-level, reading more like a marketing tagline than a functional skill description. It lacks specific concrete actions, natural trigger term variations, and critically missing an explicit 'Use when...' clause to guide skill selection.
Suggestions
Add a 'Use when...' clause with trigger terms like 'backtest', 'trading strategy', 'historical simulation', 'strategy evaluation', 'portfolio backtest'.
List specific concrete actions such as 'simulate trades against historical data, model slippage and transaction costs, calculate performance metrics like Sharpe ratio and max drawdown, detect look-ahead bias and survivorship bias'.
Include common file types or frameworks users might mention, such as 'CSV price data', 'OHLCV data', or specific libraries like 'zipline', 'backtrader' to improve trigger term coverage.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (backtesting systems) and mentions some goals (avoid pitfalls, reliable performance estimates), but does not list specific concrete actions like 'simulate trades', 'calculate Sharpe ratios', 'handle slippage modeling', etc. | 2 / 3 |
Completeness | Describes what it does at a high level but completely lacks a 'Use when...' clause or any explicit trigger guidance for when Claude should select this skill. Per rubric guidelines, a missing 'Use when...' clause caps completeness at 2, and the 'what' is also fairly vague, warranting a score of 1. | 1 / 3 |
Trigger Term Quality | Includes 'backtesting' which is a strong trigger term, and 'strategy performance' is relevant, but misses common variations users might say like 'backtest', 'trading strategy', 'historical simulation', 'portfolio testing', 'quantitative finance', or 'strategy evaluation'. | 2 / 3 |
Distinctiveness Conflict Risk | The term 'backtesting' is fairly niche and specific to quantitative finance, which helps distinguish it, but the description is broad enough ('production-grade systems') that it could overlap with general software engineering or trading system skills. | 2 / 3 |
Total | 7 / 12 Passed |
Implementation
35%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill reads as a high-level outline or table of contents rather than an actionable guide. The instructions are abstract descriptions of what to do without any concrete code, specific commands, or worked examples. While the structure is reasonable and it appropriately references a playbook for details, the SKILL.md body itself provides insufficient standalone guidance for Claude to act on.
Suggestions
Add at least one concrete, executable code example (e.g., a minimal event-driven backtest loop in Python) to make the instructions actionable rather than abstract.
Include explicit validation checkpoints in the workflow, such as 'Verify no future data leakage by checking that all features use only data available at decision time' or a concrete data leakage detection snippet.
Replace vague instructions like 'Build point-in-time data pipelines and realistic cost models' with specific guidance — e.g., show a concrete cost model implementation or a point-in-time join pattern.
Provide the referenced `resources/implementation-playbook.md` bundle file so the progressive disclosure actually delivers on its promise, or inline the most critical patterns directly in the SKILL.md.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is relatively brief but includes some unnecessary sections like 'Use this skill when' / 'Do not use this skill when' and 'Limitations' that are somewhat generic and boilerplate. The core instructions are lean but the surrounding scaffolding adds tokens without much value. | 2 / 3 |
Actionability | The instructions are vague and abstract — 'Build point-in-time data pipelines and realistic cost models' and 'Implement event-driven simulation and execution logic' describe what to do at a high level but provide no concrete code, commands, specific examples, or executable guidance. It reads like a table of contents rather than actionable instructions. | 1 / 3 |
Workflow Clarity | There is a rough sequence implied (define hypothesis → build pipelines → implement simulation → use splits/walk-forward), but there are no explicit validation checkpoints, no feedback loops for error recovery, and no concrete verification steps. For a multi-step process like backtesting with potential for data leakage and bias, this lacks critical validation gates. | 2 / 3 |
Progressive Disclosure | The skill references `resources/implementation-playbook.md` for detailed patterns, which is a reasonable one-level-deep reference. However, no bundle files were provided, so the reference cannot be verified, and the SKILL.md itself is too thin — it delegates almost all substance to the playbook without providing enough standalone value in the overview. | 2 / 3 |
Total | 7 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
f5dc9e3
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.