Test trading strategies on historical data with Monte Carlo simulation
61
52%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./src/skills/bundled/backtest/SKILL.mdQuality
Discovery
40%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description identifies a clear and distinctive niche—backtesting trading strategies with Monte Carlo simulation—but is too terse to be fully effective. It lacks explicit trigger guidance ('Use when...') and misses common user-facing keywords like 'backtest' or 'backtesting'. Adding a when-clause and more concrete actions would significantly improve skill selection accuracy.
Suggestions
Add an explicit 'Use when...' clause, e.g., 'Use when the user asks to backtest, simulate, or evaluate trading strategies against historical market data.'
Include common trigger term variations users would naturally say: 'backtest', 'backtesting', 'strategy simulation', 'portfolio testing', 'risk simulation'.
List more specific concrete actions, e.g., 'Runs backtests on trading strategies using historical price data, performs Monte Carlo simulations to estimate risk and return distributions, and generates performance metrics like Sharpe ratio and max drawdown.'
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (trading strategies, historical data) and a specific technique (Monte Carlo simulation), but doesn't list multiple concrete actions beyond 'test'. Missing details like what outputs are produced, what inputs are accepted, or what specific operations are performed. | 2 / 3 |
Completeness | Describes what it does (test trading strategies with Monte Carlo simulation on historical data) but completely lacks a 'Use when...' clause or any explicit trigger guidance for when Claude should select this skill. Per rubric guidelines, a missing 'Use when...' clause caps completeness at 2, and the 'what' is also only partially described, warranting a 1. | 1 / 3 |
Trigger Term Quality | Includes some strong natural keywords like 'trading strategies', 'historical data', 'Monte Carlo simulation', and 'backtest' is implied by 'test trading strategies'. However, it misses common variations users might say such as 'backtest', 'backtesting', 'strategy testing', 'portfolio simulation', 'risk analysis', or 'stock'. | 2 / 3 |
Distinctiveness Conflict Risk | The combination of trading strategies, historical data backtesting, and Monte Carlo simulation is a very specific niche that is unlikely to conflict with other skills. The domain is narrow and well-defined enough to be clearly distinguishable. | 3 / 3 |
Total | 8 / 12 Passed |
Implementation
64%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
The skill provides strong, actionable TypeScript API examples covering all major backtesting features, making it highly executable. However, it lacks a clear end-to-end workflow with validation checkpoints (e.g., checking for overfitting before trusting results), and includes some unnecessary explanatory content like the metrics definitions table. The document would benefit from being split into a concise overview with references to detailed API docs.
Suggestions
Add an explicit end-to-end workflow section showing the recommended sequence: run basic backtest → check metrics for red flags → run walk-forward to validate → run Monte Carlo for stress testing, with validation checkpoints at each step.
Remove or significantly trim the 'Metrics Explained' table — Claude already knows what Sharpe Ratio and Win Rate mean; at most keep the 'Good Value' column as a quick reference.
Split the detailed API reference (createBacktestEngine, walkForward, monteCarlo, custom strategies) into a separate REFERENCE.md file, keeping SKILL.md as a concise overview with quick-start examples and links.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The content is reasonably efficient but includes some unnecessary verbosity — the extensive console.log blocks for metrics are repetitive, and the metrics table explains concepts Claude already knows (e.g., what Sharpe Ratio or Win Rate mean). The 'Metrics Explained' section and 'Best Practices' section add marginal value. | 2 / 3 |
Actionability | The skill provides fully executable TypeScript code examples for every major feature — creating the engine, running backtests, walk-forward analysis, Monte Carlo simulation, custom strategies, and chat commands. Code is copy-paste ready with specific parameters and configuration options. | 3 / 3 |
Workflow Clarity | While individual API calls are clear, there's no explicit multi-step workflow showing the recommended sequence (e.g., run backtest → validate results → run walk-forward → run Monte Carlo). There are no validation checkpoints or error handling guidance for when backtests produce suspicious results or fail. | 2 / 3 |
Progressive Disclosure | The content is structured with clear sections and a table of built-in strategies, but it's a monolithic document (~180 lines) that could benefit from splitting the API reference, strategy definitions, and metrics into separate files. No references to external files for deeper content. | 2 / 3 |
Total | 9 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
e71a5f6
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.