backtest

Test trading strategies on historical data with Monte Carlo simulation

Quality

52%

Does it follow best practices?

Run evals on this skill

Adds up to 20 points to the overall score

View guide

Securityby

Passed

No findings from the security scan

Fix and improve this skill with Tessl

tessl review fix ./src/skills/bundled/backtest/SKILL.md

Quality

Content

64%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The skill provides strong, actionable TypeScript API examples covering all major backtesting features, making it highly executable. However, it lacks a clear end-to-end workflow with validation checkpoints (e.g., checking for overfitting before trusting results), and includes some unnecessary explanatory content like the metrics definitions table. The document would benefit from being split into a concise overview with references to detailed API docs.

Suggestions

Add an explicit end-to-end workflow section showing the recommended sequence: run basic backtest → check metrics for red flags → run walk-forward to validate → run Monte Carlo for stress testing, with validation checkpoints at each step.

Remove or significantly trim the 'Metrics Explained' table — Claude already knows what Sharpe Ratio and Win Rate mean; at most keep the 'Good Value' column as a quick reference.

Split the detailed API reference (createBacktestEngine, walkForward, monteCarlo, custom strategies) into a separate REFERENCE.md file, keeping SKILL.md as a concise overview with quick-start examples and links.

Dimension	Reasoning	Score
Conciseness	The content is reasonably efficient but includes some unnecessary verbosity — the extensive console.log blocks for metrics are repetitive, and the metrics table explains concepts Claude already knows (e.g., what Sharpe Ratio or Win Rate mean). The 'Metrics Explained' section and 'Best Practices' section add marginal value.	2 / 3
Actionability	The skill provides fully executable TypeScript code examples for every major feature — creating the engine, running backtests, walk-forward analysis, Monte Carlo simulation, custom strategies, and chat commands. Code is copy-paste ready with specific parameters and configuration options.	3 / 3
Workflow Clarity	While individual API calls are clear, there's no explicit multi-step workflow showing the recommended sequence (e.g., run backtest → validate results → run walk-forward → run Monte Carlo). There are no validation checkpoints or error handling guidance for when backtests produce suspicious results or fail.	2 / 3
Progressive Disclosure	The content is structured with clear sections and a table of built-in strategies, but it's a monolithic document (~180 lines) that could benefit from splitting the API reference, strategy definitions, and metrics into separate files. No references to external files for deeper content.	2 / 3
	Total	9 / 12 Passed

Description

40%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description identifies a clear and distinctive niche—backtesting trading strategies with Monte Carlo simulation—but is too terse to be fully effective. It lacks explicit trigger guidance ('Use when...') and misses common user-facing keywords like 'backtest' or 'backtesting'. Adding a when-clause and more concrete actions would significantly improve skill selection accuracy.

Suggestions

Add an explicit 'Use when...' clause, e.g., 'Use when the user asks to backtest, simulate, or evaluate trading strategies against historical market data.'

Include common trigger term variations users would naturally say: 'backtest', 'backtesting', 'strategy simulation', 'portfolio testing', 'risk simulation'.

List more specific concrete actions, e.g., 'Runs backtests on trading strategies using historical price data, performs Monte Carlo simulations to estimate risk and return distributions, and generates performance metrics like Sharpe ratio and max drawdown.'

Dimension	Reasoning	Score
Specificity	Names the domain (trading strategies, historical data) and a specific technique (Monte Carlo simulation), but doesn't list multiple concrete actions beyond 'test'. Missing details like what outputs are produced, what inputs are accepted, or what specific operations are performed.	2 / 3
Completeness	Describes what it does (test trading strategies with Monte Carlo simulation on historical data) but completely lacks a 'Use when...' clause or any explicit trigger guidance for when Claude should select this skill. Per rubric guidelines, a missing 'Use when...' clause caps completeness at 2, and the 'what' is also only partially described, warranting a 1.	1 / 3
Trigger Term Quality	Includes some strong natural keywords like 'trading strategies', 'historical data', 'Monte Carlo simulation', and 'backtest' is implied by 'test trading strategies'. However, it misses common variations users might say such as 'backtest', 'backtesting', 'strategy testing', 'portfolio simulation', 'risk analysis', or 'stock'.	2 / 3
Distinctiveness Conflict Risk	The combination of trading strategies, historical data backtesting, and Monte Carlo simulation is a very specific niche that is unlikely to conflict with other skills. The domain is narrow and well-defined enough to be clearly distinguishable.	3 / 3
	Total	8 / 12 Passed

Validation

90%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 10 / 11 Passed

Validation for skill structure

Criteria	Description	Result
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	10 / 11 Passed

Repository: alsk1992/CloddsBot
Path: src/skills/bundled/backtest/SKILL.md
Commit: e71a5f6

Reviewed: about 12 hours ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.