Research prediction markets - base rates, resolution rules, historical data
41
27%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
Optimize this skill with Tessl
npx tessl skill review --optimize ./src/skills/bundled/research/SKILL.mdQuality
Discovery
32%Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.
The description identifies a reasonably niche domain (prediction markets) and lists some relevant subtopics, but it reads more like a tag list than a proper skill description. It critically lacks any 'Use when...' guidance, and the actions described are vague topic areas rather than concrete capabilities. Adding explicit trigger conditions and more specific action verbs would significantly improve skill selection accuracy.
Suggestions
Add an explicit 'Use when...' clause, e.g., 'Use when the user asks about prediction market probabilities, forecasting questions, or wants to analyze market odds and historical accuracy.'
Replace the topic list with concrete action verbs describing what the skill does, e.g., 'Researches prediction market questions by calculating base rates, analyzing resolution criteria, and comparing historical forecasting data.'
Include common user-facing trigger terms and platform names like 'forecasting', 'Polymarket', 'Metaculus', 'probabilities', 'calibration', 'betting odds' to improve keyword coverage.
| Dimension | Reasoning | Score |
|---|---|---|
Specificity | Names the domain (prediction markets) and some actions (research base rates, resolution rules, historical data), but these are more like topic areas than concrete actions. It doesn't specify what it actually does with these (e.g., 'calculates base rates', 'retrieves resolution criteria', 'analyzes historical accuracy'). | 2 / 3 |
Completeness | Describes a rough 'what' (research prediction markets topics) but completely lacks a 'when' clause. There is no 'Use when...' or equivalent trigger guidance, which per the rubric should cap completeness at 2, and since the 'what' is also weak, this scores a 1. | 1 / 3 |
Trigger Term Quality | Includes relevant domain keywords like 'prediction markets', 'base rates', 'resolution rules', and 'historical data' which users might naturally say. However, it misses common variations like 'forecasting', 'Polymarket', 'Metaculus', 'probabilities', 'calibration', or 'betting markets'. | 2 / 3 |
Distinctiveness Conflict Risk | The prediction markets domain is fairly niche, which helps distinctiveness. However, 'research' and 'historical data' are generic enough that they could overlap with general research or data analysis skills. The lack of explicit trigger boundaries increases conflict risk. | 2 / 3 |
Total | 7 / 12 Passed |
Implementation
22%Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.
This skill reads more like a feature specification or product mockup than an actionable skill for Claude. It defines commands and shows idealized outputs but provides no concrete guidance on how to actually perform research — no data sources, no tools, no APIs, no verification methods. The base rate numbers in examples appear fabricated with no sourcing methodology, which is particularly problematic for a research-focused skill.
Suggestions
Add concrete, actionable steps for how to actually research base rates — specify data sources to query, tools to use (web search, specific databases), and how to validate findings.
Include a clear workflow: e.g., 1) Search for historical data, 2) Verify with multiple sources, 3) Calculate rates, 4) Flag uncertainty/confidence levels.
Define what the /baserate, /resolution, and /history commands actually do mechanically — are these web searches? Database lookups? Clarify the tools Claude should use.
Add explicit guidance on how to handle uncertainty and when to caveat findings, especially since prediction market research involves probabilistic claims that could be misleading if presented without proper sourcing.
| Dimension | Reasoning | Score |
|---|---|---|
Conciseness | The skill is reasonably concise but includes some unnecessary enumeration of research areas (political, economic, sports) that Claude already knows how to categorize. The examples section partially duplicates what the output format already shows. | 2 / 3 |
Actionability | The skill provides no executable code, no actual data sources, no API endpoints, and no concrete methods for looking up base rates or resolution rules. The commands (/baserate, /resolution, /history) are defined but there's no implementation or guidance on how to actually perform the research — it just shows idealized outputs with no path to producing them. | 1 / 3 |
Workflow Clarity | There is no clear workflow or sequence of steps for conducting research. The skill lists commands and example outputs but never explains how to go from a query to a result — no data sources to check, no verification steps, no process for validating base rate calculations. | 1 / 3 |
Progressive Disclosure | The content is organized into logical sections (commands, research areas, examples, output format) which provides some structure. However, there are no references to external files for deeper content, and the research areas section is a flat list that could either be removed or expanded into separate reference documents. | 2 / 3 |
Total | 6 / 12 Passed |
Validation
90%Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.
Validation — 10 / 11 Passed
Validation for skill structure
| Criteria | Description | Result |
|---|---|---|
frontmatter_unknown_keys | Unknown frontmatter key(s) found; consider removing or moving to metadata | Warning |
Total | 10 / 11 Passed | |
e71a5f6
Table of Contents
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.