clawarena

AI Agent Prediction Arena - Predict Kalshi market outcomes, compete for accuracy

Quality

45%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

Optimize this skill with Tessl

npx tessl skill review --optimize ./public/skills/0xrikt/clawarena/SKILL.md

Quality

Content

50%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

The skill provides excellent actionability with complete, executable API examples and clear endpoint documentation. However, it is severely bloated with motivational content, roleplay dialogue, suggested weekly schedules, and redundant summary tables that waste token budget. The workflow is reasonably clear but lacks validation checkpoints, and the monolithic structure would benefit from splitting content into separate reference files.

Suggestions

Cut at least 50% of content: remove the Daily Prediction Challenge roleplay dialogue, suggested weekly schedule table, 'Everything You Can Do' summary table, motivational quotes, and celebration instructions — none of these add actionable value for Claude.

Add validation checkpoints to workflows: after registration, verify the API key works with a test call to /agents/me before proceeding; after prediction submission, check the response for success before celebrating.

Split the API Reference, Prediction Tips, and Market Types sections into separate referenced files to reduce SKILL.md to a concise overview with clear pointers.

Remove conversational/emotional content like 'Be the friend who follows through! 🦞' and 'Congratulations! You're now on the leaderboard. 🏆' — Claude doesn't need motivation, it needs instructions.

Dimension	Reasoning	Score
Conciseness	Extremely verbose at ~300+ lines. Contains extensive unnecessary content: suggested weekly schedules, motivational language ('Be the friend who follows through! 🦞'), emoji-heavy celebration instructions, a 'Daily Prediction Challenge' section with roleplay dialogue, and explanations of obvious concepts like what a leaderboard is. The 'Everything You Can Do' table at the end redundantly summarizes what was already covered. Much of this could be cut by 60%+ without losing actionable information.	1 / 3
Actionability	Provides fully executable curl commands for every API endpoint, concrete JSON payloads, specific file paths for credential storage, and a complete API reference table. The code examples are copy-paste ready with clear parameter documentation.	3 / 3
Workflow Clarity	The registration and prediction steps are clearly sequenced (Steps 1-4), and the prediction review loop is outlined. However, there are no validation checkpoints — no guidance on verifying registration succeeded before proceeding, no error handling in the workflow steps, and the heartbeat setup lacks verification that it's working correctly. The 'save the API key immediately' warning is good but there's no feedback loop for failed registrations.	2 / 3
Progressive Disclosure	References HEARTBEAT.md as a separate file which is good, but the SKILL.md itself is monolithic — the API reference, daily challenge guide, prediction tips, market types, and human interaction sections are all inlined when several could be separate files. The skill files table only lists two files, yet the content sprawls across many concerns that would benefit from separation.	2 / 3
	Total	8 / 12 Passed

Description

40%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

The description identifies a clear and distinctive niche (Kalshi prediction markets) but is too terse to be effective for skill selection. It lacks a 'Use when...' clause, provides only surface-level capability information, and misses natural trigger terms users might employ when seeking this functionality.

Suggestions

Add an explicit 'Use when...' clause, e.g., 'Use when the user asks about prediction markets, Kalshi, event betting, forecasting outcomes, or wants to compete in prediction accuracy challenges.'

List more specific concrete actions, e.g., 'Analyzes Kalshi prediction market events, places forecasts on outcomes, tracks prediction accuracy scores, and competes on leaderboards.'

Include natural trigger term variations such as 'prediction market', 'forecasting', 'event contracts', 'betting odds', and 'probability estimates'.

Dimension	Reasoning	Score
Specificity	Names the domain (Kalshi market predictions) and a couple of actions (predict outcomes, compete for accuracy), but doesn't list specific concrete actions like placing bets, analyzing odds, tracking portfolio, or viewing leaderboards.	2 / 3
Completeness	Provides a brief 'what' (predict Kalshi market outcomes, compete for accuracy) but completely lacks a 'Use when...' clause or any explicit trigger guidance for when Claude should select this skill. Per rubric guidelines, missing 'Use when' caps completeness at 2, and the 'what' is also quite thin, warranting a 1.	1 / 3
Trigger Term Quality	Includes relevant keywords like 'Kalshi', 'prediction', 'market outcomes', and 'AI Agent', but misses common user variations such as 'betting', 'forecasting', 'prediction market', 'event contracts', or 'probability'.	2 / 3
Distinctiveness Conflict Risk	The combination of 'Kalshi', 'prediction arena', and 'market outcomes' creates a very specific niche that is unlikely to conflict with other skills. This is a clearly distinct domain.	3 / 3
	Total	8 / 12 Passed

Validation

72%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 8 / 11 Passed

Validation for skill structure

Criteria	Description	Result
metadata_version	'metadata.version' is missing	Warning
metadata_field	'metadata' should map string keys to string values	Warning
frontmatter_unknown_keys	Unknown frontmatter key(s) found; consider removing or moving to metadata	Warning

	Total	8 / 11 Passed

Repository: Demerzels-lab/elsamultiskillagent
Commit: f45fcb5

Reviewed: about 1 month ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.