CtrlK
BlogDocsLog inGet started
Tessl Logo

launchdarkly-metric-choose

Choose the right metrics for a LaunchDarkly experiment, guarded rollout, or release policy. Use when the user wants to know which metrics to use, which is the primary metric for an experiment, what guardrails to add, or which events to monitor in a rollout. Surfaces what will auto-attach from existing release policies before making additional recommendations.

93

1.16x
Quality

92%

Does it follow best practices?

Impact

92%

1.16x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Content

85%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a strong, well-structured advisory skill that provides clear workflows branching by context type, concrete MCP tool usage, and specific decision criteria for metric selection. Its main weakness is moderate verbosity — some explanatory prose and philosophical guidance could be trimmed without losing clarity. The important context section is excellent, surfacing non-obvious gotchas (CUPED/percentile incompatibility, context kind mismatches) that would otherwise cause real problems.

DimensionReasoningScore

Conciseness

The skill is well-structured but somewhat verbose in places — the explanatory prose around each context type (experiment, guarded rollout, release policy) could be tightened. Some guidance like 'Guarded rollouts are safety mechanisms, not experiments' and 'The discipline of choosing is the point' are stylistic rather than instructional. However, it largely avoids explaining things Claude already knows and most content earns its place.

2 / 3

Actionability

The skill provides specific MCP tool calls (list-metrics, list-metric-events, list-release-policies), concrete output formats to display to users, explicit criteria for metric selection (e.g., 'Only recommend metrics with events actively flowing'), and typed recommendation categories with examples. The guidance is specific enough to execute without ambiguity.

3 / 3

Workflow Clarity

The 5-step workflow is clearly sequenced with explicit branching by context type (experiment/guarded rollout/release policy). Step 2 has a clear skip condition for experiments. Step 3 includes a health-check validation gate before recommendations. The 'Important Context' section serves as a validation checklist covering edge cases like mid-experiment changes, CUPED incompatibility, and context kind mismatches.

3 / 3

Progressive Disclosure

The skill is appropriately self-contained as an advisory document, with clear one-level-deep references to related skills (metric-create, metric-instrument) at the end. Content is well-organized with headers, tables, and collapsible context sections. The document doesn't try to inline creation or instrumentation workflows — it defers those cleanly.

3 / 3

Total

11

/

12

Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that clearly defines its scope (LaunchDarkly metric selection), provides explicit trigger conditions via a 'Use when...' clause, and includes natural keywords users would employ. It also adds a distinguishing detail about surfacing auto-attached metrics from release policies, which further clarifies its unique value.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: choosing metrics, identifying primary metrics for experiments, adding guardrails, monitoring events in rollouts, and surfacing auto-attached metrics from existing release policies.

3 / 3

Completeness

Clearly answers both 'what' (choose the right metrics, surface auto-attached metrics, make recommendations) and 'when' (explicit 'Use when...' clause covering multiple trigger scenarios like wanting to know which metrics to use, primary metric selection, guardrail addition, and event monitoring).

3 / 3

Trigger Term Quality

Includes strong natural trigger terms users would say: 'metrics', 'experiment', 'primary metric', 'guardrails', 'rollout', 'release policy', 'events to monitor', 'LaunchDarkly'. These cover the natural vocabulary of someone working with LaunchDarkly experimentation.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive with a clear niche: LaunchDarkly metric selection for experiments and rollouts. The specificity to LaunchDarkly, combined with the focus on metric selection (not flag creation or general configuration), makes it unlikely to conflict with other skills.

3 / 3

Total

12

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
launchdarkly/ai-tooling
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.