CtrlK
BlogDocsLog inGet started
Tessl Logo

metrics-queries

How to query OpenTelemetry metrics datasets in Honeycomb correctly. Metrics datasets follow different rules from trace/event datasets — many operations (bare COUNT, RATE_SUM, RATE_AVG, RATE_MAX, CONCURRENCY) are forbidden, temporal aggregation is automatic, and each metric has its own attributes. Use this skill when querying a metrics dataset (gauges, counters, histograms, sums), asking about temporal aggregation (RATE, INCREASE, SUMMARIZE, LAST), finding the metrics dataset or discovering metric names and attributes, debugging unexpected metrics query results, or querying infrastructure metrics like CPU, memory, disk I/O, or network stats. Do NOT use for instrumenting metrics (use otel-instrumentation), querying event datasets with "metrics" in their name, or conceptual questions (use observability-fundamentals).

94

1.68x
Quality

92%

Does it follow best practices?

Impact

96%

1.68x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Quality

Content

85%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a strong, well-structured skill that provides actionable guidance for querying metrics in Honeycomb. It excels at distinguishing metrics datasets from event datasets, providing executable query examples, and clearly documenting forbidden vs. allowed operations with concrete alternatives. Minor redundancy between the main sections and the Common Pitfalls section slightly impacts conciseness, but overall the content is highly effective.

DimensionReasoningScore

Conciseness

The content is mostly efficient and avoids explaining basic concepts Claude would know, but some sections are slightly verbose — e.g., the 'Common Pitfalls' section repeats information already covered in earlier sections (forbidden operations, histogram sub-fields, dataset identification). The tables and structure help, but there's some redundancy that could be tightened.

2 / 3

Actionability

The skill provides fully executable JSON query examples, specific API calls to make (get_environment, get_dataset_columns with metric_name parameter), concrete field names, and clear tables mapping goals to operations. The calculated field examples for temporal aggregation overrides and query math patterns are copy-paste ready.

3 / 3

Workflow Clarity

Multi-step workflows are clearly sequenced with explicit validation checkpoints: the dataset identification workflow (verify via get_environment → check dataset_type), the metric discovery workflow (find metric names → find attributes for specific metric → validate before querying), and the temporal aggregation override workflow all have clear sequences. The 'validate before querying' step in discovery and 'do not guess the dataset' constraint serve as important checkpoints.

3 / 3

Progressive Disclosure

The skill provides a comprehensive but scannable overview with well-signaled one-level-deep references to three specific reference files (temporal-aggregation.md, metrics-query-examples.md, metric-types.md) and four cross-references to related skills. The main content covers what's needed for most queries while pointing to deeper material for advanced topics.

3 / 3

Total

11

/

12

Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that clearly defines its scope, provides rich trigger terms covering both technical operations and user-facing concepts, and explicitly delineates boundaries with related skills. The 'Do NOT use for' section is particularly valuable for reducing conflict risk in a multi-skill environment. The description is comprehensive yet focused, using third-person voice throughout.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions and concepts: querying metrics datasets, forbidden operations (bare COUNT, RATE_SUM, etc.), temporal aggregation, discovering metric names/attributes, debugging unexpected query results, and specific infrastructure metrics like CPU, memory, disk I/O, network stats.

3 / 3

Completeness

Clearly answers both 'what' (querying OpenTelemetry metrics datasets in Honeycomb, explaining forbidden operations, temporal aggregation rules) and 'when' with an explicit 'Use this skill when...' clause listing five specific trigger scenarios, plus a 'Do NOT use for...' section that further clarifies boundaries.

3 / 3

Trigger Term Quality

Excellent coverage of natural terms users would say: 'metrics dataset', 'gauges', 'counters', 'histograms', 'sums', 'temporal aggregation', 'RATE', 'INCREASE', 'CPU', 'memory', 'disk I/O', 'network stats', 'Honeycomb', 'OpenTelemetry'. These are terms users would naturally use when facing these problems.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive with clear niche (metrics datasets in Honeycomb specifically), and explicitly differentiates itself from related skills by naming them (otel-instrumentation, observability-fundamentals) and clarifying what NOT to use it for, including the edge case of event datasets with 'metrics' in their name.

3 / 3

Total

12

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
honeycombio/agent-skill
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.