metrics-queries

How to query OpenTelemetry metrics datasets in Honeycomb correctly. Metrics datasets follow different rules from trace/event datasets — many operations (bare COUNT, RATE_SUM, RATE_AVG, RATE_MAX, CONCURRENCY) are forbidden, temporal aggregation is automatic, and each metric has its own attributes. Use this skill when querying a metrics dataset (gauges, counters, histograms, sums), asking about temporal aggregation (RATE, INCREASE, SUMMARIZE, LAST), finding the metrics dataset or discovering metric names and attributes, debugging unexpected metrics query results, or querying infrastructure metrics like CPU, memory, disk I/O, or network stats. Do NOT use for instrumenting metrics (use otel-instrumentation), querying event datasets with "metrics" in their name, or conceptual questions (use observability-fundamentals).

1.68x

Quality

100%

Does it follow best practices?

Impact

96%

1.68x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

Quality

Content

100%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

A well-structured, actionable reference for querying Honeycomb metrics datasets: executable JSON examples, explicit discovery workflows with validation checkpoints, and appropriately offloaded detail to three real reference files. Concise without sacrificing the non-obvious domain rules that justify the skill.

Dimension	Reasoning	Score
Conciseness	Content is information-dense and domain-specific (forbidden operations, MetricInfo type mapping, default temporal aggregation by metric type) rather than restating concepts Claude already knows; the tables earn their tokens, and the prose assumes competence.	3 / 3
Actionability	Provides fully executable JSON query structures with concrete column names (k8s.pod.memory.usage, http.server.duration.p99) and specific tool calls (get_environment, get_dataset_columns, find_columns) — copy-paste ready, not pseudocode.	3 / 3
Workflow Clarity	Discovery workflow is numbered with an explicit 'Validate before querying' checkpoint, and dataset identification includes 'Do not guess the dataset. Always verify via get_environment or get_dataset_columns'; operations are read-only queries so the destructive/batch validation cap does not apply.	3 / 3
Progressive Disclosure	Body is an overview that signals one-level-deep references inline (temporal-aggregation.md, metrics-query-examples.md) and lists all three reference files with descriptions in 'Additional Resources'; the referenced files (metric-types.md, metrics-query-examples.md, temporal-aggregation.md) all exist.	3 / 3
	Total	12 / 12 Passed

Description

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

A strong, specific description that clearly states the skill's purpose, gives explicit positive and negative use-when triggers, and distinguishes itself from sibling skills. It lists concrete actions and natural trigger terms without padding or over-claiming.

Dimension	Reasoning	Score
Specificity	Lists multiple concrete actions — querying gauges/counters/histograms/sums, discovering metric names and attributes, debugging unexpected results, and querying infrastructure metrics (CPU, memory, disk I/O, network stats) — rather than vague language. Voice is third person ('How to query', 'Use this skill'), so no specificity penalty applies.	3 / 3
Completeness	Explicitly answers what ('How to query OpenTelemetry metrics datasets in Honeycomb correctly') and when via an explicit 'Use this skill when querying…' trigger clause covering several scenarios.	3 / 3
Trigger Term Quality	Covers natural terms users would actually say — 'CPU', 'memory', 'disk I/O', 'network stats', 'RATE/INCREASE/SUMMARIZE/LAST', 'debugging unexpected metrics query results' — not just technical jargon.	3 / 3
Distinctiveness Conflict Risk	Clear niche (metrics datasets vs trace/event datasets) reinforced by explicit negative triggers — 'Do NOT use for instrumenting metrics… querying event datasets with metrics in their name… or conceptual questions' — reducing conflict with adjacent skills.	3 / 3
	Total	12 / 12 Passed

Validation

100%

Warnings & errors only

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation — 16 / 16 Passed

Validation for skill structure

No warnings or errors.

Repository: honeycombio/agent-skill
Commit: 189553c

Reviewed: 1 day ago

Table of Contents

Discovery Implementation Validation

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.