CtrlK
BlogDocsLog inGet started
Tessl Logo

production-investigation

Structured workflows for investigating production issues in Honeycomb — the sequence of tool calls (context priming, broad query, BubbleUp, trace analysis, verification) and how to chain results between steps to reach root causes. Trigger phrases: "investigate production issue", "debug latency spike", "find root cause", "use BubbleUp", "analyze traces", "debug an outage", "why is my API slow", "errors are increasing", "health check", "SLO burning", or any request to investigate or debug production problems.

98

1.55x
Quality

100%

Does it follow best practices?

Impact

93%

1.55x

Average score across 3 eval scenarios

SecuritybySnyk

Passed

No known issues

SKILL.md
Quality
Evals
Security

Evaluation results

100%

44%

Checkout Latency Spike Investigation

Full latency spike investigation workflow

Criteria
Without context
With context

Orient: get_workspace_context

0%

100%

Orient: get_slos

0%

100%

Orient: get_triggers

40%

100%

Orient: find_queries

20%

100%

Characterize with HEATMAP

75%

100%

get_service_map called

100%

100%

BubbleUp on anomaly

87%

100%

BubbleUp 2D selection

12%

100%

BubbleUp result interpretation

50%

100%

get_trace with BubbleUp filters

25%

100%

Trace waterfall analysis

83%

100%

Verification: with-filter query

100%

100%

Verification: without-filter control

100%

100%

create_board content

77%

100%

95%

27%

Payment Service Error Surge After Deployment

Error surge + deployment regression investigation

Criteria
Without context
With context

Error query by exception.message

50%

100%

Group by deployment.version

100%

44%

BubbleUp with group selection

0%

100%

BubbleUp not skipped

100%

100%

get_trace with show_events

0%

100%

Error depth interpretation

100%

100%

Verification: with deployment filter

100%

100%

Verification: control query

100%

100%

find_queries in orient step

0%

100%

Secondary BubbleUp signals

100%

100%

Rollback criteria defined

100%

100%

Span events for stack traces

83%

100%

85%

29%

Database Dependency Slowdown and SLO Burn Investigation

SLO burn and dependency failure investigation

Criteria
Without context
With context

get_slos with SLO ID

33%

77%

SLO burn timing identification

100%

100%

get_service_map for architecture

44%

66%

Dependency health query

50%

50%

BubbleUp called

71%

100%

BubbleUp pagination

70%

100%

Large trace view mode

50%

100%

Verification with/without filter

33%

100%

Impact assessment query

25%

62%

SLO-linked failing query

50%

100%

Secondary BubbleUp signals checked

100%

100%

No premature conclusion

71%

57%

Repository
honeycombio/agent-skill
Evaluated
Agent
Claude Code
Model
Claude Sonnet 4.6

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.