monitoring-expert

Configures monitoring systems, implements structured logging pipelines, creates Prometheus/Grafana dashboards, defines alerting rules, and instruments distributed tracing. Implements Prometheus/Grafana stacks, conducts load testing, performs application profiling, and plans infrastructure capacity. Use when setting up application monitoring, adding observability to services, debugging production issues with logs/metrics/traces, running load tests with k6 or Artillery, profiling CPU/memory bottlenecks, or forecasting capacity needs.

1.17x

Quality

88%

Does it follow best practices?

Impact

95%

1.17x

Average score across 6 eval scenarios

Securityby

Passed

No known issues

Evaluation results

96%

34%

Add Observability to Order Management API

Structured logging and Prometheus metrics instrumentation

Criteria

Without context

With context

Uses pino for logging

100%

JSON structured fields

60%

100%

Request ID correlation

100%

Sensitive field redaction

100%

No sensitive data logged

100%

Correct metric types

80%

100%

Metric naming convention

50%

Business metrics present

100%

Health check endpoint

100%

Metrics endpoint

100%

Consistent event naming

100%

Set Up Alerting and Dashboards for a Payment Processing Service

Prometheus alerting rules and Grafana dashboard design

Criteria

Without context

With context

Severity classification

100%

Threshold-based alerting

90%

100%

Alert 'for' duration

100%

Runbook URL annotations

100%

Summary and description annotations

100%

Alertmanager severity routing

100%

RED method in dashboard

100%

USE method in dashboard

100%

predict_linear for capacity

100%

Capacity forecast horizon

100%

Business metrics in dashboard

100%

No noisy alert patterns

100%

10%

Load Test an E-Commerce Checkout API Before Black Friday

k6 load testing with staged profiles and performance thresholds

Criteria

Without context

With context

Uses k6

100%

Staged load profile

100%

p95 latency threshold

100%

p99 latency threshold

100%

Error rate threshold

100%

Multi-step user journey

100%

Think time between steps

100%

Custom business metrics

100%

Response validation

100%

Test plan explains thresholds

100%

96%

Trace the Checkout Pipeline

OpenTelemetry distributed tracing instrumentation

Criteria

Without context

With context

NodeSDK import

100%

OTLPTraceExporter used

100%

Jaeger OTLP URL

50%

Service name resource attribute

87%

100%

Auto-instrumentations enabled

100%

Manual span creation

100%

Span attributes set

100%

Exception recording

100%

SpanStatusCode set

100%

Context extraction from incoming requests

100%

Context injection into outgoing requests

100%

span.end() always called

100%

98%

Diagnose a Sluggish Node.js API

Application profiling with CPU and memory diagnostics

Criteria

Without context

With context

clinic.js CPU profiling

100%

clinic bubbleprof mentioned

50%

100%

performance.mark used

100%

performance.measure used

100%

PerformanceObserver used

100%

process.memoryUsage called

100%

v8.writeHeapSnapshot used

100%

Profiling workflow documented

100%

Symptom-to-tool mapping

100%

No GUI-only tools prescribed

100%

80%

83%

38%

Plan Infrastructure Capacity for a Growing Platform

Capacity planning with resource forecasting and performance budgets

Criteria

Without context

With context

CPU buffer 30%

100%

Memory buffer 20%

100%

Connection buffer 25%

12%

100%

Storage buffer 40%

100%

predict_linear for memory

100%

predict_linear for disk

100%

Performance budget apiP95

100%

Performance budget apiP99

16%

100%

Performance budget error rate

100%

Scale-up trigger at 80% CPU

25%

Planning trigger at 70% CPU

50%

Scale-down trigger defined

Instance count calculation

75%

100%

Repository: jeffallan/claude-skills
Commit: 3d95bb1

Evaluated: about 2 months ago
Agent: Claude Code
Model: Claude Sonnet 4.6

Table of Contents

Add Observability to Order Management API Set Up Alerting and Dashboards for a Payment Processing Service Load Test an E-Commerce Checkout API Before Black Friday Trace the Checkout Pipeline Diagnose a Sluggish Node.js API Plan Infrastructure Capacity for a Growing Platform

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.