or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

events.mdindex.mdinstrumentation.mdmetrics.mdutilities.md

metrics.mddocs/

0

# Metrics and Monitoring

1

2

Advanced metrics collection system providing comprehensive observability for AWS Bedrock AI model invocations. Includes token usage tracking, request duration monitoring, error analytics, guardrail interaction metrics, and prompt caching statistics.

3

4

## Capabilities

5

6

### Metrics Parameters Container

7

8

Central container for all metrics instruments and runtime state, managing the complete lifecycle of metrics collection for Bedrock operations.

9

10

```python { .api }

11

class MetricParams:

12

"""

13

Container for metrics configuration and runtime state.

14

15

Manages all metrics instruments and maintains request-specific

16

state throughout the lifecycle of Bedrock API calls.

17

"""

18

19

def __init__(

20

self,

21

token_histogram: Histogram,

22

choice_counter: Counter,

23

duration_histogram: Histogram,

24

exception_counter: Counter,

25

guardrail_activation: Counter,

26

guardrail_latency_histogram: Histogram,

27

guardrail_coverage: Counter,

28

guardrail_sensitive_info: Counter,

29

guardrail_topic: Counter,

30

guardrail_content: Counter,

31

guardrail_words: Counter,

32

prompt_caching: Counter,

33

):

34

"""

35

Initialize metrics container with all required instruments.

36

37

Parameters:

38

- token_histogram: Token usage tracking (input/output tokens)

39

- choice_counter: Number of completion choices generated

40

- duration_histogram: Request/response latency tracking

41

- exception_counter: Error and exception counting

42

- guardrail_activation: Guardrail trigger frequency

43

- guardrail_latency_histogram: Guardrail processing time

44

- guardrail_coverage: Text coverage by guardrails

45

- guardrail_sensitive_info: PII detection events

46

- guardrail_topic: Topic policy violations

47

- guardrail_content: Content policy violations

48

- guardrail_words: Word filter violations

49

- prompt_caching: Prompt caching utilization

50

"""

51

52

# Runtime state attributes

53

vendor: str

54

"""AI model vendor (e.g., 'anthropic', 'cohere', 'ai21')"""

55

56

model: str

57

"""Specific model name (e.g., 'claude-3-sonnet-20240229-v1:0')"""

58

59

is_stream: bool

60

"""Whether the current request uses streaming responses"""

61

62

start_time: float

63

"""Request start timestamp for duration calculation"""

64

```

65

66

### Core Metrics Constants

67

68

Metric name constants for standard AI model observability, following OpenTelemetry semantic conventions.

69

70

```python { .api }

71

class GuardrailMeters:

72

"""

73

Metric name constants for Bedrock Guardrails observability.

74

75

Provides standardized metric names for all guardrail-related

76

measurements following semantic convention patterns.

77

"""

78

79

LLM_BEDROCK_GUARDRAIL_ACTIVATION = "gen_ai.bedrock.guardrail.activation"

80

"""Counter for guardrail activation events"""

81

82

LLM_BEDROCK_GUARDRAIL_LATENCY = "gen_ai.bedrock.guardrail.latency"

83

"""Histogram for guardrail processing latency in milliseconds"""

84

85

LLM_BEDROCK_GUARDRAIL_COVERAGE = "gen_ai.bedrock.guardrail.coverage"

86

"""Counter for text coverage by guardrails in characters"""

87

88

LLM_BEDROCK_GUARDRAIL_SENSITIVE = "gen_ai.bedrock.guardrail.sensitive_info"

89

"""Counter for sensitive information detection events"""

90

91

LLM_BEDROCK_GUARDRAIL_TOPICS = "gen_ai.bedrock.guardrail.topics"

92

"""Counter for topic policy violation events"""

93

94

LLM_BEDROCK_GUARDRAIL_CONTENT = "gen_ai.bedrock.guardrail.content"

95

"""Counter for content policy violation events"""

96

97

LLM_BEDROCK_GUARDRAIL_WORDS = "gen_ai.bedrock.guardrail.words"

98

"""Counter for word filter violation events"""

99

100

101

class PromptCaching:

102

"""

103

Metric name constants for prompt caching observability.

104

105

Provides standardized metric names for prompt caching

106

utilization and performance tracking.

107

"""

108

109

LLM_BEDROCK_PROMPT_CACHING = "gen_ai.prompt.caching"

110

"""Counter for cached token utilization"""

111

112

113

class GuardrailAttributes:

114

"""

115

Span attribute constants for guardrail information.

116

117

Standardized attribute names for recording guardrail-related

118

data in OpenTelemetry spans.

119

"""

120

121

GUARDRAIL = "gen_ai.guardrail"

122

"""Base guardrail attribute namespace"""

123

124

TYPE = "gen_ai.guardrail.type"

125

"""Guardrail processing type (input/output)"""

126

127

PII = "gen_ai.guardrail.pii"

128

"""PII detection attribute"""

129

130

PATTERN = "gen_ai.guardrail.pattern"

131

"""Pattern matching attribute"""

132

133

TOPIC = "gen_ai.guardrail.topic"

134

"""Topic policy attribute"""

135

136

CONTENT = "gen_ai.guardrail.content"

137

"""Content policy attribute"""

138

139

CONFIDENCE = "gen_ai.guardrail.confidence"

140

"""Confidence score attribute"""

141

142

MATCH = "gen_ai.guardrail.match"

143

"""Match result attribute"""

144

145

146

class Type(Enum):

147

"""

148

Guardrail processing type enumeration.

149

150

Defines whether guardrail processing applies to input

151

or output content in AI model interactions.

152

"""

153

154

INPUT = "input"

155

"""Input content processing"""

156

157

OUTPUT = "output"

158

"""Output content processing"""

159

```

160

161

### Metrics Creation and Management

162

163

Functions for creating and managing the complete set of metrics instruments required for Bedrock observability.

164

165

```python { .api }

166

def _create_metrics(meter: Meter) -> tuple:

167

"""

168

Create all metrics instruments for Bedrock observability.

169

170

Initializes the complete set of histograms and counters needed

171

for comprehensive monitoring of Bedrock AI model interactions.

172

173

Parameters:

174

- meter: OpenTelemetry Meter instance for creating instruments

175

176

Returns:

177

Tuple containing all metrics instruments:

178

(token_histogram, choice_counter, duration_histogram, exception_counter,

179

guardrail_activation, guardrail_latency_histogram, guardrail_coverage,

180

guardrail_sensitive_info, guardrail_topic, guardrail_content,

181

guardrail_words, prompt_caching)

182

"""

183

184

185

def is_metrics_enabled() -> bool:

186

"""

187

Check if metrics collection is globally enabled.

188

189

Returns:

190

Boolean indicating if metrics should be collected based on

191

TRACELOOP_METRICS_ENABLED environment variable (default: true)

192

"""

193

```

194

195

### Guardrail Metrics Processing

196

197

Specialized functions for processing and recording guardrail-related metrics with detailed categorization and attribution.

198

199

```python { .api }

200

def is_guardrail_activated(response) -> bool:

201

"""

202

Check if any guardrails were activated in the response.

203

204

Examines response metadata to determine if Bedrock Guardrails

205

processed the request and applied any filtering or monitoring.

206

207

Parameters:

208

- response: Bedrock API response containing guardrail metadata

209

210

Returns:

211

Boolean indicating guardrail activation status

212

"""

213

214

215

def guardrail_converse(span, response, vendor, model, metric_params) -> None:

216

"""

217

Process guardrail metrics for converse API responses.

218

219

Extracts and records guardrail-related metrics from converse API

220

responses, including policy violations and processing latency.

221

222

Parameters:

223

- span: OpenTelemetry span for attribute setting

224

- response: Converse API response with guardrail metadata

225

- vendor: AI model vendor identifier

226

- model: Specific model name

227

- metric_params: MetricParams instance for recording metrics

228

"""

229

230

231

def guardrail_handling(span, response_body, vendor, model, metric_params) -> None:

232

"""

233

Process guardrail metrics for invoke_model API responses.

234

235

Handles guardrail metric extraction and recording for traditional

236

invoke_model API calls with comprehensive policy violation tracking.

237

238

Parameters:

239

- span: OpenTelemetry span for attribute setting

240

- response_body: Parsed response body with guardrail data

241

- vendor: AI model vendor identifier

242

- model: Specific model name

243

- metric_params: MetricParams instance for recording metrics

244

"""

245

246

247

def handle_invoke_metrics(t: Type, guardrail, attrs, metric_params) -> None:

248

"""

249

Handle metrics processing for guardrail invocations.

250

251

Extracts and records guardrail processing latency and coverage

252

metrics from guardrail invocation metadata.

253

254

Parameters:

255

- t: Guardrail processing type (INPUT or OUTPUT)

256

- guardrail: Guardrail response data containing metrics

257

- attrs: Base metric attributes for categorization

258

- metric_params: MetricParams instance for recording metrics

259

"""

260

261

262

def handle_sensitive(t: Type, guardrail, attrs, metric_params) -> None:

263

"""

264

Handle metrics for sensitive information policy violations.

265

266

Records metrics for PII detection and sensitive content

267

filtering by Bedrock Guardrails.

268

269

Parameters:

270

- t: Guardrail processing type (INPUT or OUTPUT)

271

- guardrail: Guardrail response data with PII detection results

272

- attrs: Base metric attributes for categorization

273

- metric_params: MetricParams instance for recording metrics

274

"""

275

276

277

def handle_topic(t: Type, guardrail, attrs, metric_params) -> None:

278

"""

279

Handle metrics for topic policy violations.

280

281

Records metrics for topic policy enforcement including

282

forbidden topics and conversation steering.

283

284

Parameters:

285

- t: Guardrail processing type (INPUT or OUTPUT)

286

- guardrail: Guardrail response data with topic policy results

287

- attrs: Base metric attributes for categorization

288

- metric_params: MetricParams instance for recording metrics

289

"""

290

291

292

def handle_content(t: Type, guardrail, attrs, metric_params) -> None:

293

"""

294

Handle metrics for content policy violations.

295

296

Records metrics for content filtering including harmful content

297

detection and safety policy enforcement.

298

299

Parameters:

300

- t: Guardrail processing type (INPUT or OUTPUT)

301

- guardrail: Guardrail response data with content policy results

302

- attrs: Base metric attributes for categorization

303

- metric_params: MetricParams instance for recording metrics

304

"""

305

306

307

def handle_words(t: Type, guardrail, attrs, metric_params) -> None:

308

"""

309

Handle metrics for word filter violations.

310

311

Records metrics for word-level filtering including blocked

312

words and phrases detected by guardrails.

313

314

Parameters:

315

- t: Guardrail processing type (INPUT or OUTPUT)

316

- guardrail: Guardrail response data with word filter results

317

- attrs: Base metric attributes for categorization

318

- metric_params: MetricParams instance for recording metrics

319

"""

320

```

321

322

### Prompt Caching Metrics

323

324

Functions for tracking prompt caching utilization and performance metrics.

325

326

```python { .api }

327

def prompt_caching_handling(headers, vendor, model, metric_params) -> None:

328

"""

329

Process prompt caching metrics from response headers.

330

331

Extracts caching information from HTTP response headers and

332

records metrics for cache hits, misses, and token savings.

333

334

Parameters:

335

- headers: HTTP response headers containing caching metadata

336

- vendor: AI model vendor identifier

337

- model: Specific model name

338

- metric_params: MetricParams instance for recording metrics

339

"""

340

341

342

class CachingHeaders:

343

"""

344

HTTP header constants for prompt caching detection.

345

346

Defines the standard headers used by Bedrock to communicate

347

prompt caching status and token counts.

348

"""

349

350

READ = "x-amzn-bedrock-cache-read-input-token-count"

351

"""Header indicating cached input tokens read"""

352

353

WRITE = "x-amzn-bedrock-cache-write-input-token-count"

354

"""Header indicating input tokens written to cache"""

355

356

357

class CacheSpanAttrs:

358

"""

359

Span attribute constants for prompt caching information.

360

361

Standardized attribute names for recording caching data

362

in OpenTelemetry spans.

363

"""

364

365

TYPE = "gen_ai.cache.type"

366

"""Cache operation type (read/write/miss)"""

367

368

CACHED = "gen_ai.prompt_caching"

369

"""Prompt caching utilization flag"""

370

```

371

372

## Usage Examples

373

374

### Basic Metrics Collection

375

376

```python

377

from opentelemetry import metrics

378

from opentelemetry.instrumentation.bedrock import BedrockInstrumentor, is_metrics_enabled

379

380

# Check if metrics are enabled

381

if is_metrics_enabled():

382

print("Metrics collection is enabled")

383

384

# Enable instrumentation with metrics

385

BedrockInstrumentor().instrument()

386

else:

387

print("Metrics collection is disabled")

388

```

389

390

### Custom Metrics Provider

391

392

```python

393

from opentelemetry.instrumentation.bedrock import BedrockInstrumentor

394

from opentelemetry.sdk.metrics import MeterProvider

395

from opentelemetry.sdk.metrics.export import ConsoleMetricExporter, PeriodicExportingMetricReader

396

397

# Configure custom metrics provider

398

metric_reader = PeriodicExportingMetricReader(

399

exporter=ConsoleMetricExporter(),

400

export_interval_millis=30000

401

)

402

meter_provider = MeterProvider(metric_readers=[metric_reader])

403

404

# Instrument with custom provider

405

BedrockInstrumentor().instrument(meter_provider=meter_provider)

406

```

407

408

### Metrics Analysis

409

410

Common metrics collected include:

411

412

```python

413

# Token usage metrics

414

token_histogram.record(

415

value=150, # token count

416

attributes={

417

"gen_ai.system": "bedrock",

418

"gen_ai.request.model": "anthropic.claude-3-sonnet-20240229-v1:0",

419

"gen_ai.token.type": "input"

420

}

421

)

422

423

# Guardrail activation metrics

424

guardrail_activation.add(

425

1,

426

attributes={

427

"gen_ai.system": "bedrock",

428

"guardrail.type": "input",

429

"guardrail.policy": "sensitive_info"

430

}

431

)

432

433

# Request duration metrics

434

duration_histogram.record(

435

value=1.25, # seconds

436

attributes={

437

"gen_ai.system": "bedrock",

438

"gen_ai.operation.name": "completion",

439

"gen_ai.request.model": "anthropic.claude-3-sonnet-20240229-v1:0"

440

}

441

)

442

```

443

444

### Monitoring Dashboard Queries

445

446

Example queries for common monitoring scenarios:

447

448

#### Token Usage Monitoring

449

450

```promql

451

# Average tokens per request by model

452

rate(gen_ai_token_usage_sum[5m]) / rate(gen_ai_token_usage_count[5m])

453

454

# Token usage by type (input vs output)

455

sum by (gen_ai_token_type) (rate(gen_ai_token_usage_sum[5m]))

456

```

457

458

#### Error Rate Monitoring

459

460

```promql

461

# Error rate by model

462

rate(llm_bedrock_completions_exceptions_total[5m]) /

463

rate(gen_ai_operation_duration_count[5m])

464

465

# Error breakdown by type

466

sum by (error_type) (rate(llm_bedrock_completions_exceptions_total[5m]))

467

```

468

469

#### Guardrail Analytics

470

471

```promql

472

# Guardrail activation rate

473

rate(gen_ai_bedrock_guardrail_activation_total[5m])

474

475

# Guardrail policy violation breakdown

476

sum by (guardrail_policy) (rate(gen_ai_bedrock_guardrail_activation_total[5m]))

477

478

# Guardrail processing latency

479

histogram_quantile(0.95, rate(gen_ai_bedrock_guardrail_latency_bucket[5m]))

480

```

481

482

#### Prompt Caching Effectiveness

483

484

```promql

485

# Cache hit rate

486

rate(gen_ai_prompt_caching_total{cache_type="read"}[5m]) /

487

(rate(gen_ai_prompt_caching_total{cache_type="read"}[5m]) +

488

rate(gen_ai_prompt_caching_total{cache_type="write"}[5m]))

489

490

# Token savings from caching

491

sum(rate(gen_ai_prompt_caching_total{cache_type="read"}[5m]))

492

```

493

494

## Metrics Schema

495

496

### Standard Attributes

497

498

All metrics include standard attributes for filtering and aggregation:

499

500

- **gen_ai.system**: "bedrock"

501

- **gen_ai.request.model**: Full model identifier

502

- **gen_ai.operation.name**: "completion" or "chat"

503

- **error.type**: Exception class name (for error metrics)

504

- **gen_ai.token.type**: "input" or "output" (for token metrics)

505

506

### Guardrail-Specific Attributes

507

508

Guardrail metrics include additional categorization:

509

510

- **guardrail.type**: "input" or "output"

511

- **guardrail.policy**: Policy type (sensitive_info, topic, content, words)

512

- **guardrail.confidence**: Confidence score for detections

513

- **guardrail.action**: Action taken (block, warn, pass)

514

515

### Model-Specific Attributes

516

517

Model identification attributes for multi-model deployments:

518

519

- **gen_ai.model.vendor**: Vendor name (anthropic, cohere, ai21, etc.)

520

- **gen_ai.model.name**: Simplified model name

521

- **gen_ai.model.version**: Model version identifier

522

- **gen_ai.model.family**: Model family grouping

523

524

## Performance Considerations

525

526

### Metrics Overhead

527

528

Metrics collection adds minimal overhead:

529

- **Counter operations**: ~10-50 nanoseconds per increment

530

- **Histogram recordings**: ~100-500 nanoseconds per measurement

531

- **Attribute processing**: ~50-200 nanoseconds per attribute set

532

533

### Cardinality Management

534

535

Control metrics cardinality to prevent memory issues:

536

- Model identifiers are normalized to reduce unique combinations

537

- Request parameters are not included as attributes

538

- User-specific data is excluded from metrics labels

539

540

### Batching and Export

541

542

Configure appropriate export intervals:

543

- **Development**: 5-10 second intervals for immediate feedback

544

- **Production**: 30-60 second intervals to balance freshness and overhead

545

- **High-volume**: Use sampling or aggregation for cost optimization