CtrlK
BlogDocsLog inGet started
Tessl Logo

phoenix-cli

Debug LLM applications using the Phoenix CLI. Fetch traces, analyze errors, structure trace review with open coding and axial coding, inspect datasets, review experiments, query annotation configs, and use the GraphQL API. Use whenever the user is analyzing traces or spans, investigating LLM/agent failures, deciding what to do after instrumenting an app, building failure taxonomies, choosing what evals to write, or asking "what's going wrong", "what kinds of mistakes", or "where do I focus" — even without naming a technique.

68

Quality

82%

Does it follow best practices?

Impact

No eval scenarios have been run

SecuritybySnyk

Risky

Do not use without reviewing

SKILL.md
Quality
Evals
Security

Quality

Discovery

100%

Based on the skill's description, can an agent find and select it at the right time? Clear, specific descriptions lead to better discovery.

This is an excellent skill description that clearly articulates specific capabilities, provides rich trigger terms including natural user phrases, explicitly addresses both what and when, and occupies a distinct niche. The description is comprehensive without being padded, and uses proper third-person voice throughout.

DimensionReasoningScore

Specificity

Lists multiple specific concrete actions: fetch traces, analyze errors, structure trace review with open coding and axial coding, inspect datasets, review experiments, query annotation configs, and use the GraphQL API. Very detailed and actionable.

3 / 3

Completeness

Clearly answers both 'what' (debug LLM applications using Phoenix CLI with specific actions listed) and 'when' (explicit 'Use whenever...' clause with multiple trigger scenarios and even quoted user phrases). Comprehensive on both fronts.

3 / 3

Trigger Term Quality

Excellent coverage of natural terms users would say: 'traces', 'spans', 'LLM failures', 'agent failures', 'what's going wrong', 'what kinds of mistakes', 'where do I focus', 'evals', 'failure taxonomies'. Includes both technical terms and natural language phrases users would actually type.

3 / 3

Distinctiveness Conflict Risk

Highly distinctive — targets a specific tool (Phoenix CLI), a specific domain (LLM application debugging/observability), and specific methodologies (open coding, axial coding, failure taxonomies). Unlikely to conflict with other skills.

3 / 3

Total

12

/

12

Passed

Implementation

64%

Reviews the quality of instructions and guidance provided to agents. Good implementation is clear, handles edge cases, and produces reliable results.

This is a comprehensive CLI reference with excellent actionability — nearly every command is copy-paste ready with practical jq pipelines. The main weaknesses are repetition across entity types (traces/spans/sessions share nearly identical patterns that could be consolidated), a lack of explicit validation/error-recovery steps for destructive operations, and a body that's grown long enough to benefit from splitting detailed JSON shapes and command catalogs into separate reference files.

Suggestions

Consolidate the repeated trace/span/session annotation and note commands into a single pattern section (e.g., 'px <entity> annotate <id> --name <name> --label <label> [--identifier <id>]') to reduce duplication and improve conciseness.

Add explicit validation checkpoints for destructive operations like `*-annotations delete`, e.g., a numbered workflow: 1) list annotations with identifier, 2) confirm count matches expectations, 3) run delete with -y.

Move the detailed JSON shape blocks (Trace, Span, Session) into a separate reference file (e.g., references/json-shapes.md) and link from the main skill to improve progressive disclosure.

DimensionReasoningScore

Conciseness

The skill is largely efficient with concrete commands and JSON shapes, but there's significant repetition across traces/spans/sessions sections (nearly identical annotation/note patterns repeated three times) and the JSON shape blocks duplicate many fields. The coding annotation identifier explanation is also somewhat verbose.

2 / 3

Actionability

Excellent actionability — nearly every section provides copy-paste-ready CLI commands with jq pipelines, concrete flag combinations, and real filtering examples. The JSON shape documentation gives exact field names and types. Commands cover the full lifecycle from listing to annotating to deleting.

3 / 3

Workflow Clarity

The high-level workflow (open-coding → axial-coding → build evals) is mentioned but lives in referenced files. The main document lacks explicit validation checkpoints or error recovery steps. The coding annotation identifier workflow has important details (revert is opt-in, three DELETEs after confirmation) but the sequencing is described in prose rather than clear numbered steps. The delete operations are destructive but lack verification guidance inline.

2 / 3

Progressive Disclosure

The skill references open-coding.md and axial-coding.md appropriately, and the quick reference table is well-structured. However, the main body is quite long (~300+ lines) with extensive inline command examples and repeated JSON shapes that could be split into separate reference files. Without bundle files provided, the referenced files cannot be verified, but the references themselves are one-level deep and clearly signaled.

2 / 3

Total

9

/

12

Passed

Validation

100%

Checks the skill against the spec for correct structure and formatting. All validation checks must pass before discovery and implementation can be scored.

Validation11 / 11 Passed

Validation for skill structure

No warnings or errors.

Repository
Arize-ai/phoenix
Reviewed

Table of Contents

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.