CtrlK
BlogDocsLog inGet started
Tessl Logo

golikovichev/phoenix2pytest

Turn labeled LLM failure traces from an Arize Phoenix project into runnable pytest regression tests using the phoenix2pytest pipeline. Use when the user has an LLM application emitting OpenInference spans to Phoenix and wants a regression suite from real production failures, when extracting test cases from observed LLM bugs (hallucination, format break, off-topic drift, stale data, wrong reasoning, refusal bug), when bridging Phoenix-labeled traces into pytest-based suites for CI, when the user mentions Arize Phoenix MCP, OpenInference instrumentation, LLM observability, Gemini test synthesis, Vertex AI agent evaluation, or wants to react to LLM failures rather than predict them upfront.

88

1.63x
Quality

94%

Does it follow best practices?

Impact

98%

1.63x

Average score across 2 eval scenarios

SecuritybySnyk

Advisory

Suggest reviewing before use

Overview
Quality
Evals
Security
Files

Security

2 findings — 2 medium severity. This skill can be installed but you should review these findings before use.

Medium

W011: Third-party content exposure detected (indirect prompt injection risk)

What this means

The skill exposes the agent to untrusted, user-generated content from public third-party sources, creating a risk of indirect prompt injection. This includes browsing arbitrary URLs, reading social media posts or forum comments, and analyzing content from unknown websites.

Why it was flagged

Third-party content exposure detected (high risk: 0.95). The required runtime workflow for the web UI `/generate` endpoint ingests OUTSIDER-authored free text from the HTTP form/JSON fields `trace_json` and `details_json` (user-controlled `user_prompt`/`evidence` etc.), which are then embedded into the synthesiser prompt and sent to the agent’s LLM context via `synthesise(...)->build_user_message(...)` and `client.generate_text(...)`.

Report incorrect finding
Medium

W012: Unverifiable external dependency detected (runtime URL that controls agent)

What this means

The skill fetches instructions or code from an external URL at runtime, and the fetched content directly controls the agent’s prompts or executes code. This dynamic dependency allows the external source to modify the agent’s behavior without any changes to the skill itself.

Why it was flagged

Potentially malicious external URL detected (high risk: 0.90). The repo spawns and runs an external MCP server via npx at runtime (see src/phoenix2pytest/mcp_client.py: it runs "npx -y @arizeai/phoenix-mcp@latest", which fetches and executes remote npm package code from the npm registry e.g. https://registry.npmjs.org/@arizeai/phoenix-mcp) — this is executed at runtime and is required for fetching Phoenix traces.

Audited
Security analysis
Snyk