Turn labeled LLM failure traces from an Arize Phoenix project into runnable pytest regression tests using the phoenix2pytest pipeline. Use when the user has an LLM application emitting OpenInference spans to Phoenix and wants a regression suite from real production failures, when extracting test cases from observed LLM bugs (hallucination, format break, off-topic drift, stale data, wrong reasoning, refusal bug), when bridging Phoenix-labeled traces into pytest-based suites for CI, when the user mentions Arize Phoenix MCP, OpenInference instrumentation, LLM observability, Gemini test synthesis, Vertex AI agent evaluation, or wants to react to LLM failures rather than predict them upfront.
88
94%
Does it follow best practices?
Impact
98%
1.63xAverage score across 2 eval scenarios
Advisory
Suggest reviewing before use
Security
2 findings — 2 medium severity. This skill can be installed but you should review these findings before use.
The skill exposes the agent to untrusted, user-generated content from public third-party sources, creating a risk of indirect prompt injection. This includes browsing arbitrary URLs, reading social media posts or forum comments, and analyzing content from unknown websites.
Third-party content exposure detected (high risk: 0.95). The required runtime workflow for the web UI `/generate` endpoint ingests OUTSIDER-authored free text from the HTTP form/JSON fields `trace_json` and `details_json` (user-controlled `user_prompt`/`evidence` etc.), which are then embedded into the synthesiser prompt and sent to the agent’s LLM context via `synthesise(...)->build_user_message(...)` and `client.generate_text(...)`.
The skill fetches instructions or code from an external URL at runtime, and the fetched content directly controls the agent’s prompts or executes code. This dynamic dependency allows the external source to modify the agent’s behavior without any changes to the skill itself.
Potentially malicious external URL detected (high risk: 0.90). The repo spawns and runs an external MCP server via npx at runtime (see src/phoenix2pytest/mcp_client.py: it runs "npx -y @arizeai/phoenix-mcp@latest", which fetches and executes remote npm package code from the npm registry e.g. https://registry.npmjs.org/@arizeai/phoenix-mcp) — this is executed at runtime and is required for fetching Phoenix traces.