Turn labeled LLM failure traces from an Arize Phoenix project into runnable pytest regression tests using the phoenix2pytest pipeline. Use when the user has an LLM application emitting OpenInference spans to Phoenix and wants a regression suite from real production failures, when extracting test cases from observed LLM bugs (hallucination, format break, off-topic drift, stale data, wrong reasoning, refusal bug), when bridging Phoenix-labeled traces into pytest-based suites for CI, when the user mentions Arize Phoenix MCP, OpenInference instrumentation, LLM observability, Gemini test synthesis, Vertex AI agent evaluation, or wants to react to LLM failures rather than predict them upfront.
88
94%
Does it follow best practices?
Impact
98%
1.63xAverage score across 2 eval scenarios
Advisory
Suggest reviewing before use
pytest template compliance and naming conventions
Required imports present
37%
100%
google-genai import
0%
100%
VERTEXAI env var set
0%
100%
_ask_gemini helper defined
0%
100%
Test naming convention
20%
100%
Hallucination assertion strategy
100%
100%
Format_break assertion strategy
100%
100%
No LLM-as-judge
100%
100%
Output file paths
100%
100%
Markdown fence stripping present
100%
100%
synthesis_notes.md produced
100%
100%
Multi-trace parametrized pytest synthesis for shared failure mode
Output file path
0%
100%
Parametrize decorator
100%
100%
All three prompts covered
100%
100%
Test function name pattern
25%
100%
Required imports
28%
57%
VERTEXAI env var
0%
100%
_ask_gemini helper
0%
100%
Fabricated strings excluded
100%
100%
Concrete assertions only
100%
100%
Grouping notes file
100%
100%
Grouping notes content
100%
100%
synthesise_many reference
0%
100%