Turn labeled LLM failure traces from an Arize Phoenix project into runnable pytest regression tests using the phoenix2pytest pipeline. Use when the user has an LLM application emitting OpenInference spans to Phoenix and wants a regression suite from real production failures, when extracting test cases from observed LLM bugs (hallucination, format break, off-topic drift, stale data, wrong reasoning, refusal bug), when bridging Phoenix-labeled traces into pytest-based suites for CI, when the user mentions Arize Phoenix MCP, OpenInference instrumentation, LLM observability, Gemini test synthesis, Vertex AI agent evaluation, or wants to react to LLM failures rather than predict them upfront.
88
94%
Does it follow best practices?
Impact
98%
1.63xAverage score across 2 eval scenarios
Advisory
Suggest reviewing before use
All notable changes to this project are documented here. Format loosely follows Keep a Changelog, but this is an early alpha so versions track the hackathon delivery cycle rather than strict semver.
pyproject metadata so installs expose phoenix-extract and related commands.SECURITY.md with supported versions and the reporting flow.Initial alpha release for the Google Cloud Rapid Agent Hackathon (Arize track).
--use-gemini flag (off by default; works without any LLM call).See README.md for the hackathon framing and stack notes.
.tessl-plugin
docs
evals
scenario-1
scenario-2
scripts
src
phoenix2pytest
tests