Turn labeled LLM failure traces from an Arize Phoenix project into runnable pytest regression tests using the phoenix2pytest pipeline. Use when the user has an LLM application emitting OpenInference spans to Phoenix and wants a regression suite from real production failures, when extracting test cases from observed LLM bugs (hallucination, format break, off-topic drift, stale data, wrong reasoning, refusal bug), when bridging Phoenix-labeled traces into pytest-based suites for CI, when the user mentions Arize Phoenix MCP, OpenInference instrumentation, LLM observability, Gemini test synthesis, Vertex AI agent evaluation, or wants to react to LLM failures rather than predict them upfront.
88
94%
Does it follow best practices?
Impact
98%
1.63xAverage score across 2 eval scenarios
Advisory
Suggest reviewing before use
Thanks for your interest in phoenix2pytest. This is a small alpha project shipped during the Google Cloud Rapid Agent Hackathon, so the contribution flow is light.
Open an issue with:
Open an issue first so we can talk through the use case before you write code. The project scope is intentionally narrow (Phoenix LLM trace ingest, regression test extraction, optional Gemini-assisted assertions), so feature requests that pull it elsewhere will get a polite redirect.
main.tests/. The CI runs pytest -v on Python 3.11, 3.12, and 3.13.pip install -e ".[dev]"
pytest -vextract_trace_spans).pre-commit run --all-files before opening a PR.If you find something that could leak API keys or production trace data, please email me directly instead of opening a public issue. Address is on my GitHub profile. See also SECURITY.md.
.tessl-plugin
docs
evals
scenario-1
scenario-2
scripts
src
phoenix2pytest
tests