Test chat bots, voice assistants, and IVR menus with pytest using a small Conversation object and a callable bot adapter. Use when the user wants to write rule-based assertions over multi-turn dialogue without bringing in an LLM dependency, when they have a chatbot reachable as a Python callable or HTTP webhook, when they need to keep per-conversation state across turns and assert on slot filling, when they want pytest-native fixtures and a printable transcript on failure, or when they mention voice-assistant testing, IVR menu testing, conversational AI testing, LLM bot testing (used as the target under test, not as the matcher), expect matchers for bot replies, or multi-turn dialogue tests.
99
100%
Does it follow best practices?
Impact
97%
1.56xAverage score across 3 eval scenarios
Passed
No known issues
A pytest plugin for testing chat bots, voice assistants, IVR menus. Rule-based assertions, no LLM dependency.
Status: alpha. v1.0.0 target June 2026.
Most chat-bot test setups fall into one of two camps. Either a pile of requests.post calls with hand-rolled assertions, or a heavy framework that pins you to one platform. This plugin sits in the middle: a small Conversation object, a callable bot adapter, and pytest fixtures that wire them together.
You bring the bot. The plugin keeps turn order and per-conversation state, then prints a transcript when an assertion fails.

pip install pytest-conversationalPython 3.10 and above.
def my_bot(text, convo):
if "hello" in text.lower():
return "hi"
return "sorry, did not get that"
def test_greeting(conversation_factory):
convo = conversation_factory(bot=my_bot)
convo.say("hello there")
assert convo.last.bot == "hi"Adapters can read convo.state and convo.turns to keep slots between turns:
def slot_filling_bot(text, convo):
slots = convo.state.setdefault("slots", {})
if "name" not in slots:
slots["name"] = text
return "got it, what city?"
if "city" not in slots:
slots["city"] = text
return f"hello {slots['name']} from {slots['city']}"
return "done"
def test_two_slot_flow(conversation_factory):
convo = conversation_factory(bot=slot_filling_bot)
convo.say("Mikhail")
convo.say("Hove")
assert convo.state["slots"] == {"name": "Mikhail", "city": "Hove"}
assert convo.last.bot == "hello Mikhail from Hove"If your bot lives behind an HTTP endpoint, use the bundled adapter instead of writing one by hand:
pip install pytest-conversational[http]from pytest_conversational import Conversation
from pytest_conversational.adapters import http_webhook
def test_remote_bot():
bot = http_webhook("https://my-bot.example.com/webhook", timeout=3.0)
convo = Conversation(bot=bot)
convo.say("hello")
assert "hi" in convo.last.bot.lower()The default contract: POST {"user": text, "history": [[u, b], ...]}, expect 200 OK with JSON {"reply": "..."}. If your endpoint speaks a different shape, pass request_builder and response_parser callbacks.
The webhook URL is passed through to httpx as-is. If your test feeds a URL it pulled from user input, fixture data, or another untrusted source, the adapter will happily hit it. That includes internal addresses like 127.0.0.1, 169.254.169.254 (cloud metadata service), or 10.x.x.x inside a VPC. Pin the URL to a hard-coded value in the test, or gate it through your own allowlist before passing it in.
expect is a small module of assertion helpers tuned for bot replies. Each matcher raises AssertionError with the actual reply embedded in the message, so pytest output shows what the bot said versus what the test wanted.
from pytest_conversational import expect
def test_replies(conversation_factory):
convo = conversation_factory(bot=my_bot)
convo.say("hi")
expect.contains(convo.last.bot, "hello")
expect.not_contains(convo.last.bot, "error")
expect.regex(convo.last.bot, r"^hello\s+\w+")
expect.one_of(convo.last.bot, ["hello there", "hi there", "hey"])contains(actual, substring, *, case_sensitive=False): substring search. Case-insensitive by default.not_contains(actual, substring, *, case_sensitive=False): the negative of contains. Guards against leaks, for example a bot echoing an internal error, a stack trace, or a value it was never given.regex(actual, pattern, *, flags=0): re.search semantics. Returns the match object so callers can inspect captured groups.one_of(actual, options, *, case_sensitive=False, mode="exact"): matches actual against a list of alternative options. Supports mode="exact" (full-string match, default) and mode="substring" (checks if any option is a substring of actual).Use these when bare assert "hello" in convo.last.bot would give noisy failure messages across many tests. For one-off checks, plain assert is still fine.
| Fixture | Purpose |
|---|---|
conversation | Empty Conversation, no adapter. Good for user-only flows. |
conversation_factory | Builder. Pass a bot callable, get a fresh Conversation. |
Conversation(bot=None, turns=[], state={})Conversation.say(text): drive a turn through the adapter, return the Turn.Conversation.add_user(text): append a user-only turn.Conversation.last, .turns, .history, .transcript().Turn(user, bot, metadata).BotAdapter = Callable[[str, Conversation], str].expect.contains, expect.not_contains, expect.regex, expect.one_of.Contributions welcome. See CONTRIBUTING.md for setup and the PR workflow. A couple of good first issue slots are open in the issue tracker if you want to jump in.
MIT. See LICENSE.
.tessl-plugin
evals
src
pytest_conversational
tests