Name: golikovichev/pytest-conversational
Rating: 99.4 (1 reviews)
Author: golikovichev

golikovichev/pytest-conversational

Test chat bots, voice assistants, and IVR menus with pytest using a small Conversation object and a callable bot adapter. Use when the user wants to write rule-based assertions over multi-turn dialogue without bringing in an LLM dependency, when they have a chatbot reachable as a Python callable or HTTP webhook, when they need to keep per-conversation state across turns and assert on slot filling, when they want pytest-native fixtures and a printable transcript on failure, or when they mention voice-assistant testing, IVR menu testing, conversational AI testing, LLM bot testing (used as the target under test, not as the matcher), expect matchers for bot replies, or multi-turn dialogue tests.

1.56x

Quality

100%

Does it follow best practices?

Impact

97%

1.56x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

{
  "context": "Tests whether the agent correctly uses pytest-conversational to test a multi-turn slot-filling pizza bot. The agent must write a bot adapter as a Python callable with the correct signature, use conversation_factory to wire it in, drive turns with convo.say(), record per-turn metadata and cross-turn state, and apply the library's metadata and latency matchers.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "Bot adapter signature",
      "description": "Bot adapter is a Python callable with exactly two parameters: a string (user text) and a Conversation object, returning a string reply",
      "max_score": 10
    },
    {
      "name": "conversation_factory usage",
      "description": "Uses the conversation_factory pytest fixture to create a Conversation, passing the bot adapter as the bot= keyword argument",
      "max_score": 10
    },
    {
      "name": "convo.say() for turns",
      "description": "Drives conversation turns by calling convo.say(text) rather than constructing Turn objects or calling the adapter directly",
      "max_score": 10
    },
    {
      "name": "Turn.metadata intent recording",
      "description": "Bot adapter writes the intent label into turn.metadata[\"intent\"] on each turn",
      "max_score": 8
    },
    {
      "name": "Turn.metadata slot recording",
      "description": "Bot adapter writes extracted slot data into turn.metadata[\"slots\"] (or turn.metadata with a slot key) on each turn",
      "max_score": 8
    },
    {
      "name": "Turn.metadata latency recording",
      "description": "Bot adapter records elapsed response time in turn.metadata[\"latency_ms\"] in milliseconds",
      "max_score": 8
    },
    {
      "name": "convo.state for cross-turn slots",
      "description": "Bot adapter reads and writes accumulated slot data to convo.state (not to a local variable or global) so state persists across turns",
      "max_score": 8
    },
    {
      "name": "has_intent matcher",
      "description": "Test(s) assert on the intent label of a turn using has_intent(turn, intent_name) from pytest-conversational",
      "max_score": 8
    },
    {
      "name": "has_slot matcher",
      "description": "Test(s) assert on a per-turn extracted slot using has_slot(turn, slot_name) or has_slot(turn, slot_name, value=...) from pytest-conversational",
      "max_score": 8
    },
    {
      "name": "has_state matcher",
      "description": "Test(s) assert on the conversation-level accumulated state using has_state(convo, state_name) or has_state(convo, state_name, value=...) from pytest-conversational",
      "max_score": 8
    },
    {
      "name": "responds_within matcher",
      "description": "At least one test asserts latency using expect.responds_within(turn, seconds) — budget expressed in seconds while latency is recorded in milliseconds",
      "max_score": 8
    },
    {
      "name": "convo.last.bot usage",
      "description": "At least one test reads the most recent bot reply via convo.last.bot",
      "max_score": 6
    }
  ]
}

golikovichev/pytest-conversational

criteria.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-1/

criteria.jsonevals/scenario-1/