Name: golikovichev/pytest-conversational
Rating: 99.4 (1 reviews)
Author: golikovichev

golikovichev/pytest-conversational

Test chat bots, voice assistants, and IVR menus with pytest using a small Conversation object and a callable bot adapter. Use when the user wants to write rule-based assertions over multi-turn dialogue without bringing in an LLM dependency, when they have a chatbot reachable as a Python callable or HTTP webhook, when they need to keep per-conversation state across turns and assert on slot filling, when they want pytest-native fixtures and a printable transcript on failure, or when they mention voice-assistant testing, IVR menu testing, conversational AI testing, LLM bot testing (used as the target under test, not as the matcher), expect matchers for bot replies, or multi-turn dialogue tests.

1.56x

Quality

100%

Does it follow best practices?

Impact

97%

1.56x

Average score across 3 eval scenarios

Securityby

Passed

No known issues

{
  "context": "Tests whether the agent uses the correct pytest-conversational expect matchers (contains, not_contains, regex, one_of) for a customer support bot test suite, writes a valid BotAdapter callable, and reads the bot reply via convo.last.bot.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "BotAdapter signature",
      "description": "The bot stub is a Python callable with signature (text: str, convo: Conversation) -> str, accepting exactly those two parameters",
      "max_score": 10
    },
    {
      "name": "conversation_factory fixture",
      "description": "Tests use the conversation_factory fixture and pass bot= to create a Conversation, rather than constructing Conversation directly",
      "max_score": 8
    },
    {
      "name": "convo.last.bot usage",
      "description": "At least one assertion reads the bot reply via convo.last.bot (not turn.bot from a stored reference or other attribute)",
      "max_score": 8
    },
    {
      "name": "expect.contains used",
      "description": "expect.contains() is called at least once to assert that a bot reply contains an expected substring",
      "max_score": 12
    },
    {
      "name": "contains case-insensitive default",
      "description": "expect.contains() is called without case_sensitive=True in at least one test (relying on the default case-insensitive behaviour)",
      "max_score": 8
    },
    {
      "name": "expect.not_contains used",
      "description": "expect.not_contains() is called at least once as a leak guard on an error or unexpected reply",
      "max_score": 12
    },
    {
      "name": "not_contains guards internal details",
      "description": "The not_contains call checks that the reply does NOT contain at least one of: 'traceback', 'exception', 'stack', 'error:' (case-insensitive match acceptable), guarding against internal leak",
      "max_score": 8
    },
    {
      "name": "expect.regex used",
      "description": "expect.regex() is called at least once to assert a formatted pattern (e.g. phone number, date, or order number) in a bot reply",
      "max_score": 12
    },
    {
      "name": "regex match object used",
      "description": "The return value of expect.regex() is captured and at least one captured group or match attribute is inspected (e.g. m.group(0) or m.group(1))",
      "max_score": 8
    },
    {
      "name": "expect.one_of used",
      "description": "expect.one_of() is called at least once with a list of acceptable reply variants to handle a bot that returns one of several valid phrases",
      "max_score": 12
    },
    {
      "name": "results.txt present",
      "description": "A results.txt file exists containing pytest -v output showing test results",
      "max_score": 2
    }
  ]
}

golikovichev/pytest-conversational

criteria.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-2/

criteria.jsonevals/scenario-2/