CtrlK
CommunityDocumentationLog inGet started
Tessl Logo

tessl/pypi-vllm

tessl install tessl/pypi-vllm@0.10.0

A high-throughput and memory-efficient inference and serving engine for LLMs

Agent Success

Agent success rate when using this tile

69%

Improvement

Agent success rate improvement when using this tile compared to baseline

1.33x

Baseline

Agent success rate without this tile

52%

task.mdevals/scenario-8/

Multi-Turn Conversation System

Overview

Build a conversational AI system that processes multi-turn dialogues with role-based messages and maintains conversation context across multiple interactions.

Requirements

Implement a function generate_chat_response(conversation, max_tokens=100, temperature=0.7) that:

  1. Multi-Turn Dialogue Processing: Accepts a list of messages with roles (system, user, or assistant) and generates the next response.

  2. Role-Based Messages: Properly handles three types of message roles:

    • System messages that provide instructions or context
    • User messages representing user inputs
    • Assistant messages representing previous AI responses
  3. Conversation Context: Uses the entire conversation history to generate contextually appropriate responses.

  4. Model Configuration: Supports configuration parameters:

    • max_tokens: Maximum tokens to generate (default: 100)
    • temperature: Sampling temperature for randomness control (default: 0.7)

Input Format

Your system should accept conversations in the following format:

conversation = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello! Can you help me?"},
    {"role": "assistant", "content": "Of course! I'd be happy to help."},
    {"role": "user", "content": "What's the capital of France?"}
]

Expected Behavior

  • Process the entire conversation history to generate the next response
  • Apply the system message as context for all responses
  • Generate responses that are contextually aware of previous turns
  • Support configurable parameters like temperature and max tokens

Implementation Notes

  • Use a model suitable for chat/instruction following (e.g., "meta-llama/Meta-Llama-3-8B-Instruct" or similar)
  • Handle conversations with varying numbers of turns
  • Return generated text as the response

Test Cases

Test Case 1: Basic Multi-Turn Conversation { .test }

File: tests/test_chat.py { .test-file }

def test_basic_conversation():
    """Test basic multi-turn conversation handling."""
    conversation = [
        {"role": "system", "content": "You are a helpful math tutor."},
        {"role": "user", "content": "What is 2 + 2?"},
        {"role": "assistant", "content": "2 + 2 equals 4."},
        {"role": "user", "content": "Now multiply that by 3."}
    ]

    response = generate_chat_response(conversation)

    # Verify response acknowledges previous context
    assert response is not None
    assert len(response) > 0
    assert isinstance(response, str)

Test Case 2: System Prompt Influence { .test }

File: tests/test_chat.py { .test-file }

def test_system_prompt():
    """Test that system prompt influences response style."""
    conversation_formal = [
        {"role": "system", "content": "Respond in a very formal, professional manner."},
        {"role": "user", "content": "Hi there"}
    ]

    conversation_casual = [
        {"role": "system", "content": "Respond in a casual, friendly manner."},
        {"role": "user", "content": "Hi there"}
    ]

    response_formal = generate_chat_response(conversation_formal)
    response_casual = generate_chat_response(conversation_casual)

    # Verify both generate responses
    assert response_formal is not None
    assert response_casual is not None
    assert len(response_formal) > 0
    assert len(response_casual) > 0

Test Case 3: Single User Message { .test }

File: tests/test_chat.py { .test-file }

def test_single_message():
    """Test handling of a single user message."""
    conversation = [
        {"role": "user", "content": "Tell me a fun fact."}
    ]

    response = generate_chat_response(conversation)

    assert response is not None
    assert len(response) > 0
    assert isinstance(response, str)

Dependencies { .dependencies }

vllm { .dependency }

Provides high-performance LLM inference capabilities for chat-based text generation.

Version

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/vllm@0.10.x
tile.json