tessl install tessl/pypi-vllm@0.10.0A high-throughput and memory-efficient inference and serving engine for LLMs
Agent Success
Agent success rate when using this tile
69%
Improvement
Agent success rate improvement when using this tile compared to baseline
1.33x
Baseline
Agent success rate without this tile
52%
Build a conversational AI system that processes multi-turn dialogues with role-based messages and maintains conversation context across multiple interactions.
Implement a function generate_chat_response(conversation, max_tokens=100, temperature=0.7) that:
Multi-Turn Dialogue Processing: Accepts a list of messages with roles (system, user, or assistant) and generates the next response.
Role-Based Messages: Properly handles three types of message roles:
Conversation Context: Uses the entire conversation history to generate contextually appropriate responses.
Model Configuration: Supports configuration parameters:
max_tokens: Maximum tokens to generate (default: 100)temperature: Sampling temperature for randomness control (default: 0.7)Your system should accept conversations in the following format:
conversation = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello! Can you help me?"},
{"role": "assistant", "content": "Of course! I'd be happy to help."},
{"role": "user", "content": "What's the capital of France?"}
]File: tests/test_chat.py { .test-file }
def test_basic_conversation():
"""Test basic multi-turn conversation handling."""
conversation = [
{"role": "system", "content": "You are a helpful math tutor."},
{"role": "user", "content": "What is 2 + 2?"},
{"role": "assistant", "content": "2 + 2 equals 4."},
{"role": "user", "content": "Now multiply that by 3."}
]
response = generate_chat_response(conversation)
# Verify response acknowledges previous context
assert response is not None
assert len(response) > 0
assert isinstance(response, str)File: tests/test_chat.py { .test-file }
def test_system_prompt():
"""Test that system prompt influences response style."""
conversation_formal = [
{"role": "system", "content": "Respond in a very formal, professional manner."},
{"role": "user", "content": "Hi there"}
]
conversation_casual = [
{"role": "system", "content": "Respond in a casual, friendly manner."},
{"role": "user", "content": "Hi there"}
]
response_formal = generate_chat_response(conversation_formal)
response_casual = generate_chat_response(conversation_casual)
# Verify both generate responses
assert response_formal is not None
assert response_casual is not None
assert len(response_formal) > 0
assert len(response_casual) > 0File: tests/test_chat.py { .test-file }
def test_single_message():
"""Test handling of a single user message."""
conversation = [
{"role": "user", "content": "Tell me a fun fact."}
]
response = generate_chat_response(conversation)
assert response is not None
assert len(response) > 0
assert isinstance(response, str)Provides high-performance LLM inference capabilities for chat-based text generation.