tessl/pypi-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Overall
score

69%

Evaluation — 69%

↑ 1.33x

Agent success when using this tile

Overview

Eval results

Files

{
  "context": "This criteria evaluates how well the engineer uses vLLM's chat-based generation API to implement a multi-turn conversational system. The focus is on proper usage of the LLM class, the chat() method, message formatting, and sampling parameter configuration.",
  "type": "weighted_checklist",
  "checklist": [
    {
      "name": "LLM Initialization",
      "description": "Correctly initializes the vLLM LLM class with an appropriate chat/instruction-tuned model (e.g., using model parameter).",
      "max_score": 15
    },
    {
      "name": "Chat Method Usage",
      "description": "Uses the LLM.chat() method (not generate()) to process conversational input, which is the appropriate method for chat-based interactions.",
      "max_score": 25
    },
    {
      "name": "Message Format",
      "description": "Correctly formats messages as a list of dictionaries with 'role' and 'content' keys, supporting system, user, and assistant roles.",
      "max_score": 20
    },
    {
      "name": "SamplingParams Configuration",
      "description": "Creates and uses a SamplingParams object to configure generation parameters (max_tokens, temperature), passing it to the chat() method.",
      "max_score": 20
    },
    {
      "name": "Response Extraction",
      "description": "Correctly extracts the generated text from the RequestOutput object returned by chat(), accessing the outputs attribute and text content.",
      "max_score": 15
    },
    {
      "name": "Multi-Turn Handling",
      "description": "Properly handles multi-turn conversations by passing the entire message history to chat(), allowing the model to maintain context.",
      "max_score": 5
    }
  ]
}

Install with Tessl CLI

npx tessl i tessl/pypi-vllm

tessl/pypi-vllm

rubric.json.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}evals/scenario-8/

rubric.jsonevals/scenario-8/