CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Overall
score

69%

Evaluation69%

1.33x

Agent success when using this tile

Overview
Eval results
Files

Evaluation results

15%

Task: Build a Custom Request Manager with Step Control and Async Streaming

Synchronous and Asynchronous Engines

Criteria
Without context
With context

LLMEngine Initialization

0%

0%

add_request() Usage

0%

0%

step() Execution

0%

0%

has_unfinished_requests() Check

0%

0%

abort_request() Implementation

0%

0%

AsyncLLMEngine Initialization

0%

0%

Async Generator Pattern

100%

100%

AsyncLLMEngine.generate() Streaming

0%

0%

99%

89%

Story Generator

Text Generation and Completion

Criteria
Without context
With context

LLM initialization

0%

100%

generate() method usage

0%

100%

SamplingParams configuration

0%

100%

Output extraction

0%

93%

Error handling

100%

100%

77%

-23%

LLM Memory Configuration Manager

Basic Memory Management

Criteria
Without context
With context

LLM class import

100%

100%

LLM initialization

100%

100%

gpu_memory_utilization parameter

100%

60%

swap_space parameter

100%

60%

Combined configuration

100%

80%

Default configuration

100%

100%

98%

-2%

Model Configuration Service

Model Loading and Initialization

Criteria
Without context
With context

LLM Class Usage

100%

90%

GPU Memory Configuration

100%

100%

Load Format Specification

100%

100%

Model Path Handling

100%

100%

Error Handling

100%

100%

0%

Text Generation with Multiple Candidate Selection

Beam Search and Advanced Sampling

Criteria
Without context
With context

LLM Initialization

0%

0%

Beam Search Method

0%

0%

Beam Width Configuration

0%

0%

Length Penalty

0%

0%

Max Tokens Control

0%

0%

Temperature Parameter

0%

0%

Vocabulary Restriction

0%

0%

Output Processing

0%

0%

100%

78%

Text Generation with Advanced Configuration

Sampling Parameters

Criteria
Without context
With context

LLM Initialization

0%

100%

SamplingParams Import

50%

100%

Default Generation

20%

100%

Temperature Parameter

20%

100%

Top-p Parameter

20%

100%

Multiple Completions

13%

100%

Seed Parameter

30%

100%

Max Tokens

60%

100%

Generation Method

0%

100%

3%

-97%

Multi-Adapter Text Generation Service

LoRA Adapters and Multi-LoRA Support

Criteria
Without context
With context

LLM Initialization

100%

0%

Max LoRAs Configuration

100%

0%

Max LoRA Rank

100%

0%

LoRA Request Object

100%

0%

Adapter in Generate

100%

0%

Base Model Generation

100%

30%

97%

47%

Multi-Turn Conversation System

Chat-based Generation

Criteria
Without context
With context

LLM Initialization

100%

80%

Chat Method Usage

0%

100%

Message Format

0%

100%

SamplingParams Configuration

100%

100%

Response Extraction

100%

100%

Multi-Turn Handling

0%

100%

100%

78%

Image Description Service

Multi-Modal Support

Criteria
Without context
With context

LLM initialization

50%

100%

Multi-modal prompt format

0%

100%

Image loading

20%

100%

Single image processing

10%

100%

Multiple image handling

13%

100%

Error handling

100%

100%

100%

Attention Backend Benchmark Tool

Custom Attention Mechanisms

Criteria
Without context
With context

LLM Class Import

100%

100%

SamplingParams Import

100%

100%

LLM Initialization

100%

100%

Attention Backend Configuration

100%

100%

Default Backend Handling

100%

100%

Text Generation

100%

100%

SamplingParams Usage

100%

100%

Output Extraction

100%

100%

Install with Tessl CLI

npx tessl i tessl/pypi-vllm
Evaluated
Agent
Claude Code

Table of Contents