A high-throughput and memory-efficient inference and serving engine for LLMs
Overall
score
69%
Evaluation — 69%
↑ 1.33xAgent success when using this tile
Text encoding and embedding generation for semantic similarity, retrieval applications, and downstream NLP tasks. Supports various pooling strategies and normalization options for different use cases.
Generate dense vector representations of text for semantic tasks, similarity search, and downstream machine learning applications.
def encode(
self,
prompts: Union[PromptType, Sequence[PromptType], DataPrompt],
pooling_params: Optional[Union[PoolingParams, Sequence[PoolingParams]]] = None,
*,
truncate_prompt_tokens: Optional[int] = None,
use_tqdm: Union[bool, Callable[..., tqdm]] = True,
lora_request: Optional[Union[List[LoRARequest], LoRARequest]] = None,
pooling_task: PoolingTask = "encode",
tokenization_kwargs: Optional[Dict[str, Any]] = None
) -> List[PoolingRequestOutput]:
"""
Generate embeddings for input text.
Parameters:
- prompts: Input text or token sequences
- pooling_params: Pooling strategy and normalization options
- truncate_prompt_tokens: Maximum prompt length (keyword-only)
- use_tqdm: Show progress bar (keyword-only)
- lora_request: LoRA adapter configuration (keyword-only)
- pooling_task: The pooling task to perform (keyword-only)
- tokenization_kwargs: Additional tokenization options (keyword-only)
Returns:
List of PoolingRequestOutput with vector representations
"""from vllm import LLM, PoolingParams
llm = LLM(model="sentence-transformers/all-MiniLM-L6-v2")
texts = [
"The quick brown fox jumps over the lazy dog.",
"A fast fox leaps over a sleeping dog.",
"Python is a programming language."
]
pooling_params = PoolingParams(pooling_type="MEAN", normalize=True)
outputs = llm.encode(texts, pooling_params=pooling_params)
for output in outputs:
print(f"Embedding dimension: {len(output.outputs.data)}")class PoolingRequestOutput:
id: str
outputs: PoolingOutput
prompt_token_ids: List[int]
finished: bool
class PoolingOutput:
data: List[float] # Dense vector representationInstall with Tessl CLI
npx tessl i tessl/pypi-vllmdocs
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10