tessl/pypi-pyllamacpp

Python bindings for llama.cpp enabling efficient local language model inference without external API dependencies

—

Pending

Overview

Eval results

Files

Command Line Interface

Name: tessl/pypi-pyllamacpp
Author: tessl

Interactive command-line interface for model testing and development. The CLI provides a configurable chat interface with extensive parameter control, debugging features, and direct access to model capabilities for development and experimentation.

Capabilities

Basic CLI Usage

Launch the interactive chat interface with a model file:

pyllamacpp /path/to/model.ggml

This starts an interactive session where you can chat with the model:

██████╗ ██╗   ██╗██╗     ██╗      █████╗ ███╗   ███╗ █████╗  ██████╗██████╗ ██████╗ 
██╔══██╗╚██╗ ██╔╝██║     ██║     ██╔══██╗████╗ ████║██╔══██╗██╔════╝██╔══██╗██╔══██╗
██████╔╝ ╚████╔╝ ██║     ██║     ███████║██╔████╔██║███████║██║     ██████╔╝██████╔╝
██╔═══╝   ╚██╔╝  ██║     ██║     ██╔══██║██║╚██╔╝██║██╔══██║██║     ██╔═══╝ ██╔═══╝ 
██║        ██║   ███████╗███████╗██║  ██║██║ ╚═╝ ██║██║  ██║╚██████╗██║     ██║     
╚═╝        ╚═╝   ╚══════╝╚══════╝╚═╝  ╚═╝╚═╝     ╚═╝╚═╝  ╚═╝ ╚═════╝╚═╝     ╚═╝     

PyLLaMACpp
A simple Command Line Interface to test the package
Version: 2.4.3

You: Hello, how are you?
AI: I'm doing well, thank you for asking! How can I help you today?

You:

Command Line Arguments

The CLI supports extensive parameter customization:

pyllamacpp --help

usage: pyllamacpp [-h] [--n_ctx N_CTX] [--seed SEED] [--f16_kv F16_KV] 
                  [--logits_all LOGITS_ALL] [--vocab_only VOCAB_ONLY] 
                  [--use_mlock USE_MLOCK] [--embedding EMBEDDING] 
                  [--n_predict N_PREDICT] [--n_threads N_THREADS] 
                  [--repeat_last_n REPEAT_LAST_N] [--top_k TOP_K] 
                  [--top_p TOP_P] [--temp TEMP] [--repeat_penalty REPEAT_PENALTY] 
                  [--n_batch N_BATCH]
                  model

positional arguments:
  model                 The path of the model file

options:
  -h, --help            show this help message and exit
  
  # Context Parameters
  --n_ctx N_CTX         text context (default: 512)
  --seed SEED           RNG seed (default: -1 for random)
  --f16_kv F16_KV       use fp16 for KV cache (default: False)
  --logits_all LOGITS_ALL
                        compute all logits, not just the last one (default: False)
  --vocab_only VOCAB_ONLY
                        only load vocabulary, no weights (default: False)
  --use_mlock USE_MLOCK
                        force system to keep model in RAM (default: False)
  --embedding EMBEDDING
                        embedding mode only (default: False)
  
  # Generation Parameters
  --n_predict N_PREDICT
                        Number of tokens to predict (default: 256)
  --n_threads N_THREADS
                        Number of threads (default: 4)
  --repeat_last_n REPEAT_LAST_N
                        Last n tokens to penalize (default: 64)
  --top_k TOP_K         top_k sampling (default: 40)
  --top_p TOP_P         top_p sampling (default: 0.95)
  --temp TEMP           temperature (default: 0.8)
  --repeat_penalty REPEAT_PENALTY
                        repeat_penalty (default: 1.1)
  --n_batch N_BATCH     batch size for prompt processing (default: 512)

CLI Parameter Examples

Configure the model for different use cases:

# High creativity configuration
pyllamacpp /path/to/model.ggml \
  --temp 1.2 \
  --top_p 0.9 \
  --top_k 50 \
  --n_predict 200

# Focused, deterministic responses
pyllamacpp /path/to/model.ggml \
  --temp 0.1 \
  --top_p 0.9 \
  --top_k 20 \
  --repeat_penalty 1.15

# Large context configuration
pyllamacpp /path/to/model.ggml \
  --n_ctx 2048 \
  --n_batch 1024 \
  --n_threads 8

# GPU acceleration (if supported)
pyllamacpp /path/to/model.ggml \
  --n_gpu_layers 32 \
  --f16_kv True

# Memory-optimized configuration
pyllamacpp /path/to/model.ggml \
  --use_mlock True \
  --n_batch 256

Interactive Features

The CLI provides several interactive features:

Multi-line Input: Press Enter twice to send multi-line messages
Exit Commands: Type 'exit', 'quit', or press Ctrl+C to quit
Context Persistence: Conversation context is maintained across exchanges
Real-time Generation: See tokens generated in real-time
Color Output: Colored output for better readability

Instruction-Following Mode

The CLI includes built-in instruction-following templates:

# Default prompt templates in CLI
PROMPT_CONTEXT = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
PROMPT_PREFIX = "\n\n##Instruction:\n"
PROMPT_SUFFIX = "\n\n##Response:\n"

Example interaction with instruction format:

You: Explain how photosynthesis works

AI: ##Response:
Photosynthesis is the process by which plants convert light energy into chemical energy...

Performance Monitoring

The CLI includes performance monitoring capabilities:

# Example CLI session with timing info
You: Tell me about machine learning
AI: Machine learning is a subset of artificial intelligence... (Generated in 2.3s, 45 tokens/s)

# System information display
Model: /path/to/llama-7b.ggml
Context size: 512 tokens
Threads: 4
Memory usage: 4.2 GB

Configuration Schema

The CLI uses structured parameter schemas for validation:

# Context parameters schema
LLAMA_CONTEXT_PARAMS_SCHEMA = {
    'n_ctx': {
        'type': int,
        'description': "text context",
        'default': 512
    },
    'seed': {
        'type': int,
        'description': "RNG seed",
        'default': -1
    },
    'f16_kv': {
        'type': bool,
        'description': "use fp16 for KV cache",
        'default': False
    },
    # ... more parameters
}

# Generation parameters schema
GPT_PARAMS_SCHEMA = {
    'n_predict': {
        'type': int,
        'description': "Number of tokens to predict",
        'default': 256
    },
    'n_threads': {
        'type': int,
        'description': "Number of threads",
        'default': 4
    },
    # ... more parameters
}

Programmatic CLI Access

Access CLI functionality programmatically:

def main():
    """Main entry point for command line interface."""

def run(args):
    """
    Run interactive chat session with parsed arguments.
    
    Parameters:
    - args: Parsed command line arguments
    """

Example programmatic usage:

import argparse
from pyllamacpp.cli import run

# Create argument parser
parser = argparse.ArgumentParser()
parser.add_argument('model', help='Path to model file')
parser.add_argument('--temp', type=float, default=0.8)
parser.add_argument('--n_predict', type=int, default=128)

# Parse arguments and run
args = parser.parse_args(['/path/to/model.ggml', '--temp', '0.7'])
run(args)

Custom CLI Applications

Build custom CLI applications using the CLI components:

from pyllamacpp.model import Model
from pyllamacpp.cli import bcolors, PROMPT_CONTEXT, PROMPT_PREFIX, PROMPT_SUFFIX
import argparse

def custom_cli():
    parser = argparse.ArgumentParser(description="Custom PyLLaMACpp CLI")
    parser.add_argument('model', help='Model path')
    parser.add_argument('--system-prompt', default="You are a helpful assistant.")
    args = parser.parse_args()
    
    # Initialize model with custom configuration
    model = Model(
        model_path=args.model,
        prompt_context=args.system_prompt,
        prompt_prefix="\n\nUser: ",
        prompt_suffix="\n\nAssistant: "
    )
    
    print(f"{bcolors.HEADER}Custom PyLLaMACpp Chat{bcolors.ENDC}")
    print(f"Model: {args.model}")
    print(f"System: {args.system_prompt}")
    print("-" * 50)
    
    while True:
        try:
            user_input = input(f"{bcolors.OKBLUE}You: {bcolors.ENDC}")
            if user_input.lower() in ['exit', 'quit']:
                break
                
            print(f"{bcolors.OKGREEN}AI: {bcolors.ENDC}", end="")
            for token in model.generate(user_input, n_predict=150):
                print(token, end="", flush=True)
            print()
            
        except KeyboardInterrupt:
            print(f"\n{bcolors.WARNING}Goodbye!{bcolors.ENDC}")
            break

if __name__ == "__main__":
    custom_cli()

Debugging and Development

The CLI includes debugging features for development:

# Color codes for terminal output
class bcolors:
    HEADER = '\033[95m'
    OKBLUE = '\033[94m' 
    OKGREEN = '\033[92m'
    WARNING = '\033[93m'
    FAIL = '\033[91m'
    ENDC = '\033[0m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'

# Usage in CLI output
print(f"{bcolors.OKGREEN}Model loaded successfully{bcolors.ENDC}")
print(f"{bcolors.WARNING}Warning: Large context size{bcolors.ENDC}")
print(f"{bcolors.FAIL}Error: Model file not found{bcolors.ENDC}")

Batch Processing Mode

Run the CLI in batch mode for automated testing:

# Process commands from file
echo "Tell me a joke" | pyllamacpp /path/to/model.ggml --n_predict 50

# Multiple prompts
cat prompts.txt | pyllamacpp /path/to/model.ggml --temp 0.5

Integration with Development Workflow

Use the CLI for rapid prototyping and testing:

# Test different temperatures
for temp in 0.3 0.7 1.0; do
    echo "Temperature: $temp"
    echo "What is AI?" | pyllamacpp model.ggml --temp $temp --n_predict 50
    echo "---"
done

# Performance testing
time pyllamacpp model.ggml --n_predict 1000 < test_prompt.txt

# Memory usage monitoring
/usr/bin/time -v pyllamacpp model.ggml --use_mlock True < test_prompt.txt

Install with Tessl CLI