0
# PyLLaMACpp
1
2
Python bindings for llama.cpp enabling developers to run Facebook's LLaMA language models and other compatible large language models directly in Python applications. PyLLaMACpp provides both high-level Python APIs through the Model class for easy integration, and low-level access to llama.cpp C-API functions for advanced users requiring custom implementations.
3
4
## Package Information
5
6
- **Package Name**: pyllamacpp
7
- **Language**: Python
8
- **Installation**: `pip install pyllamacpp`
9
- **Dependencies**: CMake, pybind11 (for building from source); optional: numpy, torch, sentencepiece (for model conversion utilities)
10
11
## Core Imports
12
13
```python
14
from pyllamacpp.model import Model
15
```
16
17
For utility functions:
18
19
```python
20
from pyllamacpp import utils
21
```
22
23
For LangChain integration:
24
25
```python
26
from pyllamacpp.langchain_llm import PyllamacppLLM
27
```
28
29
For logging configuration:
30
31
```python
32
from pyllamacpp._logger import get_logger, set_log_level
33
```
34
35
For package constants:
36
37
```python
38
from pyllamacpp.constants import PACKAGE_NAME, LOGGING_LEVEL
39
```
40
41
For web interface:
42
43
```python
44
from pyllamacpp.webui import webui, run
45
```
46
47
## Basic Usage
48
49
```python
50
from pyllamacpp.model import Model
51
52
# Load a GGML model
53
model = Model(model_path='/path/to/model.ggml')
54
55
# Generate text streaming tokens
56
for token in model.generate("Tell me a joke"):
57
print(token, end='', flush=True)
58
59
# Or generate all at once using cpp_generate
60
response = model.cpp_generate("What is artificial intelligence?", n_predict=100)
61
print(response)
62
```
63
64
Interactive dialogue example:
65
66
```python
67
from pyllamacpp.model import Model
68
69
model = Model(model_path='/path/to/model.ggml')
70
71
while True:
72
try:
73
prompt = input("You: ")
74
if prompt == '':
75
continue
76
print("AI:", end='')
77
for token in model.generate(prompt):
78
print(token, end='', flush=True)
79
print()
80
except KeyboardInterrupt:
81
break
82
```
83
84
## Architecture
85
86
PyLLaMACpp operates as a bridge between Python and the high-performance llama.cpp C++ library:
87
88
- **Model Class**: High-level Python interface providing text generation, tokenization, and embedding capabilities
89
- **C++ Extension (_pyllamacpp)**: Direct bindings to llama.cpp functions built with pybind11
90
- **Utility Functions**: Model format conversion and quantization tools
91
- **Integration Wrappers**: LangChain compatibility and web UI interfaces
92
- **CLI Interface**: Command-line tool for interactive model testing
93
94
The architecture enables maximum performance by leveraging llama.cpp's optimized C++ implementation while maintaining ease of use through Python interfaces, making it suitable for chatbots, text generation, interactive AI applications, and any project requiring efficient local language model inference without external API dependencies.
95
96
## Capabilities
97
98
### Model Operations
99
100
Core functionality for loading models, generating text, and managing model state. Includes both streaming token generation and batch text generation methods with extensive parameter control.
101
102
```python { .api }
103
class Model:
104
def __init__(self, model_path: str, prompt_context: str = '', prompt_prefix: str = '', prompt_suffix: str = '', log_level: int = logging.ERROR, n_ctx: int = 512, seed: int = 0, n_gpu_layers: int = 0, f16_kv: bool = False, logits_all: bool = False, vocab_only: bool = False, use_mlock: bool = False, embedding: bool = False): ...
105
def generate(self, prompt: str, n_predict: Union[None, int] = None, n_threads: int = 4, **kwargs) -> Generator: ...
106
def cpp_generate(self, prompt: str, n_predict: int = 128, **kwargs) -> str: ...
107
def tokenize(self, text: str): ...
108
def detokenize(self, tokens: list): ...
109
def reset(self) -> None: ...
110
```
111
112
[Model Operations](./model-operations.md)
113
114
### Utility Functions
115
116
Helper functions for model format conversion and quantization. Includes conversion from LLaMA PyTorch models to GGML format and quantization for reduced model sizes.
117
118
```python { .api }
119
def llama_to_ggml(dir_model: str, ftype: int = 1) -> str: ...
120
def quantize(ggml_model_path: str, output_model_path: str = None, itype: int = 2) -> str: ...
121
```
122
123
[Utility Functions](./utilities.md)
124
125
### LangChain Integration
126
127
LangChain-compatible wrapper class enabling seamless integration with LangChain workflows and chains. Provides the same interface as other LangChain LLM implementations.
128
129
```python { .api }
130
class PyllamacppLLM(LLM):
131
model: str
132
n_ctx: int = 512
133
seed: int = 0
134
n_threads: int = 4
135
n_predict: int = 50
136
temp: float = 0.8
137
top_p: float = 0.95
138
top_k: int = 40
139
```
140
141
[LangChain Integration](./langchain-integration.md)
142
143
### Embeddings
144
145
Vector embeddings functionality for semantic similarity and RAG applications. Supports generating embeddings for individual prompts or extracting embeddings from current model context.
146
147
```python { .api }
148
def get_embeddings(self) -> List[float]: ...
149
def get_prompt_embeddings(self, prompt: str, n_threads: int = 4, n_batch: int = 512) -> List[float]: ...
150
```
151
152
[Embeddings](./embeddings.md)
153
154
### Web User Interface
155
156
Streamlit-based web interface for interactive model testing and development. Provides browser-based chat interface with configurable parameters and real-time model interaction.
157
158
```python { .api }
159
def webui() -> None: ...
160
def run(): ...
161
```
162
163
[Web User Interface](./web-ui.md)
164
165
### Command Line Interface
166
167
Interactive command-line interface for model testing and development. Provides configurable chat interface with extensive parameter control and debugging features.
168
169
```bash
170
pyllamacpp path/to/model.ggml
171
```
172
173
[Command Line Interface](./cli.md)