Tessl Tile for pypi/pyllamacpp@2.4.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

cli.md embeddings.md index.md langchain-integration.md model-operations.md utilities.md web-ui.md

utilities.mddocs/

0
# Utility Functions
1

2
Helper functions for model format conversion and quantization. These utilities enable conversion between different model formats and optimization of model storage and inference performance.
3

4
## Capabilities
5

6
### Model Format Conversion
7

8
Convert LLaMA PyTorch models to GGML format for use with pyllamacpp. This function replicates the functionality of llama.cpp's convert-pth-to-ggml.py script.
9

10
```python { .api }
11
def llama_to_ggml(dir_model: str, ftype: int = 1) -> str:
12
    """
13
    Convert LLaMA PyTorch models to GGML format.
14

15
    This function converts Facebook's original LLaMA model files
16
    from PyTorch format to GGML format compatible with llama.cpp.
17

18
    Parameters:
19
    - dir_model: str, path to directory containing LLaMA model files
20
                 (should contain params.json and consolidated.0X.pth files)
21
    - ftype: int, precision format (0 for f32, 1 for f16, default: 1)
22

23
    Returns:
24
    str: Path to the converted GGML model file
25

26
    Raises:
27
    Exception: If model directory structure is invalid or conversion fails
28
    """
29
```
30

31
Example usage:
32

33
```python
34
from pyllamacpp import utils
35

36
# Convert LLaMA-7B model to f16 GGML format
37
ggml_path = utils.llama_to_ggml('/path/to/llama-7b/', ftype=1)
38
print(f"Converted model saved to: {ggml_path}")
39

40
# Convert to f32 format for higher precision
41
ggml_path_f32 = utils.llama_to_ggml('/path/to/llama-13b/', ftype=0)
42
print(f"F32 model saved to: {ggml_path_f32}")
43

44
# Use converted model
45
from pyllamacpp.model import Model
46
model = Model(model_path=ggml_path)
47
```
48

49
### Model Quantization
50

51
Quantize GGML models to reduce file size and memory usage while maintaining reasonable inference quality. Supports Q4_0 and Q4_1 quantization formats.
52

53
```python { .api }
54
def quantize(ggml_model_path: str, output_model_path: str = None, itype: int = 2) -> str:
55
    """
56
    Quantize GGML model to reduce size and memory usage.
57

58
    Applies quantization to reduce model precision, significantly
59
    decreasing file size and memory requirements with minimal
60
    quality loss for most applications.
61

62
    Parameters:
63
    - ggml_model_path: str, path to input GGML model file
64
    - output_model_path: str or None, output path for quantized model
65
                        (default: input_path + '-q4_0.bin' or '-q4_1.bin')
66
    - itype: int, quantization type:
67
        - 2: Q4_0 quantization (4-bit, smaller file size)
68
        - 3: Q4_1 quantization (4-bit, slightly better quality)
69

70
    Returns:
71
    str: Path to the quantized model file
72

73
    Raises:
74
    Exception: If quantization process fails
75
    """
76
```
77

78
Example usage:
79

80
```python
81
from pyllamacpp import utils
82

83
# Quantize model using Q4_0 (default)
84
original_model = '/path/to/llama-7b.ggml'
85
quantized_path = utils.quantize(original_model)
86
print(f"Quantized model: {quantized_path}")
87

88
# Quantize with custom output path and Q4_1 format
89
quantized_custom = utils.quantize(
90
    ggml_model_path=original_model,
91
    output_model_path='/path/to/llama-7b-q4_1.ggml',
92
    itype=3
93
)
94

95
# Compare file sizes
96
import os
97
original_size = os.path.getsize(original_model) / (1024**3)  # GB
98
quantized_size = os.path.getsize(quantized_path) / (1024**3)  # GB
99
print(f"Original: {original_size:.2f} GB")
100
print(f"Quantized: {quantized_size:.2f} GB")
101
print(f"Size reduction: {(1 - quantized_size/original_size)*100:.1f}%")
102

103
# Use quantized model
104
from pyllamacpp.model import Model
105
model = Model(model_path=quantized_path)
106
```
107

108
### GPT4All Conversion
109

110
Placeholder function for converting GPT4All models (currently not implemented).
111

112
```python { .api }
113
def convert_gpt4all() -> str:
114
    """
115
    Convert GPT4All models (placeholder implementation).
116
    
117
    Note: This function is currently not implemented and will
118
    pass without performing any operations.
119

120
    Returns:
121
    str: Conversion result (implementation pending)
122
    """
123
```
124

125
### Logging Configuration
126

127
Logger configuration functions for controlling PyLLaMACpp's internal logging behavior.
128

129
```python { .api }
130
def get_logger():
131
    """
132
    Get the package logger instance.
133
    
134
    Returns the configured logger instance used throughout
135
    the PyLLaMACpp package for debugging and information output.
136
    
137
    Returns:
138
    logging.Logger: Package logger instance
139
    """
140

141
def set_log_level(log_level):
142
    """
143
    Set the logging level for the PyLLaMACpp package.
144
    
145
    Controls the verbosity of logging output from the package.
146
    Use standard Python logging levels (DEBUG, INFO, WARNING, ERROR, CRITICAL).
147
    
148
    Parameters:
149
    - log_level: int or logging level constant, desired logging level
150
    
151
    Example:
152
    ```python
153
    import logging
154
    from pyllamacpp._logger import set_log_level
155
    
156
    # Set to INFO level for detailed output
157
    set_log_level(logging.INFO)
158
    
159
    # Set to ERROR level for minimal output
160
    set_log_level(logging.ERROR)
161
    ```
162
    """
163
```
164

165
Example usage:
166

167
```python
168
import logging
169
from pyllamacpp._logger import get_logger, set_log_level
170
from pyllamacpp.model import Model
171

172
# Configure logging for debugging
173
set_log_level(logging.DEBUG)
174
logger = get_logger()
175

176
# Load model with debug logging
177
model = Model(model_path='/path/to/model.ggml')
178
logger.info("Model loaded successfully")
179

180
# Generate text with logging
181
response = model.cpp_generate("Test prompt", n_predict=50)
182
logger.info(f"Generated {len(response)} characters")
183
```
184

185
### Package Constants
186

187
Package-level constants for identification and configuration.
188

189
```python { .api }
190
PACKAGE_NAME = 'pyllamacpp'
191
"""Package name identifier constant."""
192

193
LOGGING_LEVEL = logging.INFO
194
"""Default logging level for the package."""
195
```
196

197
Example usage:
198

199
```python
200
from pyllamacpp.constants import PACKAGE_NAME, LOGGING_LEVEL
201
import logging
202

203
print(f"Using {PACKAGE_NAME} package")
204

205
# Use default logging level
206
logging.basicConfig(level=LOGGING_LEVEL)
207
```
208

209
## Complete Workflow Example
210

211
Here's a complete example showing the typical workflow from PyTorch LLaMA model to optimized quantized model:
212

213
```python
214
from pyllamacpp import utils
215
from pyllamacpp.model import Model
216
import os
217

218
# Step 1: Convert PyTorch LLaMA model to GGML
219
print("Converting PyTorch model to GGML...")
220
ggml_model = utils.llama_to_ggml(
221
    dir_model='/path/to/llama-7b-pytorch/',
222
    ftype=1  # f16 precision
223
)
224
print(f"GGML model created: {ggml_model}")
225

226
# Step 2: Quantize the GGML model
227
print("Quantizing model...")
228
quantized_model = utils.quantize(
229
    ggml_model_path=ggml_model,
230
    itype=2  # Q4_0 quantization
231
)
232
print(f"Quantized model created: {quantized_model}")
233

234
# Step 3: Compare sizes
235
original_size = os.path.getsize(ggml_model) / (1024**2)  # MB
236
quantized_size = os.path.getsize(quantized_model) / (1024**2)  # MB
237
print(f"Size reduction: {original_size:.1f}MB -> {quantized_size:.1f}MB")
238

239
# Step 4: Test the quantized model
240
print("Testing quantized model...")
241
model = Model(model_path=quantized_model)
242
response = model.cpp_generate("Hello, how are you?", n_predict=50)
243
print(f"Model response: {response}")
244
```
245

246
## Dependencies
247

248
The utility functions require additional dependencies:
249

250
```python
251
# Required for llama_to_ggml
252
import torch
253
import numpy as np
254
from sentencepiece import SentencePieceProcessor
255

256
# Built-in dependencies
257
import json
258
import struct
259
import sys
260
from pathlib import Path
261
```
262

263
Make sure these are installed:
264

265
```bash
266
pip install torch numpy sentencepiece
267
```

Version

Tile

Files

utilities.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

utilities.mddocs/