0
# Utility Functions
1
2
Helper functions for model format conversion and quantization. These utilities enable conversion between different model formats and optimization of model storage and inference performance.
3
4
## Capabilities
5
6
### Model Format Conversion
7
8
Convert LLaMA PyTorch models to GGML format for use with pyllamacpp. This function replicates the functionality of llama.cpp's convert-pth-to-ggml.py script.
9
10
```python { .api }
11
def llama_to_ggml(dir_model: str, ftype: int = 1) -> str:
12
"""
13
Convert LLaMA PyTorch models to GGML format.
14
15
This function converts Facebook's original LLaMA model files
16
from PyTorch format to GGML format compatible with llama.cpp.
17
18
Parameters:
19
- dir_model: str, path to directory containing LLaMA model files
20
(should contain params.json and consolidated.0X.pth files)
21
- ftype: int, precision format (0 for f32, 1 for f16, default: 1)
22
23
Returns:
24
str: Path to the converted GGML model file
25
26
Raises:
27
Exception: If model directory structure is invalid or conversion fails
28
"""
29
```
30
31
Example usage:
32
33
```python
34
from pyllamacpp import utils
35
36
# Convert LLaMA-7B model to f16 GGML format
37
ggml_path = utils.llama_to_ggml('/path/to/llama-7b/', ftype=1)
38
print(f"Converted model saved to: {ggml_path}")
39
40
# Convert to f32 format for higher precision
41
ggml_path_f32 = utils.llama_to_ggml('/path/to/llama-13b/', ftype=0)
42
print(f"F32 model saved to: {ggml_path_f32}")
43
44
# Use converted model
45
from pyllamacpp.model import Model
46
model = Model(model_path=ggml_path)
47
```
48
49
### Model Quantization
50
51
Quantize GGML models to reduce file size and memory usage while maintaining reasonable inference quality. Supports Q4_0 and Q4_1 quantization formats.
52
53
```python { .api }
54
def quantize(ggml_model_path: str, output_model_path: str = None, itype: int = 2) -> str:
55
"""
56
Quantize GGML model to reduce size and memory usage.
57
58
Applies quantization to reduce model precision, significantly
59
decreasing file size and memory requirements with minimal
60
quality loss for most applications.
61
62
Parameters:
63
- ggml_model_path: str, path to input GGML model file
64
- output_model_path: str or None, output path for quantized model
65
(default: input_path + '-q4_0.bin' or '-q4_1.bin')
66
- itype: int, quantization type:
67
- 2: Q4_0 quantization (4-bit, smaller file size)
68
- 3: Q4_1 quantization (4-bit, slightly better quality)
69
70
Returns:
71
str: Path to the quantized model file
72
73
Raises:
74
Exception: If quantization process fails
75
"""
76
```
77
78
Example usage:
79
80
```python
81
from pyllamacpp import utils
82
83
# Quantize model using Q4_0 (default)
84
original_model = '/path/to/llama-7b.ggml'
85
quantized_path = utils.quantize(original_model)
86
print(f"Quantized model: {quantized_path}")
87
88
# Quantize with custom output path and Q4_1 format
89
quantized_custom = utils.quantize(
90
ggml_model_path=original_model,
91
output_model_path='/path/to/llama-7b-q4_1.ggml',
92
itype=3
93
)
94
95
# Compare file sizes
96
import os
97
original_size = os.path.getsize(original_model) / (1024**3) # GB
98
quantized_size = os.path.getsize(quantized_path) / (1024**3) # GB
99
print(f"Original: {original_size:.2f} GB")
100
print(f"Quantized: {quantized_size:.2f} GB")
101
print(f"Size reduction: {(1 - quantized_size/original_size)*100:.1f}%")
102
103
# Use quantized model
104
from pyllamacpp.model import Model
105
model = Model(model_path=quantized_path)
106
```
107
108
### GPT4All Conversion
109
110
Placeholder function for converting GPT4All models (currently not implemented).
111
112
```python { .api }
113
def convert_gpt4all() -> str:
114
"""
115
Convert GPT4All models (placeholder implementation).
116
117
Note: This function is currently not implemented and will
118
pass without performing any operations.
119
120
Returns:
121
str: Conversion result (implementation pending)
122
"""
123
```
124
125
### Logging Configuration
126
127
Logger configuration functions for controlling PyLLaMACpp's internal logging behavior.
128
129
```python { .api }
130
def get_logger():
131
"""
132
Get the package logger instance.
133
134
Returns the configured logger instance used throughout
135
the PyLLaMACpp package for debugging and information output.
136
137
Returns:
138
logging.Logger: Package logger instance
139
"""
140
141
def set_log_level(log_level):
142
"""
143
Set the logging level for the PyLLaMACpp package.
144
145
Controls the verbosity of logging output from the package.
146
Use standard Python logging levels (DEBUG, INFO, WARNING, ERROR, CRITICAL).
147
148
Parameters:
149
- log_level: int or logging level constant, desired logging level
150
151
Example:
152
```python
153
import logging
154
from pyllamacpp._logger import set_log_level
155
156
# Set to INFO level for detailed output
157
set_log_level(logging.INFO)
158
159
# Set to ERROR level for minimal output
160
set_log_level(logging.ERROR)
161
```
162
"""
163
```
164
165
Example usage:
166
167
```python
168
import logging
169
from pyllamacpp._logger import get_logger, set_log_level
170
from pyllamacpp.model import Model
171
172
# Configure logging for debugging
173
set_log_level(logging.DEBUG)
174
logger = get_logger()
175
176
# Load model with debug logging
177
model = Model(model_path='/path/to/model.ggml')
178
logger.info("Model loaded successfully")
179
180
# Generate text with logging
181
response = model.cpp_generate("Test prompt", n_predict=50)
182
logger.info(f"Generated {len(response)} characters")
183
```
184
185
### Package Constants
186
187
Package-level constants for identification and configuration.
188
189
```python { .api }
190
PACKAGE_NAME = 'pyllamacpp'
191
"""Package name identifier constant."""
192
193
LOGGING_LEVEL = logging.INFO
194
"""Default logging level for the package."""
195
```
196
197
Example usage:
198
199
```python
200
from pyllamacpp.constants import PACKAGE_NAME, LOGGING_LEVEL
201
import logging
202
203
print(f"Using {PACKAGE_NAME} package")
204
205
# Use default logging level
206
logging.basicConfig(level=LOGGING_LEVEL)
207
```
208
209
## Complete Workflow Example
210
211
Here's a complete example showing the typical workflow from PyTorch LLaMA model to optimized quantized model:
212
213
```python
214
from pyllamacpp import utils
215
from pyllamacpp.model import Model
216
import os
217
218
# Step 1: Convert PyTorch LLaMA model to GGML
219
print("Converting PyTorch model to GGML...")
220
ggml_model = utils.llama_to_ggml(
221
dir_model='/path/to/llama-7b-pytorch/',
222
ftype=1 # f16 precision
223
)
224
print(f"GGML model created: {ggml_model}")
225
226
# Step 2: Quantize the GGML model
227
print("Quantizing model...")
228
quantized_model = utils.quantize(
229
ggml_model_path=ggml_model,
230
itype=2 # Q4_0 quantization
231
)
232
print(f"Quantized model created: {quantized_model}")
233
234
# Step 3: Compare sizes
235
original_size = os.path.getsize(ggml_model) / (1024**2) # MB
236
quantized_size = os.path.getsize(quantized_model) / (1024**2) # MB
237
print(f"Size reduction: {original_size:.1f}MB -> {quantized_size:.1f}MB")
238
239
# Step 4: Test the quantized model
240
print("Testing quantized model...")
241
model = Model(model_path=quantized_model)
242
response = model.cpp_generate("Hello, how are you?", n_predict=50)
243
print(f"Model response: {response}")
244
```
245
246
## Dependencies
247
248
The utility functions require additional dependencies:
249
250
```python
251
# Required for llama_to_ggml
252
import torch
253
import numpy as np
254
from sentencepiece import SentencePieceProcessor
255
256
# Built-in dependencies
257
import json
258
import struct
259
import sys
260
from pathlib import Path
261
```
262
263
Make sure these are installed:
264
265
```bash
266
pip install torch numpy sentencepiece
267
```