or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

cli.mdembeddings.mdindex.mdlangchain-integration.mdmodel-operations.mdutilities.mdweb-ui.md

utilities.mddocs/

0

# Utility Functions

1

2

Helper functions for model format conversion and quantization. These utilities enable conversion between different model formats and optimization of model storage and inference performance.

3

4

## Capabilities

5

6

### Model Format Conversion

7

8

Convert LLaMA PyTorch models to GGML format for use with pyllamacpp. This function replicates the functionality of llama.cpp's convert-pth-to-ggml.py script.

9

10

```python { .api }

11

def llama_to_ggml(dir_model: str, ftype: int = 1) -> str:

12

"""

13

Convert LLaMA PyTorch models to GGML format.

14

15

This function converts Facebook's original LLaMA model files

16

from PyTorch format to GGML format compatible with llama.cpp.

17

18

Parameters:

19

- dir_model: str, path to directory containing LLaMA model files

20

(should contain params.json and consolidated.0X.pth files)

21

- ftype: int, precision format (0 for f32, 1 for f16, default: 1)

22

23

Returns:

24

str: Path to the converted GGML model file

25

26

Raises:

27

Exception: If model directory structure is invalid or conversion fails

28

"""

29

```

30

31

Example usage:

32

33

```python

34

from pyllamacpp import utils

35

36

# Convert LLaMA-7B model to f16 GGML format

37

ggml_path = utils.llama_to_ggml('/path/to/llama-7b/', ftype=1)

38

print(f"Converted model saved to: {ggml_path}")

39

40

# Convert to f32 format for higher precision

41

ggml_path_f32 = utils.llama_to_ggml('/path/to/llama-13b/', ftype=0)

42

print(f"F32 model saved to: {ggml_path_f32}")

43

44

# Use converted model

45

from pyllamacpp.model import Model

46

model = Model(model_path=ggml_path)

47

```

48

49

### Model Quantization

50

51

Quantize GGML models to reduce file size and memory usage while maintaining reasonable inference quality. Supports Q4_0 and Q4_1 quantization formats.

52

53

```python { .api }

54

def quantize(ggml_model_path: str, output_model_path: str = None, itype: int = 2) -> str:

55

"""

56

Quantize GGML model to reduce size and memory usage.

57

58

Applies quantization to reduce model precision, significantly

59

decreasing file size and memory requirements with minimal

60

quality loss for most applications.

61

62

Parameters:

63

- ggml_model_path: str, path to input GGML model file

64

- output_model_path: str or None, output path for quantized model

65

(default: input_path + '-q4_0.bin' or '-q4_1.bin')

66

- itype: int, quantization type:

67

- 2: Q4_0 quantization (4-bit, smaller file size)

68

- 3: Q4_1 quantization (4-bit, slightly better quality)

69

70

Returns:

71

str: Path to the quantized model file

72

73

Raises:

74

Exception: If quantization process fails

75

"""

76

```

77

78

Example usage:

79

80

```python

81

from pyllamacpp import utils

82

83

# Quantize model using Q4_0 (default)

84

original_model = '/path/to/llama-7b.ggml'

85

quantized_path = utils.quantize(original_model)

86

print(f"Quantized model: {quantized_path}")

87

88

# Quantize with custom output path and Q4_1 format

89

quantized_custom = utils.quantize(

90

ggml_model_path=original_model,

91

output_model_path='/path/to/llama-7b-q4_1.ggml',

92

itype=3

93

)

94

95

# Compare file sizes

96

import os

97

original_size = os.path.getsize(original_model) / (1024**3) # GB

98

quantized_size = os.path.getsize(quantized_path) / (1024**3) # GB

99

print(f"Original: {original_size:.2f} GB")

100

print(f"Quantized: {quantized_size:.2f} GB")

101

print(f"Size reduction: {(1 - quantized_size/original_size)*100:.1f}%")

102

103

# Use quantized model

104

from pyllamacpp.model import Model

105

model = Model(model_path=quantized_path)

106

```

107

108

### GPT4All Conversion

109

110

Placeholder function for converting GPT4All models (currently not implemented).

111

112

```python { .api }

113

def convert_gpt4all() -> str:

114

"""

115

Convert GPT4All models (placeholder implementation).

116

117

Note: This function is currently not implemented and will

118

pass without performing any operations.

119

120

Returns:

121

str: Conversion result (implementation pending)

122

"""

123

```

124

125

### Logging Configuration

126

127

Logger configuration functions for controlling PyLLaMACpp's internal logging behavior.

128

129

```python { .api }

130

def get_logger():

131

"""

132

Get the package logger instance.

133

134

Returns the configured logger instance used throughout

135

the PyLLaMACpp package for debugging and information output.

136

137

Returns:

138

logging.Logger: Package logger instance

139

"""

140

141

def set_log_level(log_level):

142

"""

143

Set the logging level for the PyLLaMACpp package.

144

145

Controls the verbosity of logging output from the package.

146

Use standard Python logging levels (DEBUG, INFO, WARNING, ERROR, CRITICAL).

147

148

Parameters:

149

- log_level: int or logging level constant, desired logging level

150

151

Example:

152

```python

153

import logging

154

from pyllamacpp._logger import set_log_level

155

156

# Set to INFO level for detailed output

157

set_log_level(logging.INFO)

158

159

# Set to ERROR level for minimal output

160

set_log_level(logging.ERROR)

161

```

162

"""

163

```

164

165

Example usage:

166

167

```python

168

import logging

169

from pyllamacpp._logger import get_logger, set_log_level

170

from pyllamacpp.model import Model

171

172

# Configure logging for debugging

173

set_log_level(logging.DEBUG)

174

logger = get_logger()

175

176

# Load model with debug logging

177

model = Model(model_path='/path/to/model.ggml')

178

logger.info("Model loaded successfully")

179

180

# Generate text with logging

181

response = model.cpp_generate("Test prompt", n_predict=50)

182

logger.info(f"Generated {len(response)} characters")

183

```

184

185

### Package Constants

186

187

Package-level constants for identification and configuration.

188

189

```python { .api }

190

PACKAGE_NAME = 'pyllamacpp'

191

"""Package name identifier constant."""

192

193

LOGGING_LEVEL = logging.INFO

194

"""Default logging level for the package."""

195

```

196

197

Example usage:

198

199

```python

200

from pyllamacpp.constants import PACKAGE_NAME, LOGGING_LEVEL

201

import logging

202

203

print(f"Using {PACKAGE_NAME} package")

204

205

# Use default logging level

206

logging.basicConfig(level=LOGGING_LEVEL)

207

```

208

209

## Complete Workflow Example

210

211

Here's a complete example showing the typical workflow from PyTorch LLaMA model to optimized quantized model:

212

213

```python

214

from pyllamacpp import utils

215

from pyllamacpp.model import Model

216

import os

217

218

# Step 1: Convert PyTorch LLaMA model to GGML

219

print("Converting PyTorch model to GGML...")

220

ggml_model = utils.llama_to_ggml(

221

dir_model='/path/to/llama-7b-pytorch/',

222

ftype=1 # f16 precision

223

)

224

print(f"GGML model created: {ggml_model}")

225

226

# Step 2: Quantize the GGML model

227

print("Quantizing model...")

228

quantized_model = utils.quantize(

229

ggml_model_path=ggml_model,

230

itype=2 # Q4_0 quantization

231

)

232

print(f"Quantized model created: {quantized_model}")

233

234

# Step 3: Compare sizes

235

original_size = os.path.getsize(ggml_model) / (1024**2) # MB

236

quantized_size = os.path.getsize(quantized_model) / (1024**2) # MB

237

print(f"Size reduction: {original_size:.1f}MB -> {quantized_size:.1f}MB")

238

239

# Step 4: Test the quantized model

240

print("Testing quantized model...")

241

model = Model(model_path=quantized_model)

242

response = model.cpp_generate("Hello, how are you?", n_predict=50)

243

print(f"Model response: {response}")

244

```

245

246

## Dependencies

247

248

The utility functions require additional dependencies:

249

250

```python

251

# Required for llama_to_ggml

252

import torch

253

import numpy as np

254

from sentencepiece import SentencePieceProcessor

255

256

# Built-in dependencies

257

import json

258

import struct

259

import sys

260

from pathlib import Path

261

```

262

263

Make sure these are installed:

264

265

```bash

266

pip install torch numpy sentencepiece

267

```