or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-pyllamacpp

Python bindings for llama.cpp enabling efficient local language model inference without external API dependencies

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/pyllamacpp@2.4.x

To install, run

npx @tessl/cli install tessl/pypi-pyllamacpp@2.4.0

0

# PyLLaMACpp

1

2

Python bindings for llama.cpp enabling developers to run Facebook's LLaMA language models and other compatible large language models directly in Python applications. PyLLaMACpp provides both high-level Python APIs through the Model class for easy integration, and low-level access to llama.cpp C-API functions for advanced users requiring custom implementations.

3

4

## Package Information

5

6

- **Package Name**: pyllamacpp

7

- **Language**: Python

8

- **Installation**: `pip install pyllamacpp`

9

- **Dependencies**: CMake, pybind11 (for building from source); optional: numpy, torch, sentencepiece (for model conversion utilities)

10

11

## Core Imports

12

13

```python

14

from pyllamacpp.model import Model

15

```

16

17

For utility functions:

18

19

```python

20

from pyllamacpp import utils

21

```

22

23

For LangChain integration:

24

25

```python

26

from pyllamacpp.langchain_llm import PyllamacppLLM

27

```

28

29

For logging configuration:

30

31

```python

32

from pyllamacpp._logger import get_logger, set_log_level

33

```

34

35

For package constants:

36

37

```python

38

from pyllamacpp.constants import PACKAGE_NAME, LOGGING_LEVEL

39

```

40

41

For web interface:

42

43

```python

44

from pyllamacpp.webui import webui, run

45

```

46

47

## Basic Usage

48

49

```python

50

from pyllamacpp.model import Model

51

52

# Load a GGML model

53

model = Model(model_path='/path/to/model.ggml')

54

55

# Generate text streaming tokens

56

for token in model.generate("Tell me a joke"):

57

print(token, end='', flush=True)

58

59

# Or generate all at once using cpp_generate

60

response = model.cpp_generate("What is artificial intelligence?", n_predict=100)

61

print(response)

62

```

63

64

Interactive dialogue example:

65

66

```python

67

from pyllamacpp.model import Model

68

69

model = Model(model_path='/path/to/model.ggml')

70

71

while True:

72

try:

73

prompt = input("You: ")

74

if prompt == '':

75

continue

76

print("AI:", end='')

77

for token in model.generate(prompt):

78

print(token, end='', flush=True)

79

print()

80

except KeyboardInterrupt:

81

break

82

```

83

84

## Architecture

85

86

PyLLaMACpp operates as a bridge between Python and the high-performance llama.cpp C++ library:

87

88

- **Model Class**: High-level Python interface providing text generation, tokenization, and embedding capabilities

89

- **C++ Extension (_pyllamacpp)**: Direct bindings to llama.cpp functions built with pybind11

90

- **Utility Functions**: Model format conversion and quantization tools

91

- **Integration Wrappers**: LangChain compatibility and web UI interfaces

92

- **CLI Interface**: Command-line tool for interactive model testing

93

94

The architecture enables maximum performance by leveraging llama.cpp's optimized C++ implementation while maintaining ease of use through Python interfaces, making it suitable for chatbots, text generation, interactive AI applications, and any project requiring efficient local language model inference without external API dependencies.

95

96

## Capabilities

97

98

### Model Operations

99

100

Core functionality for loading models, generating text, and managing model state. Includes both streaming token generation and batch text generation methods with extensive parameter control.

101

102

```python { .api }

103

class Model:

104

def __init__(self, model_path: str, prompt_context: str = '', prompt_prefix: str = '', prompt_suffix: str = '', log_level: int = logging.ERROR, n_ctx: int = 512, seed: int = 0, n_gpu_layers: int = 0, f16_kv: bool = False, logits_all: bool = False, vocab_only: bool = False, use_mlock: bool = False, embedding: bool = False): ...

105

def generate(self, prompt: str, n_predict: Union[None, int] = None, n_threads: int = 4, **kwargs) -> Generator: ...

106

def cpp_generate(self, prompt: str, n_predict: int = 128, **kwargs) -> str: ...

107

def tokenize(self, text: str): ...

108

def detokenize(self, tokens: list): ...

109

def reset(self) -> None: ...

110

```

111

112

[Model Operations](./model-operations.md)

113

114

### Utility Functions

115

116

Helper functions for model format conversion and quantization. Includes conversion from LLaMA PyTorch models to GGML format and quantization for reduced model sizes.

117

118

```python { .api }

119

def llama_to_ggml(dir_model: str, ftype: int = 1) -> str: ...

120

def quantize(ggml_model_path: str, output_model_path: str = None, itype: int = 2) -> str: ...

121

```

122

123

[Utility Functions](./utilities.md)

124

125

### LangChain Integration

126

127

LangChain-compatible wrapper class enabling seamless integration with LangChain workflows and chains. Provides the same interface as other LangChain LLM implementations.

128

129

```python { .api }

130

class PyllamacppLLM(LLM):

131

model: str

132

n_ctx: int = 512

133

seed: int = 0

134

n_threads: int = 4

135

n_predict: int = 50

136

temp: float = 0.8

137

top_p: float = 0.95

138

top_k: int = 40

139

```

140

141

[LangChain Integration](./langchain-integration.md)

142

143

### Embeddings

144

145

Vector embeddings functionality for semantic similarity and RAG applications. Supports generating embeddings for individual prompts or extracting embeddings from current model context.

146

147

```python { .api }

148

def get_embeddings(self) -> List[float]: ...

149

def get_prompt_embeddings(self, prompt: str, n_threads: int = 4, n_batch: int = 512) -> List[float]: ...

150

```

151

152

[Embeddings](./embeddings.md)

153

154

### Web User Interface

155

156

Streamlit-based web interface for interactive model testing and development. Provides browser-based chat interface with configurable parameters and real-time model interaction.

157

158

```python { .api }

159

def webui() -> None: ...

160

def run(): ...

161

```

162

163

[Web User Interface](./web-ui.md)

164

165

### Command Line Interface

166

167

Interactive command-line interface for model testing and development. Provides configurable chat interface with extensive parameter control and debugging features.

168

169

```bash

170

pyllamacpp path/to/model.ggml

171

```

172

173

[Command Line Interface](./cli.md)