or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

cli.mdembeddings.mdindex.mdlangchain-integration.mdmodel-operations.mdutilities.mdweb-ui.md

cli.mddocs/

0

# Command Line Interface

1

2

Interactive command-line interface for model testing and development. The CLI provides a configurable chat interface with extensive parameter control, debugging features, and direct access to model capabilities for development and experimentation.

3

4

## Capabilities

5

6

### Basic CLI Usage

7

8

Launch the interactive chat interface with a model file:

9

10

```bash

11

pyllamacpp /path/to/model.ggml

12

```

13

14

This starts an interactive session where you can chat with the model:

15

16

```

17

██████╗ ██╗ ██╗██╗ ██╗ █████╗ ███╗ ███╗ █████╗ ██████╗██████╗ ██████╗

18

██╔══██╗╚██╗ ██╔╝██║ ██║ ██╔══██╗████╗ ████║██╔══██╗██╔════╝██╔══██╗██╔══██╗

19

██████╔╝ ╚████╔╝ ██║ ██║ ███████║██╔████╔██║███████║██║ ██████╔╝██████╔╝

20

██╔═══╝ ╚██╔╝ ██║ ██║ ██╔══██║██║╚██╔╝██║██╔══██║██║ ██╔═══╝ ██╔═══╝

21

██║ ██║ ███████╗███████╗██║ ██║██║ ╚═╝ ██║██║ ██║╚██████╗██║ ██║

22

╚═╝ ╚═╝ ╚══════╝╚══════╝╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝ ╚═════╝╚═╝ ╚═╝

23

24

PyLLaMACpp

25

A simple Command Line Interface to test the package

26

Version: 2.4.3

27

28

You: Hello, how are you?

29

AI: I'm doing well, thank you for asking! How can I help you today?

30

31

You:

32

```

33

34

### Command Line Arguments

35

36

The CLI supports extensive parameter customization:

37

38

```bash

39

pyllamacpp --help

40

41

usage: pyllamacpp [-h] [--n_ctx N_CTX] [--seed SEED] [--f16_kv F16_KV]

42

[--logits_all LOGITS_ALL] [--vocab_only VOCAB_ONLY]

43

[--use_mlock USE_MLOCK] [--embedding EMBEDDING]

44

[--n_predict N_PREDICT] [--n_threads N_THREADS]

45

[--repeat_last_n REPEAT_LAST_N] [--top_k TOP_K]

46

[--top_p TOP_P] [--temp TEMP] [--repeat_penalty REPEAT_PENALTY]

47

[--n_batch N_BATCH]

48

model

49

50

positional arguments:

51

model The path of the model file

52

53

options:

54

-h, --help show this help message and exit

55

56

# Context Parameters

57

--n_ctx N_CTX text context (default: 512)

58

--seed SEED RNG seed (default: -1 for random)

59

--f16_kv F16_KV use fp16 for KV cache (default: False)

60

--logits_all LOGITS_ALL

61

compute all logits, not just the last one (default: False)

62

--vocab_only VOCAB_ONLY

63

only load vocabulary, no weights (default: False)

64

--use_mlock USE_MLOCK

65

force system to keep model in RAM (default: False)

66

--embedding EMBEDDING

67

embedding mode only (default: False)

68

69

# Generation Parameters

70

--n_predict N_PREDICT

71

Number of tokens to predict (default: 256)

72

--n_threads N_THREADS

73

Number of threads (default: 4)

74

--repeat_last_n REPEAT_LAST_N

75

Last n tokens to penalize (default: 64)

76

--top_k TOP_K top_k sampling (default: 40)

77

--top_p TOP_P top_p sampling (default: 0.95)

78

--temp TEMP temperature (default: 0.8)

79

--repeat_penalty REPEAT_PENALTY

80

repeat_penalty (default: 1.1)

81

--n_batch N_BATCH batch size for prompt processing (default: 512)

82

```

83

84

### CLI Parameter Examples

85

86

Configure the model for different use cases:

87

88

```bash

89

# High creativity configuration

90

pyllamacpp /path/to/model.ggml \

91

--temp 1.2 \

92

--top_p 0.9 \

93

--top_k 50 \

94

--n_predict 200

95

96

# Focused, deterministic responses

97

pyllamacpp /path/to/model.ggml \

98

--temp 0.1 \

99

--top_p 0.9 \

100

--top_k 20 \

101

--repeat_penalty 1.15

102

103

# Large context configuration

104

pyllamacpp /path/to/model.ggml \

105

--n_ctx 2048 \

106

--n_batch 1024 \

107

--n_threads 8

108

109

# GPU acceleration (if supported)

110

pyllamacpp /path/to/model.ggml \

111

--n_gpu_layers 32 \

112

--f16_kv True

113

114

# Memory-optimized configuration

115

pyllamacpp /path/to/model.ggml \

116

--use_mlock True \

117

--n_batch 256

118

```

119

120

### Interactive Features

121

122

The CLI provides several interactive features:

123

124

1. **Multi-line Input**: Press Enter twice to send multi-line messages

125

2. **Exit Commands**: Type 'exit', 'quit', or press Ctrl+C to quit

126

3. **Context Persistence**: Conversation context is maintained across exchanges

127

4. **Real-time Generation**: See tokens generated in real-time

128

5. **Color Output**: Colored output for better readability

129

130

### Instruction-Following Mode

131

132

The CLI includes built-in instruction-following templates:

133

134

```python { .api }

135

# Default prompt templates in CLI

136

PROMPT_CONTEXT = "Below is an instruction that describes a task. Write a response that appropriately completes the request."

137

PROMPT_PREFIX = "\n\n##Instruction:\n"

138

PROMPT_SUFFIX = "\n\n##Response:\n"

139

```

140

141

Example interaction with instruction format:

142

143

```

144

You: Explain how photosynthesis works

145

146

AI: ##Response:

147

Photosynthesis is the process by which plants convert light energy into chemical energy...

148

```

149

150

### Performance Monitoring

151

152

The CLI includes performance monitoring capabilities:

153

154

```python

155

# Example CLI session with timing info

156

You: Tell me about machine learning

157

AI: Machine learning is a subset of artificial intelligence... (Generated in 2.3s, 45 tokens/s)

158

159

# System information display

160

Model: /path/to/llama-7b.ggml

161

Context size: 512 tokens

162

Threads: 4

163

Memory usage: 4.2 GB

164

```

165

166

### Configuration Schema

167

168

The CLI uses structured parameter schemas for validation:

169

170

```python { .api }

171

# Context parameters schema

172

LLAMA_CONTEXT_PARAMS_SCHEMA = {

173

'n_ctx': {

174

'type': int,

175

'description': "text context",

176

'default': 512

177

},

178

'seed': {

179

'type': int,

180

'description': "RNG seed",

181

'default': -1

182

},

183

'f16_kv': {

184

'type': bool,

185

'description': "use fp16 for KV cache",

186

'default': False

187

},

188

# ... more parameters

189

}

190

191

# Generation parameters schema

192

GPT_PARAMS_SCHEMA = {

193

'n_predict': {

194

'type': int,

195

'description': "Number of tokens to predict",

196

'default': 256

197

},

198

'n_threads': {

199

'type': int,

200

'description': "Number of threads",

201

'default': 4

202

},

203

# ... more parameters

204

}

205

```

206

207

### Programmatic CLI Access

208

209

Access CLI functionality programmatically:

210

211

```python { .api }

212

def main():

213

"""Main entry point for command line interface."""

214

215

def run(args):

216

"""

217

Run interactive chat session with parsed arguments.

218

219

Parameters:

220

- args: Parsed command line arguments

221

"""

222

```

223

224

Example programmatic usage:

225

226

```python

227

import argparse

228

from pyllamacpp.cli import run

229

230

# Create argument parser

231

parser = argparse.ArgumentParser()

232

parser.add_argument('model', help='Path to model file')

233

parser.add_argument('--temp', type=float, default=0.8)

234

parser.add_argument('--n_predict', type=int, default=128)

235

236

# Parse arguments and run

237

args = parser.parse_args(['/path/to/model.ggml', '--temp', '0.7'])

238

run(args)

239

```

240

241

### Custom CLI Applications

242

243

Build custom CLI applications using the CLI components:

244

245

```python

246

from pyllamacpp.model import Model

247

from pyllamacpp.cli import bcolors, PROMPT_CONTEXT, PROMPT_PREFIX, PROMPT_SUFFIX

248

import argparse

249

250

def custom_cli():

251

parser = argparse.ArgumentParser(description="Custom PyLLaMACpp CLI")

252

parser.add_argument('model', help='Model path')

253

parser.add_argument('--system-prompt', default="You are a helpful assistant.")

254

args = parser.parse_args()

255

256

# Initialize model with custom configuration

257

model = Model(

258

model_path=args.model,

259

prompt_context=args.system_prompt,

260

prompt_prefix="\n\nUser: ",

261

prompt_suffix="\n\nAssistant: "

262

)

263

264

print(f"{bcolors.HEADER}Custom PyLLaMACpp Chat{bcolors.ENDC}")

265

print(f"Model: {args.model}")

266

print(f"System: {args.system_prompt}")

267

print("-" * 50)

268

269

while True:

270

try:

271

user_input = input(f"{bcolors.OKBLUE}You: {bcolors.ENDC}")

272

if user_input.lower() in ['exit', 'quit']:

273

break

274

275

print(f"{bcolors.OKGREEN}AI: {bcolors.ENDC}", end="")

276

for token in model.generate(user_input, n_predict=150):

277

print(token, end="", flush=True)

278

print()

279

280

except KeyboardInterrupt:

281

print(f"\n{bcolors.WARNING}Goodbye!{bcolors.ENDC}")

282

break

283

284

if __name__ == "__main__":

285

custom_cli()

286

```

287

288

### Debugging and Development

289

290

The CLI includes debugging features for development:

291

292

```python

293

# Color codes for terminal output

294

class bcolors:

295

HEADER = '\033[95m'

296

OKBLUE = '\033[94m'

297

OKGREEN = '\033[92m'

298

WARNING = '\033[93m'

299

FAIL = '\033[91m'

300

ENDC = '\033[0m'

301

BOLD = '\033[1m'

302

UNDERLINE = '\033[4m'

303

304

# Usage in CLI output

305

print(f"{bcolors.OKGREEN}Model loaded successfully{bcolors.ENDC}")

306

print(f"{bcolors.WARNING}Warning: Large context size{bcolors.ENDC}")

307

print(f"{bcolors.FAIL}Error: Model file not found{bcolors.ENDC}")

308

```

309

310

### Batch Processing Mode

311

312

Run the CLI in batch mode for automated testing:

313

314

```bash

315

# Process commands from file

316

echo "Tell me a joke" | pyllamacpp /path/to/model.ggml --n_predict 50

317

318

# Multiple prompts

319

cat prompts.txt | pyllamacpp /path/to/model.ggml --temp 0.5

320

```

321

322

### Integration with Development Workflow

323

324

Use the CLI for rapid prototyping and testing:

325

326

```bash

327

# Test different temperatures

328

for temp in 0.3 0.7 1.0; do

329

echo "Temperature: $temp"

330

echo "What is AI?" | pyllamacpp model.ggml --temp $temp --n_predict 50

331

echo "---"

332

done

333

334

# Performance testing

335

time pyllamacpp model.ggml --n_predict 1000 < test_prompt.txt

336

337

# Memory usage monitoring

338

/usr/bin/time -v pyllamacpp model.ggml --use_mlock True < test_prompt.txt

339

```