or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

aqa.mdchat-models.mdembeddings.mdindex.mdllm-models.mdsafety-config.mdvector-store.md

llm-models.mddocs/

0

# LLM Models

1

2

Simple text generation interface providing direct access to Google's Gemini models for completion-style tasks. This interface extends LangChain's `BaseLLM` and is designed for straightforward text generation without the complexity of conversational context management.

3

4

## Capabilities

5

6

### GoogleGenerativeAI

7

8

Text generation LLM that wraps Google's Gemini models in a simple completion interface.

9

10

```python { .api }

11

class GoogleGenerativeAI:

12

def __init__(

13

self,

14

*,

15

model: str,

16

google_api_key: Optional[SecretStr] = None,

17

credentials: Any = None,

18

temperature: float = 0.7,

19

top_p: Optional[float] = None,

20

top_k: Optional[int] = None,

21

max_output_tokens: Optional[int] = None,

22

n: int = 1,

23

max_retries: int = 6,

24

timeout: Optional[float] = None,

25

client_options: Optional[Dict] = None,

26

transport: Optional[str] = None,

27

additional_headers: Optional[Dict[str, str]] = None,

28

response_modalities: Optional[List[Modality]] = None,

29

thinking_budget: Optional[int] = None,

30

include_thoughts: Optional[bool] = None,

31

safety_settings: Optional[Dict[HarmCategory, HarmBlockThreshold]] = None

32

)

33

```

34

35

**Parameters:**

36

- `model` (str): Model name (e.g., "gemini-2.5-pro", "gemini-2.0-flash")

37

- `google_api_key` (Optional[SecretStr]): Google API key (defaults to GOOGLE_API_KEY env var)

38

- `credentials` (Any): Google authentication credentials object

39

- `temperature` (float): Generation temperature [0.0, 2.0], controls randomness

40

- `top_p` (Optional[float]): Nucleus sampling parameter [0.0, 1.0]

41

- `top_k` (Optional[int]): Top-k sampling parameter for vocabulary selection

42

- `max_output_tokens` (Optional[int]): Maximum tokens in response

43

- `n` (int): Number of completions to generate (default: 1)

44

- `max_retries` (int): Maximum retry attempts for failed requests (default: 6)

45

- `timeout` (Optional[float]): Request timeout in seconds

46

- `client_options` (Optional[Dict]): API client configuration options

47

- `transport` (Optional[str]): Transport method ["rest", "grpc", "grpc_asyncio"]

48

- `additional_headers` (Optional[Dict[str, str]]): Additional HTTP headers

49

- `response_modalities` (Optional[List[Modality]]): Response output modalities

50

- `thinking_budget` (Optional[int]): Thinking budget in tokens for reasoning

51

- `include_thoughts` (Optional[bool]): Include reasoning thoughts in response

52

- `safety_settings` (Optional[Dict[HarmCategory, HarmBlockThreshold]]): Content safety configuration

53

54

### Core Methods

55

56

#### Text Generation

57

58

```python { .api }

59

def invoke(

60

self,

61

input: Union[str, List[BaseMessage]],

62

config: Optional[RunnableConfig] = None,

63

*,

64

stop: Optional[List[str]] = None,

65

**kwargs: Any

66

) -> str

67

```

68

69

Generate text completion for the given input.

70

71

**Parameters:**

72

- `input`: Input text prompt or list of messages

73

- `config`: Optional run configuration

74

- `stop`: List of stop sequences to end generation

75

- `**kwargs`: Additional generation parameters

76

77

**Returns:** Generated text as string

78

79

```python { .api }

80

async def ainvoke(

81

self,

82

input: Union[str, List[BaseMessage]],

83

config: Optional[RunnableConfig] = None,

84

**kwargs: Any

85

) -> str

86

```

87

88

Async version of invoke().

89

90

#### Streaming

91

92

```python { .api }

93

def stream(

94

self,

95

input: Union[str, List[BaseMessage]],

96

config: Optional[RunnableConfig] = None,

97

*,

98

stop: Optional[List[str]] = None,

99

**kwargs: Any

100

) -> Iterator[str]

101

```

102

103

Stream text generation as chunks.

104

105

**Parameters:**

106

- `input`: Input text prompt or list of messages

107

- `config`: Optional run configuration

108

- `stop`: List of stop sequences

109

- `**kwargs`: Additional generation parameters

110

111

**Returns:** Iterator of text chunks

112

113

```python { .api }

114

async def astream(

115

self,

116

input: Union[str, List[BaseMessage]],

117

config: Optional[RunnableConfig] = None,

118

**kwargs: Any

119

) -> AsyncIterator[str]

120

```

121

122

Async version of stream().

123

124

### Utility Methods

125

126

```python { .api }

127

def get_num_tokens(self, text: str) -> int

128

```

129

130

Estimate token count for input text.

131

132

**Parameters:**

133

- `text` (str): Input text to count tokens for

134

135

**Returns:** Estimated token count

136

137

## Usage Examples

138

139

### Basic Text Generation

140

141

```python

142

from langchain_google_genai import GoogleGenerativeAI

143

144

# Initialize LLM

145

llm = GoogleGenerativeAI(model="gemini-2.5-pro")

146

147

# Generate text completion

148

result = llm.invoke("Once upon a time in a land of artificial intelligence")

149

print(result)

150

```

151

152

### Streaming Generation

153

154

```python

155

# Stream text as it's generated

156

for chunk in llm.stream("Write a short story about robots learning to paint"):

157

print(chunk, end="", flush=True)

158

print() # New line after streaming

159

```

160

161

### Temperature Control

162

163

```python

164

# Creative writing with higher temperature

165

creative_llm = GoogleGenerativeAI(

166

model="gemini-2.5-pro",

167

temperature=1.2 # More creative/random

168

)

169

170

creative_text = creative_llm.invoke("Describe a futuristic city")

171

172

# Factual content with lower temperature

173

factual_llm = GoogleGenerativeAI(

174

model="gemini-2.5-pro",

175

temperature=0.1 # More focused/deterministic

176

)

177

178

factual_text = factual_llm.invoke("Explain photosynthesis")

179

```

180

181

### Token Limits and Sampling

182

183

```python

184

# Configure generation parameters

185

llm = GoogleGenerativeAI(

186

model="gemini-2.5-pro",

187

max_output_tokens=500, # Limit response length

188

top_p=0.8, # Nucleus sampling

189

top_k=40 # Top-k sampling

190

)

191

192

result = llm.invoke("Write a summary of machine learning")

193

```

194

195

### Safety Settings

196

197

```python

198

from langchain_google_genai import HarmCategory, HarmBlockThreshold

199

200

# Configure content safety

201

safe_llm = GoogleGenerativeAI(

202

model="gemini-2.5-pro",

203

safety_settings={

204

HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,

205

HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,

206

HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,

207

}

208

)

209

210

result = safe_llm.invoke("Generate helpful and safe content")

211

```

212

213

### Async Usage

214

215

```python

216

import asyncio

217

218

async def generate_multiple():

219

llm = GoogleGenerativeAI(model="gemini-2.5-pro")

220

221

# Generate multiple completions concurrently

222

tasks = [

223

llm.ainvoke("Write about space exploration"),

224

llm.ainvoke("Write about ocean conservation"),

225

llm.ainvoke("Write about renewable energy")

226

]

227

228

results = await asyncio.gather(*tasks)

229

230

for i, result in enumerate(results, 1):

231

print(f"Result {i}: {result[:100]}...")

232

233

# Run async example

234

asyncio.run(generate_multiple())

235

```

236

237

### Stop Sequences

238

239

```python

240

# Use stop sequences to control generation

241

llm = GoogleGenerativeAI(model="gemini-2.5-pro")

242

243

result = llm.invoke(

244

"List the planets in our solar system:\n1.",

245

stop=["\n\n", "10."] # Stop at double newline or item 10

246

)

247

print(result)

248

```

249

250

### Custom Client Configuration

251

252

```python

253

# Configure API client options

254

llm = GoogleGenerativeAI(

255

model="gemini-2.5-pro",

256

client_options={

257

"api_endpoint": "https://generativelanguage.googleapis.com"

258

},

259

transport="rest", # Use REST instead of gRPC

260

additional_headers={

261

"User-Agent": "MyApp/1.0"

262

},

263

timeout=30.0 # 30 second timeout

264

)

265

266

result = llm.invoke("Generate content with custom configuration")

267

```

268

269

### Integration with LangChain

270

271

```python

272

from langchain_core.prompts import PromptTemplate

273

from langchain_core.output_parsers import StrOutputParser

274

275

# Create a simple chain

276

llm = GoogleGenerativeAI(model="gemini-2.5-pro")

277

278

prompt = PromptTemplate.from_template(

279

"Write a {style} poem about {topic}"

280

)

281

282

output_parser = StrOutputParser()

283

284

# Build chain

285

chain = prompt | llm | output_parser

286

287

# Use chain

288

result = chain.invoke({

289

"style": "haiku",

290

"topic": "artificial intelligence"

291

})

292

print(result)

293

```

294

295

### Token Counting

296

297

```python

298

# Estimate tokens before generation

299

llm = GoogleGenerativeAI(model="gemini-2.5-pro")

300

301

prompt = "Explain quantum computing in detail"

302

token_count = llm.get_num_tokens(prompt)

303

304

print(f"Input tokens: {token_count}")

305

306

# Generate with awareness of token usage

307

if token_count < 1000: # Stay within limits

308

result = llm.invoke(prompt)

309

print(f"Generated: {result[:100]}...")

310

else:

311

print("Prompt too long, consider shortening")

312

```

313

314

## Error Handling

315

316

Handle errors appropriately for LLM operations:

317

318

```python

319

from langchain_google_genai import GoogleGenerativeAI

320

321

try:

322

llm = GoogleGenerativeAI(model="gemini-2.5-pro")

323

result = llm.invoke("Your prompt here")

324

print(result)

325

except Exception as e:

326

if "safety" in str(e).lower():

327

print(f"Content blocked by safety filters: {e}")

328

elif "model" in str(e).lower():

329

print(f"Model error: {e}")

330

else:

331

print(f"Generation error: {e}")

332

```

333

334

## Model Recommendations

335

336

- **gemini-2.5-pro**: Best for complex reasoning, creative writing, and detailed analysis

337

- **gemini-2.0-flash**: Faster inference for simpler tasks and real-time applications

338

- **gemini-pro**: General-purpose model for balanced performance and cost