0
# LangChain Groq
1
2
An integration package connecting Groq's Language Processing Unit (LPU) with LangChain for high-performance AI inference. This package provides seamless access to Groq's deterministic, single-core streaming architecture that delivers predictable and repeatable performance for GenAI inference workloads.
3
4
## Package Information
5
6
- **Package Name**: langchain-groq
7
- **Language**: Python
8
- **Installation**: `pip install langchain-groq`
9
- **Dependencies**: langchain-core, groq
10
- **Python Version**: >=3.9
11
12
## Core Imports
13
14
```python
15
from langchain_groq import ChatGroq
16
```
17
18
Import version information:
19
20
```python
21
from langchain_groq import __version__
22
```
23
24
## Basic Usage
25
26
```python
27
from langchain_groq import ChatGroq
28
from langchain_core.messages import HumanMessage, SystemMessage
29
30
# Basic initialization
31
llm = ChatGroq(
32
model="llama-3.1-8b-instant",
33
temperature=0.0,
34
api_key="your-groq-api-key" # or set GROQ_API_KEY env var
35
)
36
37
# Simple conversation
38
messages = [
39
SystemMessage(content="You are a helpful assistant."),
40
HumanMessage(content="What is the capital of France?")
41
]
42
43
response = llm.invoke(messages)
44
print(response.content)
45
46
# Streaming response
47
for chunk in llm.stream(messages):
48
print(chunk.content, end="", flush=True)
49
```
50
51
## Architecture
52
53
LangChain Groq integrates with the LangChain ecosystem through the standard `BaseChatModel` interface, providing:
54
55
- **LangChain Compatibility**: Full integration with LangChain's Runnable interface, supporting chaining, composition, and streaming
56
- **Groq LPU Integration**: Direct connection to Groq's deterministic Language Processing Units for consistent, high-performance inference
57
- **Tool Calling Support**: Native function calling capabilities using OpenAI-compatible tool schemas
58
- **Structured Output**: Built-in support for generating responses conforming to specific schemas via function calling or JSON mode
59
- **Async Support**: Full asynchronous operation support for high-throughput applications
60
- **Streaming**: Real-time token streaming with predictable performance characteristics
61
62
The package follows LangChain's standard patterns while leveraging Groq's unique deterministic architecture for reproducible results across inference runs.
63
64
## Environment Variables
65
66
- **GROQ_API_KEY**: Required API key for Groq service
67
- **GROQ_API_BASE**: Optional custom API base URL
68
- **GROQ_PROXY**: Optional proxy configuration
69
70
## Capabilities
71
72
### Chat Model Initialization
73
74
Initialize the ChatGroq model with comprehensive configuration options for performance, behavior, and API settings.
75
76
```python { .api }
77
class ChatGroq:
78
def __init__(
79
self,
80
model: str,
81
temperature: float = 0.7,
82
max_tokens: Optional[int] = None,
83
stop: Optional[Union[List[str], str]] = None,
84
reasoning_format: Optional[Literal["parsed", "raw", "hidden"]] = None,
85
reasoning_effort: Optional[str] = None,
86
service_tier: Literal["on_demand", "flex", "auto"] = "on_demand",
87
api_key: Optional[str] = None,
88
base_url: Optional[str] = None,
89
timeout: Union[float, Tuple[float, float], Any, None] = None,
90
max_retries: int = 2,
91
streaming: bool = False,
92
n: int = 1,
93
model_kwargs: Dict[str, Any] = None,
94
default_headers: Union[Mapping[str, str], None] = None,
95
default_query: Union[Mapping[str, object], None] = None,
96
http_client: Union[Any, None] = None,
97
http_async_client: Union[Any, None] = None,
98
**kwargs: Any
99
) -> None:
100
"""
101
Initialize ChatGroq model.
102
103
Parameters:
104
- model: Name of Groq model (e.g., "llama-3.1-8b-instant")
105
Note: Aliased to internal field 'model_name'
106
- temperature: Sampling temperature (0.0 to 1.0)
107
- max_tokens: Maximum tokens to generate
108
- stop: Stop sequences (string or list of strings)
109
Note: Aliased to internal field 'stop_sequences'
110
- reasoning_format: Format for reasoning output ("parsed", "raw", "hidden")
111
- reasoning_effort: Level of reasoning effort
112
- service_tier: Service tier ("on_demand", "flex", "auto")
113
- api_key: Groq API key (defaults to GROQ_API_KEY env var)
114
Note: Aliased to internal field 'groq_api_key'
115
- base_url: Custom API base URL
116
Note: Aliased to internal field 'groq_api_base'
117
- timeout: Request timeout in seconds
118
Note: Aliased to internal field 'request_timeout'
119
- max_retries: Maximum retry attempts
120
- streaming: Enable streaming responses
121
- n: Number of completions to generate
122
- model_kwargs: Additional model parameters
123
- default_headers: Default HTTP headers
124
- default_query: Default query parameters
125
- http_client: Custom httpx client for sync requests
126
- http_async_client: Custom httpx client for async requests
127
"""
128
```
129
130
### Synchronous Chat Operations
131
132
Generate responses using synchronous methods for immediate results and batch processing.
133
134
```python { .api }
135
def invoke(
136
self,
137
input: LanguageModelInput,
138
config: Optional[RunnableConfig] = None,
139
**kwargs: Any
140
) -> BaseMessage:
141
"""
142
Generate a single response from input messages.
143
144
Parameters:
145
- input: Messages (list of BaseMessage) or string
146
- config: Runtime configuration
147
- **kwargs: Additional parameters
148
149
Returns:
150
BaseMessage: Generated response message
151
"""
152
153
def batch(
154
self,
155
inputs: List[LanguageModelInput],
156
config: Optional[Union[RunnableConfig, List[RunnableConfig]]] = None,
157
**kwargs: Any
158
) -> List[BaseMessage]:
159
"""
160
Process multiple inputs in batch.
161
162
Parameters:
163
- inputs: List of message sequences or strings
164
- config: Runtime configuration(s)
165
- **kwargs: Additional parameters
166
167
Returns:
168
List[BaseMessage]: List of generated responses
169
"""
170
171
def stream(
172
self,
173
input: LanguageModelInput,
174
config: Optional[RunnableConfig] = None,
175
**kwargs: Any
176
) -> Iterator[BaseMessageChunk]:
177
"""
178
Stream response tokens as they're generated.
179
180
Parameters:
181
- input: Messages (list of BaseMessage) or string
182
- config: Runtime configuration
183
- **kwargs: Additional parameters
184
185
Yields:
186
BaseMessageChunk: Individual response chunks
187
"""
188
189
def generate(
190
self,
191
messages: List[List[BaseMessage]],
192
stop: Optional[List[str]] = None,
193
callbacks: Optional[Union[List[BaseCallbackHandler], BaseCallbackManager]] = None,
194
**kwargs: Any
195
) -> LLMResult:
196
"""
197
Legacy generate method returning detailed results.
198
199
Parameters:
200
- messages: List of message sequences
201
- stop: Stop sequences
202
- callbacks: Callback handlers
203
- **kwargs: Additional parameters
204
205
Returns:
206
LLMResult: Detailed generation results with metadata
207
"""
208
```
209
210
### Asynchronous Chat Operations
211
212
Generate responses using asynchronous methods for concurrent processing and high-throughput applications.
213
214
```python { .api }
215
async def ainvoke(
216
self,
217
input: LanguageModelInput,
218
config: Optional[RunnableConfig] = None,
219
**kwargs: Any
220
) -> BaseMessage:
221
"""
222
Asynchronously generate a single response.
223
224
Parameters:
225
- input: Messages (list of BaseMessage) or string
226
- config: Runtime configuration
227
- **kwargs: Additional parameters
228
229
Returns:
230
BaseMessage: Generated response message
231
"""
232
233
async def abatch(
234
self,
235
inputs: List[LanguageModelInput],
236
config: Optional[Union[RunnableConfig, List[RunnableConfig]]] = None,
237
**kwargs: Any
238
) -> List[BaseMessage]:
239
"""
240
Asynchronously process multiple inputs in batch.
241
242
Parameters:
243
- inputs: List of message sequences or strings
244
- config: Runtime configuration(s)
245
- **kwargs: Additional parameters
246
247
Returns:
248
List[BaseMessage]: List of generated responses
249
"""
250
251
async def astream(
252
self,
253
input: LanguageModelInput,
254
config: Optional[RunnableConfig] = None,
255
**kwargs: Any
256
) -> AsyncIterator[BaseMessageChunk]:
257
"""
258
Asynchronously stream response tokens.
259
260
Parameters:
261
- input: Messages (list of BaseMessage) or string
262
- config: Runtime configuration
263
- **kwargs: Additional parameters
264
265
Yields:
266
BaseMessageChunk: Individual response chunks
267
"""
268
269
async def agenerate(
270
self,
271
messages: List[List[BaseMessage]],
272
stop: Optional[List[str]] = None,
273
callbacks: Optional[Union[List[BaseCallbackHandler], BaseCallbackManager]] = None,
274
**kwargs: Any
275
) -> LLMResult:
276
"""
277
Asynchronously generate with detailed results.
278
279
Parameters:
280
- messages: List of message sequences
281
- stop: Stop sequences
282
- callbacks: Callback handlers
283
- **kwargs: Additional parameters
284
285
Returns:
286
LLMResult: Detailed generation results with metadata
287
"""
288
```
289
290
### Tool Integration
291
292
Bind tools and functions to enable function calling capabilities with the Groq model.
293
294
```python { .api }
295
def bind_tools(
296
self,
297
tools: Sequence[Union[Dict[str, Any], Type[BaseModel], Callable, BaseTool]],
298
*,
299
tool_choice: Optional[Union[Dict, str, Literal["auto", "any", "none"], bool]] = None,
300
**kwargs: Any
301
) -> Runnable[LanguageModelInput, BaseMessage]:
302
"""
303
Bind tools for function calling.
304
305
Parameters:
306
- tools: List of tool definitions (Pydantic models, functions, or dicts)
307
- tool_choice: Tool selection strategy
308
- "auto": Model chooses whether to call tools
309
- "any"/"required": Model must call a tool
310
- "none": Disable tool calling
311
- str: Specific tool name to call
312
- bool: True requires single tool call
313
- dict: {"type": "function", "function": {"name": "tool_name"}}
314
- **kwargs: Additional binding parameters
315
316
Returns:
317
Runnable: Model with bound tools
318
"""
319
320
def bind_functions(
321
self,
322
functions: Sequence[Union[Dict[str, Any], Type[BaseModel], Callable, BaseTool]],
323
function_call: Optional[Union[Dict, str, Literal["auto", "none"]]] = None,
324
**kwargs: Any
325
) -> Runnable[LanguageModelInput, BaseMessage]:
326
"""
327
[DEPRECATED] Bind functions for function calling. Use bind_tools instead.
328
329
This method is deprecated since version 0.2.1 and will be removed in 1.0.0.
330
Use bind_tools() for new development.
331
332
Parameters:
333
- functions: List of function definitions (dicts, Pydantic models, callables, or tools)
334
- function_call: Function call strategy
335
- "auto": Model chooses whether to call function
336
- "none": Disable function calling
337
- str: Specific function name to call
338
- dict: {"name": "function_name"}
339
- **kwargs: Additional binding parameters
340
341
Returns:
342
Runnable: Model with bound functions
343
"""
344
```
345
346
### Structured Output
347
348
Generate responses conforming to specific schemas using function calling or JSON mode.
349
350
```python { .api }
351
def with_structured_output(
352
self,
353
schema: Optional[Union[Dict, Type[BaseModel]]] = None,
354
*,
355
method: Literal["function_calling", "json_mode"] = "function_calling",
356
include_raw: bool = False,
357
**kwargs: Any
358
) -> Runnable[LanguageModelInput, Union[Dict, BaseModel]]:
359
"""
360
Create model that outputs structured data.
361
362
Parameters:
363
- schema: Output schema (Pydantic model, TypedDict, or OpenAI function schema)
364
- method: Generation method
365
- "function_calling": Use function calling API
366
- "json_mode": Use JSON mode (requires schema instructions in prompt)
367
- include_raw: Include raw response alongside parsed output
368
- **kwargs: Additional parameters
369
370
Returns:
371
Runnable: Model that returns structured output
372
373
If include_raw=False:
374
- Returns: Instance of schema type (if Pydantic) or dict
375
If include_raw=True:
376
- Returns: Dict with keys 'raw', 'parsed', 'parsing_error'
377
"""
378
```
379
380
### Model Properties
381
382
Access model configuration and type information.
383
384
```python { .api }
385
@property
386
def _llm_type(self) -> str:
387
"""
388
Return model type identifier for LangChain integration.
389
390
Returns:
391
str: Always returns "groq-chat"
392
"""
393
394
@property
395
def lc_secrets(self) -> Dict[str, str]:
396
"""
397
Return secret field mappings for serialization.
398
399
Returns:
400
Dict[str, str]: Mapping of secret fields to environment variables
401
{"groq_api_key": "GROQ_API_KEY"}
402
"""
403
404
@classmethod
405
def is_lc_serializable(cls) -> bool:
406
"""
407
Check if model supports LangChain serialization.
408
409
Returns:
410
bool: Always returns True
411
"""
412
```
413
414
## Usage Examples
415
416
### Tool Calling Example
417
418
```python
419
from langchain_groq import ChatGroq
420
from pydantic import BaseModel, Field
421
422
class WeatherTool(BaseModel):
423
"""Get weather information for a location."""
424
location: str = Field(description="City and state, e.g. 'San Francisco, CA'")
425
426
llm = ChatGroq(model="llama-3.1-8b-instant")
427
llm_with_tools = llm.bind_tools([WeatherTool], tool_choice="auto")
428
429
response = llm_with_tools.invoke("What's the weather in New York?")
430
print(response.tool_calls)
431
```
432
433
### Structured Output Example
434
435
```python
436
from langchain_groq import ChatGroq
437
from pydantic import BaseModel, Field
438
from typing import Optional
439
440
class PersonInfo(BaseModel):
441
"""Extract person information from text."""
442
name: str = Field(description="Person's full name")
443
age: Optional[int] = Field(description="Person's age if mentioned")
444
occupation: Optional[str] = Field(description="Person's job or profession")
445
446
llm = ChatGroq(model="llama-3.1-8b-instant")
447
structured_llm = llm.with_structured_output(PersonInfo)
448
449
result = structured_llm.invoke("John Smith is a 35-year-old software engineer.")
450
print(f"Name: {result.name}, Age: {result.age}, Job: {result.occupation}")
451
```
452
453
### Reasoning Model Example
454
455
```python
456
from langchain_groq import ChatGroq
457
from langchain_core.messages import HumanMessage, SystemMessage
458
459
# Use reasoning-capable model with parsed reasoning format
460
llm = ChatGroq(
461
model="deepseek-r1-distill-llama-70b",
462
reasoning_format="parsed"
463
)
464
465
messages = [
466
SystemMessage(content="You are a math tutor. Show your reasoning."),
467
HumanMessage(content="If a train travels 120 miles in 2 hours, what's its average speed?")
468
]
469
470
response = llm.invoke(messages)
471
print("Answer:", response.content)
472
print("Reasoning:", response.additional_kwargs.get("reasoning_content", "No reasoning available"))
473
```
474
475
### Streaming with Token Usage
476
477
```python
478
from langchain_groq import ChatGroq
479
480
llm = ChatGroq(model="llama-3.1-8b-instant")
481
messages = [{"role": "user", "content": "Write a short poem about coding."}]
482
483
full_response = None
484
for chunk in llm.stream(messages):
485
print(chunk.content, end="", flush=True)
486
if full_response is None:
487
full_response = chunk
488
else:
489
full_response += chunk
490
491
print("\n\nToken usage:", full_response.usage_metadata)
492
print("Response metadata:", full_response.response_metadata)
493
```
494
495
## Response Metadata
496
497
ChatGroq responses include comprehensive metadata for monitoring and optimization:
498
499
```python { .api }
500
# Response metadata structure
501
{
502
"token_usage": {
503
"completion_tokens": int, # Output tokens used
504
"prompt_tokens": int, # Input tokens used
505
"total_tokens": int, # Total tokens used
506
"completion_time": float, # Time for completion
507
"prompt_time": float, # Time for prompt processing
508
"queue_time": Optional[float], # Time spent in queue
509
"total_time": float # Total processing time
510
},
511
"model_name": str, # Model used for generation
512
"system_fingerprint": str, # System configuration fingerprint
513
"finish_reason": str, # Completion reason ("stop", "length", etc.)
514
"service_tier": str, # Service tier used
515
"reasoning_effort": Optional[str] # Reasoning effort level (if applicable)
516
}
517
```
518
519
## Error Handling
520
521
The package handles various error conditions and provides clear error messages:
522
523
```python
524
from langchain_groq import ChatGroq
525
from groq import BadRequestError
526
527
try:
528
llm = ChatGroq(model="invalid-model")
529
response = llm.invoke("Hello")
530
except BadRequestError as e:
531
print(f"API Error: {e}")
532
except ValueError as e:
533
print(f"Configuration Error: {e}")
534
```
535
536
Common validation errors:
537
- `n` must be >= 1
538
- `n` must be 1 when streaming is enabled
539
- Missing API key when GROQ_API_KEY environment variable not set
540
- Invalid model name or unavailable model
541
542
## Types
543
544
```python { .api }
545
# Core types used throughout the API
546
from typing import Any, Callable, Dict, List, Literal, Optional, Sequence, Tuple, Union
547
from typing_extensions import TypedDict
548
from langchain_core.messages import BaseMessage, BaseMessageChunk
549
from langchain_core.outputs import ChatResult, LLMResult
550
from langchain_core.language_models import LanguageModelInput
551
from langchain_core.runnables import Runnable, RunnableConfig
552
from langchain_core.callbacks import BaseCallbackHandler, BaseCallbackManager
553
from langchain_core.tools import BaseTool
554
from pydantic import BaseModel, SecretStr
555
from collections.abc import AsyncIterator, Iterator, Mapping
556
557
# Message types for input
558
LanguageModelInput = Union[
559
str, # Simple string input
560
List[BaseMessage], # List of messages
561
# ... other LangChain input types
562
]
563
564
# Service tier options
565
ServiceTier = Literal["on_demand", "flex", "auto"]
566
567
# Reasoning format options
568
ReasoningFormat = Literal["parsed", "raw", "hidden"]
569
570
# Tool choice options
571
ToolChoice = Union[
572
Dict, # {"type": "function", "function": {"name": "tool_name"}}
573
str, # Tool name or "auto"/"any"/"none"
574
Literal["auto", "any", "none"],
575
bool # True for single tool requirement
576
]
577
```