or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

index.md
tile.json

tessl/pypi-langchain-groq

An integration package connecting Groq's Language Processing Unit (LPU) with LangChain for high-performance AI inference

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/langchain-groq@0.3.x

To install, run

npx @tessl/cli install tessl/pypi-langchain-groq@0.3.0

index.mddocs/

LangChain Groq

An integration package connecting Groq's Language Processing Unit (LPU) with LangChain for high-performance AI inference. This package provides seamless access to Groq's deterministic, single-core streaming architecture that delivers predictable and repeatable performance for GenAI inference workloads.

Package Information

  • Package Name: langchain-groq
  • Language: Python
  • Installation: pip install langchain-groq
  • Dependencies: langchain-core, groq
  • Python Version: >=3.9

Core Imports

from langchain_groq import ChatGroq

Import version information:

from langchain_groq import __version__

Basic Usage

from langchain_groq import ChatGroq
from langchain_core.messages import HumanMessage, SystemMessage

# Basic initialization
llm = ChatGroq(
    model="llama-3.1-8b-instant",
    temperature=0.0,
    api_key="your-groq-api-key"  # or set GROQ_API_KEY env var
)

# Simple conversation
messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="What is the capital of France?")
]

response = llm.invoke(messages)
print(response.content)

# Streaming response
for chunk in llm.stream(messages):
    print(chunk.content, end="", flush=True)

Architecture

LangChain Groq integrates with the LangChain ecosystem through the standard BaseChatModel interface, providing:

  • LangChain Compatibility: Full integration with LangChain's Runnable interface, supporting chaining, composition, and streaming
  • Groq LPU Integration: Direct connection to Groq's deterministic Language Processing Units for consistent, high-performance inference
  • Tool Calling Support: Native function calling capabilities using OpenAI-compatible tool schemas
  • Structured Output: Built-in support for generating responses conforming to specific schemas via function calling or JSON mode
  • Async Support: Full asynchronous operation support for high-throughput applications
  • Streaming: Real-time token streaming with predictable performance characteristics

The package follows LangChain's standard patterns while leveraging Groq's unique deterministic architecture for reproducible results across inference runs.

Environment Variables

  • GROQ_API_KEY: Required API key for Groq service
  • GROQ_API_BASE: Optional custom API base URL
  • GROQ_PROXY: Optional proxy configuration

Capabilities

Chat Model Initialization

Initialize the ChatGroq model with comprehensive configuration options for performance, behavior, and API settings.

class ChatGroq:
    def __init__(
        self,
        model: str,
        temperature: float = 0.7,
        max_tokens: Optional[int] = None,
        stop: Optional[Union[List[str], str]] = None,
        reasoning_format: Optional[Literal["parsed", "raw", "hidden"]] = None,
        reasoning_effort: Optional[str] = None,
        service_tier: Literal["on_demand", "flex", "auto"] = "on_demand",
        api_key: Optional[str] = None,
        base_url: Optional[str] = None,
        timeout: Union[float, Tuple[float, float], Any, None] = None,
        max_retries: int = 2,
        streaming: bool = False,
        n: int = 1,
        model_kwargs: Dict[str, Any] = None,
        default_headers: Union[Mapping[str, str], None] = None,
        default_query: Union[Mapping[str, object], None] = None,
        http_client: Union[Any, None] = None,
        http_async_client: Union[Any, None] = None,
        **kwargs: Any
    ) -> None:
        """
        Initialize ChatGroq model.

        Parameters:
        - model: Name of Groq model (e.g., "llama-3.1-8b-instant")
                 Note: Aliased to internal field 'model_name'
        - temperature: Sampling temperature (0.0 to 1.0)
        - max_tokens: Maximum tokens to generate
        - stop: Stop sequences (string or list of strings)
                Note: Aliased to internal field 'stop_sequences'
        - reasoning_format: Format for reasoning output ("parsed", "raw", "hidden")
        - reasoning_effort: Level of reasoning effort
        - service_tier: Service tier ("on_demand", "flex", "auto")
        - api_key: Groq API key (defaults to GROQ_API_KEY env var)
                   Note: Aliased to internal field 'groq_api_key'
        - base_url: Custom API base URL
                    Note: Aliased to internal field 'groq_api_base'
        - timeout: Request timeout in seconds
                   Note: Aliased to internal field 'request_timeout'
        - max_retries: Maximum retry attempts
        - streaming: Enable streaming responses
        - n: Number of completions to generate
        - model_kwargs: Additional model parameters
        - default_headers: Default HTTP headers
        - default_query: Default query parameters
        - http_client: Custom httpx client for sync requests
        - http_async_client: Custom httpx client for async requests
        """

Synchronous Chat Operations

Generate responses using synchronous methods for immediate results and batch processing.

def invoke(
    self, 
    input: LanguageModelInput, 
    config: Optional[RunnableConfig] = None, 
    **kwargs: Any
) -> BaseMessage:
    """
    Generate a single response from input messages.

    Parameters:
    - input: Messages (list of BaseMessage) or string
    - config: Runtime configuration
    - **kwargs: Additional parameters

    Returns:
    BaseMessage: Generated response message
    """

def batch(
    self,
    inputs: List[LanguageModelInput],
    config: Optional[Union[RunnableConfig, List[RunnableConfig]]] = None,
    **kwargs: Any
) -> List[BaseMessage]:
    """
    Process multiple inputs in batch.

    Parameters:
    - inputs: List of message sequences or strings
    - config: Runtime configuration(s)
    - **kwargs: Additional parameters

    Returns:
    List[BaseMessage]: List of generated responses
    """

def stream(
    self,
    input: LanguageModelInput,
    config: Optional[RunnableConfig] = None,
    **kwargs: Any
) -> Iterator[BaseMessageChunk]:
    """
    Stream response tokens as they're generated.

    Parameters:
    - input: Messages (list of BaseMessage) or string
    - config: Runtime configuration
    - **kwargs: Additional parameters

    Yields:
    BaseMessageChunk: Individual response chunks
    """

def generate(
    self,
    messages: List[List[BaseMessage]],
    stop: Optional[List[str]] = None,
    callbacks: Optional[Union[List[BaseCallbackHandler], BaseCallbackManager]] = None,
    **kwargs: Any
) -> LLMResult:
    """
    Legacy generate method returning detailed results.

    Parameters:
    - messages: List of message sequences
    - stop: Stop sequences
    - callbacks: Callback handlers
    - **kwargs: Additional parameters

    Returns:
    LLMResult: Detailed generation results with metadata
    """

Asynchronous Chat Operations

Generate responses using asynchronous methods for concurrent processing and high-throughput applications.

async def ainvoke(
    self,
    input: LanguageModelInput,
    config: Optional[RunnableConfig] = None,
    **kwargs: Any
) -> BaseMessage:
    """
    Asynchronously generate a single response.

    Parameters:
    - input: Messages (list of BaseMessage) or string
    - config: Runtime configuration
    - **kwargs: Additional parameters

    Returns:
    BaseMessage: Generated response message
    """

async def abatch(
    self,
    inputs: List[LanguageModelInput],
    config: Optional[Union[RunnableConfig, List[RunnableConfig]]] = None,
    **kwargs: Any
) -> List[BaseMessage]:
    """
    Asynchronously process multiple inputs in batch.

    Parameters:
    - inputs: List of message sequences or strings
    - config: Runtime configuration(s)
    - **kwargs: Additional parameters

    Returns:
    List[BaseMessage]: List of generated responses
    """

async def astream(
    self,
    input: LanguageModelInput,
    config: Optional[RunnableConfig] = None,
    **kwargs: Any
) -> AsyncIterator[BaseMessageChunk]:
    """
    Asynchronously stream response tokens.

    Parameters:
    - input: Messages (list of BaseMessage) or string
    - config: Runtime configuration
    - **kwargs: Additional parameters

    Yields:
    BaseMessageChunk: Individual response chunks
    """

async def agenerate(
    self,
    messages: List[List[BaseMessage]],
    stop: Optional[List[str]] = None,
    callbacks: Optional[Union[List[BaseCallbackHandler], BaseCallbackManager]] = None,
    **kwargs: Any
) -> LLMResult:
    """
    Asynchronously generate with detailed results.

    Parameters:
    - messages: List of message sequences
    - stop: Stop sequences
    - callbacks: Callback handlers
    - **kwargs: Additional parameters

    Returns:
    LLMResult: Detailed generation results with metadata
    """

Tool Integration

Bind tools and functions to enable function calling capabilities with the Groq model.

def bind_tools(
    self,
    tools: Sequence[Union[Dict[str, Any], Type[BaseModel], Callable, BaseTool]],
    *,
    tool_choice: Optional[Union[Dict, str, Literal["auto", "any", "none"], bool]] = None,
    **kwargs: Any
) -> Runnable[LanguageModelInput, BaseMessage]:
    """
    Bind tools for function calling.

    Parameters:
    - tools: List of tool definitions (Pydantic models, functions, or dicts)
    - tool_choice: Tool selection strategy
      - "auto": Model chooses whether to call tools
      - "any"/"required": Model must call a tool
      - "none": Disable tool calling
      - str: Specific tool name to call
      - bool: True requires single tool call
      - dict: {"type": "function", "function": {"name": "tool_name"}}
    - **kwargs: Additional binding parameters

    Returns:
    Runnable: Model with bound tools
    """

def bind_functions(
    self,
    functions: Sequence[Union[Dict[str, Any], Type[BaseModel], Callable, BaseTool]],
    function_call: Optional[Union[Dict, str, Literal["auto", "none"]]] = None,
    **kwargs: Any
) -> Runnable[LanguageModelInput, BaseMessage]:
    """
    [DEPRECATED] Bind functions for function calling. Use bind_tools instead.
    
    This method is deprecated since version 0.2.1 and will be removed in 1.0.0.
    Use bind_tools() for new development.

    Parameters:
    - functions: List of function definitions (dicts, Pydantic models, callables, or tools)
    - function_call: Function call strategy
      - "auto": Model chooses whether to call function
      - "none": Disable function calling
      - str: Specific function name to call
      - dict: {"name": "function_name"}
    - **kwargs: Additional binding parameters

    Returns:
    Runnable: Model with bound functions
    """

Structured Output

Generate responses conforming to specific schemas using function calling or JSON mode.

def with_structured_output(
    self,
    schema: Optional[Union[Dict, Type[BaseModel]]] = None,
    *,
    method: Literal["function_calling", "json_mode"] = "function_calling",
    include_raw: bool = False,
    **kwargs: Any
) -> Runnable[LanguageModelInput, Union[Dict, BaseModel]]:
    """
    Create model that outputs structured data.

    Parameters:
    - schema: Output schema (Pydantic model, TypedDict, or OpenAI function schema)
    - method: Generation method
      - "function_calling": Use function calling API
      - "json_mode": Use JSON mode (requires schema instructions in prompt)
    - include_raw: Include raw response alongside parsed output
    - **kwargs: Additional parameters

    Returns:
    Runnable: Model that returns structured output

    If include_raw=False:
      - Returns: Instance of schema type (if Pydantic) or dict
    If include_raw=True:
      - Returns: Dict with keys 'raw', 'parsed', 'parsing_error'
    """

Model Properties

Access model configuration and type information.

@property
def _llm_type(self) -> str:
    """
    Return model type identifier for LangChain integration.
    
    Returns:
        str: Always returns "groq-chat"
    """

@property
def lc_secrets(self) -> Dict[str, str]:
    """
    Return secret field mappings for serialization.
    
    Returns:
        Dict[str, str]: Mapping of secret fields to environment variables
                       {"groq_api_key": "GROQ_API_KEY"}
    """

@classmethod
def is_lc_serializable(cls) -> bool:
    """
    Check if model supports LangChain serialization.
    
    Returns:
        bool: Always returns True
    """

Usage Examples

Tool Calling Example

from langchain_groq import ChatGroq
from pydantic import BaseModel, Field

class WeatherTool(BaseModel):
    """Get weather information for a location."""
    location: str = Field(description="City and state, e.g. 'San Francisco, CA'")

llm = ChatGroq(model="llama-3.1-8b-instant")
llm_with_tools = llm.bind_tools([WeatherTool], tool_choice="auto")

response = llm_with_tools.invoke("What's the weather in New York?")
print(response.tool_calls)

Structured Output Example

from langchain_groq import ChatGroq
from pydantic import BaseModel, Field
from typing import Optional

class PersonInfo(BaseModel):
    """Extract person information from text."""
    name: str = Field(description="Person's full name")
    age: Optional[int] = Field(description="Person's age if mentioned")
    occupation: Optional[str] = Field(description="Person's job or profession")

llm = ChatGroq(model="llama-3.1-8b-instant")
structured_llm = llm.with_structured_output(PersonInfo)

result = structured_llm.invoke("John Smith is a 35-year-old software engineer.")
print(f"Name: {result.name}, Age: {result.age}, Job: {result.occupation}")

Reasoning Model Example

from langchain_groq import ChatGroq
from langchain_core.messages import HumanMessage, SystemMessage

# Use reasoning-capable model with parsed reasoning format
llm = ChatGroq(
    model="deepseek-r1-distill-llama-70b",
    reasoning_format="parsed"
)

messages = [
    SystemMessage(content="You are a math tutor. Show your reasoning."),
    HumanMessage(content="If a train travels 120 miles in 2 hours, what's its average speed?")
]

response = llm.invoke(messages)
print("Answer:", response.content)
print("Reasoning:", response.additional_kwargs.get("reasoning_content", "No reasoning available"))

Streaming with Token Usage

from langchain_groq import ChatGroq

llm = ChatGroq(model="llama-3.1-8b-instant")
messages = [{"role": "user", "content": "Write a short poem about coding."}]

full_response = None
for chunk in llm.stream(messages):
    print(chunk.content, end="", flush=True)
    if full_response is None:
        full_response = chunk
    else:
        full_response += chunk

print("\n\nToken usage:", full_response.usage_metadata)
print("Response metadata:", full_response.response_metadata)

Response Metadata

ChatGroq responses include comprehensive metadata for monitoring and optimization:

# Response metadata structure
{
    "token_usage": {
        "completion_tokens": int,      # Output tokens used
        "prompt_tokens": int,          # Input tokens used
        "total_tokens": int,           # Total tokens used
        "completion_time": float,      # Time for completion
        "prompt_time": float,          # Time for prompt processing
        "queue_time": Optional[float], # Time spent in queue
        "total_time": float           # Total processing time
    },
    "model_name": str,                # Model used for generation
    "system_fingerprint": str,        # System configuration fingerprint
    "finish_reason": str,             # Completion reason ("stop", "length", etc.)
    "service_tier": str,             # Service tier used
    "reasoning_effort": Optional[str] # Reasoning effort level (if applicable)
}

Error Handling

The package handles various error conditions and provides clear error messages:

from langchain_groq import ChatGroq
from groq import BadRequestError

try:
    llm = ChatGroq(model="invalid-model")
    response = llm.invoke("Hello")
except BadRequestError as e:
    print(f"API Error: {e}")
except ValueError as e:
    print(f"Configuration Error: {e}")

Common validation errors:

  • n must be >= 1
  • n must be 1 when streaming is enabled
  • Missing API key when GROQ_API_KEY environment variable not set
  • Invalid model name or unavailable model

Types

# Core types used throughout the API
from typing import Any, Callable, Dict, List, Literal, Optional, Sequence, Tuple, Union
from typing_extensions import TypedDict
from langchain_core.messages import BaseMessage, BaseMessageChunk
from langchain_core.outputs import ChatResult, LLMResult
from langchain_core.language_models import LanguageModelInput
from langchain_core.runnables import Runnable, RunnableConfig
from langchain_core.callbacks import BaseCallbackHandler, BaseCallbackManager
from langchain_core.tools import BaseTool
from pydantic import BaseModel, SecretStr
from collections.abc import AsyncIterator, Iterator, Mapping

# Message types for input
LanguageModelInput = Union[
    str,                          # Simple string input
    List[BaseMessage],           # List of messages
    # ... other LangChain input types
]

# Service tier options
ServiceTier = Literal["on_demand", "flex", "auto"]

# Reasoning format options  
ReasoningFormat = Literal["parsed", "raw", "hidden"]

# Tool choice options
ToolChoice = Union[
    Dict,                        # {"type": "function", "function": {"name": "tool_name"}}
    str,                        # Tool name or "auto"/"any"/"none"  
    Literal["auto", "any", "none"],
    bool                        # True for single tool requirement
]