or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

cache-testing.mdconfiguration.mdindex.mdintegration-tests.mdkey-value-stores.mdunit-tests.mdvector-stores.md

integration-tests.mddocs/

0

# Integration Testing

1

2

Comprehensive test classes for full functionality verification including real API calls, streaming, tool calling, structured output, and multimodal inputs. Integration tests verify complete feature sets and real-world usage patterns with external services.

3

4

## Capabilities

5

6

### Chat Model Integration Tests

7

8

Comprehensive integration testing for chat models with 40+ test methods covering all aspects of chat model functionality.

9

10

```python { .api }

11

from langchain_tests.integration_tests import ChatModelIntegrationTests

12

13

class ChatModelIntegrationTests(ChatModelTests):

14

"""Integration tests for chat models with comprehensive functionality testing."""

15

16

# Inherits all configuration from ChatModelTests

17

18

# Basic invocation tests

19

def test_invoke(self) -> None:

20

"""Test basic model invocation with simple prompts."""

21

22

def test_ainvoke(self) -> None:

23

"""Test asynchronous model invocation."""

24

25

# Streaming tests

26

def test_stream(self) -> None:

27

"""Test streaming responses from the model."""

28

29

def test_astream(self) -> None:

30

"""Test asynchronous streaming responses."""

31

32

# Batch processing tests

33

def test_batch(self) -> None:

34

"""Test batch processing of multiple prompts."""

35

36

def test_abatch(self) -> None:

37

"""Test asynchronous batch processing."""

38

39

# Conversation tests

40

def test_conversation(self) -> None:

41

"""Test multi-turn conversation handling."""

42

43

def test_double_messages_conversation(self) -> None:

44

"""Test sequential message handling in conversations."""

45

46

# Usage metadata tests

47

def test_usage_metadata(self) -> None:

48

"""Test usage metadata tracking and validation."""

49

50

def test_usage_metadata_streaming(self) -> None:

51

"""Test usage metadata in streaming responses."""

52

53

# Stop sequence tests

54

def test_stop_sequence(self) -> None:

55

"""Test stop sequence functionality."""

56

57

# Tool calling tests (if has_tool_calling=True)

58

def test_tool_calling(self) -> None:

59

"""Test tool calling functionality."""

60

61

def test_tool_calling_async(self) -> None:

62

"""Test asynchronous tool calling."""

63

64

def test_bind_runnables_as_tools(self) -> None:

65

"""Test binding runnable objects as tools."""

66

67

def test_tool_message_histories_string_content(self) -> None:

68

"""Test tool message histories with string content."""

69

70

def test_tool_message_histories_list_content(self) -> None:

71

"""Test tool message histories with complex list content."""

72

73

def test_tool_choice(self) -> None:

74

"""Test tool choice functionality."""

75

76

def test_tool_calling_with_no_arguments(self) -> None:

77

"""Test tool calling with tools that take no arguments."""

78

79

def test_tool_message_error_status(self) -> None:

80

"""Test error handling in tool messages."""

81

82

# Structured output tests (if has_structured_output=True)

83

def test_structured_few_shot_examples(self) -> None:

84

"""Test structured output with few-shot examples."""

85

86

def test_structured_output(self) -> None:

87

"""Test structured output generation."""

88

89

def test_structured_output_async(self) -> None:

90

"""Test asynchronous structured output generation."""

91

92

def test_structured_output_pydantic_2_v1(self) -> None:

93

"""Test Pydantic V1 compatibility in structured output."""

94

95

def test_structured_output_optional_param(self) -> None:

96

"""Test structured output with optional parameters."""

97

98

# JSON mode tests (if supports_json_mode=True)

99

def test_json_mode(self) -> None:

100

"""Test JSON mode functionality."""

101

102

# Multimodal input tests (if corresponding support flags=True)

103

def test_pdf_inputs(self) -> None:

104

"""Test PDF input handling."""

105

106

def test_audio_inputs(self) -> None:

107

"""Test audio input handling."""

108

109

def test_image_inputs(self) -> None:

110

"""Test image input handling."""

111

112

def test_image_tool_message(self) -> None:

113

"""Test image content in tool messages."""

114

115

def test_anthropic_inputs(self) -> None:

116

"""Test Anthropic-style input format handling."""

117

118

# Message handling tests

119

def test_message_with_name(self) -> None:

120

"""Test messages with name attributes."""

121

122

# Advanced functionality tests

123

def test_agent_loop(self) -> None:

124

"""Test agent loop functionality with tool calling."""

125

126

def test_unicode_tool_call_integration(self) -> None:

127

"""Test Unicode handling in tool calls."""

128

129

# Performance tests

130

def test_stream_time(self) -> None:

131

"""Benchmark streaming performance."""

132

```

133

134

#### Usage Example

135

136

```python

137

from langchain_tests.integration_tests import ChatModelIntegrationTests

138

from my_integration import MyChatModel

139

140

class TestMyChatModelIntegration(ChatModelIntegrationTests):

141

@property

142

def chat_model_class(self):

143

return MyChatModel

144

145

@property

146

def chat_model_params(self):

147

return {

148

"api_key": "real-api-key", # Use real credentials for integration tests

149

"model": "gpt-4",

150

"temperature": 0.1

151

}

152

153

# Configure model capabilities

154

@property

155

def has_tool_calling(self):

156

return True

157

158

@property

159

def has_structured_output(self):

160

return True

161

162

@property

163

def supports_image_inputs(self):

164

return True

165

166

@property

167

def returns_usage_metadata(self):

168

return True

169

```

170

171

### Embeddings Integration Tests

172

173

Integration testing for embeddings models with synchronous and asynchronous operations.

174

175

```python { .api }

176

from langchain_tests.integration_tests import EmbeddingsIntegrationTests

177

178

class EmbeddingsIntegrationTests(EmbeddingsTests):

179

"""Integration tests for embeddings models."""

180

181

def test_embed_query(self) -> None:

182

"""Test embedding a single query string."""

183

184

def test_embed_documents(self) -> None:

185

"""Test embedding a list of documents."""

186

187

def test_aembed_query(self) -> None:

188

"""Test asynchronous embedding of a single query."""

189

190

def test_aembed_documents(self) -> None:

191

"""Test asynchronous embedding of document lists."""

192

```

193

194

#### Usage Example

195

196

```python

197

from langchain_tests.integration_tests import EmbeddingsIntegrationTests

198

from my_integration import MyEmbeddings

199

200

class TestMyEmbeddingsIntegration(EmbeddingsIntegrationTests):

201

@property

202

def embeddings_class(self):

203

return MyEmbeddings

204

205

@property

206

def embedding_model_params(self):

207

return {

208

"api_key": "real-api-key",

209

"model": "text-embedding-3-large"

210

}

211

```

212

213

### Tools Integration Tests

214

215

Integration testing for tools with schema validation and invocation verification.

216

217

```python { .api }

218

from langchain_tests.integration_tests import ToolsIntegrationTests

219

220

class ToolsIntegrationTests(ToolsTests):

221

"""Integration tests for tools."""

222

223

def test_invoke_matches_output_schema(self) -> None:

224

"""Test that tool output matches its declared schema."""

225

226

def test_async_invoke_matches_output_schema(self) -> None:

227

"""Test that async tool output matches its declared schema."""

228

229

def test_invoke_no_tool_call(self) -> None:

230

"""Test direct tool invocation without tool call wrapper."""

231

232

def test_async_invoke_no_tool_call(self) -> None:

233

"""Test direct async tool invocation."""

234

```

235

236

#### Usage Example

237

238

```python

239

from langchain_tests.integration_tests import ToolsIntegrationTests

240

from my_integration import MySearchTool

241

242

class TestMySearchToolIntegration(ToolsIntegrationTests):

243

@property

244

def tool_constructor(self):

245

return MySearchTool

246

247

@property

248

def tool_constructor_params(self):

249

return {

250

"api_key": "real-search-api-key",

251

"base_url": "https://api.search-service.com"

252

}

253

254

@property

255

def tool_invoke_params_example(self):

256

return {

257

"query": "LangChain framework",

258

"num_results": 5

259

}

260

```

261

262

### Retrievers Integration Tests

263

264

Integration testing for retriever implementations with document retrieval and parameter validation.

265

266

```python { .api }

267

from langchain_tests.integration_tests import RetrieversIntegrationTests

268

269

class RetrieversIntegrationTests(BaseStandardTests):

270

"""Integration tests for retrievers."""

271

272

# Required abstract properties

273

@property

274

def retriever_constructor(self):

275

"""Retriever class to test."""

276

277

@property

278

def retriever_constructor_params(self) -> dict:

279

"""Constructor parameters for the retriever."""

280

281

@property

282

def retriever_query_example(self) -> str:

283

"""Example query string for testing."""

284

285

@property

286

def num_results_arg_name(self) -> str:

287

"""Name of the parameter that controls number of results. Default: 'k'."""

288

289

# Fixtures

290

@pytest.fixture

291

def retriever(self):

292

"""Retriever fixture for testing."""

293

294

def test_k_constructor_param(self) -> None:

295

"""Test the number of results constructor parameter."""

296

297

def test_invoke_with_k_kwarg(self) -> None:

298

"""Test runtime parameter for number of results."""

299

300

def test_invoke_returns_documents(self) -> None:

301

"""Test that retriever returns Document objects."""

302

303

def test_ainvoke_returns_documents(self) -> None:

304

"""Test that async retriever returns Document objects."""

305

```

306

307

#### Usage Example

308

309

```python

310

from langchain_tests.integration_tests import RetrieversIntegrationTests

311

from my_integration import MyRetriever

312

313

class TestMyRetrieverIntegration(RetrieversIntegrationTests):

314

@property

315

def retriever_constructor(self):

316

return MyRetriever

317

318

@property

319

def retriever_constructor_params(self):

320

return {

321

"index_name": "test-index",

322

"api_key": "real-api-key"

323

}

324

325

@property

326

def retriever_query_example(self):

327

return "machine learning algorithms"

328

329

@property

330

def num_results_arg_name(self):

331

return "top_k" # If your retriever uses 'top_k' instead of 'k'

332

```

333

334

## Pre-defined Test Tools

335

336

The integration test framework includes several pre-built tools for testing tool calling functionality:

337

338

```python { .api }

339

# Pre-defined tools for testing

340

def magic_function(input: int) -> int:

341

"""Magic function tool with input validation."""

342

343

def magic_function_no_args() -> str:

344

"""No-argument magic function tool."""

345

346

def unicode_customer(customer_name: str, description: str) -> str:

347

"""Unicode handling tool for internationalization testing."""

348

349

def current_weather_tool():

350

"""Weather tool fixture for testing tool calling."""

351

```

352

353

## Test Callback Handlers

354

355

Integration tests include callback handlers for capturing and validating model behavior:

356

357

```python { .api }

358

class _TestCallbackHandler:

359

"""Callback handler for capturing chat model options and events."""

360

361

def on_chat_model_start(self, serialized, messages, **kwargs):

362

"""Called when chat model starts processing."""

363

364

def on_llm_end(self, response, **kwargs):

365

"""Called when chat model completes processing."""

366

```

367

368

## Schema Generation Utilities

369

370

Utilities for generating test schemas for structured output testing:

371

372

```python { .api }

373

def _get_joke_class(schema_type: str):

374

"""Generate joke schema for different output formats."""

375

```

376

377

## VCR Integration

378

379

Integration tests automatically use VCR (Video Cassette Recorder) for HTTP call recording and playback, enabling:

380

381

- **Consistent Testing**: Record real API responses once, replay for subsequent test runs

382

- **Offline Testing**: Run tests without network connectivity

383

- **Cost Reduction**: Avoid repeated API calls during test development

384

- **Deterministic Results**: Same responses every time for reliable testing

385

386

VCR integration is controlled by the `enable_vcr_tests` property in the base test class.

387

388

## Performance Benchmarking

389

390

Integration tests include performance benchmarking capabilities:

391

392

- **Stream Performance**: `test_stream_time()` benchmarks streaming response times

393

- **Batch Performance**: Timing analysis for batch operations

394

- **Tool Calling Performance**: Benchmarking for tool calling overhead

395

396

Performance tests use pytest-benchmark for detailed statistical analysis and regression detection.

397

398

## Multimodal Input Testing

399

400

For models that support multimodal inputs, the framework provides comprehensive testing:

401

402

- **Image Inputs**: Base64 encoded images and image URLs

403

- **PDF Inputs**: Document processing capabilities

404

- **Audio Inputs**: Speech and audio file processing

405

- **Video Inputs**: Video content analysis

406

407

Each multimodal capability is controlled by feature flags in the test class configuration.

408

409

## Error Handling Validation

410

411

Integration tests verify proper error handling for common failure scenarios:

412

413

- **API Key Errors**: Invalid or missing authentication

414

- **Rate Limiting**: Handling of rate limit responses

415

- **Network Errors**: Connection timeouts and failures

416

- **Invalid Parameters**: Malformed requests and responses

417

- **Tool Errors**: Tool execution failures and error propagation

418

419

The framework ensures that implementations handle these errors gracefully and provide meaningful error messages to developers.