or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

agentic-metrics.mdbenchmarks.mdcontent-quality-metrics.mdconversational-metrics.mdcore-evaluation.mdcustom-metrics.mddataset.mdindex.mdintegrations.mdmodels.mdmultimodal-metrics.mdrag-metrics.mdsynthesizer.mdtest-cases.mdtracing.md

synthesizer.mddocs/

0

# Synthesizer

1

2

Synthetic test data generation using various evolution strategies (reasoning, multi-context, concretizing, etc.) to create diverse and challenging test cases. Generate goldens from documents, contexts, or from scratch.

3

4

## Imports

5

6

```python

7

from deepeval.synthesizer import (

8

Synthesizer,

9

Evolution,

10

PromptEvolution,

11

FiltrationConfig,

12

EvolutionConfig,

13

StylingConfig,

14

ContextConstructionConfig

15

)

16

```

17

18

## Capabilities

19

20

### Synthesizer Class

21

22

Main class for generating synthetic test data.

23

24

```python { .api }

25

class Synthesizer:

26

"""

27

Generates synthetic test data and goldens.

28

29

Parameters:

30

- model (Union[str, DeepEvalBaseLLM], optional): Model for generation

31

- async_mode (bool): Async mode (default: True)

32

- max_concurrent (int): Max concurrent tasks (default: 100)

33

- filtration_config (FiltrationConfig, optional): Filtration configuration

34

- evolution_config (EvolutionConfig, optional): Evolution configuration

35

- styling_config (StylingConfig, optional): Styling configuration

36

- cost_tracking (bool): Track API costs (default: False)

37

38

Methods:

39

- generate_goldens_from_docs(document_paths, **kwargs) -> List[Golden]

40

- a_generate_goldens_from_docs(document_paths, **kwargs) -> List[Golden]

41

- generate_goldens_from_contexts(contexts, **kwargs) -> List[Golden]

42

- a_generate_goldens_from_contexts(contexts, **kwargs) -> List[Golden]

43

- generate_goldens_from_scratch(num_goldens, **kwargs) -> List[Golden]

44

- a_generate_goldens_from_scratch(num_goldens, **kwargs) -> List[Golden]

45

- generate_goldens_from_goldens(goldens, **kwargs) -> List[Golden]

46

- a_generate_goldens_from_goldens(goldens, **kwargs) -> List[Golden]

47

- save_as(file_type, directory, file_name=None): Save synthetic goldens

48

- to_pandas() -> pd.DataFrame: Convert to pandas DataFrame

49

"""

50

```

51

52

### Evolution Types

53

54

Input evolution strategies for creating diverse test cases.

55

56

```python { .api }

57

class Evolution:

58

"""

59

Enum of input evolution strategies.

60

61

Values:

62

- REASONING: Add reasoning complexity

63

- MULTICONTEXT: Require multiple contexts

64

- CONCRETIZING: Make more concrete/specific

65

- CONSTRAINED: Add constraints

66

- COMPARATIVE: Add comparisons

67

- HYPOTHETICAL: Make hypothetical

68

- IN_BREADTH: Broaden scope

69

"""

70

71

class PromptEvolution:

72

"""

73

Enum of prompt evolution (for scratch generation).

74

75

Values:

76

- REASONING

77

- CONCRETIZING

78

- CONSTRAINED

79

- COMPARATIVE

80

- HYPOTHETICAL

81

- IN_BREADTH

82

"""

83

```

84

85

### Configuration Classes

86

87

```python { .api }

88

class FiltrationConfig:

89

"""

90

Configuration for synthetic data filtration.

91

92

Parameters:

93

- synthetic_input_quality_threshold (float): Quality threshold (default: 0.5)

94

- max_quality_retries (int): Max retries for quality (default: 3)

95

- critic_model (Union[str, DeepEvalBaseLLM], optional): Critic model for quality assessment

96

"""

97

98

class EvolutionConfig:

99

"""

100

Configuration for input evolution.

101

102

Parameters:

103

- num_evolutions (int): Number of evolution iterations (default: 1)

104

- evolutions (Dict[Evolution, float]): Evolution types and weights (default: equal distribution)

105

"""

106

107

class StylingConfig:

108

"""

109

Configuration for output styling.

110

111

Parameters:

112

- scenario (str, optional): Scenario description

113

- task (str, optional): Task description

114

- input_format (str, optional): Input format specification

115

- expected_output_format (str, optional): Expected output format

116

"""

117

118

class ContextConstructionConfig:

119

"""

120

Configuration for context construction from documents.

121

122

Parameters:

123

- embedder (Union[str, DeepEvalBaseEmbeddingModel], optional): Embedding model

124

- critic_model (Union[str, DeepEvalBaseLLM], optional): Critic model

125

- encoding (str, optional): Text encoding

126

- max_contexts_per_document (int): Max contexts per doc (default: 3)

127

- min_contexts_per_document (int): Min contexts per doc (default: 1)

128

- max_context_length (int): Max context length in chunks (default: 3)

129

- min_context_length (int): Min context length in chunks (default: 1)

130

- chunk_size (int): Chunk size in characters (default: 1024)

131

- chunk_overlap (int): Chunk overlap (default: 0)

132

- context_quality_threshold (float): Quality threshold (default: 0.5)

133

- context_similarity_threshold (float): Similarity threshold (default: 0.0)

134

- max_retries (int): Max retries (default: 3)

135

"""

136

```

137

138

## Usage Examples

139

140

### Generate from Documents

141

142

```python

143

from deepeval.synthesizer import Synthesizer

144

145

synthesizer = Synthesizer(model="gpt-4")

146

147

# Generate goldens from documents

148

goldens = synthesizer.generate_goldens_from_docs(

149

document_paths=[

150

"./docs/product_manual.pdf",

151

"./docs/faq.txt",

152

"./docs/user_guide.docx"

153

],

154

max_goldens_per_context=2,

155

include_expected_output=True

156

)

157

158

print(f"Generated {len(goldens)} goldens")

159

for golden in goldens[:3]:

160

print(f"Input: {golden.input}")

161

print(f"Expected: {golden.expected_output}\n")

162

163

# Save to file

164

synthesizer.save_as(

165

file_type="json",

166

directory="./synthetic_data",

167

file_name="doc_goldens"

168

)

169

```

170

171

### Generate from Contexts

172

173

```python

174

from deepeval.synthesizer import Synthesizer

175

176

synthesizer = Synthesizer()

177

178

# Generate from predefined contexts

179

contexts = [

180

["Our return policy allows 30-day full refunds"],

181

["Shipping takes 3-5 business days for US orders"],

182

["Premium members get free expedited shipping"]

183

]

184

185

goldens = synthesizer.generate_goldens_from_contexts(

186

contexts=contexts,

187

max_goldens_per_context=3,

188

include_expected_output=True

189

)

190

```

191

192

### Generate from Scratch

193

194

```python

195

from deepeval.synthesizer import Synthesizer, StylingConfig

196

197

synthesizer = Synthesizer(

198

styling_config=StylingConfig(

199

scenario="Customer support for an e-commerce platform",

200

task="Answer customer questions about products, shipping, and returns",

201

input_format="Natural language questions",

202

expected_output_format="Helpful, concise answers"

203

)

204

)

205

206

# Generate from scratch using styling config

207

goldens = synthesizer.generate_goldens_from_scratch(

208

num_goldens=50

209

)

210

211

print(f"Generated {len(goldens)} synthetic goldens")

212

```

213

214

### Apply Evolution Strategies

215

216

```python

217

from deepeval.synthesizer import Synthesizer, EvolutionConfig, Evolution

218

219

# Configure evolution strategies

220

evolution_config = EvolutionConfig(

221

num_evolutions=2, # Apply 2 rounds of evolution

222

evolutions={

223

Evolution.REASONING: 0.3, # 30% reasoning

224

Evolution.MULTICONTEXT: 0.2, # 20% multi-context

225

Evolution.CONCRETIZING: 0.2, # 20% concretizing

226

Evolution.CONSTRAINED: 0.15, # 15% constrained

227

Evolution.COMPARATIVE: 0.15 # 15% comparative

228

}

229

)

230

231

synthesizer = Synthesizer(evolution_config=evolution_config)

232

233

goldens = synthesizer.generate_goldens_from_docs(

234

document_paths=["./docs/guide.pdf"],

235

max_goldens_per_context=3

236

)

237

```

238

239

### Quality Filtration

240

241

```python

242

from deepeval.synthesizer import Synthesizer, FiltrationConfig

243

244

# Configure quality filtration

245

filtration_config = FiltrationConfig(

246

synthetic_input_quality_threshold=0.7, # Higher quality threshold

247

max_quality_retries=5, # More retry attempts

248

critic_model="gpt-4" # Use GPT-4 as quality critic

249

)

250

251

synthesizer = Synthesizer(

252

filtration_config=filtration_config,

253

cost_tracking=True # Track API costs

254

)

255

256

goldens = synthesizer.generate_goldens_from_contexts(

257

contexts=[["High-quality context about AI"]],

258

max_goldens_per_context=5

259

)

260

261

# Only high-quality goldens will be generated

262

```

263

264

### Custom Context Construction

265

266

```python

267

from deepeval.synthesizer import Synthesizer, ContextConstructionConfig

268

from deepeval.models import OpenAIEmbeddingModel

269

270

# Configure context construction

271

context_config = ContextConstructionConfig(

272

embedder=OpenAIEmbeddingModel(model="text-embedding-3-large"),

273

chunk_size=512, # Smaller chunks

274

chunk_overlap=50, # Some overlap

275

max_contexts_per_document=5,

276

min_context_length=2, # At least 2 chunks per context

277

max_context_length=4, # At most 4 chunks per context

278

context_quality_threshold=0.6,

279

context_similarity_threshold=0.3 # Avoid very similar contexts

280

)

281

282

synthesizer = Synthesizer()

283

284

goldens = synthesizer.generate_goldens_from_docs(

285

document_paths=["./large_document.pdf"],

286

context_construction_config=context_config,

287

max_goldens_per_context=3

288

)

289

```

290

291

### Evolve Existing Goldens

292

293

```python

294

from deepeval.synthesizer import Synthesizer

295

from deepeval.dataset import Golden

296

297

# Existing goldens

298

existing_goldens = [

299

Golden(input="What is Python?", expected_output="Python is a programming language"),

300

Golden(input="What is Java?", expected_output="Java is a programming language")

301

]

302

303

synthesizer = Synthesizer()

304

305

# Generate more goldens based on existing ones

306

new_goldens = synthesizer.generate_goldens_from_goldens(

307

goldens=existing_goldens,

308

max_goldens_per_golden=3, # Generate 3 variations per golden

309

include_expected_output=True

310

)

311

312

print(f"Generated {len(new_goldens)} new goldens from {len(existing_goldens)} existing")

313

```

314

315

### Async Generation

316

317

```python

318

import asyncio

319

from deepeval.synthesizer import Synthesizer

320

321

async def generate_data():

322

synthesizer = Synthesizer(

323

async_mode=True,

324

max_concurrent=50 # Higher concurrency

325

)

326

327

# Async generation

328

goldens = await synthesizer.a_generate_goldens_from_docs(

329

document_paths=["./doc1.pdf", "./doc2.pdf"],

330

max_goldens_per_context=5

331

)

332

333

return goldens

334

335

# Run async

336

goldens = asyncio.run(generate_data())

337

```

338

339

### Save and Export

340

341

```python

342

from deepeval.synthesizer import Synthesizer

343

344

synthesizer = Synthesizer()

345

goldens = synthesizer.generate_goldens_from_scratch(num_goldens=100)

346

347

# Save as JSON

348

synthesizer.save_as(

349

file_type="json",

350

directory="./data",

351

file_name="synthetic_goldens"

352

)

353

354

# Save as CSV

355

synthesizer.save_as(

356

file_type="csv",

357

directory="./data",

358

file_name="synthetic_goldens"

359

)

360

361

# Convert to pandas DataFrame for analysis

362

df = synthesizer.to_pandas()

363

print(df.head())

364

print(df.describe())

365

```

366

367

### Complete Example

368

369

```python

370

from deepeval.synthesizer import (

371

Synthesizer,

372

EvolutionConfig,

373

Evolution,

374

FiltrationConfig,

375

StylingConfig,

376

ContextConstructionConfig

377

)

378

from deepeval.models import GPTModel, OpenAIEmbeddingModel

379

380

# Configure synthesizer with all options

381

synthesizer = Synthesizer(

382

model=GPTModel(model="gpt-4"),

383

async_mode=True,

384

max_concurrent=20,

385

evolution_config=EvolutionConfig(

386

num_evolutions=2,

387

evolutions={

388

Evolution.REASONING: 0.4,

389

Evolution.MULTICONTEXT: 0.3,

390

Evolution.CONCRETIZING: 0.3

391

}

392

),

393

filtration_config=FiltrationConfig(

394

synthetic_input_quality_threshold=0.7,

395

max_quality_retries=3,

396

critic_model="gpt-4"

397

),

398

styling_config=StylingConfig(

399

scenario="Technical support for software products",

400

task="Help users troubleshoot issues",

401

input_format="User problem descriptions",

402

expected_output_format="Step-by-step troubleshooting guides"

403

),

404

cost_tracking=True

405

)

406

407

# Generate high-quality synthetic data

408

goldens = synthesizer.generate_goldens_from_docs(

409

document_paths=["./technical_docs.pdf"],

410

context_construction_config=ContextConstructionConfig(

411

embedder=OpenAIEmbeddingModel(),

412

chunk_size=1024,

413

max_contexts_per_document=10

414

),

415

max_goldens_per_context=2,

416

include_expected_output=True

417

)

418

419

# Save results

420

synthesizer.save_as(

421

file_type="json",

422

directory="./synthetic_data",

423

file_name="technical_support_goldens"

424

)

425

426

print(f"Generated {len(goldens)} high-quality synthetic goldens")

427

```

428