or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

cli-tools.mdexternal-data.mdfunctions-module.mdindex.mdmodel-loading.mdmodel-simulation.mdparameter-management.mdstateful-components.mdutils-module.md

external-data.mddocs/

0

# External Data Integration

1

2

PySD's external data system enables models to access time series data, lookup tables, constants, and subscripts from external files, supporting various formats including Excel, CSV, and netCDF with automatic caching and encoding handling.

3

4

## Capabilities

5

6

### Base External Data Class

7

8

Foundation class for all external data components with common functionality for file handling and data management.

9

10

```python { .api }

11

class External:

12

"""

13

Base class for external data objects.

14

15

Provides common functionality for loading, caching, and accessing

16

external data sources. Handles file path resolution, encoding detection,

17

and error management.

18

19

Methods:

20

- __init__(file_name, root, sheet=None, time_row_or_col=None, cell=None)

21

- initialize() - Load and prepare external data

22

- __call__(time) - Get data value at specified time

23

"""

24

```

25

26

### Time Series Data

27

28

Handle time-varying data from external files with interpolation and extrapolation capabilities.

29

30

```python { .api }

31

class ExtData(External):

32

"""

33

Time series data from external files.

34

35

Loads time series data from CSV, Excel, or other supported formats.

36

Supports interpolation, extrapolation, and missing value handling.

37

38

Parameters:

39

- file_name: str - Path to data file

40

- root: str - Root directory for relative paths

41

- sheet: str or int or None - Excel sheet name/index

42

- time_row_or_col: str or int - Time column/row identifier

43

- cell: str or tuple - Specific cell range for data

44

- interp: str - Interpolation method ('linear', 'nearest', 'cubic')

45

- py_name: str - Python variable name

46

47

Methods:

48

- __call__(time) - Get interpolated value at specified time

49

- get_series_data() - Get original pandas Series

50

"""

51

```

52

53

#### Usage Examples

54

55

```python

56

from pysd.py_backend.external import ExtData

57

58

# Load time series from CSV

59

population_data = ExtData(

60

file_name='demographics.csv',

61

root='/data',

62

time_row_or_col='year',

63

py_name='historical_population'

64

)

65

66

# Load from Excel with specific sheet

67

economic_data = ExtData(

68

file_name='economic_indicators.xlsx',

69

root='/data',

70

sheet='GDP_Data',

71

time_row_or_col='time',

72

interp='linear'

73

)

74

75

# Access data during simulation

76

pop_at_time_15 = population_data(15.0)

77

gdp_at_time_20 = economic_data(20.0)

78

79

# Get original data series

80

original_pop_data = population_data.get_series_data()

81

```

82

83

### Lookup Tables

84

85

Access lookup tables and reference data from external files with support for multi-dimensional lookups.

86

87

```python { .api }

88

class ExtLookup(External):

89

"""

90

Lookup tables from external files.

91

92

Loads lookup tables for interpolation-based relationships between variables.

93

Supports 1D and multi-dimensional lookups with various interpolation methods.

94

95

Parameters:

96

- file_name: str - Path to lookup file

97

- root: str - Root directory

98

- sheet: str or int or None - Excel sheet

99

- x_row_or_col: str or int - X-axis data column/row

100

- cell: str or tuple - Data cell range

101

- interp: str - Interpolation method

102

- py_name: str - Variable name

103

104

Methods:

105

- __call__(x_value) - Get interpolated lookup value

106

- get_series_data() - Get original lookup table

107

"""

108

```

109

110

#### Usage Examples

111

112

```python

113

from pysd.py_backend.external import ExtLookup

114

115

# Load price-demand lookup table

116

price_lookup = ExtLookup(

117

file_name='market_data.xlsx',

118

root='/data',

119

sheet='price_elasticity',

120

x_row_or_col='price',

121

py_name='demand_lookup'

122

)

123

124

# Load multi-dimensional efficiency table

125

efficiency_lookup = ExtLookup(

126

file_name='efficiency_curves.csv',

127

root='/data',

128

x_row_or_col='temperature',

129

interp='cubic'

130

)

131

132

# Use during simulation

133

demand_for_price_50 = price_lookup(50.0)

134

efficiency_at_temp_25 = efficiency_lookup(25.0)

135

```

136

137

### External Constants

138

139

Load constant values from external files for model parameterization.

140

141

```python { .api }

142

class ExtConstant(External):

143

"""

144

Constants from external files.

145

146

Loads scalar constant values from external data sources.

147

Useful for model parameterization and configuration management.

148

149

Parameters:

150

- file_name: str - Path to constants file

151

- root: str - Root directory

152

- sheet: str or int or None - Excel sheet

153

- cell: str or tuple - Specific cell containing constant

154

- py_name: str - Variable name

155

156

Methods:

157

- __call__() - Get constant value

158

- get_constant_value() - Get the stored constant

159

"""

160

```

161

162

#### Usage Examples

163

164

```python

165

from pysd.py_backend.external import ExtConstant

166

167

# Load model parameters from configuration file

168

birth_rate_constant = ExtConstant(

169

file_name='model_config.xlsx',

170

root='/config',

171

sheet='parameters',

172

cell='B5', # Specific cell

173

py_name='base_birth_rate'

174

)

175

176

# Load from CSV

177

area_constant = ExtConstant(

178

file_name='geographic_data.csv',

179

root='/data',

180

cell='total_area',

181

py_name='country_area'

182

)

183

184

# Access constant values

185

birth_rate = birth_rate_constant()

186

total_area = area_constant()

187

```

188

189

### External Subscripts

190

191

Load subscript definitions and ranges from external files for multi-dimensional variables.

192

193

```python { .api }

194

class ExtSubscript(External):

195

"""

196

Subscripts from external files.

197

198

Loads subscript definitions (dimension ranges) from external sources.

199

Enables dynamic model structure based on external configuration.

200

201

Parameters:

202

- file_name: str - Path to subscript definition file

203

- root: str - Root directory

204

- sheet: str or int or None - Excel sheet

205

- py_name: str - Subscript name

206

207

Methods:

208

- __call__() - Get subscript range/definition

209

- get_subscript_elements() - Get list of subscript elements

210

"""

211

```

212

213

#### Usage Examples

214

215

```python

216

from pysd.py_backend.external import ExtSubscript

217

218

# Load region definitions

219

regions_subscript = ExtSubscript(

220

file_name='geographic_structure.xlsx',

221

root='/config',

222

sheet='regions',

223

py_name='model_regions'

224

)

225

226

# Load age group definitions

227

age_groups_subscript = ExtSubscript(

228

file_name='demographic_structure.csv',

229

root='/config',

230

py_name='age_categories'

231

)

232

233

# Get subscript elements

234

available_regions = regions_subscript.get_subscript_elements()

235

age_categories = age_groups_subscript.get_subscript_elements()

236

```

237

238

### Excel File Caching

239

240

Utility class for efficient Excel file handling with caching and shared access.

241

242

```python { .api }

243

class ExtSubscript(External):

244

"""

245

External subscript data from Excel files implementing Vensim's GET XLS SUBSCRIPT and GET DIRECT SUBSCRIPT functions.

246

247

Loads subscript values from Excel files to define model dimensions and array indices.

248

Supports cell ranges and named ranges with optional prefix for subscript names.

249

250

Methods:

251

- __init__(file_name, tab, firstcell, lastcell, prefix, root) - Initialize subscript data source

252

- get_subscripts_cell(col, row, lastcell) - Extract subscripts from cell range

253

- get_subscripts_name(name) - Extract subscripts from named range

254

"""

255

256

class Excels:

257

"""

258

Excel file caching utility.

259

260

Manages Excel file loading and caching for efficient access to multiple

261

sheets and ranges within the same file. Prevents repeated file loading.

262

263

Methods:

264

- __init__() - Initialize cache

265

- get_sheet(file_path, sheet_name) - Get cached Excel sheet

266

- clear_cache() - Clear all cached Excel data

267

- get_file_info(file_path) - Get file metadata

268

"""

269

```

270

271

#### Usage Examples

272

273

```python

274

from pysd.py_backend.external import Excels

275

276

# Create Excel cache manager

277

excel_cache = Excels()

278

279

# Multiple ExtData objects using same Excel file benefit from caching

280

data1 = ExtData('large_dataset.xlsx', sheet='Sheet1', ...)

281

data2 = ExtData('large_dataset.xlsx', sheet='Sheet2', ...)

282

data3 = ExtData('large_dataset.xlsx', sheet='Sheet3', ...)

283

284

# File is loaded only once and cached for reuse

285

# Clear cache when memory management needed

286

excel_cache.clear_cache()

287

```

288

289

### Data File Format Support

290

291

PySD supports various external data formats:

292

293

#### CSV Files

294

```python

295

# CSV with time column

296

time,population,gdp

297

0,1000,5000

298

1,1050,5250

299

2,1100,5500

300

```

301

302

#### Excel Files

303

```python

304

# Multiple sheets supported

305

# Sheet names or indices can be specified

306

# Cell ranges: 'A1:C10' or (1,1,3,10)

307

```

308

309

#### NetCDF Files

310

```python

311

# For large datasets and model output

312

# Supports multi-dimensional arrays

313

# Automatic coordinate handling

314

```

315

316

### Integration with Model Loading

317

318

External data is typically integrated during model loading:

319

320

```python

321

import pysd

322

323

# Load model with external data files

324

model = pysd.read_vensim(

325

'population_model.mdl',

326

data_files={

327

'demographics.csv': ['birth_rate', 'death_rate'],

328

'economic.xlsx': ['gdp_growth', 'unemployment']

329

},

330

data_files_encoding='utf-8'

331

)

332

333

# External data automatically available in model

334

results = model.run()

335

```

336

337

### Advanced Data Handling

338

339

#### Missing Value Strategies

340

341

```python

342

# Configure missing value handling during model loading

343

model = pysd.read_vensim(

344

'model.mdl',

345

data_files=['incomplete_data.csv'],

346

missing_values='warning' # 'error', 'ignore', 'keep'

347

)

348

```

349

350

#### Encoding Management

351

352

```python

353

# Handle different file encodings

354

model = pysd.read_vensim(

355

'model.mdl',

356

data_files=['international_data.csv'],

357

data_files_encoding={

358

'international_data.csv': 'utf-8'

359

}

360

)

361

```

362

363

#### Data Serialization

364

365

Export external data to netCDF format for efficient storage and access:

366

367

```python

368

# Export model's external data

369

model.serialize_externals(

370

export_path='model_externals.nc',

371

time_coords={'time': range(0, 101)},

372

compression_level=4

373

)

374

375

# Load model with serialized externals

376

model_with_nc = pysd.load(

377

'model.py',

378

data_files='model_externals.nc'

379

)

380

```

381

382

### Error Handling

383

384

External data components provide comprehensive error handling:

385

386

- **FileNotFoundError**: Missing data files

387

- **KeyError**: Missing columns or sheets

388

- **ValueError**: Invalid data formats or ranges

389

- **UnicodeDecodeError**: Encoding issues

390

- **InterpolationError**: Problems with data interpolation

391

392

```python

393

try:

394

data = ExtData('missing_file.csv', root='/data')

395

data.initialize()

396

except FileNotFoundError:

397

print("Data file not found, using default values")

398

399

try:

400

value = data(time_point)

401

except ValueError as e:

402

print(f"Interpolation error: {e}")

403

```

404

405

### Performance Optimization

406

407

For efficient external data usage:

408

409

- Cache frequently accessed files using Excels class

410

- Use appropriate interpolation methods for data characteristics

411

- Consider data preprocessing for very large datasets

412

- Utilize netCDF format for complex multi-dimensional data

413

414

```python

415

# Efficient pattern for multiple data sources

416

excel_manager = Excels()

417

418

# All data objects share cached Excel file

419

population_data = ExtData('master_data.xlsx', sheet='population')

420

economic_data = ExtData('master_data.xlsx', sheet='economy')

421

social_data = ExtData('master_data.xlsx', sheet='social')

422

```