0
# External Data Integration
1
2
PySD's external data system enables models to access time series data, lookup tables, constants, and subscripts from external files, supporting various formats including Excel, CSV, and netCDF with automatic caching and encoding handling.
3
4
## Capabilities
5
6
### Base External Data Class
7
8
Foundation class for all external data components with common functionality for file handling and data management.
9
10
```python { .api }
11
class External:
12
"""
13
Base class for external data objects.
14
15
Provides common functionality for loading, caching, and accessing
16
external data sources. Handles file path resolution, encoding detection,
17
and error management.
18
19
Methods:
20
- __init__(file_name, root, sheet=None, time_row_or_col=None, cell=None)
21
- initialize() - Load and prepare external data
22
- __call__(time) - Get data value at specified time
23
"""
24
```
25
26
### Time Series Data
27
28
Handle time-varying data from external files with interpolation and extrapolation capabilities.
29
30
```python { .api }
31
class ExtData(External):
32
"""
33
Time series data from external files.
34
35
Loads time series data from CSV, Excel, or other supported formats.
36
Supports interpolation, extrapolation, and missing value handling.
37
38
Parameters:
39
- file_name: str - Path to data file
40
- root: str - Root directory for relative paths
41
- sheet: str or int or None - Excel sheet name/index
42
- time_row_or_col: str or int - Time column/row identifier
43
- cell: str or tuple - Specific cell range for data
44
- interp: str - Interpolation method ('linear', 'nearest', 'cubic')
45
- py_name: str - Python variable name
46
47
Methods:
48
- __call__(time) - Get interpolated value at specified time
49
- get_series_data() - Get original pandas Series
50
"""
51
```
52
53
#### Usage Examples
54
55
```python
56
from pysd.py_backend.external import ExtData
57
58
# Load time series from CSV
59
population_data = ExtData(
60
file_name='demographics.csv',
61
root='/data',
62
time_row_or_col='year',
63
py_name='historical_population'
64
)
65
66
# Load from Excel with specific sheet
67
economic_data = ExtData(
68
file_name='economic_indicators.xlsx',
69
root='/data',
70
sheet='GDP_Data',
71
time_row_or_col='time',
72
interp='linear'
73
)
74
75
# Access data during simulation
76
pop_at_time_15 = population_data(15.0)
77
gdp_at_time_20 = economic_data(20.0)
78
79
# Get original data series
80
original_pop_data = population_data.get_series_data()
81
```
82
83
### Lookup Tables
84
85
Access lookup tables and reference data from external files with support for multi-dimensional lookups.
86
87
```python { .api }
88
class ExtLookup(External):
89
"""
90
Lookup tables from external files.
91
92
Loads lookup tables for interpolation-based relationships between variables.
93
Supports 1D and multi-dimensional lookups with various interpolation methods.
94
95
Parameters:
96
- file_name: str - Path to lookup file
97
- root: str - Root directory
98
- sheet: str or int or None - Excel sheet
99
- x_row_or_col: str or int - X-axis data column/row
100
- cell: str or tuple - Data cell range
101
- interp: str - Interpolation method
102
- py_name: str - Variable name
103
104
Methods:
105
- __call__(x_value) - Get interpolated lookup value
106
- get_series_data() - Get original lookup table
107
"""
108
```
109
110
#### Usage Examples
111
112
```python
113
from pysd.py_backend.external import ExtLookup
114
115
# Load price-demand lookup table
116
price_lookup = ExtLookup(
117
file_name='market_data.xlsx',
118
root='/data',
119
sheet='price_elasticity',
120
x_row_or_col='price',
121
py_name='demand_lookup'
122
)
123
124
# Load multi-dimensional efficiency table
125
efficiency_lookup = ExtLookup(
126
file_name='efficiency_curves.csv',
127
root='/data',
128
x_row_or_col='temperature',
129
interp='cubic'
130
)
131
132
# Use during simulation
133
demand_for_price_50 = price_lookup(50.0)
134
efficiency_at_temp_25 = efficiency_lookup(25.0)
135
```
136
137
### External Constants
138
139
Load constant values from external files for model parameterization.
140
141
```python { .api }
142
class ExtConstant(External):
143
"""
144
Constants from external files.
145
146
Loads scalar constant values from external data sources.
147
Useful for model parameterization and configuration management.
148
149
Parameters:
150
- file_name: str - Path to constants file
151
- root: str - Root directory
152
- sheet: str or int or None - Excel sheet
153
- cell: str or tuple - Specific cell containing constant
154
- py_name: str - Variable name
155
156
Methods:
157
- __call__() - Get constant value
158
- get_constant_value() - Get the stored constant
159
"""
160
```
161
162
#### Usage Examples
163
164
```python
165
from pysd.py_backend.external import ExtConstant
166
167
# Load model parameters from configuration file
168
birth_rate_constant = ExtConstant(
169
file_name='model_config.xlsx',
170
root='/config',
171
sheet='parameters',
172
cell='B5', # Specific cell
173
py_name='base_birth_rate'
174
)
175
176
# Load from CSV
177
area_constant = ExtConstant(
178
file_name='geographic_data.csv',
179
root='/data',
180
cell='total_area',
181
py_name='country_area'
182
)
183
184
# Access constant values
185
birth_rate = birth_rate_constant()
186
total_area = area_constant()
187
```
188
189
### External Subscripts
190
191
Load subscript definitions and ranges from external files for multi-dimensional variables.
192
193
```python { .api }
194
class ExtSubscript(External):
195
"""
196
Subscripts from external files.
197
198
Loads subscript definitions (dimension ranges) from external sources.
199
Enables dynamic model structure based on external configuration.
200
201
Parameters:
202
- file_name: str - Path to subscript definition file
203
- root: str - Root directory
204
- sheet: str or int or None - Excel sheet
205
- py_name: str - Subscript name
206
207
Methods:
208
- __call__() - Get subscript range/definition
209
- get_subscript_elements() - Get list of subscript elements
210
"""
211
```
212
213
#### Usage Examples
214
215
```python
216
from pysd.py_backend.external import ExtSubscript
217
218
# Load region definitions
219
regions_subscript = ExtSubscript(
220
file_name='geographic_structure.xlsx',
221
root='/config',
222
sheet='regions',
223
py_name='model_regions'
224
)
225
226
# Load age group definitions
227
age_groups_subscript = ExtSubscript(
228
file_name='demographic_structure.csv',
229
root='/config',
230
py_name='age_categories'
231
)
232
233
# Get subscript elements
234
available_regions = regions_subscript.get_subscript_elements()
235
age_categories = age_groups_subscript.get_subscript_elements()
236
```
237
238
### Excel File Caching
239
240
Utility class for efficient Excel file handling with caching and shared access.
241
242
```python { .api }
243
class ExtSubscript(External):
244
"""
245
External subscript data from Excel files implementing Vensim's GET XLS SUBSCRIPT and GET DIRECT SUBSCRIPT functions.
246
247
Loads subscript values from Excel files to define model dimensions and array indices.
248
Supports cell ranges and named ranges with optional prefix for subscript names.
249
250
Methods:
251
- __init__(file_name, tab, firstcell, lastcell, prefix, root) - Initialize subscript data source
252
- get_subscripts_cell(col, row, lastcell) - Extract subscripts from cell range
253
- get_subscripts_name(name) - Extract subscripts from named range
254
"""
255
256
class Excels:
257
"""
258
Excel file caching utility.
259
260
Manages Excel file loading and caching for efficient access to multiple
261
sheets and ranges within the same file. Prevents repeated file loading.
262
263
Methods:
264
- __init__() - Initialize cache
265
- get_sheet(file_path, sheet_name) - Get cached Excel sheet
266
- clear_cache() - Clear all cached Excel data
267
- get_file_info(file_path) - Get file metadata
268
"""
269
```
270
271
#### Usage Examples
272
273
```python
274
from pysd.py_backend.external import Excels
275
276
# Create Excel cache manager
277
excel_cache = Excels()
278
279
# Multiple ExtData objects using same Excel file benefit from caching
280
data1 = ExtData('large_dataset.xlsx', sheet='Sheet1', ...)
281
data2 = ExtData('large_dataset.xlsx', sheet='Sheet2', ...)
282
data3 = ExtData('large_dataset.xlsx', sheet='Sheet3', ...)
283
284
# File is loaded only once and cached for reuse
285
# Clear cache when memory management needed
286
excel_cache.clear_cache()
287
```
288
289
### Data File Format Support
290
291
PySD supports various external data formats:
292
293
#### CSV Files
294
```python
295
# CSV with time column
296
time,population,gdp
297
0,1000,5000
298
1,1050,5250
299
2,1100,5500
300
```
301
302
#### Excel Files
303
```python
304
# Multiple sheets supported
305
# Sheet names or indices can be specified
306
# Cell ranges: 'A1:C10' or (1,1,3,10)
307
```
308
309
#### NetCDF Files
310
```python
311
# For large datasets and model output
312
# Supports multi-dimensional arrays
313
# Automatic coordinate handling
314
```
315
316
### Integration with Model Loading
317
318
External data is typically integrated during model loading:
319
320
```python
321
import pysd
322
323
# Load model with external data files
324
model = pysd.read_vensim(
325
'population_model.mdl',
326
data_files={
327
'demographics.csv': ['birth_rate', 'death_rate'],
328
'economic.xlsx': ['gdp_growth', 'unemployment']
329
},
330
data_files_encoding='utf-8'
331
)
332
333
# External data automatically available in model
334
results = model.run()
335
```
336
337
### Advanced Data Handling
338
339
#### Missing Value Strategies
340
341
```python
342
# Configure missing value handling during model loading
343
model = pysd.read_vensim(
344
'model.mdl',
345
data_files=['incomplete_data.csv'],
346
missing_values='warning' # 'error', 'ignore', 'keep'
347
)
348
```
349
350
#### Encoding Management
351
352
```python
353
# Handle different file encodings
354
model = pysd.read_vensim(
355
'model.mdl',
356
data_files=['international_data.csv'],
357
data_files_encoding={
358
'international_data.csv': 'utf-8'
359
}
360
)
361
```
362
363
#### Data Serialization
364
365
Export external data to netCDF format for efficient storage and access:
366
367
```python
368
# Export model's external data
369
model.serialize_externals(
370
export_path='model_externals.nc',
371
time_coords={'time': range(0, 101)},
372
compression_level=4
373
)
374
375
# Load model with serialized externals
376
model_with_nc = pysd.load(
377
'model.py',
378
data_files='model_externals.nc'
379
)
380
```
381
382
### Error Handling
383
384
External data components provide comprehensive error handling:
385
386
- **FileNotFoundError**: Missing data files
387
- **KeyError**: Missing columns or sheets
388
- **ValueError**: Invalid data formats or ranges
389
- **UnicodeDecodeError**: Encoding issues
390
- **InterpolationError**: Problems with data interpolation
391
392
```python
393
try:
394
data = ExtData('missing_file.csv', root='/data')
395
data.initialize()
396
except FileNotFoundError:
397
print("Data file not found, using default values")
398
399
try:
400
value = data(time_point)
401
except ValueError as e:
402
print(f"Interpolation error: {e}")
403
```
404
405
### Performance Optimization
406
407
For efficient external data usage:
408
409
- Cache frequently accessed files using Excels class
410
- Use appropriate interpolation methods for data characteristics
411
- Consider data preprocessing for very large datasets
412
- Utilize netCDF format for complex multi-dimensional data
413
414
```python
415
# Efficient pattern for multiple data sources
416
excel_manager = Excels()
417
418
# All data objects share cached Excel file
419
population_data = ExtData('master_data.xlsx', sheet='population')
420
economic_data = ExtData('master_data.xlsx', sheet='economy')
421
social_data = ExtData('master_data.xlsx', sheet='social')
422
```