0
# User-Defined Types
1
2
NetCDF4 supports user-defined data types including enumeration types, variable-length types, and compound (structured) types. These enable complex data structures beyond basic numeric and string types.
3
4
## Capabilities
5
6
### Base User Type
7
8
Common functionality for all user-defined types.
9
10
```python { .api }
11
class UserType(BaseObject):
12
@property
13
def name(self) -> str:
14
"""Type name."""
15
...
16
17
@property
18
def dtype(self) -> np.dtype:
19
"""NumPy dtype representation."""
20
...
21
```
22
23
### Enumeration Types
24
25
Define discrete sets of named values, useful for categorical data and flags.
26
27
```python { .api }
28
class EnumType(UserType):
29
@property
30
def enum_dict(self) -> dict:
31
"""Dictionary mapping enum names to values."""
32
...
33
34
def create_enumtype(self, datatype, datatype_name: str, enum_dict: dict) -> EnumType:
35
"""
36
Create an enumeration type.
37
38
Args:
39
datatype: Base integer type (e.g., 'i1', 'i2', 'i4')
40
datatype_name (str): Name for the enumeration type
41
enum_dict (dict): Mapping of enum names to integer values
42
43
Returns:
44
EnumType: The created enumeration type
45
"""
46
...
47
```
48
49
### Variable-Length Types
50
51
Store arrays of varying lengths, useful for ragged arrays and string data.
52
53
```python { .api }
54
class VLType(UserType):
55
pass
56
57
def create_vltype(self, datatype, datatype_name: str) -> VLType:
58
"""
59
Create a variable-length type.
60
61
Args:
62
datatype: Base data type for array elements
63
datatype_name (str): Name for the variable-length type
64
65
Returns:
66
VLType: The created variable-length type
67
"""
68
...
69
```
70
71
### Compound Types
72
73
Define structured types with multiple named fields, similar to C structs.
74
75
```python { .api }
76
class CompoundType(UserType):
77
@property
78
def dtype_view(self) -> np.dtype:
79
"""Alternative dtype view for string handling."""
80
...
81
82
def create_cmptype(self, datatype, datatype_name: str) -> CompoundType:
83
"""
84
Create a compound type.
85
86
Args:
87
datatype: NumPy structured dtype defining the compound type
88
datatype_name (str): Name for the compound type
89
90
Returns:
91
CompoundType: The created compound type
92
"""
93
...
94
```
95
96
### Type Access
97
98
Access user-defined types within groups.
99
100
```python { .api }
101
@property
102
def enumtypes(self) -> Frozen:
103
"""Dictionary-like access to enumeration types."""
104
...
105
106
@property
107
def vltypes(self) -> Frozen:
108
"""Dictionary-like access to variable-length types."""
109
...
110
111
@property
112
def cmptypes(self) -> Frozen:
113
"""Dictionary-like access to compound types."""
114
...
115
```
116
117
## Usage Examples
118
119
### Enumeration Types
120
121
```python
122
import h5netcdf
123
import numpy as np
124
125
with h5netcdf.File('enum_types.nc', 'w') as f:
126
# Create enumeration for quality flags
127
quality_enum = f.create_enumtype(
128
'i1', # Base type: signed 8-bit integer
129
'quality_flag',
130
{
131
'good': 0,
132
'questionable': 1,
133
'bad': 2,
134
'missing': 3
135
}
136
)
137
138
# Create enumeration for weather conditions
139
weather_enum = f.create_enumtype(
140
'i2', # Base type: signed 16-bit integer
141
'weather_type',
142
{
143
'clear': 0,
144
'partly_cloudy': 1,
145
'cloudy': 2,
146
'rain': 3,
147
'snow': 4,
148
'storm': 5
149
}
150
)
151
152
# Create dimensions and variables using enum types
153
f.dimensions['time'] = 100
154
f.dimensions['station'] = 50
155
156
quality = f.create_variable('quality', ('time', 'station'),
157
dtype=quality_enum)
158
weather = f.create_variable('weather', ('time', 'station'),
159
dtype=weather_enum)
160
161
# Write enum values using integer codes
162
quality[0, :] = np.random.choice([0, 1, 2, 3], size=50)
163
weather[0, :] = np.random.choice([0, 1, 2, 3, 4, 5], size=50)
164
165
# Access enum information
166
print(f"Quality enum values: {quality_enum.enum_dict}")
167
print(f"Weather enum values: {weather_enum.enum_dict}")
168
```
169
170
### Variable-Length Types
171
172
```python
173
with h5netcdf.File('vlen_types.nc', 'w') as f:
174
# Create variable-length string type
175
vlen_str = f.create_vltype(str, 'vlen_string')
176
177
# Create variable-length integer array type
178
vlen_int = f.create_vltype('i4', 'vlen_int_array')
179
180
# Create variables using VL types
181
f.dimensions['record'] = 10
182
183
# Variable-length strings (for varying-length text)
184
comments = f.create_variable('comments', ('record',), dtype=vlen_str)
185
186
# Variable-length integer arrays (for ragged arrays)
187
measurements = f.create_variable('measurements', ('record',), dtype=vlen_int)
188
189
# Write variable-length data
190
comment_data = [
191
"Short comment",
192
"This is a much longer comment with more detail",
193
"Medium length",
194
"", # Empty string
195
"Another comment"
196
]
197
198
measurement_data = [
199
[1, 2, 3], # 3 values
200
[4, 5, 6, 7, 8], # 5 values
201
[9], # 1 value
202
[], # No values
203
[10, 11] # 2 values
204
]
205
206
# Note: Writing VL data depends on h5py version and backend
207
# This is conceptual - actual syntax may vary
208
for i, (comment, measurements_list) in enumerate(zip(comment_data, measurement_data)):
209
if i < len(comment_data):
210
comments[i] = comment
211
if i < len(measurement_data):
212
measurements[i] = measurements_list
213
```
214
215
### Compound Types
216
217
```python
218
with h5netcdf.File('compound_types.nc', 'w') as f:
219
# Define compound type for weather observations
220
weather_dtype = np.dtype([
221
('temperature', 'f4'), # 32-bit float
222
('humidity', 'f4'), # 32-bit float
223
('pressure', 'f8'), # 64-bit float
224
('wind_speed', 'f4'), # 32-bit float
225
('wind_direction', 'i2'), # 16-bit integer
226
('station_id', 'i4'), # 32-bit integer
227
('timestamp', 'i8') # 64-bit integer
228
])
229
230
weather_compound = f.create_cmptype(weather_dtype, 'weather_obs')
231
232
# Create variable using compound type
233
f.dimensions['observation'] = 1000
234
235
obs = f.create_variable('observations', ('observation',),
236
dtype=weather_compound)
237
238
# Create structured array data
239
data = np.zeros(1000, dtype=weather_dtype)
240
data['temperature'] = np.random.normal(20, 10, 1000)
241
data['humidity'] = np.random.uniform(30, 90, 1000)
242
data['pressure'] = np.random.normal(1013.25, 20, 1000)
243
data['wind_speed'] = np.random.exponential(5, 1000)
244
data['wind_direction'] = np.random.randint(0, 360, 1000)
245
data['station_id'] = np.random.randint(1000, 9999, 1000)
246
data['timestamp'] = np.arange(1000) + 1640000000 # Unix timestamps
247
248
# Write compound data
249
obs[:] = data
250
251
# Access compound type information
252
print(f"Compound type fields: {weather_compound.dtype.names}")
253
print(f"Field types: {[weather_compound.dtype.fields[name][0] for name in weather_compound.dtype.names]}")
254
```
255
256
### Complex Nested Types
257
258
```python
259
with h5netcdf.File('nested_types.nc', 'w') as f:
260
# Create enumeration for data source
261
source_enum = f.create_enumtype('i1', 'data_source', {
262
'satellite': 0,
263
'ground_station': 1,
264
'aircraft': 2,
265
'ship': 3
266
})
267
268
# Create compound type that includes enum field
269
measurement_dtype = np.dtype([
270
('value', 'f4'),
271
('uncertainty', 'f4'),
272
('source', 'i1'), # Will use enum values
273
('quality_code', 'i1')
274
])
275
276
measurement_compound = f.create_cmptype(measurement_dtype, 'measurement')
277
278
# Create variable using nested types
279
f.dimensions['sample'] = 500
280
281
data_var = f.create_variable('data', ('sample',), dtype=measurement_compound)
282
283
# Create data with enum values in compound type
284
sample_data = np.zeros(500, dtype=measurement_dtype)
285
sample_data['value'] = np.random.normal(0, 1, 500)
286
sample_data['uncertainty'] = np.random.exponential(0.1, 500)
287
sample_data['source'] = np.random.choice([0, 1, 2, 3], 500) # Enum values
288
sample_data['quality_code'] = np.random.choice([0, 1, 2], 500)
289
290
data_var[:] = sample_data
291
```
292
293
### Reading User-Defined Types
294
295
```python
296
with h5netcdf.File('read_types.nc', 'r') as f:
297
# List all user-defined types
298
print("Enumeration types:")
299
for name, enum_type in f.enumtypes.items():
300
print(f" {name}: {enum_type.enum_dict}")
301
302
print("\nVariable-length types:")
303
for name, vl_type in f.vltypes.items():
304
print(f" {name}: {vl_type.dtype}")
305
306
print("\nCompound types:")
307
for name, cmp_type in f.cmptypes.items():
308
print(f" {name}: {cmp_type.dtype}")
309
310
# Read data with user-defined types
311
if 'observations' in f.variables:
312
obs = f.variables['observations']
313
data = obs[:]
314
315
# Access individual fields of compound data
316
temperatures = data['temperature']
317
pressures = data['pressure']
318
319
print(f"Temperature range: {temperatures.min():.1f} to {temperatures.max():.1f}")
320
print(f"Pressure range: {pressures.min():.1f} to {pressures.max():.1f}")
321
```
322
323
### Type Inheritance in Groups
324
325
```python
326
with h5netcdf.File('type_inheritance.nc', 'w') as f:
327
# Create types in root group
328
status_enum = f.create_enumtype('i1', 'status', {
329
'active': 1,
330
'inactive': 0,
331
'maintenance': 2
332
})
333
334
# Create child group
335
sensors = f.create_group('sensors')
336
337
# Child groups inherit parent types
338
sensors.dimensions['sensor_id'] = 100
339
340
# Use parent's enum type in child group
341
sensor_status = sensors.create_variable('status', ('sensor_id',),
342
dtype=status_enum)
343
344
# Create group-specific type
345
sensor_type_enum = sensors.create_enumtype('i1', 'sensor_type', {
346
'temperature': 0,
347
'humidity': 1,
348
'pressure': 2,
349
'wind': 3
350
})
351
352
sensor_type_var = sensors.create_variable('type', ('sensor_id',),
353
dtype=sensor_type_enum)
354
```
355
356
### Legacy API Compatibility
357
358
```python
359
import h5netcdf.legacyapi as netCDF4
360
361
with netCDF4.Dataset('legacy_types.nc', 'w') as f:
362
# Legacy API methods (aliases to core methods)
363
quality_enum = f.createEnumType('i1', 'quality', {
364
'good': 0,
365
'bad': 1,
366
'missing': 2
367
})
368
369
vlen_str = f.createVLType(str, 'vlen_string')
370
371
compound_dtype = np.dtype([('x', 'f4'), ('y', 'f4')])
372
point_type = f.createCompoundType(compound_dtype, 'point')
373
374
# Create variables using these types
375
f.createDimension('n', 10)
376
377
quality_var = f.createVariable('quality', quality_enum, ('n',))
378
text_var = f.createVariable('text', vlen_str, ('n',))
379
points_var = f.createVariable('points', point_type, ('n',))
380
```
381
382
## Type Validation and Best Practices
383
384
### Enumeration Guidelines
385
- Use meaningful names for enum values
386
- Keep integer values small and sequential
387
- Document enum meanings in variable attributes
388
- Consider using flags for multiple boolean properties
389
390
### Variable-Length Considerations
391
- VL types can impact performance with large datasets
392
- Consider fixed-size alternatives when possible
393
- Be aware of memory usage with large VL arrays
394
395
### Compound Type Design
396
- Use descriptive field names
397
- Group related fields logically
398
- Consider alignment and padding for performance
399
- Document field meanings and units
400
401
### Compatibility Notes
402
- User-defined types are netCDF4-specific features
403
- Not all tools support all user-defined types
404
- Test compatibility with target applications
405
- Provide fallback variables for critical data