Tessl Tile for pypi/h5netcdf@1.6.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

attributes.md dimensions.md file-operations.md groups.md index.md legacy-api.md user-types.md variables.md

user-types.mddocs/

0
# User-Defined Types
1

2
NetCDF4 supports user-defined data types including enumeration types, variable-length types, and compound (structured) types. These enable complex data structures beyond basic numeric and string types.
3

4
## Capabilities
5

6
### Base User Type
7

8
Common functionality for all user-defined types.
9

10
```python { .api }
11
class UserType(BaseObject):
12
    @property
13
    def name(self) -> str:
14
        """Type name."""
15
        ...
16
    
17
    @property
18
    def dtype(self) -> np.dtype:
19
        """NumPy dtype representation."""
20
        ...
21
```
22

23
### Enumeration Types
24

25
Define discrete sets of named values, useful for categorical data and flags.
26

27
```python { .api }
28
class EnumType(UserType):
29
    @property
30
    def enum_dict(self) -> dict:
31
        """Dictionary mapping enum names to values."""
32
        ...
33

34
def create_enumtype(self, datatype, datatype_name: str, enum_dict: dict) -> EnumType:
35
    """
36
    Create an enumeration type.
37
    
38
    Args:
39
        datatype: Base integer type (e.g., 'i1', 'i2', 'i4')
40
        datatype_name (str): Name for the enumeration type
41
        enum_dict (dict): Mapping of enum names to integer values
42
        
43
    Returns:
44
        EnumType: The created enumeration type
45
    """
46
    ...
47
```
48

49
### Variable-Length Types
50

51
Store arrays of varying lengths, useful for ragged arrays and string data.
52

53
```python { .api }
54
class VLType(UserType):
55
    pass
56

57
def create_vltype(self, datatype, datatype_name: str) -> VLType:
58
    """
59
    Create a variable-length type.
60
    
61
    Args:
62
        datatype: Base data type for array elements
63
        datatype_name (str): Name for the variable-length type
64
        
65
    Returns:
66
        VLType: The created variable-length type
67
    """
68
    ...
69
```
70

71
### Compound Types
72

73
Define structured types with multiple named fields, similar to C structs.
74

75
```python { .api }
76
class CompoundType(UserType):
77
    @property
78
    def dtype_view(self) -> np.dtype:
79
        """Alternative dtype view for string handling."""
80
        ...
81

82
def create_cmptype(self, datatype, datatype_name: str) -> CompoundType:
83
    """
84
    Create a compound type.
85
    
86
    Args:
87
        datatype: NumPy structured dtype defining the compound type
88
        datatype_name (str): Name for the compound type
89
        
90
    Returns:
91
        CompoundType: The created compound type
92
    """
93
    ...
94
```
95

96
### Type Access
97

98
Access user-defined types within groups.
99

100
```python { .api }
101
@property
102
def enumtypes(self) -> Frozen:
103
    """Dictionary-like access to enumeration types."""
104
    ...
105

106
@property
107
def vltypes(self) -> Frozen:
108
    """Dictionary-like access to variable-length types."""
109
    ...
110

111
@property
112
def cmptypes(self) -> Frozen:
113
    """Dictionary-like access to compound types."""
114
    ...
115
```
116

117
## Usage Examples
118

119
### Enumeration Types
120

121
```python
122
import h5netcdf
123
import numpy as np
124

125
with h5netcdf.File('enum_types.nc', 'w') as f:
126
    # Create enumeration for quality flags
127
    quality_enum = f.create_enumtype(
128
        'i1',  # Base type: signed 8-bit integer
129
        'quality_flag',
130
        {
131
            'good': 0,
132
            'questionable': 1,
133
            'bad': 2,
134
            'missing': 3
135
        }
136
    )
137
    
138
    # Create enumeration for weather conditions
139
    weather_enum = f.create_enumtype(
140
        'i2',  # Base type: signed 16-bit integer
141
        'weather_type',
142
        {
143
            'clear': 0,
144
            'partly_cloudy': 1,
145
            'cloudy': 2,
146
            'rain': 3,
147
            'snow': 4,
148
            'storm': 5
149
        }
150
    )
151
    
152
    # Create dimensions and variables using enum types
153
    f.dimensions['time'] = 100
154
    f.dimensions['station'] = 50
155
    
156
    quality = f.create_variable('quality', ('time', 'station'), 
157
                               dtype=quality_enum)
158
    weather = f.create_variable('weather', ('time', 'station'), 
159
                               dtype=weather_enum)
160
    
161
    # Write enum values using integer codes
162
    quality[0, :] = np.random.choice([0, 1, 2, 3], size=50)
163
    weather[0, :] = np.random.choice([0, 1, 2, 3, 4, 5], size=50)
164
    
165
    # Access enum information
166
    print(f"Quality enum values: {quality_enum.enum_dict}")
167
    print(f"Weather enum values: {weather_enum.enum_dict}")
168
```
169

170
### Variable-Length Types
171

172
```python
173
with h5netcdf.File('vlen_types.nc', 'w') as f:
174
    # Create variable-length string type
175
    vlen_str = f.create_vltype(str, 'vlen_string')
176
    
177
    # Create variable-length integer array type
178
    vlen_int = f.create_vltype('i4', 'vlen_int_array')
179
    
180
    # Create variables using VL types
181
    f.dimensions['record'] = 10
182
    
183
    # Variable-length strings (for varying-length text)
184
    comments = f.create_variable('comments', ('record',), dtype=vlen_str)
185
    
186
    # Variable-length integer arrays (for ragged arrays)
187
    measurements = f.create_variable('measurements', ('record',), dtype=vlen_int)
188
    
189
    # Write variable-length data
190
    comment_data = [
191
        "Short comment",
192
        "This is a much longer comment with more detail",
193
        "Medium length",
194
        "",  # Empty string
195
        "Another comment"
196
    ]
197
    
198
    measurement_data = [
199
        [1, 2, 3],           # 3 values
200
        [4, 5, 6, 7, 8],     # 5 values
201
        [9],                 # 1 value
202
        [],                  # No values
203
        [10, 11]             # 2 values
204
    ]
205
    
206
    # Note: Writing VL data depends on h5py version and backend
207
    # This is conceptual - actual syntax may vary
208
    for i, (comment, measurements_list) in enumerate(zip(comment_data, measurement_data)):
209
        if i < len(comment_data):
210
            comments[i] = comment
211
        if i < len(measurement_data):
212
            measurements[i] = measurements_list
213
```
214

215
### Compound Types
216

217
```python
218
with h5netcdf.File('compound_types.nc', 'w') as f:
219
    # Define compound type for weather observations
220
    weather_dtype = np.dtype([
221
        ('temperature', 'f4'),    # 32-bit float
222
        ('humidity', 'f4'),       # 32-bit float
223
        ('pressure', 'f8'),       # 64-bit float
224
        ('wind_speed', 'f4'),     # 32-bit float
225
        ('wind_direction', 'i2'), # 16-bit integer
226
        ('station_id', 'i4'),     # 32-bit integer
227
        ('timestamp', 'i8')       # 64-bit integer
228
    ])
229
    
230
    weather_compound = f.create_cmptype(weather_dtype, 'weather_obs')
231
    
232
    # Create variable using compound type
233
    f.dimensions['observation'] = 1000
234
    
235
    obs = f.create_variable('observations', ('observation',), 
236
                           dtype=weather_compound)
237
    
238
    # Create structured array data
239
    data = np.zeros(1000, dtype=weather_dtype)
240
    data['temperature'] = np.random.normal(20, 10, 1000)
241
    data['humidity'] = np.random.uniform(30, 90, 1000)
242
    data['pressure'] = np.random.normal(1013.25, 20, 1000)
243
    data['wind_speed'] = np.random.exponential(5, 1000)
244
    data['wind_direction'] = np.random.randint(0, 360, 1000)
245
    data['station_id'] = np.random.randint(1000, 9999, 1000)
246
    data['timestamp'] = np.arange(1000) + 1640000000  # Unix timestamps
247
    
248
    # Write compound data
249
    obs[:] = data
250
    
251
    # Access compound type information
252
    print(f"Compound type fields: {weather_compound.dtype.names}")
253
    print(f"Field types: {[weather_compound.dtype.fields[name][0] for name in weather_compound.dtype.names]}")
254
```
255

256
### Complex Nested Types
257

258
```python
259
with h5netcdf.File('nested_types.nc', 'w') as f:
260
    # Create enumeration for data source
261
    source_enum = f.create_enumtype('i1', 'data_source', {
262
        'satellite': 0,
263
        'ground_station': 1,
264
        'aircraft': 2,
265
        'ship': 3
266
    })
267
    
268
    # Create compound type that includes enum field
269
    measurement_dtype = np.dtype([
270
        ('value', 'f4'),
271
        ('uncertainty', 'f4'),
272
        ('source', 'i1'),  # Will use enum values
273
        ('quality_code', 'i1')
274
    ])
275
    
276
    measurement_compound = f.create_cmptype(measurement_dtype, 'measurement')
277
    
278
    # Create variable using nested types
279
    f.dimensions['sample'] = 500
280
    
281
    data_var = f.create_variable('data', ('sample',), dtype=measurement_compound)
282
    
283
    # Create data with enum values in compound type
284
    sample_data = np.zeros(500, dtype=measurement_dtype)
285
    sample_data['value'] = np.random.normal(0, 1, 500)
286
    sample_data['uncertainty'] = np.random.exponential(0.1, 500)
287
    sample_data['source'] = np.random.choice([0, 1, 2, 3], 500)  # Enum values
288
    sample_data['quality_code'] = np.random.choice([0, 1, 2], 500)
289
    
290
    data_var[:] = sample_data
291
```
292

293
### Reading User-Defined Types
294

295
```python
296
with h5netcdf.File('read_types.nc', 'r') as f:
297
    # List all user-defined types
298
    print("Enumeration types:")
299
    for name, enum_type in f.enumtypes.items():
300
        print(f"  {name}: {enum_type.enum_dict}")
301
    
302
    print("\nVariable-length types:")
303
    for name, vl_type in f.vltypes.items():
304
        print(f"  {name}: {vl_type.dtype}")
305
    
306
    print("\nCompound types:")
307
    for name, cmp_type in f.cmptypes.items():
308
        print(f"  {name}: {cmp_type.dtype}")
309
    
310
    # Read data with user-defined types
311
    if 'observations' in f.variables:
312
        obs = f.variables['observations']
313
        data = obs[:]
314
        
315
        # Access individual fields of compound data
316
        temperatures = data['temperature']
317
        pressures = data['pressure']
318
        
319
        print(f"Temperature range: {temperatures.min():.1f} to {temperatures.max():.1f}")
320
        print(f"Pressure range: {pressures.min():.1f} to {pressures.max():.1f}")
321
```
322

323
### Type Inheritance in Groups
324

325
```python
326
with h5netcdf.File('type_inheritance.nc', 'w') as f:
327
    # Create types in root group
328
    status_enum = f.create_enumtype('i1', 'status', {
329
        'active': 1,
330
        'inactive': 0,
331
        'maintenance': 2
332
    })
333
    
334
    # Create child group
335
    sensors = f.create_group('sensors')
336
    
337
    # Child groups inherit parent types
338
    sensors.dimensions['sensor_id'] = 100
339
    
340
    # Use parent's enum type in child group
341
    sensor_status = sensors.create_variable('status', ('sensor_id',), 
342
                                          dtype=status_enum)
343
    
344
    # Create group-specific type
345
    sensor_type_enum = sensors.create_enumtype('i1', 'sensor_type', {
346
        'temperature': 0,
347
        'humidity': 1,
348
        'pressure': 2,
349
        'wind': 3
350
    })
351
    
352
    sensor_type_var = sensors.create_variable('type', ('sensor_id',), 
353
                                            dtype=sensor_type_enum)
354
```
355

356
### Legacy API Compatibility
357

358
```python
359
import h5netcdf.legacyapi as netCDF4
360

361
with netCDF4.Dataset('legacy_types.nc', 'w') as f:
362
    # Legacy API methods (aliases to core methods)
363
    quality_enum = f.createEnumType('i1', 'quality', {
364
        'good': 0,
365
        'bad': 1,
366
        'missing': 2
367
    })
368
    
369
    vlen_str = f.createVLType(str, 'vlen_string')
370
    
371
    compound_dtype = np.dtype([('x', 'f4'), ('y', 'f4')])
372
    point_type = f.createCompoundType(compound_dtype, 'point')
373
    
374
    # Create variables using these types
375
    f.createDimension('n', 10)
376
    
377
    quality_var = f.createVariable('quality', quality_enum, ('n',))
378
    text_var = f.createVariable('text', vlen_str, ('n',))
379
    points_var = f.createVariable('points', point_type, ('n',))
380
```
381

382
## Type Validation and Best Practices
383

384
### Enumeration Guidelines
385
- Use meaningful names for enum values
386
- Keep integer values small and sequential
387
- Document enum meanings in variable attributes
388
- Consider using flags for multiple boolean properties
389

390
### Variable-Length Considerations
391
- VL types can impact performance with large datasets
392
- Consider fixed-size alternatives when possible
393
- Be aware of memory usage with large VL arrays
394

395
### Compound Type Design
396
- Use descriptive field names
397
- Group related fields logically
398
- Consider alignment and padding for performance
399
- Document field meanings and units
400

401
### Compatibility Notes
402
- User-defined types are netCDF4-specific features
403
- Not all tools support all user-defined types
404
- Test compatibility with target applications
405
- Provide fallback variables for critical data

Version

Tile

Files

user-types.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

user-types.mddocs/