or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

attributes.mddimensions.mdfile-operations.mdgroups.mdindex.mdlegacy-api.mduser-types.mdvariables.md

user-types.mddocs/

0

# User-Defined Types

1

2

NetCDF4 supports user-defined data types including enumeration types, variable-length types, and compound (structured) types. These enable complex data structures beyond basic numeric and string types.

3

4

## Capabilities

5

6

### Base User Type

7

8

Common functionality for all user-defined types.

9

10

```python { .api }

11

class UserType(BaseObject):

12

@property

13

def name(self) -> str:

14

"""Type name."""

15

...

16

17

@property

18

def dtype(self) -> np.dtype:

19

"""NumPy dtype representation."""

20

...

21

```

22

23

### Enumeration Types

24

25

Define discrete sets of named values, useful for categorical data and flags.

26

27

```python { .api }

28

class EnumType(UserType):

29

@property

30

def enum_dict(self) -> dict:

31

"""Dictionary mapping enum names to values."""

32

...

33

34

def create_enumtype(self, datatype, datatype_name: str, enum_dict: dict) -> EnumType:

35

"""

36

Create an enumeration type.

37

38

Args:

39

datatype: Base integer type (e.g., 'i1', 'i2', 'i4')

40

datatype_name (str): Name for the enumeration type

41

enum_dict (dict): Mapping of enum names to integer values

42

43

Returns:

44

EnumType: The created enumeration type

45

"""

46

...

47

```

48

49

### Variable-Length Types

50

51

Store arrays of varying lengths, useful for ragged arrays and string data.

52

53

```python { .api }

54

class VLType(UserType):

55

pass

56

57

def create_vltype(self, datatype, datatype_name: str) -> VLType:

58

"""

59

Create a variable-length type.

60

61

Args:

62

datatype: Base data type for array elements

63

datatype_name (str): Name for the variable-length type

64

65

Returns:

66

VLType: The created variable-length type

67

"""

68

...

69

```

70

71

### Compound Types

72

73

Define structured types with multiple named fields, similar to C structs.

74

75

```python { .api }

76

class CompoundType(UserType):

77

@property

78

def dtype_view(self) -> np.dtype:

79

"""Alternative dtype view for string handling."""

80

...

81

82

def create_cmptype(self, datatype, datatype_name: str) -> CompoundType:

83

"""

84

Create a compound type.

85

86

Args:

87

datatype: NumPy structured dtype defining the compound type

88

datatype_name (str): Name for the compound type

89

90

Returns:

91

CompoundType: The created compound type

92

"""

93

...

94

```

95

96

### Type Access

97

98

Access user-defined types within groups.

99

100

```python { .api }

101

@property

102

def enumtypes(self) -> Frozen:

103

"""Dictionary-like access to enumeration types."""

104

...

105

106

@property

107

def vltypes(self) -> Frozen:

108

"""Dictionary-like access to variable-length types."""

109

...

110

111

@property

112

def cmptypes(self) -> Frozen:

113

"""Dictionary-like access to compound types."""

114

...

115

```

116

117

## Usage Examples

118

119

### Enumeration Types

120

121

```python

122

import h5netcdf

123

import numpy as np

124

125

with h5netcdf.File('enum_types.nc', 'w') as f:

126

# Create enumeration for quality flags

127

quality_enum = f.create_enumtype(

128

'i1', # Base type: signed 8-bit integer

129

'quality_flag',

130

{

131

'good': 0,

132

'questionable': 1,

133

'bad': 2,

134

'missing': 3

135

}

136

)

137

138

# Create enumeration for weather conditions

139

weather_enum = f.create_enumtype(

140

'i2', # Base type: signed 16-bit integer

141

'weather_type',

142

{

143

'clear': 0,

144

'partly_cloudy': 1,

145

'cloudy': 2,

146

'rain': 3,

147

'snow': 4,

148

'storm': 5

149

}

150

)

151

152

# Create dimensions and variables using enum types

153

f.dimensions['time'] = 100

154

f.dimensions['station'] = 50

155

156

quality = f.create_variable('quality', ('time', 'station'),

157

dtype=quality_enum)

158

weather = f.create_variable('weather', ('time', 'station'),

159

dtype=weather_enum)

160

161

# Write enum values using integer codes

162

quality[0, :] = np.random.choice([0, 1, 2, 3], size=50)

163

weather[0, :] = np.random.choice([0, 1, 2, 3, 4, 5], size=50)

164

165

# Access enum information

166

print(f"Quality enum values: {quality_enum.enum_dict}")

167

print(f"Weather enum values: {weather_enum.enum_dict}")

168

```

169

170

### Variable-Length Types

171

172

```python

173

with h5netcdf.File('vlen_types.nc', 'w') as f:

174

# Create variable-length string type

175

vlen_str = f.create_vltype(str, 'vlen_string')

176

177

# Create variable-length integer array type

178

vlen_int = f.create_vltype('i4', 'vlen_int_array')

179

180

# Create variables using VL types

181

f.dimensions['record'] = 10

182

183

# Variable-length strings (for varying-length text)

184

comments = f.create_variable('comments', ('record',), dtype=vlen_str)

185

186

# Variable-length integer arrays (for ragged arrays)

187

measurements = f.create_variable('measurements', ('record',), dtype=vlen_int)

188

189

# Write variable-length data

190

comment_data = [

191

"Short comment",

192

"This is a much longer comment with more detail",

193

"Medium length",

194

"", # Empty string

195

"Another comment"

196

]

197

198

measurement_data = [

199

[1, 2, 3], # 3 values

200

[4, 5, 6, 7, 8], # 5 values

201

[9], # 1 value

202

[], # No values

203

[10, 11] # 2 values

204

]

205

206

# Note: Writing VL data depends on h5py version and backend

207

# This is conceptual - actual syntax may vary

208

for i, (comment, measurements_list) in enumerate(zip(comment_data, measurement_data)):

209

if i < len(comment_data):

210

comments[i] = comment

211

if i < len(measurement_data):

212

measurements[i] = measurements_list

213

```

214

215

### Compound Types

216

217

```python

218

with h5netcdf.File('compound_types.nc', 'w') as f:

219

# Define compound type for weather observations

220

weather_dtype = np.dtype([

221

('temperature', 'f4'), # 32-bit float

222

('humidity', 'f4'), # 32-bit float

223

('pressure', 'f8'), # 64-bit float

224

('wind_speed', 'f4'), # 32-bit float

225

('wind_direction', 'i2'), # 16-bit integer

226

('station_id', 'i4'), # 32-bit integer

227

('timestamp', 'i8') # 64-bit integer

228

])

229

230

weather_compound = f.create_cmptype(weather_dtype, 'weather_obs')

231

232

# Create variable using compound type

233

f.dimensions['observation'] = 1000

234

235

obs = f.create_variable('observations', ('observation',),

236

dtype=weather_compound)

237

238

# Create structured array data

239

data = np.zeros(1000, dtype=weather_dtype)

240

data['temperature'] = np.random.normal(20, 10, 1000)

241

data['humidity'] = np.random.uniform(30, 90, 1000)

242

data['pressure'] = np.random.normal(1013.25, 20, 1000)

243

data['wind_speed'] = np.random.exponential(5, 1000)

244

data['wind_direction'] = np.random.randint(0, 360, 1000)

245

data['station_id'] = np.random.randint(1000, 9999, 1000)

246

data['timestamp'] = np.arange(1000) + 1640000000 # Unix timestamps

247

248

# Write compound data

249

obs[:] = data

250

251

# Access compound type information

252

print(f"Compound type fields: {weather_compound.dtype.names}")

253

print(f"Field types: {[weather_compound.dtype.fields[name][0] for name in weather_compound.dtype.names]}")

254

```

255

256

### Complex Nested Types

257

258

```python

259

with h5netcdf.File('nested_types.nc', 'w') as f:

260

# Create enumeration for data source

261

source_enum = f.create_enumtype('i1', 'data_source', {

262

'satellite': 0,

263

'ground_station': 1,

264

'aircraft': 2,

265

'ship': 3

266

})

267

268

# Create compound type that includes enum field

269

measurement_dtype = np.dtype([

270

('value', 'f4'),

271

('uncertainty', 'f4'),

272

('source', 'i1'), # Will use enum values

273

('quality_code', 'i1')

274

])

275

276

measurement_compound = f.create_cmptype(measurement_dtype, 'measurement')

277

278

# Create variable using nested types

279

f.dimensions['sample'] = 500

280

281

data_var = f.create_variable('data', ('sample',), dtype=measurement_compound)

282

283

# Create data with enum values in compound type

284

sample_data = np.zeros(500, dtype=measurement_dtype)

285

sample_data['value'] = np.random.normal(0, 1, 500)

286

sample_data['uncertainty'] = np.random.exponential(0.1, 500)

287

sample_data['source'] = np.random.choice([0, 1, 2, 3], 500) # Enum values

288

sample_data['quality_code'] = np.random.choice([0, 1, 2], 500)

289

290

data_var[:] = sample_data

291

```

292

293

### Reading User-Defined Types

294

295

```python

296

with h5netcdf.File('read_types.nc', 'r') as f:

297

# List all user-defined types

298

print("Enumeration types:")

299

for name, enum_type in f.enumtypes.items():

300

print(f" {name}: {enum_type.enum_dict}")

301

302

print("\nVariable-length types:")

303

for name, vl_type in f.vltypes.items():

304

print(f" {name}: {vl_type.dtype}")

305

306

print("\nCompound types:")

307

for name, cmp_type in f.cmptypes.items():

308

print(f" {name}: {cmp_type.dtype}")

309

310

# Read data with user-defined types

311

if 'observations' in f.variables:

312

obs = f.variables['observations']

313

data = obs[:]

314

315

# Access individual fields of compound data

316

temperatures = data['temperature']

317

pressures = data['pressure']

318

319

print(f"Temperature range: {temperatures.min():.1f} to {temperatures.max():.1f}")

320

print(f"Pressure range: {pressures.min():.1f} to {pressures.max():.1f}")

321

```

322

323

### Type Inheritance in Groups

324

325

```python

326

with h5netcdf.File('type_inheritance.nc', 'w') as f:

327

# Create types in root group

328

status_enum = f.create_enumtype('i1', 'status', {

329

'active': 1,

330

'inactive': 0,

331

'maintenance': 2

332

})

333

334

# Create child group

335

sensors = f.create_group('sensors')

336

337

# Child groups inherit parent types

338

sensors.dimensions['sensor_id'] = 100

339

340

# Use parent's enum type in child group

341

sensor_status = sensors.create_variable('status', ('sensor_id',),

342

dtype=status_enum)

343

344

# Create group-specific type

345

sensor_type_enum = sensors.create_enumtype('i1', 'sensor_type', {

346

'temperature': 0,

347

'humidity': 1,

348

'pressure': 2,

349

'wind': 3

350

})

351

352

sensor_type_var = sensors.create_variable('type', ('sensor_id',),

353

dtype=sensor_type_enum)

354

```

355

356

### Legacy API Compatibility

357

358

```python

359

import h5netcdf.legacyapi as netCDF4

360

361

with netCDF4.Dataset('legacy_types.nc', 'w') as f:

362

# Legacy API methods (aliases to core methods)

363

quality_enum = f.createEnumType('i1', 'quality', {

364

'good': 0,

365

'bad': 1,

366

'missing': 2

367

})

368

369

vlen_str = f.createVLType(str, 'vlen_string')

370

371

compound_dtype = np.dtype([('x', 'f4'), ('y', 'f4')])

372

point_type = f.createCompoundType(compound_dtype, 'point')

373

374

# Create variables using these types

375

f.createDimension('n', 10)

376

377

quality_var = f.createVariable('quality', quality_enum, ('n',))

378

text_var = f.createVariable('text', vlen_str, ('n',))

379

points_var = f.createVariable('points', point_type, ('n',))

380

```

381

382

## Type Validation and Best Practices

383

384

### Enumeration Guidelines

385

- Use meaningful names for enum values

386

- Keep integer values small and sequential

387

- Document enum meanings in variable attributes

388

- Consider using flags for multiple boolean properties

389

390

### Variable-Length Considerations

391

- VL types can impact performance with large datasets

392

- Consider fixed-size alternatives when possible

393

- Be aware of memory usage with large VL arrays

394

395

### Compound Type Design

396

- Use descriptive field names

397

- Group related fields logically

398

- Consider alignment and padding for performance

399

- Document field meanings and units

400

401

### Compatibility Notes

402

- User-defined types are netCDF4-specific features

403

- Not all tools support all user-defined types

404

- Test compatibility with target applications

405

- Provide fallback variables for critical data