0
# Input/Output Operations
1
2
Comprehensive file I/O operations for loading, saving, and formatting array data. CuPy provides NumPy-compatible I/O functions for various data formats including binary files, compressed archives, text files, and custom formatting options with seamless GPU memory management.
3
4
## Capabilities
5
6
### Binary File Operations
7
8
Efficient binary file I/O for preserving exact array data with metadata and supporting single arrays or multiple arrays in compressed archives.
9
10
```python { .api }
11
def save(file, arr, allow_pickle=True, fix_imports=True):
12
"""
13
Save array to binary file in NumPy .npy format.
14
15
Parameters:
16
- file: str or file-like, output file path or file object
17
- arr: array_like, array data to save
18
- allow_pickle: bool, allow saving object arrays with pickle
19
- fix_imports: bool, force pickle protocol 2 for Python 2 compatibility
20
21
Notes:
22
- Data is transferred to CPU before saving
23
- Preserves dtype, shape, and array metadata
24
- Compatible with numpy.load()
25
"""
26
27
def load(file, mmap_mode=None, allow_pickle=False, fix_imports=True, encoding='ASCII'):
28
"""
29
Load arrays from binary .npy, .npz files or pickled files.
30
31
Parameters:
32
- file: str or file-like, input file path or file object
33
- mmap_mode: {None, 'r+', 'r', 'w+', 'c'}, memory mapping mode
34
- allow_pickle: bool, allow loading pickled object arrays
35
- fix_imports: bool, assume pickle protocol 2 names for Python 2 compatibility
36
- encoding: str, encoding for reading Python 2 strings
37
38
Returns:
39
- ndarray or NpzFile: Loaded array data on GPU
40
41
Notes:
42
- Automatically transfers loaded data to GPU
43
- Supports .npy single array and .npz archive formats
44
- Compatible with numpy.save() output
45
"""
46
47
def savez(file, *args, **kwds):
48
"""
49
Save multiple arrays to single compressed file.
50
51
Parameters:
52
- file: str or file-like, output file path
53
- *args: arrays to save with automatic naming (arr_0, arr_1, ...)
54
- **kwds: arrays to save with specified names
55
56
Notes:
57
- Creates .npz archive with multiple arrays
58
- Arrays transferred to CPU before saving
59
- Useful for saving related datasets together
60
"""
61
62
def savez_compressed(file, *args, **kwds):
63
"""
64
Save multiple arrays to compressed .npz archive.
65
66
Parameters:
67
- file: str or file-like, output file path
68
- *args: arrays to save with automatic naming
69
- **kwds: arrays to save with specified names
70
71
Notes:
72
- Same as savez() but with compression for smaller files
73
- Slower saving but reduced disk space usage
74
- Recommended for long-term storage
75
"""
76
```
77
78
### Text File Operations
79
80
Human-readable text format I/O for data exchange, debugging, and integration with other tools and programming languages.
81
82
```python { .api }
83
def loadtxt(fname, dtype=float, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0, encoding='bytes', max_rows=None):
84
"""
85
Load data from text file with each row containing array elements.
86
87
Parameters:
88
- fname: str or file-like, input file path or file object
89
- dtype: data type, optional (default: float)
90
- comments: str or sequence, characters marking comment lines
91
- delimiter: str, optional, field delimiter (default: whitespace)
92
- converters: dict, optional, mapping column to conversion function
93
- skiprows: int, lines to skip at beginning of file
94
- usecols: int or sequence, columns to read
95
- unpack: bool, return separate arrays for each column
96
- ndmin: int, minimum number of dimensions for returned array
97
- encoding: str, encoding for decoding input file
98
- max_rows: int, optional, maximum rows to read
99
100
Returns:
101
- ndarray: Loaded data on GPU
102
103
Notes:
104
- Data loaded on CPU then transferred to GPU
105
- Compatible with CSV and whitespace-delimited formats
106
- Handles various numeric formats and missing values
107
"""
108
109
def savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\n', header='', footer='', comments='# ', encoding=None):
110
"""
111
Save array to text file.
112
113
Parameters:
114
- fname: str or file-like, output file path or file object
115
- X: 1D or 2D array_like, data to save
116
- fmt: str or sequence, format string for elements
117
- delimiter: str, string separating columns
118
- newline: str, string separating lines
119
- header: str, string written at beginning of file
120
- footer: str, string written at end of file
121
- comments: str, string prefixing header and footer
122
- encoding: str, encoding for output file
123
124
Notes:
125
- Array transferred to CPU before saving
126
- Supports custom formatting for each column
127
- Human-readable output suitable for external tools
128
"""
129
130
def genfromtxt(fname, dtype=float, comments='#', delimiter=None, skip_header=0, skip_footer=0, converters=None, missing_values=None, filling_values=None, usecols=None, names=None, excludelist=None, deletechars=''.join(sorted(''.join(sorted("~!@#$%^&*()+={}[]|\\:;\"'<>,.?/")))), defaultfmt="f%i", autostrip=False, replace_space='_', case_sensitive=True, unpack=None, ndmin=0, encoding='bytes', max_rows=None):
131
"""
132
Load data from text file with enhanced handling of missing values.
133
134
Parameters:
135
- fname: str or file-like, input file path
136
- dtype: data type for array
137
- comments: str, characters marking comment lines
138
- delimiter: str, field delimiter
139
- skip_header: int, lines to skip at start
140
- skip_footer: int, lines to skip at end
141
- converters: dict, column converters
142
- missing_values: set, strings representing missing data
143
- filling_values: values to use for missing data
144
- usecols: sequence, columns to read
145
- names: bool or list, field names for structured arrays
146
- excludelist: sequence, names to exclude
147
- deletechars: str, characters to remove from field names
148
- defaultfmt: str, default field name format
149
- autostrip: bool, automatically strip whitespace
150
- replace_space: str, character to replace spaces in names
151
- case_sensitive: bool, field name case sensitivity
152
- unpack: bool, return separate arrays
153
- ndmin: int, minimum dimensions
154
- encoding: str, file encoding
155
- max_rows: int, maximum rows to read
156
157
Returns:
158
- ndarray: Loaded data on GPU
159
160
Notes:
161
- More robust than loadtxt for complex text formats
162
- Handles missing values and structured data
163
- Supports named fields and data validation
164
"""
165
166
def fromfile(file, dtype=float, count=-1, sep='', offset=0):
167
"""
168
Construct array from data in text or binary file.
169
170
Parameters:
171
- file: str or file-like, input file
172
- dtype: data type for reading
173
- count: int, number of items to read (-1 for all)
174
- sep: str, separator between items (empty for binary)
175
- offset: int, offset from start of file
176
177
Returns:
178
- ndarray: 1D array constructed from file data
179
180
Notes:
181
- Binary mode when sep is empty string
182
- Text mode when sep is specified
183
- Data transferred to GPU after reading
184
"""
185
```
186
187
### Data Conversion and Transfer
188
189
Functions for seamless data transfer between CPU and GPU memory with format conversion capabilities.
190
191
```python { .api }
192
def frombuffer(buffer, dtype=float, count=-1, offset=0):
193
"""
194
Interpret buffer as 1D array.
195
196
Parameters:
197
- buffer: buffer_like, object exposing buffer interface
198
- dtype: data type for interpretation
199
- count: int, number of items to read (-1 for all)
200
- offset: int, start reading from this offset
201
202
Returns:
203
- ndarray: 1D array view of buffer data on GPU
204
205
Notes:
206
- Creates view into existing buffer
207
- Data copied to GPU memory
208
- Useful for interfacing with other libraries
209
"""
210
211
def fromstring(string, dtype=float, count=-1, sep=''):
212
"""
213
Create array from string data.
214
215
Parameters:
216
- string: str, string containing array data
217
- dtype: data type for parsing
218
- count: int, number of items to read (-1 for all)
219
- sep: str, separator between items
220
221
Returns:
222
- ndarray: 1D array parsed from string on GPU
223
224
Notes:
225
- Whitespace-separated when sep is empty
226
- Custom separator supported
227
- Convenient for parsing string-formatted data
228
"""
229
230
def fromfunction(func, shape, dtype=float, **kwargs):
231
"""
232
Construct array by executing function over coordinate arrays.
233
234
Parameters:
235
- func: callable, function to evaluate over coordinate grids
236
- shape: sequence of ints, shape of output array
237
- dtype: data type for output
238
- **kwargs: additional arguments passed to func
239
240
Returns:
241
- ndarray: Array with values func(coordinates) on GPU
242
243
Notes:
244
- Function called with coordinate arrays as arguments
245
- Useful for generating coordinate-based patterns
246
- Function executed on GPU when possible
247
"""
248
249
def fromiter(iterable, dtype, count=-1):
250
"""
251
Create array from iterable object.
252
253
Parameters:
254
- iterable: iterable, sequence of values
255
- dtype: data type for array elements
256
- count: int, number of items to read (-1 for all)
257
258
Returns:
259
- ndarray: 1D array created from iterable on GPU
260
261
Notes:
262
- Iterates through all items if count is -1
263
- Efficient for converting Python sequences
264
- Data transferred to GPU after creation
265
"""
266
```
267
268
### Array Formatting and Display
269
270
Comprehensive formatting functions for array visualization, debugging, and custom string representations.
271
272
```python { .api }
273
def array_repr(arr, max_line_width=None, precision=None, suppress_small=None):
274
"""
275
Return string representation of array.
276
277
Parameters:
278
- arr: ndarray, input array
279
- max_line_width: int, maximum characters per line
280
- precision: int, floating point precision
281
- suppress_small: bool, suppress small floating point values
282
283
Returns:
284
- str: String representation suitable for eval()
285
286
Notes:
287
- Creates repr() string that could recreate array
288
- Respects NumPy print options
289
- Array data transferred to CPU for formatting
290
"""
291
292
def array_str(a, max_line_width=None, precision=None, suppress_small=None):
293
"""
294
Return string representation of array data.
295
296
Parameters:
297
- a: ndarray, input array
298
- max_line_width: int, maximum characters per line
299
- precision: int, floating point precision
300
- suppress_small: bool, suppress small values
301
302
Returns:
303
- str: String representation of array contents
304
305
Notes:
306
- Creates str() representation for display
307
- Does not include array constructor syntax
308
- Formatted for human readability
309
"""
310
311
def array2string(a, max_line_width=None, precision=None, suppress_small=None, separator=' ', prefix='', style=float64, formatter=None, threshold=None, edgeitems=None, sign=None, floatmode=None, suffix='', legacy=None):
312
"""
313
Return string representation with full formatting control.
314
315
Parameters:
316
- a: ndarray, input array
317
- max_line_width: int, maximum line width
318
- precision: int, floating point precision
319
- suppress_small: bool, suppress small values
320
- separator: str, element separator
321
- prefix: str, prefix for each line
322
- style: callable, deprecated formatting function
323
- formatter: dict, custom formatters for different types
324
- threshold: int, total items before summarizing
325
- edgeitems: int, items at each edge when summarizing
326
- sign: str, control sign printing ('+', '-', ' ')
327
- floatmode: str, floating point format mode
328
- suffix: str, suffix for each line
329
- legacy: str, compatibility mode
330
331
Returns:
332
- str: Formatted string representation
333
334
Notes:
335
- Most flexible formatting function
336
- Supports custom formatters for different data types
337
- Handles large arrays with summarization
338
"""
339
340
def format_float_positional(x, precision=None, unique=True, fractional=True, trim='k', sign=False, pad_left=None, pad_right=None):
341
"""
342
Format float in positional notation.
343
344
Parameters:
345
- x: float, value to format
346
- precision: int, number of digits after decimal
347
- unique: bool, use minimum precision for unique representation
348
- fractional: bool, use fractional precision mode
349
- trim: str, trim trailing zeros ('k', '0', '.')
350
- sign: bool, always show sign
351
- pad_left: int, minimum total width
352
- pad_right: int, pad right side
353
354
Returns:
355
- str: Formatted float string
356
"""
357
358
def format_float_scientific(x, precision=None, unique=True, trim='k', sign=False, pad_left=None, exp_digits=None):
359
"""
360
Format float in scientific notation.
361
362
Parameters:
363
- x: float, value to format
364
- precision: int, number of digits after decimal
365
- unique: bool, use minimum precision for unique representation
366
- trim: str, trim trailing zeros
367
- sign: bool, always show sign
368
- pad_left: int, minimum total width
369
- exp_digits: int, minimum exponent digits
370
371
Returns:
372
- str: Formatted float string in scientific notation
373
"""
374
```
375
376
### Usage Examples
377
378
#### Basic File I/O
379
380
```python
381
import cupy as cp
382
383
# Create sample data
384
data = cp.random.random((1000, 100))
385
labels = cp.random.randint(0, 10, 1000)
386
387
# Save arrays to files
388
cp.save('data.npy', data)
389
cp.savez('dataset.npz', features=data, labels=labels)
390
cp.savez_compressed('dataset_compressed.npz', features=data, labels=labels)
391
392
# Load arrays from files
393
loaded_data = cp.load('data.npy')
394
archive = cp.load('dataset.npz')
395
features = archive['features']
396
labels = archive['labels']
397
398
print(f"Original shape: {data.shape}, Loaded shape: {loaded_data.shape}")
399
print(f"Data matches: {cp.allclose(data, loaded_data)}")
400
```
401
402
#### Text File Operations
403
404
```python
405
import cupy as cp
406
407
# Save data to text file
408
data = cp.array([[1.1, 2.2, 3.3],
409
[4.4, 5.5, 6.6],
410
[7.7, 8.8, 9.9]])
411
412
cp.savetxt('data.txt', data, delimiter=',', header='col1,col2,col3', fmt='%.2f')
413
414
# Load data from text file
415
loaded = cp.loadtxt('data.txt', delimiter=',', skiprows=1)
416
print(f"Text data shape: {loaded.shape}")
417
418
# Handle CSV with mixed data types using genfromtxt
419
# Assuming file with columns: name, age, score
420
mixed_data = cp.genfromtxt('mixed_data.csv',
421
delimiter=',',
422
names=True,
423
dtype=None,
424
encoding='utf-8')
425
```
426
427
#### Advanced Formatting
428
429
```python
430
import cupy as cp
431
432
# Create array for formatting examples
433
arr = cp.array([[1.23456789, 2.87654321],
434
[0.00000012, 999999.999]])
435
436
# Different representation formats
437
print("Default repr:")
438
print(cp.array_repr(arr))
439
440
print("\nCustom precision:")
441
print(cp.array_str(arr, precision=2))
442
443
print("\nScientific notation:")
444
print(cp.array2string(arr, formatter={'float': '{:.2e}'.format}))
445
446
# Format individual floats
447
value = 123.456789
448
positional = cp.format_float_positional(value, precision=2)
449
scientific = cp.format_float_scientific(value, precision=2)
450
print(f"Positional: {positional}, Scientific: {scientific}")
451
```
452
453
#### Data Transfer Workflows
454
455
```python
456
import cupy as cp
457
import numpy as np
458
459
# CPU to GPU workflow
460
cpu_data = np.random.random((10000, 1000))
461
462
# Method 1: Direct conversion
463
gpu_data = cp.asarray(cpu_data)
464
465
# Method 2: Save/load (useful for large datasets)
466
np.save('temp_data.npy', cpu_data)
467
gpu_data = cp.load('temp_data.npy')
468
469
# GPU to CPU workflow
470
result = cp.random.random((1000, 1000))
471
472
# Method 1: Direct conversion
473
cpu_result = cp.asnumpy(result)
474
475
# Method 2: Save to file
476
cp.save('gpu_result.npy', result)
477
# Later load on CPU
478
cpu_result = np.load('gpu_result.npy')
479
480
# Verify data integrity
481
print(f"Data preserved: {np.allclose(cpu_data, cp.asnumpy(gpu_data))}")
482
```
483
484
#### Batch Processing
485
486
```python
487
import cupy as cp
488
import os
489
490
# Process multiple files
491
file_pattern = 'data_batch_*.npy'
492
results = []
493
494
for filename in sorted(os.glob(file_pattern)):
495
# Load batch
496
batch = cp.load(filename)
497
498
# Process on GPU
499
processed = cp.fft.fft2(batch)
500
result = cp.abs(processed).mean(axis=(1,2))
501
502
results.append(result)
503
504
# Combine results and save
505
final_result = cp.concatenate(results)
506
cp.save('processed_results.npy', final_result)
507
508
# Save processing log as text
509
processing_info = cp.array([len(results), final_result.shape[0], final_result.mean()])
510
cp.savetxt('processing_log.txt', processing_info,
511
header='num_batches,total_samples,mean_value',
512
fmt='%.6f')
513
```
514
515
## Notes
516
517
- All I/O operations automatically handle CPU/GPU memory transfers
518
- Binary formats (.npy, .npz) preserve exact precision and metadata
519
- Text formats are human-readable but may lose precision for floating-point data
520
- Compressed archives (.npz with compression) balance storage efficiency and loading speed
521
- File I/O operations are synchronous and will block until completion
522
- Large datasets may benefit from chunked I/O to manage memory usage
523
- CuPy I/O functions are fully compatible with NumPy file formats
524
- For maximum performance, keep data on GPU and minimize CPU/GPU transfers during processing