Tessl Tile for pypi/cudf-cu12@25.8.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

core-data-structures.md data-manipulation.md index.md io-operations.md pandas-compatibility.md testing-utilities.md type-checking.md

io-operations.mddocs/

0
# I/O Operations
1

2
cuDF provides high-performance GPU I/O for popular data formats with automatic memory management and optimized readers/writers. All I/O operations leverage GPU memory directly, minimizing CPU-GPU data transfers.
3

4
## Import Statements
5

6
```python
7
# Core I/O functions
8
from cudf import read_csv, read_parquet, read_json
9
from cudf.io import read_orc, read_avro, read_feather, read_hdf, read_text
10
from cudf.io.csv import to_csv
11
from cudf.io.orc import to_orc
12

13
# Parquet utilities
14
from cudf.io.parquet import (
15
    read_parquet_metadata, merge_parquet_filemetadata,
16
    ParquetDatasetWriter, write_to_dataset
17
)
18

19
# ORC utilities  
20
from cudf.io.orc import read_orc_metadata
21

22
# Interoperability
23
from cudf.io.dlpack import from_dlpack
24
```
25

26
## CSV I/O
27

28
High-performance CSV reading with extensive parsing options.
29

30
```{ .api }
31
def read_csv(
32
    filepath_or_buffer,
33
    sep=',',
34
    delimiter=None, 
35
    header='infer',
36
    names=None,
37
    index_col=None,
38
    usecols=None,
39
    dtype=None,
40
    skiprows=None,
41
    skipfooter=0,
42
    nrows=None,
43
    na_values=None,
44
    keep_default_na=True,
45
    na_filter=True,
46
    skip_blank_lines=True,
47
    parse_dates=False,
48
    date_parser=None,
49
    dayfirst=False,
50
    compression='infer',
51
    thousands=None,
52
    decimal='.',
53
    lineterminator=None,
54
    quotechar='"',
55
    quoting=0,
56
    doublequote=True,
57
    escapechar=None,
58
    comment=None,
59
    encoding='utf-8',
60
    storage_options=None,
61
    **kwargs
62
) -> DataFrame:
63
    """
64
    Read CSV file directly into GPU memory with optimized parsing
65
    
66
    Provides GPU-accelerated CSV parsing with extensive configuration options.
67
    Automatically detects and handles various CSV formats and encodings.
68
    
69
    Parameters:
70
        filepath_or_buffer: str, PathLike, or file-like object
71
            File path, URL, or buffer containing CSV data
72
        sep: str, default ','
73
            Field delimiter character
74
        delimiter: str, optional
75
            Alternative name for sep parameter
76
        header: int, list of int, or 'infer', default 'infer'
77
            Row number(s) to use as column names
78
        names: list, optional
79
            List of column names to use instead of header
80
        index_col: int, str, or list, optional
81
            Column(s) to use as row labels
82
        usecols: list or callable, optional
83
            Subset of columns to read
84
        dtype: dict or str, optional
85
            Data type specification for columns
86
        skiprows: int, list, or callable, optional
87
            Rows to skip at beginning of file
88
        skipfooter: int, default 0
89
            Number of rows to skip at end of file
90
        nrows: int, optional
91
            Maximum number of rows to read
92
        na_values: scalar, str, list, or dict, optional
93
            Additional strings to recognize as NA/NaN
94
        keep_default_na: bool, default True
95
            Whether to include default NaN values
96
        na_filter: bool, default True
97
            Whether to check for missing values
98
        skip_blank_lines: bool, default True
99
            Whether to skip blank lines
100
        parse_dates: bool, list, or dict, default False
101
            Columns to parse as dates
102
        compression: str or dict, default 'infer'
103
            Type of compression ('gzip', 'bz2', 'xz', 'zip', None)
104
        encoding: str, default 'utf-8'
105
            Character encoding to use
106
        storage_options: dict, optional
107
            Options for cloud storage access
108
        **kwargs: additional keyword arguments
109
            Other CSV parsing options
110
            
111
    Returns:
112
        DataFrame: GPU DataFrame containing parsed CSV data
113
        
114
    Examples:
115
        # Basic CSV reading
116
        df = cudf.read_csv('data.csv')
117
        
118
        # With custom options
119
        df = cudf.read_csv(
120
            'data.csv',
121
            sep=';',
122
            header=0,
123
            dtype={'col1': 'int64', 'col2': 'float32'},
124
            parse_dates=['date_column']
125
        )
126
        
127
        # From URL with compression
128
        df = cudf.read_csv(
129
            'https://example.com/data.csv.gz',
130
            compression='gzip'
131
        )
132
    """
133
```
134

135
### CSV Writing
136

137
```{ .api }
138
def to_csv(
139
    path_or_buf=None,
140
    sep=',',
141
    na_rep='',
142
    float_format=None,
143
    columns=None,
144
    header=True,
145
    index=True,
146
    index_label=None,
147
    mode='w',
148
    encoding=None,
149
    compression='infer',
150
    quoting=None,
151
    quotechar='"',
152
    line_terminator=None,
153
    chunksize=None,
154
    date_format=None,
155
    doublequote=True,
156
    escapechar=None,
157
    decimal='.',
158
    **kwargs
159
):
160
    """
161
    Write GPU DataFrame to CSV format
162
    
163
    High-performance CSV writing with customizable formatting options.
164
    Writes directly from GPU memory with minimal data transfers.
165
    
166
    Parameters:
167
        path_or_buf: str, path object, or file-like object
168
            File path or object to write to
169
        sep: str, default ','
170
            Field delimiter character
171
        na_rep: str, default ''
172
            String representation of NaN values
173
        float_format: str, optional
174
            Format string for floating point numbers
175
        columns: sequence, optional
176
            Columns to write
177
        header: bool or list of str, default True
178
            Write column names as header
179
        index: bool, default True
180
            Write row names (index)
181
        mode: str, default 'w'
182
            File mode ('w' for write, 'a' for append)
183
        compression: str or dict, default 'infer'
184
            Compression type ('gzip', 'bz2', 'xz', 'zstd', etc.)
185
        **kwargs: additional keyword arguments
186
            Other CSV writing options
187
            
188
    Examples:
189
        # Basic CSV writing
190
        df.to_csv('output.csv')
191
        
192
        # Custom formatting
193
        df.to_csv('output.csv', sep=';', index=False, float_format='%.2f')
194
        
195
        # Compressed output
196
        df.to_csv('output.csv.gz', compression='gzip')
197
    """
198
```
199

200
## Parquet I/O
201

202
Optimized Apache Parquet support with metadata handling and dataset operations.
203

204
```{ .api }
205
def read_parquet(
206
    path,
207
    engine='cudf',
208
    columns=None,
209
    filters=None,
210
    row_groups=None,
211
    use_pandas_metadata=True,
212
    storage_options=None,
213
    bytes_per_thread=None,
214
    **kwargs
215
) -> DataFrame:
216
    """
217
    Read Apache Parquet file(s) directly into GPU memory
218
    
219
    High-performance Parquet reader with predicate pushdown, column pruning,
220
    and automatic schema detection. Supports single files, directories, and
221
    cloud storage locations.
222
    
223
    Parameters:
224
        path: str, PathLike, or list
225
            File path, directory, or list of files to read
226
        engine: str, default 'cudf'
227
            Parquet engine to use ('cudf' for GPU acceleration)
228
        columns: list, optional
229
            Specific columns to read (column pruning)
230
        filters: list of tuples, optional
231
            Row filter predicates for predicate pushdown
232
        row_groups: list, optional
233
            Specific row groups to read
234
        use_pandas_metadata: bool, default True
235
            Whether to use pandas metadata for schema information
236
        storage_options: dict, optional
237
            Options for cloud storage (S3, GCS, Azure)
238
        bytes_per_thread: int, optional
239
            Bytes to read per thread for parallel I/O
240
        **kwargs: additional arguments
241
            Engine-specific options
242
            
243
    Returns:
244
        DataFrame: GPU DataFrame with Parquet data
245
        
246
    Examples:
247
        # Basic Parquet reading
248
        df = cudf.read_parquet('data.parquet')
249
        
250
        # Column pruning and filtering
251
        df = cudf.read_parquet(
252
            'data.parquet',
253
            columns=['col1', 'col2', 'col3'],
254
            filters=[('col1', '>', 100), ('col2', '==', 'value')]
255
        )
256
        
257
        # Multiple files
258
        df = cudf.read_parquet(['file1.parquet', 'file2.parquet'])
259
        
260
        # From cloud storage
261
        df = cudf.read_parquet(
262
            's3://bucket/path/data.parquet',
263
            storage_options={'key': 'access_key', 'secret': 'secret_key'}
264
        )
265
    """
266

267
def read_parquet_metadata(path, **kwargs) -> object:
268
    """
269
    Read metadata from Parquet file without loading data
270
    
271
    Extracts schema information, row group statistics, and file metadata
272
    for query planning and data exploration without full data loading.
273
    
274
    Parameters:
275
        path: str or PathLike
276
            Path to Parquet file
277
        **kwargs: additional arguments
278
            Storage and engine options
279
            
280
    Returns:
281
        object: Parquet metadata object with schema and statistics
282
        
283
    Examples:
284
        # Read metadata only
285
        metadata = cudf.io.parquet.read_parquet_metadata('data.parquet')
286
        print(f"Rows: {metadata.num_rows}")
287
        print(f"Columns: {len(metadata.schema)}")
288
    """
289

290
def merge_parquet_filemetadata(metadata_list) -> object:
291
    """
292
    Merge multiple Parquet file metadata objects
293
    
294
    Combines metadata from multiple Parquet files for unified schema
295
    and statistics. Useful for dataset-level operations.
296
    
297
    Parameters:
298
        metadata_list: list
299
            List of Parquet metadata objects to merge
300
            
301
    Returns:
302
        object: Merged Parquet metadata object
303
        
304
    Examples:
305
        # Merge metadata from multiple files
306
        meta1 = cudf.io.parquet.read_parquet_metadata('file1.parquet')
307
        meta2 = cudf.io.parquet.read_parquet_metadata('file2.parquet') 
308
        merged = cudf.io.parquet.merge_parquet_filemetadata([meta1, meta2])
309
    """
310
```
311

312
### Parquet Dataset Operations
313

314
```{ .api }
315
class ParquetDatasetWriter:
316
    """
317
    Writer for partitioned Parquet datasets
318
    
319
    Manages writing DataFrames to partitioned Parquet datasets with
320
    automatic directory structure creation and metadata management.
321
    
322
    Parameters:
323
        path: str or PathLike
324
            Root directory for the dataset
325
        partition_cols: list, optional
326
            Columns to use for dataset partitioning
327
        **kwargs: additional arguments
328
            Writer configuration options
329
            
330
    Methods:
331
        write_table(table, **kwargs): Write table to dataset
332
        close(): Finalize dataset and write metadata
333
        
334
    Examples:
335
        # Create partitioned dataset writer
336
        writer = cudf.io.parquet.ParquetDatasetWriter(
337
            '/path/to/dataset',
338
            partition_cols=['year', 'month']
339
        )
340
        
341
        # Write data in chunks
342
        for chunk in data_chunks:
343
            writer.write_table(chunk)
344
        writer.close()
345
    """
346

347
def write_to_dataset(
348
    df,
349
    root_path, 
350
    partition_cols=None,
351
    preserve_index=False,
352
    storage_options=None,
353
    **kwargs
354
) -> None:
355
    """
356
    Write DataFrame to partitioned Parquet dataset
357
    
358
    Creates partitioned Parquet dataset with automatic directory structure
359
    based on partition columns. Supports cloud storage destinations.
360
    
361
    Parameters:
362
        df: DataFrame
363
            cuDF DataFrame to write
364
        root_path: str or PathLike
365
            Root directory for dataset
366
        partition_cols: list, optional
367
            Columns to use for partitioning
368
        preserve_index: bool, default False
369
            Whether to write index as column
370
        storage_options: dict, optional
371
            Cloud storage configuration
372
        **kwargs: additional arguments
373
            Writer options (compression, etc.)
374
            
375
    Examples:
376
        # Write partitioned dataset
377
        cudf.io.parquet.write_to_dataset(
378
            df,
379
            '/path/to/dataset', 
380
            partition_cols=['year', 'category'],
381
            compression='snappy'
382
        )
383
    """
384
```
385

386
## JSON I/O
387

388
Flexible JSON reading with support for various JSON formats.
389

390
```{ .api }
391
def read_json(
392
    path_or_buf,
393
    orient='records',
394
    typ='frame',
395
    dtype=None,
396
    lines=False,
397
    compression='infer',
398
    storage_options=None,
399
    **kwargs
400
) -> DataFrame:
401
    """
402
    Read JSON data directly into GPU memory
403
    
404
    Supports various JSON formats including line-delimited JSON (JSONL),
405
    nested JSON structures, and automatic schema inference.
406
    
407
    Parameters:
408
        path_or_buf: str, PathLike, or file-like object
409
            JSON data source (file, URL, or buffer)
410
        orient: str, default 'records'
411
            JSON structure format ('records', 'index', 'values', 'split')
412
        typ: str, default 'frame'
413
            Type of object to return ('frame' for DataFrame)
414
        dtype: dict or str, optional
415
            Data type specification for columns
416
        lines: bool, default False
417
            Whether to read line-delimited JSON
418
        compression: str, default 'infer'
419
            Compression type ('gzip', 'bz2', 'xz', None)
420
        storage_options: dict, optional
421
            Cloud storage configuration
422
        **kwargs: additional arguments
423
            JSON parsing options
424
            
425
    Returns:
426
        DataFrame: GPU DataFrame containing JSON data
427
        
428
    Examples:
429
        # Read JSON file
430
        df = cudf.read_json('data.json')
431
        
432
        # Line-delimited JSON
433
        df = cudf.read_json('data.jsonl', lines=True)
434
        
435
        # With compression
436
        df = cudf.read_json('data.json.gz', compression='gzip')
437
        
438
        # From URL
439
        df = cudf.read_json('https://api.example.com/data.json')
440
    """
441
```
442

443
## ORC I/O
444

445
Apache ORC format support with metadata utilities.
446

447
```{ .api }
448
def read_orc(
449
    path,
450
    columns=None,
451
    filters=None,
452
    stripes=None,
453
    skiprows=None,
454
    num_rows=None,
455
    use_index=True,
456
    storage_options=None,
457
    **kwargs
458
) -> DataFrame:
459
    """
460
    Read Apache ORC file directly into GPU memory
461
    
462
    High-performance ORC reader with predicate pushdown and column pruning.
463
    Supports compressed ORC files and cloud storage.
464
    
465
    Parameters:
466
        path: str or PathLike
467
            Path to ORC file
468
        columns: list, optional
469
            Specific columns to read
470
        filters: list of tuples, optional
471
            Row filter predicates
472
        stripes: list, optional
473
            Specific ORC stripes to read
474
        skiprows: int, optional
475
            Number of rows to skip
476
        num_rows: int, optional
477
            Maximum rows to read
478
        use_index: bool, default True
479
            Whether to use ORC file index
480
        storage_options: dict, optional
481
            Cloud storage options
482
        **kwargs: additional arguments
483
            Reader configuration
484
            
485
    Returns:
486
        DataFrame: GPU DataFrame with ORC data
487
        
488
    Examples:
489
        # Basic ORC reading
490
        df = cudf.read_orc('data.orc')
491
        
492
        # With column pruning and filtering
493
        df = cudf.read_orc(
494
            'data.orc',
495
            columns=['col1', 'col2'],
496
            filters=[('col1', '>', 0)]
497
        )
498
    """
499

500
def read_orc_metadata(path, **kwargs) -> object:
501
    """
502
    Read metadata from ORC file without loading data
503
    
504
    Extracts schema, stripe information, and statistics for
505
    query planning and data exploration.
506
    
507
    Parameters:
508
        path: str or PathLike
509
            Path to ORC file
510
        **kwargs: additional arguments
511
            Reader options
512
            
513
    Returns:
514
        object: ORC metadata with schema and statistics
515
        
516
    Examples:
517
        # Read ORC metadata
518
        metadata = cudf.io.orc.read_orc_metadata('data.orc')
519
        print(f"Stripes: {len(metadata.stripes)}")
520
    """
521
```
522

523
### ORC Writing
524

525
```{ .api }
526
def to_orc(
527
    path,
528
    compression='snappy',
529
    enable_statistics=True,
530
    stripe_size_bytes=None,
531
    stripe_size_rows=None,
532
    row_index_stride=None,
533
    **kwargs
534
):
535
    """
536
    Write GPU DataFrame to Apache ORC format
537
    
538
    High-performance ORC writing with compression and statistical metadata.
539
    Writes directly from GPU memory with configurable stripe organization.
540
    
541
    Parameters:
542
        path: str or PathLike
543
            Output path for ORC file
544
        compression: str, default 'snappy'
545
            Compression algorithm ('snappy', 'zlib', 'lz4', 'zstd', None)
546
        enable_statistics: bool, default True
547
            Whether to compute column statistics
548
        stripe_size_bytes: int, optional
549
            Target stripe size in bytes
550
        stripe_size_rows: int, optional
551
            Target stripe size in rows
552
        row_index_stride: int, optional
553
            Row group index stride
554
        **kwargs: additional keyword arguments
555
            Other ORC writing options
556
            
557
    Examples:
558
        # Basic ORC writing
559
        df.to_orc('output.orc')
560
        
561
        # With compression
562
        df.to_orc('output.orc', compression='zlib')
563
        
564
        # Custom stripe configuration
565
        df.to_orc('output.orc', stripe_size_rows=50000)
566
    """
567
```
568

569
## Avro I/O
570

571
Apache Avro format support for schema evolution and serialization.
572

573
```{ .api }
574
def read_avro(
575
    filepath_or_buffer,
576
    columns=None,
577
    skiprows=None,
578
    num_rows=None,
579
    storage_options=None,
580
    **kwargs
581
) -> DataFrame:
582
    """
583
    Read Apache Avro file directly into GPU memory
584
    
585
    Reads Avro files with automatic schema detection and type conversion.
586
    Supports compressed Avro files and nested data structures.
587
    
588
    Parameters:
589
        filepath_or_buffer: str, PathLike, or file-like object
590
            Avro data source
591
        columns: list, optional
592
            Specific columns to read
593
        skiprows: int, optional
594
            Number of rows to skip at beginning
595
        num_rows: int, optional
596
            Maximum number of rows to read
597
        storage_options: dict, optional
598
            Cloud storage configuration
599
        **kwargs: additional arguments
600
            Avro reader options
601
            
602
    Returns:
603
        DataFrame: GPU DataFrame with Avro data
604
        
605
    Examples:
606
        # Read Avro file
607
        df = cudf.read_avro('data.avro')
608
        
609
        # With column selection
610
        df = cudf.read_avro('data.avro', columns=['col1', 'col2'])
611
    """
612
```
613

614
## Feather I/O
615

616
Apache Arrow Feather format for fast serialization.
617

618
```{ .api }
619
def read_feather(
620
    path,
621
    columns=None,
622
    use_threads=True,
623
    storage_options=None,
624
    **kwargs
625
) -> DataFrame:
626
    """
627
    Read Apache Feather format file into GPU memory
628
    
629
    Fast binary format based on Apache Arrow for efficient DataFrame
630
    serialization with preserved data types and metadata.
631
    
632
    Parameters:
633
        path: str or PathLike
634
            Path to Feather file
635
        columns: list, optional
636
            Subset of columns to read
637
        use_threads: bool, default True
638
            Whether to use threading for parallel I/O
639
        storage_options: dict, optional
640
            Cloud storage options
641
        **kwargs: additional arguments
642
            Reader configuration
643
            
644
    Returns:
645
        DataFrame: GPU DataFrame with Feather data
646
        
647
    Examples:
648
        # Read Feather file
649
        df = cudf.read_feather('data.feather')
650
        
651
        # Column selection
652
        df = cudf.read_feather('data.feather', columns=['A', 'B'])
653
    """
654
```
655

656
## HDF5 I/O
657

658
HDF5 format support for scientific and numerical data.
659

660
```{ .api }
661
def read_hdf(
662
    path_or_buf,
663
    key=None,
664
    mode='r',
665
    columns=None,
666
    start=None,
667
    stop=None,
668
    **kwargs
669
) -> DataFrame:
670
    """
671
    Read HDF5 file into GPU memory
672
    
673
    Reads HDF5 datasets with support for hierarchical data organization
674
    and partial reading of large datasets.
675
    
676
    Parameters:
677
        path_or_buf: str, PathLike, or file-like object
678
            HDF5 file source
679
        key: str, optional
680
            HDF5 group/dataset key to read
681
        mode: str, default 'r'
682
            File access mode
683
        columns: list, optional
684
            Subset of columns to read
685
        start: int, optional
686
            Starting row position
687
        stop: int, optional
688
            Ending row position
689
        **kwargs: additional arguments
690
            HDF5 reader options
691
            
692
    Returns:
693
        DataFrame: GPU DataFrame with HDF5 data
694
        
695
    Examples:
696
        # Read HDF5 dataset
697
        df = cudf.read_hdf('data.h5', key='dataset1')
698
        
699
        # Partial reading
700
        df = cudf.read_hdf('data.h5', key='dataset1', start=1000, stop=2000)
701
    """
702
```
703

704
## Text I/O
705

706
Raw text file reading for unstructured data processing.
707

708
```{ .api }
709
def read_text(
710
    filepath_or_buffer,
711
    delimiter=None,
712
    dtype='str',
713
    lineterminator='\n',
714
    skiprows=0,
715
    skipfooter=0,
716
    nrows=None,
717
    na_values=None,
718
    keep_default_na=True,
719
    na_filter=True,
720
    storage_options=None,
721
    **kwargs
722
) -> DataFrame:
723
    """
724
    Read raw text file line by line into GPU memory
725
    
726
    Reads unstructured text data with each line as a DataFrame row.
727
    Useful for log files, natural language processing, and custom parsing.
728
    
729
    Parameters:
730
        filepath_or_buffer: str, PathLike, or file-like object
731
            Text file source
732
        delimiter: str, optional
733
            Line delimiter (default: newline)
734
        dtype: str, default 'str'
735
            Data type for text data
736
        lineterminator: str, default '\n'
737
            Line termination character
738
        skiprows: int, default 0
739
            Number of rows to skip at beginning
740
        skipfooter: int, default 0
741
            Number of rows to skip at end
742
        nrows: int, optional
743
            Maximum number of lines to read
744
        na_values: list, optional
745
            Values to treat as missing
746
        keep_default_na: bool, default True
747
            Whether to include default NA values
748
        na_filter: bool, default True
749
            Whether to check for missing values
750
        storage_options: dict, optional
751
            Cloud storage configuration
752
        **kwargs: additional arguments
753
            Text reader options
754
            
755
    Returns:
756
        DataFrame: GPU DataFrame with one column containing text lines
757
        
758
    Examples:
759
        # Read text file
760
        df = cudf.read_text('logfile.txt')
761
        
762
        # With line limits
763
        df = cudf.read_text('data.txt', nrows=1000)
764
    """
765
```
766

767
## Interoperability
768

769
### DLPack Integration
770

771
```{ .api }
772
def from_dlpack(dlpack_tensor) -> Union[DataFrame, Series]:
773
    """
774
    Create cuDF object from DLPack tensor
775
    
776
    Enables zero-copy data sharing between cuDF and other GPU libraries
777
    that support the DLPack standard (PyTorch, CuPy, JAX, etc.).
778
    
779
    Parameters:
780
        dlpack_tensor: DLPack tensor object
781
            GPU tensor in DLPack format
782
            
783
    Returns:
784
        Union[DataFrame, Series]: cuDF object sharing memory with tensor
785
        
786
    Examples:
787
        # From PyTorch tensor
788
        import torch
789
        tensor = torch.cuda.FloatTensor([1, 2, 3, 4])
790
        series = cudf.io.dlpack.from_dlpack(tensor.__dlpack__())
791
        
792
        # From CuPy array
793
        import cupy
794
        array = cupy.array([1.0, 2.0, 3.0])
795
        series = cudf.io.dlpack.from_dlpack(array.toDlpack())
796
    """
797
```
798

799
## DataFrame Write Methods
800

801
All cuDF DataFrames include write methods for various formats:
802

803
```python
804
# CSV writing
805
df.to_csv('output.csv', index=False)
806

807
# Parquet writing  
808
df.to_parquet('output.parquet', compression='snappy')
809

810
# JSON writing
811
df.to_json('output.json', orient='records', lines=True)
812

813
# ORC writing
814
df.to_orc('output.orc', compression='zlib')
815

816
# Feather writing
817
df.to_feather('output.feather')
818

819
# HDF5 writing
820
df.to_hdf('output.h5', key='dataset', mode='w')
821
```
822

823
## Performance Optimizations
824

825
### GPU Memory Management
826
- **Direct GPU Loading**: All readers load data directly to GPU memory
827
- **Memory Mapping**: Support for memory-mapped files to reduce memory usage
828
- **Streaming**: Chunked reading for datasets larger than GPU memory
829
- **Zero-Copy**: Minimal memory copying between operations
830

831
### Parallel Processing
832
- **Multi-threaded I/O**: Parallel file reading with configurable thread counts
833
- **Column Parallelism**: Independent processing of columns during parsing
834
- **Compressed Reading**: Hardware-accelerated decompression on GPU
835

836
### Query Optimization  
837
- **Predicate Pushdown**: Filter rows during file reading
838
- **Column Pruning**: Read only required columns from files
839
- **Schema Inference**: Automatic data type detection and optimization
840
- **Metadata Caching**: Reuse file metadata for repeated operations
841

842
## Cloud Storage Support
843

844
All I/O functions support cloud storage through `storage_options`:
845

846
```python
847
# Amazon S3
848
s3_options = {
849
    'key': 'access_key_id',
850
    'secret': 'secret_access_key',
851
    'token': 'session_token'  # optional
852
}
853
df = cudf.read_parquet('s3://bucket/path/data.parquet', 
854
                       storage_options=s3_options)
855

856
# Google Cloud Storage
857
gcs_options = {
858
    'token': 'path/to/service_account.json'
859
}
860
df = cudf.read_csv('gs://bucket/data.csv', storage_options=gcs_options)
861

862
# Azure Blob Storage
863
azure_options = {
864
    'account_name': 'storage_account',
865
    'account_key': 'account_key'
866
}
867
df = cudf.read_json('abfs://container/data.json', 
868
                    storage_options=azure_options)
869
```

Version

Tile

Files

io-operations.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

io-operations.mddocs/