Tessl Tile for pypi/google-cloud-bigquery-storage@2.33.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

index.md metastore-services.md reading-data.md types-schemas.md writing-data.md

index.mddocs/

0
# Google Cloud BigQuery Storage
1

2
A high-performance Python client library for the Google BigQuery Storage API that enables efficient streaming of large datasets from BigQuery tables. The library provides streaming read capabilities with support for multiple data formats (Avro, Arrow, Protocol Buffers), streaming write operations with transactional semantics, and integration with popular data analysis frameworks like pandas and pyarrow.
3

4
## Package Information
5

6
- **Package Name**: google-cloud-bigquery-storage
7
- **Package Type**: pypi
8
- **Language**: Python
9
- **Installation**: `pip install google-cloud-bigquery-storage`
10
- **Optional Dependencies**: 
11
  - `pip install google-cloud-bigquery-storage[fastavro]` - for Avro format support
12
  - `pip install google-cloud-bigquery-storage[pyarrow]` - for Arrow format support
13
  - `pip install google-cloud-bigquery-storage[pandas]` - for pandas DataFrame support
14

15
## Core Imports
16

17
```python
18
from google.cloud import bigquery_storage
19
```
20

21
Import specific clients and types:
22

23
```python
24
from google.cloud.bigquery_storage import BigQueryReadClient, BigQueryWriteClient, BigQueryWriteAsyncClient
25
from google.cloud.bigquery_storage import types
26
from google.cloud.bigquery_storage import ReadRowsStream, AppendRowsStream
27

28
# Access package version
29
import google.cloud.bigquery_storage
30
print(google.cloud.bigquery_storage.__version__)
31
```
32

33
Access beta/alpha and v1 APIs:
34

35
```python
36
# Explicit v1 API access
37
from google.cloud import bigquery_storage_v1
38

39
# Beta version for metastore services  
40
from google.cloud import bigquery_storage_v1beta
41
from google.cloud import bigquery_storage_v1beta2
42

43
# Alpha version for experimental features
44
from google.cloud import bigquery_storage_v1alpha
45
```
46

47
## Basic Usage
48

49
### Reading BigQuery Data
50

51
```python
52
from google.cloud.bigquery_storage import BigQueryReadClient, types
53

54
# Create client
55
client = BigQueryReadClient()
56

57
# Configure read session
58
table = "projects/your-project/datasets/your_dataset/tables/your_table"
59
requested_session = types.ReadSession(
60
    table=table,
61
    data_format=types.DataFormat.AVRO
62
)
63

64
# Create read session
65
session = client.create_read_session(
66
    parent="projects/your-project",
67
    read_session=requested_session,
68
    max_stream_count=1
69
)
70

71
# Read data
72
reader = client.read_rows(session.streams[0].name)
73
for row in reader.rows(session):
74
    print(row)
75
```
76

77
### Writing BigQuery Data
78

79
```python
80
from google.cloud.bigquery_storage import BigQueryWriteClient, types
81

82
# Create client
83
client = BigQueryWriteClient()
84

85
# Create write stream
86
parent = client.table_path("your-project", "your_dataset", "your_table")
87
write_stream = types.WriteStream(type_=types.WriteStream.Type.PENDING)
88
stream = client.create_write_stream(parent=parent, write_stream=write_stream)
89

90
# Append data (requires protocol buffer serialized data)
91
request = types.AppendRowsRequest(write_stream=stream.name)
92
# ... configure with serialized row data
93
response = client.append_rows([request])
94
```
95

96
## Architecture
97

98
The BigQuery Storage API uses a streaming architecture designed for high-performance data transfer:
99

100
- **Read Sessions**: Logical containers that define what data to read and how to format it
101
- **Read Streams**: Individual data streams within a session that can be processed in parallel  
102
- **Write Streams**: Buffered append-only streams for inserting data with transactional semantics
103
- **Data Formats**: Support for Avro, Arrow, and Protocol Buffer serialization
104
- **Helper Classes**: High-level abstractions (`ReadRowsStream`, `AppendRowsStream`) for easier stream management
105

106
This design enables:
107
- **Parallel Processing**: Multiple streams can be read/written concurrently
108
- **Format Flexibility**: Choose optimal serialization format for your use case
109
- **Integration**: Seamless conversion to pandas DataFrames and Apache Arrow
110
- **Transactional Writes**: ACID guarantees for write operations
111

112
## Capabilities
113

114
### Reading Data
115

116
High-performance streaming reads from BigQuery tables with support for parallel processing, column selection, row filtering, and multiple data formats. Includes conversion utilities for pandas and Arrow. Available in both synchronous and asynchronous versions.
117

118
```python { .api }
119
class BigQueryReadClient:
120
    def create_read_session(
121
        self, 
122
        parent: str, 
123
        read_session: ReadSession, 
124
        max_stream_count: int = None
125
    ) -> ReadSession: ...
126
    
127
    def read_rows(self, name: str, offset: int = 0) -> ReadRowsStream: ...
128
    
129
    def split_read_stream(
130
        self, 
131
        name: str, 
132
        fraction: float = None
133
    ) -> SplitReadStreamResponse: ...
134

135
class BigQueryReadAsyncClient:
136
    async def create_read_session(
137
        self,
138
        parent: str,
139
        read_session: ReadSession,
140
        max_stream_count: int = None
141
    ) -> ReadSession: ...
142
    
143
    def read_rows(self, name: str, offset: int = 0) -> ReadRowsStream: ...
144
    
145
    async def split_read_stream(
146
        self,
147
        name: str,
148
        fraction: float = None
149
    ) -> SplitReadStreamResponse: ...
150
```
151

152
[Reading Data](./reading-data.md)
153

154
### Writing Data
155

156
Streaming write operations with support for multiple write stream types, transactional semantics, and batch commit operations. Supports Protocol Buffer, Avro, and Arrow data formats.
157

158
```python { .api }
159
class BigQueryWriteClient:
160
    def create_write_stream(
161
        self, 
162
        parent: str, 
163
        write_stream: WriteStream
164
    ) -> WriteStream: ...
165
    
166
    def append_rows(
167
        self, 
168
        requests: Iterator[AppendRowsRequest]
169
    ) -> Iterator[AppendRowsResponse]: ...
170
    
171
    def finalize_write_stream(self, name: str) -> FinalizeWriteStreamResponse: ...
172
    
173
    def batch_commit_write_streams(
174
        self, 
175
        parent: str, 
176
        write_streams: List[str]
177
    ) -> BatchCommitWriteStreamsResponse: ...
178
```
179

180
[Writing Data](./writing-data.md)
181

182
### Types and Schemas
183

184
Comprehensive type system for BigQuery Storage operations including session configuration, stream management, data formats, error handling, and schema definitions.
185

186
```python { .api }
187
class DataFormat(enum.Enum):
188
    AVRO = 1
189
    ARROW = 2  
190
    PROTO = 3
191

192
class ReadSession:
193
    table: str
194
    data_format: DataFormat
195
    read_options: TableReadOptions
196
    streams: List[ReadStream]
197

198
class WriteStream:
199
    name: str
200
    type_: WriteStream.Type
201
    create_time: Timestamp
202
    state: WriteStream.State
203
```
204

205
[Types and Schemas](./types-schemas.md)
206

207
### Metastore Services
208

209
Beta and alpha services for managing BigQuery external table metastore partitions. Supports batch operations for creating, updating, deleting, and listing Hive-style partitions in external tables.
210

211
```python { .api }
212
class MetastorePartitionServiceClient:
213
    def batch_create_metastore_partitions(
214
        self,
215
        parent: str,
216
        requests: List[CreateMetastorePartitionRequest]
217
    ) -> BatchCreateMetastorePartitionsResponse: ...
218
    
219
    def batch_delete_metastore_partitions(
220
        self,
221
        parent: str,
222
        partition_names: List[str]
223
    ) -> None: ...
224
    
225
    def list_metastore_partitions(
226
        self,
227
        parent: str,
228
        filter: str = None
229
    ) -> List[MetastorePartition]: ...
230
```
231

232
[Metastore Services](./metastore-services.md)

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/