or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

index.mdmetastore-services.mdreading-data.mdtypes-schemas.mdwriting-data.md

index.mddocs/

0

# Google Cloud BigQuery Storage

1

2

A high-performance Python client library for the Google BigQuery Storage API that enables efficient streaming of large datasets from BigQuery tables. The library provides streaming read capabilities with support for multiple data formats (Avro, Arrow, Protocol Buffers), streaming write operations with transactional semantics, and integration with popular data analysis frameworks like pandas and pyarrow.

3

4

## Package Information

5

6

- **Package Name**: google-cloud-bigquery-storage

7

- **Package Type**: pypi

8

- **Language**: Python

9

- **Installation**: `pip install google-cloud-bigquery-storage`

10

- **Optional Dependencies**:

11

- `pip install google-cloud-bigquery-storage[fastavro]` - for Avro format support

12

- `pip install google-cloud-bigquery-storage[pyarrow]` - for Arrow format support

13

- `pip install google-cloud-bigquery-storage[pandas]` - for pandas DataFrame support

14

15

## Core Imports

16

17

```python

18

from google.cloud import bigquery_storage

19

```

20

21

Import specific clients and types:

22

23

```python

24

from google.cloud.bigquery_storage import BigQueryReadClient, BigQueryWriteClient, BigQueryWriteAsyncClient

25

from google.cloud.bigquery_storage import types

26

from google.cloud.bigquery_storage import ReadRowsStream, AppendRowsStream

27

28

# Access package version

29

import google.cloud.bigquery_storage

30

print(google.cloud.bigquery_storage.__version__)

31

```

32

33

Access beta/alpha and v1 APIs:

34

35

```python

36

# Explicit v1 API access

37

from google.cloud import bigquery_storage_v1

38

39

# Beta version for metastore services

40

from google.cloud import bigquery_storage_v1beta

41

from google.cloud import bigquery_storage_v1beta2

42

43

# Alpha version for experimental features

44

from google.cloud import bigquery_storage_v1alpha

45

```

46

47

## Basic Usage

48

49

### Reading BigQuery Data

50

51

```python

52

from google.cloud.bigquery_storage import BigQueryReadClient, types

53

54

# Create client

55

client = BigQueryReadClient()

56

57

# Configure read session

58

table = "projects/your-project/datasets/your_dataset/tables/your_table"

59

requested_session = types.ReadSession(

60

table=table,

61

data_format=types.DataFormat.AVRO

62

)

63

64

# Create read session

65

session = client.create_read_session(

66

parent="projects/your-project",

67

read_session=requested_session,

68

max_stream_count=1

69

)

70

71

# Read data

72

reader = client.read_rows(session.streams[0].name)

73

for row in reader.rows(session):

74

print(row)

75

```

76

77

### Writing BigQuery Data

78

79

```python

80

from google.cloud.bigquery_storage import BigQueryWriteClient, types

81

82

# Create client

83

client = BigQueryWriteClient()

84

85

# Create write stream

86

parent = client.table_path("your-project", "your_dataset", "your_table")

87

write_stream = types.WriteStream(type_=types.WriteStream.Type.PENDING)

88

stream = client.create_write_stream(parent=parent, write_stream=write_stream)

89

90

# Append data (requires protocol buffer serialized data)

91

request = types.AppendRowsRequest(write_stream=stream.name)

92

# ... configure with serialized row data

93

response = client.append_rows([request])

94

```

95

96

## Architecture

97

98

The BigQuery Storage API uses a streaming architecture designed for high-performance data transfer:

99

100

- **Read Sessions**: Logical containers that define what data to read and how to format it

101

- **Read Streams**: Individual data streams within a session that can be processed in parallel

102

- **Write Streams**: Buffered append-only streams for inserting data with transactional semantics

103

- **Data Formats**: Support for Avro, Arrow, and Protocol Buffer serialization

104

- **Helper Classes**: High-level abstractions (`ReadRowsStream`, `AppendRowsStream`) for easier stream management

105

106

This design enables:

107

- **Parallel Processing**: Multiple streams can be read/written concurrently

108

- **Format Flexibility**: Choose optimal serialization format for your use case

109

- **Integration**: Seamless conversion to pandas DataFrames and Apache Arrow

110

- **Transactional Writes**: ACID guarantees for write operations

111

112

## Capabilities

113

114

### Reading Data

115

116

High-performance streaming reads from BigQuery tables with support for parallel processing, column selection, row filtering, and multiple data formats. Includes conversion utilities for pandas and Arrow. Available in both synchronous and asynchronous versions.

117

118

```python { .api }

119

class BigQueryReadClient:

120

def create_read_session(

121

self,

122

parent: str,

123

read_session: ReadSession,

124

max_stream_count: int = None

125

) -> ReadSession: ...

126

127

def read_rows(self, name: str, offset: int = 0) -> ReadRowsStream: ...

128

129

def split_read_stream(

130

self,

131

name: str,

132

fraction: float = None

133

) -> SplitReadStreamResponse: ...

134

135

class BigQueryReadAsyncClient:

136

async def create_read_session(

137

self,

138

parent: str,

139

read_session: ReadSession,

140

max_stream_count: int = None

141

) -> ReadSession: ...

142

143

def read_rows(self, name: str, offset: int = 0) -> ReadRowsStream: ...

144

145

async def split_read_stream(

146

self,

147

name: str,

148

fraction: float = None

149

) -> SplitReadStreamResponse: ...

150

```

151

152

[Reading Data](./reading-data.md)

153

154

### Writing Data

155

156

Streaming write operations with support for multiple write stream types, transactional semantics, and batch commit operations. Supports Protocol Buffer, Avro, and Arrow data formats.

157

158

```python { .api }

159

class BigQueryWriteClient:

160

def create_write_stream(

161

self,

162

parent: str,

163

write_stream: WriteStream

164

) -> WriteStream: ...

165

166

def append_rows(

167

self,

168

requests: Iterator[AppendRowsRequest]

169

) -> Iterator[AppendRowsResponse]: ...

170

171

def finalize_write_stream(self, name: str) -> FinalizeWriteStreamResponse: ...

172

173

def batch_commit_write_streams(

174

self,

175

parent: str,

176

write_streams: List[str]

177

) -> BatchCommitWriteStreamsResponse: ...

178

```

179

180

[Writing Data](./writing-data.md)

181

182

### Types and Schemas

183

184

Comprehensive type system for BigQuery Storage operations including session configuration, stream management, data formats, error handling, and schema definitions.

185

186

```python { .api }

187

class DataFormat(enum.Enum):

188

AVRO = 1

189

ARROW = 2

190

PROTO = 3

191

192

class ReadSession:

193

table: str

194

data_format: DataFormat

195

read_options: TableReadOptions

196

streams: List[ReadStream]

197

198

class WriteStream:

199

name: str

200

type_: WriteStream.Type

201

create_time: Timestamp

202

state: WriteStream.State

203

```

204

205

[Types and Schemas](./types-schemas.md)

206

207

### Metastore Services

208

209

Beta and alpha services for managing BigQuery external table metastore partitions. Supports batch operations for creating, updating, deleting, and listing Hive-style partitions in external tables.

210

211

```python { .api }

212

class MetastorePartitionServiceClient:

213

def batch_create_metastore_partitions(

214

self,

215

parent: str,

216

requests: List[CreateMetastorePartitionRequest]

217

) -> BatchCreateMetastorePartitionsResponse: ...

218

219

def batch_delete_metastore_partitions(

220

self,

221

parent: str,

222

partition_names: List[str]

223

) -> None: ...

224

225

def list_metastore_partitions(

226

self,

227

parent: str,

228

filter: str = None

229

) -> List[MetastorePartition]: ...

230

```

231

232

[Metastore Services](./metastore-services.md)