0
# Google Cloud BigQuery
1
2
Google BigQuery API client library for Python providing comprehensive data warehouse and analytics capabilities. This library enables developers to interact with Google's cloud-based data warehouse, perform SQL queries on massive datasets, manage BigQuery resources, and integrate with the broader Google Cloud ecosystem.
3
4
## Package Information
5
6
- **Package Name**: google-cloud-bigquery
7
- **Package Type**: library
8
- **Language**: Python
9
- **Installation**: `pip install google-cloud-bigquery`
10
11
## Core Imports
12
13
```python
14
from google.cloud import bigquery
15
```
16
17
Main client and commonly used classes:
18
19
```python
20
from google.cloud.bigquery import Client, Dataset, Table, QueryJob
21
```
22
23
Import specific components as needed:
24
25
```python
26
from google.cloud.bigquery import (
27
SchemaField, LoadJob, ExtractJob,
28
QueryJobConfig, LoadJobConfig
29
)
30
```
31
32
## Basic Usage
33
34
```python
35
from google.cloud import bigquery
36
37
# Initialize the client
38
client = bigquery.Client()
39
40
# Simple query example
41
query = """
42
SELECT name, COUNT(*) as count
43
FROM `bigquery-public-data.usa_names.usa_1910_2013`
44
WHERE state = 'TX'
45
GROUP BY name
46
ORDER BY count DESC
47
LIMIT 10
48
"""
49
50
# Execute query and get results
51
query_job = client.query(query)
52
results = query_job.result()
53
54
# Process results
55
for row in results:
56
print(f"{row.name}: {row.count}")
57
58
# Working with datasets and tables
59
dataset_id = "my_dataset"
60
table_id = "my_table"
61
62
# Create dataset
63
dataset = bigquery.Dataset(f"{client.project}.{dataset_id}")
64
dataset = client.create_dataset(dataset, exists_ok=True)
65
66
# Define table schema
67
schema = [
68
bigquery.SchemaField("name", "STRING", mode="REQUIRED"),
69
bigquery.SchemaField("age", "INTEGER", mode="NULLABLE"),
70
bigquery.SchemaField("city", "STRING", mode="NULLABLE"),
71
]
72
73
# Create table
74
table = bigquery.Table(f"{client.project}.{dataset_id}.{table_id}", schema=schema)
75
table = client.create_table(table, exists_ok=True)
76
```
77
78
## Architecture
79
80
BigQuery client library follows a hierarchical resource model:
81
82
- **Client**: Central connection manager for all BigQuery operations
83
- **Dataset**: Container for tables, models, and routines within a project
84
- **Table**: Data storage with schema, containing rows and columns
85
- **Job**: Asynchronous operation (query, load, extract, copy) with progress tracking
86
- **Schema**: Structure definition using SchemaField objects for type safety
87
- **Query Parameters**: Type-safe parameter binding for SQL queries
88
89
The library integrates seamlessly with pandas, PyArrow, and other data science tools, supports both synchronous and asynchronous operations, and provides comprehensive error handling and retry mechanisms.
90
91
## Capabilities
92
93
### Client Operations
94
95
Core client functionality for authentication, project management, and resource operations. Provides the main entry point for all BigQuery interactions.
96
97
```python { .api }
98
class Client:
99
def __init__(self, project: str = None, credentials: Any = None, **kwargs): ...
100
def query(self, query: str, **kwargs) -> QueryJob: ...
101
def get_dataset(self, dataset_ref: str) -> Dataset: ...
102
def create_dataset(self, dataset: Dataset, **kwargs) -> Dataset: ...
103
def delete_dataset(self, dataset_ref: str, **kwargs) -> None: ...
104
def list_datasets(self, **kwargs) -> Iterator[Dataset]: ...
105
```
106
107
[Client Operations](./client-operations.md)
108
109
### Query Operations
110
111
SQL query execution with parameters, job configuration, and result processing. Supports both simple queries and complex analytical workloads with pagination and streaming.
112
113
```python { .api }
114
class QueryJob:
115
def result(self, **kwargs) -> RowIterator: ...
116
def to_dataframe(self, **kwargs) -> pandas.DataFrame: ...
117
def to_arrow(self, **kwargs) -> pyarrow.Table: ...
118
119
class QueryJobConfig:
120
def __init__(self, **kwargs): ...
121
122
def query(self, query: str, job_config: QueryJobConfig = None, **kwargs) -> QueryJob: ...
123
```
124
125
[Query Operations](./query-operations.md)
126
127
### Dataset Management
128
129
Dataset creation, configuration, access control, and metadata management. Datasets serve as containers for tables and other BigQuery resources.
130
131
```python { .api }
132
class Dataset:
133
def __init__(self, dataset_ref: str): ...
134
135
class DatasetReference:
136
def __init__(self, project: str, dataset_id: str): ...
137
138
class AccessEntry:
139
def __init__(self, role: str, entity_type: str, entity_id: str): ...
140
```
141
142
[Dataset Management](./dataset-management.md)
143
144
### Table Operations
145
146
Table creation, schema management, data loading, and metadata operations. Includes support for partitioning, clustering, and various table types.
147
148
```python { .api }
149
class Table:
150
def __init__(self, table_ref: str, schema: List[SchemaField] = None): ...
151
152
class TableReference:
153
def __init__(self, dataset_ref: DatasetReference, table_id: str): ...
154
155
class Row:
156
def values(self) -> List[Any]: ...
157
def keys(self) -> List[str]: ...
158
```
159
160
[Table Operations](./table-operations.md)
161
162
### Data Loading
163
164
Loading data from various sources including local files, Cloud Storage, streaming inserts, and data export. Supports multiple formats and transformation options.
165
166
```python { .api }
167
class LoadJob:
168
def result(self, **kwargs) -> LoadJob: ...
169
170
class LoadJobConfig:
171
def __init__(self, **kwargs): ...
172
source_format: SourceFormat
173
schema: List[SchemaField]
174
write_disposition: WriteDisposition
175
176
class ExtractJob:
177
def result(self, **kwargs) -> ExtractJob: ...
178
179
class ExtractJobConfig:
180
def __init__(self, **kwargs): ...
181
destination_format: DestinationFormat
182
```
183
184
[Data Loading](./data-loading.md)
185
186
### Schema Definition
187
188
Type-safe schema definition with field specifications, modes, and descriptions. Essential for table creation and data validation.
189
190
```python { .api }
191
class SchemaField:
192
def __init__(self, name: str, field_type: str, mode: str = "NULLABLE", **kwargs): ...
193
194
class FieldElementType:
195
def __init__(self, element_type: str): ...
196
197
class PolicyTagList:
198
def __init__(self, names: List[str]): ...
199
```
200
201
[Schema Definition](./schema-definition.md)
202
203
### Query Parameters
204
205
Type-safe parameter binding for SQL queries supporting scalar, array, struct, and range parameter types with proper type validation.
206
207
```python { .api }
208
class ScalarQueryParameter:
209
def __init__(self, name: str, type_: str, value: Any): ...
210
211
class ArrayQueryParameter:
212
def __init__(self, name: str, array_type: str, values: List[Any]): ...
213
214
class StructQueryParameter:
215
def __init__(self, name: str, *sub_params): ...
216
```
217
218
[Query Parameters](./query-parameters.md)
219
220
### Database API (DB-API 2.0)
221
222
Python Database API specification compliance for SQL database compatibility. Enables use with database tools and ORMs.
223
224
```python { .api }
225
def connect(client: Client = None, **kwargs) -> Connection: ...
226
227
class Connection:
228
def cursor(self) -> Cursor: ...
229
def commit(self) -> None: ...
230
def close(self) -> None: ...
231
232
class Cursor:
233
def execute(self, query: str, parameters: Any = None) -> None: ...
234
def fetchall(self) -> List[Any]: ...
235
```
236
237
[Database API](./database-api.md)
238
239
### Models and Routines
240
241
BigQuery ML model management and user-defined functions (UDFs). Supports model creation, training, evaluation, and stored procedures.
242
243
```python { .api }
244
class Model:
245
def __init__(self, model_ref: Union[str, ModelReference]): ...
246
247
class ModelReference:
248
def __init__(self, project: str, dataset_id: str, model_id: str): ...
249
250
class Routine:
251
def __init__(self, routine_ref: Union[str, RoutineReference], routine_type: str = None): ...
252
253
class RoutineReference:
254
def __init__(self, project: str, dataset_id: str, routine_id: str): ...
255
256
class RoutineArgument:
257
def __init__(self, name: str = None, argument_kind: str = None, mode: str = None, data_type: StandardSqlDataType = None): ...
258
```
259
260
[Models and Routines](./models-routines.md)
261
262
## Common Types and Constants
263
264
```python { .api }
265
# Enums for job and table configuration
266
class SourceFormat:
267
CSV: str
268
JSON: str
269
AVRO: str
270
PARQUET: str
271
ORC: str
272
273
class WriteDisposition:
274
WRITE_EMPTY: str
275
WRITE_TRUNCATE: str
276
WRITE_APPEND: str
277
278
class CreateDisposition:
279
CREATE_IF_NEEDED: str
280
CREATE_NEVER: str
281
282
class QueryPriority:
283
BATCH: str
284
INTERACTIVE: str
285
286
# Exception classes
287
class LegacyBigQueryStorageError(Exception): ...
288
class LegacyPandasError(Exception): ...
289
class LegacyPyarrowError(Exception): ...
290
291
# Retry configuration
292
DEFAULT_RETRY: Retry
293
```