Tessl Tile for pypi/pandas-gbq@0.29.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

tessl/pypi-pandas-gbq

Google BigQuery connector for pandas DataFrames

Workspace: tessl
Visibility: Public
Created: 3 months ago
Last updated: 3 months ago
Describes: pkg:pypi/pandas-gbq@0.29.x

To install, run

npx @tessl/cli install tessl/pypi-pandas-gbq@0.29.0

0
# pandas-gbq
1

2
Google BigQuery connector for pandas DataFrames providing seamless two-way data integration. The library offers high-level functions for executing SQL queries and returning results as DataFrames, uploading DataFrames to BigQuery tables, with comprehensive support for authentication, schema inference, data type mapping, and performance optimizations.
3

4
## Package Information
5

6
- **Package Name**: pandas-gbq
7
- **Language**: Python
8
- **Installation**: `pip install pandas-gbq`
9
- **Optional Features**:
10
  - BigQuery Storage API: `pip install pandas-gbq[bqstorage]`
11
  - Progress bars: `pip install pandas-gbq[tqdm]`
12
  - Geographic data: `pip install pandas-gbq[geopandas]`
13

14
## Core Imports
15

16
```python
17
import pandas_gbq
18
```
19

20
Standard import pattern:
21

22
```python
23
from pandas_gbq import read_gbq, to_gbq
24
```
25

26
For context management:
27

28
```python
29
from pandas_gbq import context, Context
30
```
31

32
## Basic Usage
33

34
```python
35
import pandas as pd
36
from pandas_gbq import read_gbq, to_gbq
37

38
# Reading data from BigQuery
39
df = read_gbq(
40
    query="SELECT name, age FROM `my_project.my_dataset.users` WHERE age > 21",
41
    project_id="my-gcp-project"
42
)
43

44
# Writing DataFrame to BigQuery
45
sample_data = pd.DataFrame({
46
    'name': ['Alice', 'Bob', 'Charlie'],
47
    'age': [25, 30, 35],
48
    'city': ['New York', 'London', 'Tokyo']
49
})
50

51
to_gbq(
52
    dataframe=sample_data,
53
    destination_table="my_dataset.new_users",
54
    project_id="my-gcp-project",
55
    if_exists="append"
56
)
57

58
# Using context for session-wide configuration
59
import pandas_gbq
60
pandas_gbq.context.project = "my-gcp-project"
61
pandas_gbq.context.dialect = "standard"
62

63
# Now queries can omit project_id
64
df = read_gbq("SELECT * FROM `my_dataset.my_table`")
65
```
66

67
## Architecture
68

69
pandas-gbq serves as a high-level connector between pandas DataFrames and Google BigQuery, built on top of the Google Cloud BigQuery client library. The library provides a seamless interface that handles the complexities of BigQuery integration:
70

71
- **Authentication Layer**: Supports multiple authentication methods including Application Default Credentials, service account keys, and user OAuth flows
72
- **Data Type Mapping**: Automatically handles conversion between pandas data types and BigQuery types, with support for nullable integers, timestamps, and geographic data
73
- **Query Execution**: Executes SQL queries through BigQuery's REST API with support for both standard and legacy SQL dialects
74
- **Data Transfer**: Optimizes data transfer with chunking for large uploads, BigQuery Storage API for fast reads, and streaming inserts for real-time data
75
- **Session Management**: Provides a global context system for credential and configuration management across multiple operations
76

77
The design prioritizes ease of use while exposing advanced BigQuery features through comprehensive parameter options, making it suitable for both simple data analysis tasks and complex ETL pipelines.
78

79
## Capabilities
80

81
### Data Reading
82

83
Execute SQL queries or read tables directly from Google BigQuery, returning results as pandas DataFrames with full data type support and optional BigQuery Storage API acceleration.
84

85
```python { .api }
86
def read_gbq(
87
    query_or_table: str,
88
    project_id: str = None,
89
    index_col: str = None,
90
    columns: list[str] = None,
91
    reauth: bool = False,
92
    auth_local_webserver: bool = True,
93
    dialect: str = None,
94
    location: str = None,
95
    configuration: dict = None,
96
    credentials = None,
97
    use_bqstorage_api: bool = False,
98
    max_results: int = None,
99
    verbose: bool = None,
100
    private_key: str = None,
101
    progress_bar_type: str = "tqdm",
102
    dtypes: dict = None,
103
    auth_redirect_uri: str = None,
104
    client_id: str = None,
105
    client_secret: str = None,
106
    *,
107
    col_order: list[str] = None,
108
    bigquery_client = None
109
) -> pd.DataFrame:
110
    """
111
    Read data from Google BigQuery to a pandas DataFrame.
112

113
    Parameters:
114
    - query_or_table: SQL query string or table ID (dataset.table format)
115
    - project_id: Google Cloud Platform project ID (optional if set in environment/context)
116
    - index_col: Column name to use as DataFrame index
117
    - columns: List of column names in desired order for results
118
    - reauth: Force re-authentication (useful for multiple accounts)
119
    - auth_local_webserver: Use local webserver flow instead of console flow
120
    - dialect: SQL dialect ('standard' or 'legacy', default: 'standard')
121
    - location: Geographic location for query job execution
122
    - configuration: Advanced query configuration parameters
123
    - credentials: Google auth credentials object
124
    - use_bqstorage_api: Enable BigQuery Storage API for faster reads
125
    - max_results: Maximum number of rows to return
126
    - verbose: Enable verbose logging (deprecated)
127
    - private_key: Service account private key (deprecated)
128
    - progress_bar_type: Progress bar type ('tqdm' or None)
129
    - dtypes: Explicit data type mapping for columns
130
    - auth_redirect_uri: Custom OAuth redirect URI
131
    - client_id: OAuth client ID
132
    - client_secret: OAuth client secret
133
    - col_order: Column order for results (keyword-only)
134
    - bigquery_client: Pre-configured BigQuery client instance (keyword-only)
135

136
    Returns:
137
    pandas.DataFrame: Query results as DataFrame with appropriate data types
138
    """
139
```
140

141
Usage examples:
142

143
```python
144
# Basic query
145
df = read_gbq("SELECT * FROM `dataset.table` LIMIT 1000", project_id="my-project")
146

147
# Direct table read
148
df = read_gbq("my_dataset.my_table", project_id="my-project")
149

150
# With BigQuery Storage API for large results
151
df = read_gbq(
152
    "SELECT * FROM `big_dataset.huge_table`",
153
    project_id="my-project",
154
    use_bqstorage_api=True
155
)
156

157
# With custom column ordering and types
158
df = read_gbq(
159
    "SELECT name, age, created_at FROM `users.profile`",
160
    project_id="my-project",
161
    columns=["name", "age", "created_at"],
162
    dtypes={"age": "Int64"}  # Use nullable integer type
163
)
164
```
165

166
### Data Writing
167

168
Upload pandas DataFrames to Google BigQuery tables with flexible schema handling, multiple upload methods, and comprehensive data type support.
169

170
```python { .api }
171
def to_gbq(
172
    dataframe: pd.DataFrame,
173
    destination_table: str,
174
    project_id: str = None,
175
    chunksize: int = None,
176
    reauth: bool = False,
177
    if_exists: str = "fail",
178
    auth_local_webserver: bool = True,
179
    table_schema: list[dict] = None,
180
    location: str = None,
181
    progress_bar: bool = True,
182
    credentials = None,
183
    api_method: str = "default",
184
    verbose: bool = None,
185
    private_key: str = None,
186
    auth_redirect_uri: str = None,
187
    client_id: str = None,
188
    client_secret: str = None,
189
    user_agent: str = None,
190
    rfc9110_delimiter: bool = False,
191
    bigquery_client = None
192
) -> None:
193
    """
194
    Write a DataFrame to a Google BigQuery table.
195

196
    Parameters:
197
    - dataframe: pandas DataFrame to upload
198
    - destination_table: Target table in format 'dataset.table' or 'project.dataset.table'
199
    - project_id: Google Cloud Platform project ID (optional if set in environment/context)
200
    - chunksize: Number of rows per upload chunk (None = upload all at once)
201
    - reauth: Force re-authentication
202
    - if_exists: Behavior when table exists ('fail', 'replace', 'append')
203
    - auth_local_webserver: Use local webserver flow instead of console flow
204
    - table_schema: Custom BigQuery schema specification as list of field dicts
205
    - location: Geographic location for table and load job
206
    - progress_bar: Show upload progress bar
207
    - credentials: Google auth credentials object
208
    - api_method: Upload method ('default', 'streaming', 'batch')
209
    - verbose: Enable verbose logging (deprecated)
210
    - private_key: Service account private key (deprecated)
211
    - auth_redirect_uri: Custom OAuth redirect URI
212
    - client_id: OAuth client ID
213
    - client_secret: OAuth client secret
214
    - user_agent: Custom user agent string
215
    - rfc9110_delimiter: Use RFC 9110 compliant field delimiters
216
    - bigquery_client: Pre-configured BigQuery client instance
217

218
    Returns:
219
    None
220
    """
221
```
222

223
Usage examples:
224

225
```python
226
import pandas as pd
227

228
df = pd.DataFrame({
229
    'id': [1, 2, 3],
230
    'name': ['Alice', 'Bob', 'Charlie'],
231
    'score': [95.5, 87.2, 92.1],
232
    'active': [True, False, True]
233
})
234

235
# Basic upload
236
to_gbq(df, "my_dataset.scores", project_id="my-project")
237

238
# Append to existing table
239
to_gbq(df, "my_dataset.scores", project_id="my-project", if_exists="append")
240

241
# Replace existing table
242
to_gbq(df, "my_dataset.scores", project_id="my-project", if_exists="replace")
243

244
# Custom schema specification
245
schema = [
246
    {"name": "id", "type": "INTEGER", "mode": "REQUIRED"},
247
    {"name": "name", "type": "STRING", "mode": "REQUIRED"},
248
    {"name": "score", "type": "FLOAT", "mode": "NULLABLE"},
249
    {"name": "active", "type": "BOOLEAN", "mode": "NULLABLE"}
250
]
251

252
to_gbq(
253
    df, 
254
    "my_dataset.scores", 
255
    project_id="my-project",
256
    table_schema=schema,
257
    if_exists="replace"
258
)
259

260
# Chunked upload for large DataFrames
261
to_gbq(
262
    large_df, 
263
    "my_dataset.big_table", 
264
    project_id="my-project",
265
    chunksize=10000,
266
    if_exists="append"
267
)
268
```
269

270
### Session Context Management
271

272
Manage session-wide configuration for credentials, default project, and SQL dialect preferences to reduce parameter repetition across function calls.
273

274
```python { .api }
275
class Context:
276
    """Storage for objects to be used throughout a session."""
277
    
278
    @property
279
    def credentials(self) -> google.auth.credentials.Credentials:
280
        """
281
        Credentials to use for Google APIs.
282
        
283
        Returns:
284
        google.auth.credentials.Credentials object or None
285
        """
286
    
287
    @credentials.setter
288
    def credentials(self, value: google.auth.credentials.Credentials) -> None:
289
        """Set credentials for Google APIs."""
290
    
291
    @property
292
    def project(self) -> str:
293
        """
294
        Default project to use for calls to Google APIs.
295
        
296
        Returns:
297
        str: Project ID or None
298
        """
299
    
300
    @project.setter
301
    def project(self, value: str) -> None:
302
        """Set default project ID."""
303
    
304
    @property
305
    def dialect(self) -> str:
306
        """
307
        Default SQL dialect to use in read_gbq.
308
        
309
        Returns:
310
        str: 'standard', 'legacy', or None
311
        """
312
    
313
    @dialect.setter
314
    def dialect(self, value: str) -> None:
315
        """Set default SQL dialect ('standard' or 'legacy')."""
316

317
# Global context instance
318
context: Context
319
```
320

321
Usage examples:
322

323
```python
324
import pandas_gbq
325
from google.oauth2 import service_account
326

327
# Set credentials from service account file
328
credentials = service_account.Credentials.from_service_account_file(
329
    '/path/to/service-account-key.json'
330
)
331
pandas_gbq.context.credentials = credentials
332

333
# Set default project
334
pandas_gbq.context.project = "my-gcp-project"
335

336
# Set SQL dialect preference
337
pandas_gbq.context.dialect = "standard"
338

339
# Now function calls can omit these parameters
340
df = read_gbq("SELECT * FROM my_dataset.my_table")
341
to_gbq(df, "my_dataset.output_table", if_exists="replace")
342

343
# Create custom context instances
344
custom_context = Context()
345
custom_context.project = "different-project"
346
```
347

348
## Exception Handling
349

350
pandas-gbq defines custom exceptions for different error conditions:
351

352
```python { .api }
353
# Core exceptions
354
class DatasetCreationError(ValueError):
355
    """Raised when dataset creation fails."""
356

357
class TableCreationError(ValueError):
358
    """Raised when table creation fails."""
359
    
360
    @property
361
    def message(self) -> str:
362
        """Error message."""
363

364
class InvalidSchema(ValueError):
365
    """Raised when DataFrame doesn't match BigQuery table schema."""
366
    
367
    @property  
368
    def message(self) -> str:
369
        """Error message."""
370

371
class NotFoundException(ValueError):
372
    """Raised when project, table, or dataset not found."""
373

374
class GenericGBQException(ValueError):
375
    """Raised for unrecognized Google API errors."""
376

377
class AccessDenied(ValueError):
378
    """Raised for authentication/authorization failures."""
379

380
class ConversionError(GenericGBQException):
381
    """Raised for DataFrame conversion problems."""
382

383
# Query and data handling exceptions
384
class QueryTimeout(ValueError):
385
    """Raised when query exceeds timeout."""
386

387
class InvalidColumnOrder(ValueError):
388
    """Raised when column order doesn't match schema."""
389

390
class InvalidIndexColumn(ValueError):
391
    """Raised when index column doesn't match schema."""
392

393
class InvalidPageToken(ValueError):
394
    """Raised for BigQuery page token failures."""
395

396
class InvalidPrivateKeyFormat(ValueError):
397
    """Raised for invalid private key format."""
398

399
# Warning types
400
class LargeResultsWarning(UserWarning):
401
    """Warning for results beyond recommended DataFrame size."""
402

403
class PerformanceWarning(RuntimeWarning):
404
    """Warning for unsupported performance features."""
405
```
406

407
Common error handling patterns:
408

409
```python
410
from pandas_gbq import read_gbq, to_gbq
411
from pandas_gbq.exceptions import (
412
    NotFoundException, 
413
    AccessDenied, 
414
    QueryTimeout,
415
    InvalidSchema
416
)
417

418
try:
419
    df = read_gbq("SELECT * FROM nonexistent.table", project_id="my-project")
420
except NotFoundException as e:
421
    print(f"Table not found: {e}")
422
except AccessDenied as e:
423
    print(f"Access denied: {e}")
424
except QueryTimeout as e:
425
    print(f"Query timed out: {e}")
426

427
try:
428
    to_gbq(df, "dataset.table", project_id="my-project")
429
except InvalidSchema as e:
430
    print(f"Schema mismatch: {e.message}")
431
except TableCreationError as e:
432
    print(f"Failed to create table: {e.message}")
433
```
434

435
## Type Definitions
436

437
Key data types and interfaces used throughout the pandas-gbq API:
438

439
```python { .api }
440
# Authentication types (from google-auth)
441
from google.auth.credentials import Credentials
442
from google.cloud.bigquery import Client as BigQueryClient
443

444
# pandas-gbq specific type hints
445
from typing import Optional, Union, List, Dict, Any
446

447
# Common type aliases used in function signatures
448
ProjectId = Optional[str]
449
TableId = str  # Format: "dataset.table" or "project.dataset.table"
450
QueryString = str
451
ColumnList = Optional[List[str]]
452
SchemaList = Optional[List[Dict[str, Any]]]
453
ConfigDict = Optional[Dict[str, Any]]
454
DtypeDict = Optional[Dict[str, Any]]
455

456
# BigQuery schema field format
457
SchemaField = Dict[str, Any]  # Example: {"name": "col1", "type": "STRING", "mode": "NULLABLE"}
458

459
# API method options
460
ApiMethod = str  # "default", "streaming", or "batch"
461
IfExistsOption = str  # "fail", "replace", or "append"
462
DialectOption = str  # "standard" or "legacy"
463
ProgressBarType = str  # "tqdm" or None
464
```
465

466
## Version Information
467

468
```python { .api }
469
__version__: str  # Current package version (0.29.2)
470
```
471

472
## Authentication Methods
473

474
pandas-gbq supports multiple authentication methods:
475

476
1. **Application Default Credentials (ADC)**: Automatic detection from environment
477
2. **Service Account Key Files**: JSON key files for service accounts
478
3. **User Account OAuth**: Interactive browser-based authentication
479
4. **Service Account Keys**: Direct credential objects
480
5. **Custom BigQuery Client**: Pre-configured client instances
481

482
```python
483
# ADC (recommended for production)
484
df = read_gbq("SELECT 1", project_id="my-project")  # Uses ADC automatically
485

486
# Service account file
487
from google.oauth2 import service_account
488
credentials = service_account.Credentials.from_service_account_file('key.json')
489
df = read_gbq("SELECT 1", project_id="my-project", credentials=credentials)
490

491
# User OAuth (interactive)
492
df = read_gbq("SELECT 1", project_id="my-project", reauth=True)
493

494
# Pre-configured client
495
from google.cloud import bigquery
496
client = bigquery.Client(project="my-project")
497
df = read_gbq("SELECT 1", bigquery_client=client)
498
```