or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-pandas-gbq

Google BigQuery connector for pandas DataFrames

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/pandas-gbq@0.29.x

To install, run

npx @tessl/cli install tessl/pypi-pandas-gbq@0.29.0

0

# pandas-gbq

1

2

Google BigQuery connector for pandas DataFrames providing seamless two-way data integration. The library offers high-level functions for executing SQL queries and returning results as DataFrames, uploading DataFrames to BigQuery tables, with comprehensive support for authentication, schema inference, data type mapping, and performance optimizations.

3

4

## Package Information

5

6

- **Package Name**: pandas-gbq

7

- **Language**: Python

8

- **Installation**: `pip install pandas-gbq`

9

- **Optional Features**:

10

- BigQuery Storage API: `pip install pandas-gbq[bqstorage]`

11

- Progress bars: `pip install pandas-gbq[tqdm]`

12

- Geographic data: `pip install pandas-gbq[geopandas]`

13

14

## Core Imports

15

16

```python

17

import pandas_gbq

18

```

19

20

Standard import pattern:

21

22

```python

23

from pandas_gbq import read_gbq, to_gbq

24

```

25

26

For context management:

27

28

```python

29

from pandas_gbq import context, Context

30

```

31

32

## Basic Usage

33

34

```python

35

import pandas as pd

36

from pandas_gbq import read_gbq, to_gbq

37

38

# Reading data from BigQuery

39

df = read_gbq(

40

query="SELECT name, age FROM `my_project.my_dataset.users` WHERE age > 21",

41

project_id="my-gcp-project"

42

)

43

44

# Writing DataFrame to BigQuery

45

sample_data = pd.DataFrame({

46

'name': ['Alice', 'Bob', 'Charlie'],

47

'age': [25, 30, 35],

48

'city': ['New York', 'London', 'Tokyo']

49

})

50

51

to_gbq(

52

dataframe=sample_data,

53

destination_table="my_dataset.new_users",

54

project_id="my-gcp-project",

55

if_exists="append"

56

)

57

58

# Using context for session-wide configuration

59

import pandas_gbq

60

pandas_gbq.context.project = "my-gcp-project"

61

pandas_gbq.context.dialect = "standard"

62

63

# Now queries can omit project_id

64

df = read_gbq("SELECT * FROM `my_dataset.my_table`")

65

```

66

67

## Architecture

68

69

pandas-gbq serves as a high-level connector between pandas DataFrames and Google BigQuery, built on top of the Google Cloud BigQuery client library. The library provides a seamless interface that handles the complexities of BigQuery integration:

70

71

- **Authentication Layer**: Supports multiple authentication methods including Application Default Credentials, service account keys, and user OAuth flows

72

- **Data Type Mapping**: Automatically handles conversion between pandas data types and BigQuery types, with support for nullable integers, timestamps, and geographic data

73

- **Query Execution**: Executes SQL queries through BigQuery's REST API with support for both standard and legacy SQL dialects

74

- **Data Transfer**: Optimizes data transfer with chunking for large uploads, BigQuery Storage API for fast reads, and streaming inserts for real-time data

75

- **Session Management**: Provides a global context system for credential and configuration management across multiple operations

76

77

The design prioritizes ease of use while exposing advanced BigQuery features through comprehensive parameter options, making it suitable for both simple data analysis tasks and complex ETL pipelines.

78

79

## Capabilities

80

81

### Data Reading

82

83

Execute SQL queries or read tables directly from Google BigQuery, returning results as pandas DataFrames with full data type support and optional BigQuery Storage API acceleration.

84

85

```python { .api }

86

def read_gbq(

87

query_or_table: str,

88

project_id: str = None,

89

index_col: str = None,

90

columns: list[str] = None,

91

reauth: bool = False,

92

auth_local_webserver: bool = True,

93

dialect: str = None,

94

location: str = None,

95

configuration: dict = None,

96

credentials = None,

97

use_bqstorage_api: bool = False,

98

max_results: int = None,

99

verbose: bool = None,

100

private_key: str = None,

101

progress_bar_type: str = "tqdm",

102

dtypes: dict = None,

103

auth_redirect_uri: str = None,

104

client_id: str = None,

105

client_secret: str = None,

106

*,

107

col_order: list[str] = None,

108

bigquery_client = None

109

) -> pd.DataFrame:

110

"""

111

Read data from Google BigQuery to a pandas DataFrame.

112

113

Parameters:

114

- query_or_table: SQL query string or table ID (dataset.table format)

115

- project_id: Google Cloud Platform project ID (optional if set in environment/context)

116

- index_col: Column name to use as DataFrame index

117

- columns: List of column names in desired order for results

118

- reauth: Force re-authentication (useful for multiple accounts)

119

- auth_local_webserver: Use local webserver flow instead of console flow

120

- dialect: SQL dialect ('standard' or 'legacy', default: 'standard')

121

- location: Geographic location for query job execution

122

- configuration: Advanced query configuration parameters

123

- credentials: Google auth credentials object

124

- use_bqstorage_api: Enable BigQuery Storage API for faster reads

125

- max_results: Maximum number of rows to return

126

- verbose: Enable verbose logging (deprecated)

127

- private_key: Service account private key (deprecated)

128

- progress_bar_type: Progress bar type ('tqdm' or None)

129

- dtypes: Explicit data type mapping for columns

130

- auth_redirect_uri: Custom OAuth redirect URI

131

- client_id: OAuth client ID

132

- client_secret: OAuth client secret

133

- col_order: Column order for results (keyword-only)

134

- bigquery_client: Pre-configured BigQuery client instance (keyword-only)

135

136

Returns:

137

pandas.DataFrame: Query results as DataFrame with appropriate data types

138

"""

139

```

140

141

Usage examples:

142

143

```python

144

# Basic query

145

df = read_gbq("SELECT * FROM `dataset.table` LIMIT 1000", project_id="my-project")

146

147

# Direct table read

148

df = read_gbq("my_dataset.my_table", project_id="my-project")

149

150

# With BigQuery Storage API for large results

151

df = read_gbq(

152

"SELECT * FROM `big_dataset.huge_table`",

153

project_id="my-project",

154

use_bqstorage_api=True

155

)

156

157

# With custom column ordering and types

158

df = read_gbq(

159

"SELECT name, age, created_at FROM `users.profile`",

160

project_id="my-project",

161

columns=["name", "age", "created_at"],

162

dtypes={"age": "Int64"} # Use nullable integer type

163

)

164

```

165

166

### Data Writing

167

168

Upload pandas DataFrames to Google BigQuery tables with flexible schema handling, multiple upload methods, and comprehensive data type support.

169

170

```python { .api }

171

def to_gbq(

172

dataframe: pd.DataFrame,

173

destination_table: str,

174

project_id: str = None,

175

chunksize: int = None,

176

reauth: bool = False,

177

if_exists: str = "fail",

178

auth_local_webserver: bool = True,

179

table_schema: list[dict] = None,

180

location: str = None,

181

progress_bar: bool = True,

182

credentials = None,

183

api_method: str = "default",

184

verbose: bool = None,

185

private_key: str = None,

186

auth_redirect_uri: str = None,

187

client_id: str = None,

188

client_secret: str = None,

189

user_agent: str = None,

190

rfc9110_delimiter: bool = False,

191

bigquery_client = None

192

) -> None:

193

"""

194

Write a DataFrame to a Google BigQuery table.

195

196

Parameters:

197

- dataframe: pandas DataFrame to upload

198

- destination_table: Target table in format 'dataset.table' or 'project.dataset.table'

199

- project_id: Google Cloud Platform project ID (optional if set in environment/context)

200

- chunksize: Number of rows per upload chunk (None = upload all at once)

201

- reauth: Force re-authentication

202

- if_exists: Behavior when table exists ('fail', 'replace', 'append')

203

- auth_local_webserver: Use local webserver flow instead of console flow

204

- table_schema: Custom BigQuery schema specification as list of field dicts

205

- location: Geographic location for table and load job

206

- progress_bar: Show upload progress bar

207

- credentials: Google auth credentials object

208

- api_method: Upload method ('default', 'streaming', 'batch')

209

- verbose: Enable verbose logging (deprecated)

210

- private_key: Service account private key (deprecated)

211

- auth_redirect_uri: Custom OAuth redirect URI

212

- client_id: OAuth client ID

213

- client_secret: OAuth client secret

214

- user_agent: Custom user agent string

215

- rfc9110_delimiter: Use RFC 9110 compliant field delimiters

216

- bigquery_client: Pre-configured BigQuery client instance

217

218

Returns:

219

None

220

"""

221

```

222

223

Usage examples:

224

225

```python

226

import pandas as pd

227

228

df = pd.DataFrame({

229

'id': [1, 2, 3],

230

'name': ['Alice', 'Bob', 'Charlie'],

231

'score': [95.5, 87.2, 92.1],

232

'active': [True, False, True]

233

})

234

235

# Basic upload

236

to_gbq(df, "my_dataset.scores", project_id="my-project")

237

238

# Append to existing table

239

to_gbq(df, "my_dataset.scores", project_id="my-project", if_exists="append")

240

241

# Replace existing table

242

to_gbq(df, "my_dataset.scores", project_id="my-project", if_exists="replace")

243

244

# Custom schema specification

245

schema = [

246

{"name": "id", "type": "INTEGER", "mode": "REQUIRED"},

247

{"name": "name", "type": "STRING", "mode": "REQUIRED"},

248

{"name": "score", "type": "FLOAT", "mode": "NULLABLE"},

249

{"name": "active", "type": "BOOLEAN", "mode": "NULLABLE"}

250

]

251

252

to_gbq(

253

df,

254

"my_dataset.scores",

255

project_id="my-project",

256

table_schema=schema,

257

if_exists="replace"

258

)

259

260

# Chunked upload for large DataFrames

261

to_gbq(

262

large_df,

263

"my_dataset.big_table",

264

project_id="my-project",

265

chunksize=10000,

266

if_exists="append"

267

)

268

```

269

270

### Session Context Management

271

272

Manage session-wide configuration for credentials, default project, and SQL dialect preferences to reduce parameter repetition across function calls.

273

274

```python { .api }

275

class Context:

276

"""Storage for objects to be used throughout a session."""

277

278

@property

279

def credentials(self) -> google.auth.credentials.Credentials:

280

"""

281

Credentials to use for Google APIs.

282

283

Returns:

284

google.auth.credentials.Credentials object or None

285

"""

286

287

@credentials.setter

288

def credentials(self, value: google.auth.credentials.Credentials) -> None:

289

"""Set credentials for Google APIs."""

290

291

@property

292

def project(self) -> str:

293

"""

294

Default project to use for calls to Google APIs.

295

296

Returns:

297

str: Project ID or None

298

"""

299

300

@project.setter

301

def project(self, value: str) -> None:

302

"""Set default project ID."""

303

304

@property

305

def dialect(self) -> str:

306

"""

307

Default SQL dialect to use in read_gbq.

308

309

Returns:

310

str: 'standard', 'legacy', or None

311

"""

312

313

@dialect.setter

314

def dialect(self, value: str) -> None:

315

"""Set default SQL dialect ('standard' or 'legacy')."""

316

317

# Global context instance

318

context: Context

319

```

320

321

Usage examples:

322

323

```python

324

import pandas_gbq

325

from google.oauth2 import service_account

326

327

# Set credentials from service account file

328

credentials = service_account.Credentials.from_service_account_file(

329

'/path/to/service-account-key.json'

330

)

331

pandas_gbq.context.credentials = credentials

332

333

# Set default project

334

pandas_gbq.context.project = "my-gcp-project"

335

336

# Set SQL dialect preference

337

pandas_gbq.context.dialect = "standard"

338

339

# Now function calls can omit these parameters

340

df = read_gbq("SELECT * FROM my_dataset.my_table")

341

to_gbq(df, "my_dataset.output_table", if_exists="replace")

342

343

# Create custom context instances

344

custom_context = Context()

345

custom_context.project = "different-project"

346

```

347

348

## Exception Handling

349

350

pandas-gbq defines custom exceptions for different error conditions:

351

352

```python { .api }

353

# Core exceptions

354

class DatasetCreationError(ValueError):

355

"""Raised when dataset creation fails."""

356

357

class TableCreationError(ValueError):

358

"""Raised when table creation fails."""

359

360

@property

361

def message(self) -> str:

362

"""Error message."""

363

364

class InvalidSchema(ValueError):

365

"""Raised when DataFrame doesn't match BigQuery table schema."""

366

367

@property

368

def message(self) -> str:

369

"""Error message."""

370

371

class NotFoundException(ValueError):

372

"""Raised when project, table, or dataset not found."""

373

374

class GenericGBQException(ValueError):

375

"""Raised for unrecognized Google API errors."""

376

377

class AccessDenied(ValueError):

378

"""Raised for authentication/authorization failures."""

379

380

class ConversionError(GenericGBQException):

381

"""Raised for DataFrame conversion problems."""

382

383

# Query and data handling exceptions

384

class QueryTimeout(ValueError):

385

"""Raised when query exceeds timeout."""

386

387

class InvalidColumnOrder(ValueError):

388

"""Raised when column order doesn't match schema."""

389

390

class InvalidIndexColumn(ValueError):

391

"""Raised when index column doesn't match schema."""

392

393

class InvalidPageToken(ValueError):

394

"""Raised for BigQuery page token failures."""

395

396

class InvalidPrivateKeyFormat(ValueError):

397

"""Raised for invalid private key format."""

398

399

# Warning types

400

class LargeResultsWarning(UserWarning):

401

"""Warning for results beyond recommended DataFrame size."""

402

403

class PerformanceWarning(RuntimeWarning):

404

"""Warning for unsupported performance features."""

405

```

406

407

Common error handling patterns:

408

409

```python

410

from pandas_gbq import read_gbq, to_gbq

411

from pandas_gbq.exceptions import (

412

NotFoundException,

413

AccessDenied,

414

QueryTimeout,

415

InvalidSchema

416

)

417

418

try:

419

df = read_gbq("SELECT * FROM nonexistent.table", project_id="my-project")

420

except NotFoundException as e:

421

print(f"Table not found: {e}")

422

except AccessDenied as e:

423

print(f"Access denied: {e}")

424

except QueryTimeout as e:

425

print(f"Query timed out: {e}")

426

427

try:

428

to_gbq(df, "dataset.table", project_id="my-project")

429

except InvalidSchema as e:

430

print(f"Schema mismatch: {e.message}")

431

except TableCreationError as e:

432

print(f"Failed to create table: {e.message}")

433

```

434

435

## Type Definitions

436

437

Key data types and interfaces used throughout the pandas-gbq API:

438

439

```python { .api }

440

# Authentication types (from google-auth)

441

from google.auth.credentials import Credentials

442

from google.cloud.bigquery import Client as BigQueryClient

443

444

# pandas-gbq specific type hints

445

from typing import Optional, Union, List, Dict, Any

446

447

# Common type aliases used in function signatures

448

ProjectId = Optional[str]

449

TableId = str # Format: "dataset.table" or "project.dataset.table"

450

QueryString = str

451

ColumnList = Optional[List[str]]

452

SchemaList = Optional[List[Dict[str, Any]]]

453

ConfigDict = Optional[Dict[str, Any]]

454

DtypeDict = Optional[Dict[str, Any]]

455

456

# BigQuery schema field format

457

SchemaField = Dict[str, Any] # Example: {"name": "col1", "type": "STRING", "mode": "NULLABLE"}

458

459

# API method options

460

ApiMethod = str # "default", "streaming", or "batch"

461

IfExistsOption = str # "fail", "replace", or "append"

462

DialectOption = str # "standard" or "legacy"

463

ProgressBarType = str # "tqdm" or None

464

```

465

466

## Version Information

467

468

```python { .api }

469

__version__: str # Current package version (0.29.2)

470

```

471

472

## Authentication Methods

473

474

pandas-gbq supports multiple authentication methods:

475

476

1. **Application Default Credentials (ADC)**: Automatic detection from environment

477

2. **Service Account Key Files**: JSON key files for service accounts

478

3. **User Account OAuth**: Interactive browser-based authentication

479

4. **Service Account Keys**: Direct credential objects

480

5. **Custom BigQuery Client**: Pre-configured client instances

481

482

```python

483

# ADC (recommended for production)

484

df = read_gbq("SELECT 1", project_id="my-project") # Uses ADC automatically

485

486

# Service account file

487

from google.oauth2 import service_account

488

credentials = service_account.Credentials.from_service_account_file('key.json')

489

df = read_gbq("SELECT 1", project_id="my-project", credentials=credentials)

490

491

# User OAuth (interactive)

492

df = read_gbq("SELECT 1", project_id="my-project", reauth=True)

493

494

# Pre-configured client

495

from google.cloud import bigquery

496

client = bigquery.Client(project="my-project")

497

df = read_gbq("SELECT 1", bigquery_client=client)

498

```