0
# Database Metadata and Introspection
1
2
Database schema introspection capabilities for retrieving metadata about tables, columns, procedures, and other database objects. The redshift_connector provides comprehensive metadata operations that enable applications to dynamically discover and work with database schemas.
3
4
## Capabilities
5
6
### Table Metadata
7
8
Retrieve information about tables, views, and other relations in the database with optional filtering by catalog, schema, and table name patterns.
9
10
```python { .api }
11
class Cursor:
12
def get_tables(
13
self,
14
catalog: str = None,
15
schema: str = None,
16
table: str = None,
17
types: list[str] = None
18
) -> tuple:
19
"""
20
Retrieve table metadata from the database.
21
22
Parameters:
23
- catalog: Catalog name pattern (None for all catalogs)
24
- schema: Schema name pattern (None for all schemas)
25
- table: Table name pattern (None for all tables)
26
- types: List of table types to include (e.g., ['TABLE', 'VIEW'])
27
28
Returns:
29
Tuple of tuples, each containing:
30
(table_cat, table_schem, table_name, table_type, remarks,
31
type_cat, type_schem, type_name, self_referencing_col_name,
32
ref_generation)
33
34
Common table types: 'TABLE', 'VIEW', 'SYSTEM TABLE', 'GLOBAL TEMPORARY',
35
'LOCAL TEMPORARY', 'ALIAS', 'SYNONYM'
36
"""
37
```
38
39
### Column Metadata
40
41
Detailed information about table columns including data types, nullability, defaults, and constraints.
42
43
```python { .api }
44
class Cursor:
45
def get_columns(
46
self,
47
catalog: str = None,
48
schema: str = None,
49
table: str = None,
50
column: str = None
51
) -> tuple:
52
"""
53
Retrieve column metadata from database tables.
54
55
Parameters:
56
- catalog: Catalog name pattern (None for all catalogs)
57
- schema: Schema name pattern (None for all schemas)
58
- table: Table name pattern (None for all tables)
59
- column: Column name pattern (None for all columns)
60
61
Returns:
62
Tuple of tuples, each containing:
63
(table_cat, table_schem, table_name, column_name, data_type,
64
type_name, column_size, buffer_length, decimal_digits, num_prec_radix,
65
nullable, remarks, column_def, sql_data_type, sql_datetime_sub,
66
char_octet_length, ordinal_position, is_nullable, scope_catalog,
67
scope_schema, scope_table, source_data_type, is_autoincrement, is_generatedcolumn)
68
"""
69
```
70
71
### Primary Key Information
72
73
Retrieve primary key constraints and their constituent columns for tables.
74
75
```python { .api }
76
class Cursor:
77
def get_primary_keys(
78
self,
79
catalog: str = None,
80
schema: str = None,
81
table: str = None
82
) -> tuple:
83
"""
84
Retrieve primary key information for tables.
85
86
Parameters:
87
- catalog: Catalog name (None for current catalog)
88
- schema: Schema name (None for current schema)
89
- table: Table name (None for all tables)
90
91
Returns:
92
Tuple of tuples, each containing:
93
(table_cat, table_schem, table_name, column_name, key_seq, pk_name)
94
95
Results are ordered by table_cat, table_schem, table_name, key_seq
96
"""
97
```
98
99
### Stored Procedure Metadata
100
101
Information about stored procedures and functions available in the database.
102
103
```python { .api }
104
class Cursor:
105
def get_procedures(
106
self,
107
catalog: str = None,
108
schema: str = None,
109
procedure: str = None
110
) -> tuple:
111
"""
112
Retrieve stored procedure metadata.
113
114
Parameters:
115
- catalog: Catalog name pattern (None for all catalogs)
116
- schema: Schema name pattern (None for all schemas)
117
- procedure: Procedure name pattern (None for all procedures)
118
119
Returns:
120
Tuple of tuples, each containing:
121
(procedure_cat, procedure_schem, procedure_name, num_input_params,
122
num_output_params, num_result_sets, remarks, procedure_type,
123
specific_name)
124
"""
125
```
126
127
### Schema Information
128
129
Retrieve information about available schemas (namespaces) in the database.
130
131
```python { .api }
132
class Cursor:
133
def get_schemas(
134
self,
135
catalog: str = None,
136
schema: str = None
137
) -> tuple:
138
"""
139
Retrieve schema metadata.
140
141
Parameters:
142
- catalog: Catalog name (None for current catalog)
143
- schema: Schema name pattern (None for all schemas)
144
145
Returns:
146
Tuple of tuples, each containing:
147
(table_schem, table_catalog)
148
"""
149
```
150
151
### Catalog Information
152
153
Retrieve information about available catalogs (databases) accessible to the current connection.
154
155
```python { .api }
156
class Cursor:
157
def get_catalogs(self) -> tuple:
158
"""
159
Retrieve catalog (database) metadata.
160
161
Returns:
162
Tuple of tuples, each containing:
163
(table_cat,)
164
165
Lists all catalogs (databases) available to the current user.
166
"""
167
```
168
169
### Metadata Usage Examples
170
171
Practical examples demonstrating metadata introspection for common use cases.
172
173
```python
174
import redshift_connector
175
176
conn = redshift_connector.connect(
177
host='examplecluster.abc123xyz789.us-west-1.redshift.amazonaws.com',
178
database='dev',
179
user='awsuser',
180
password='password'
181
)
182
183
cursor = conn.cursor()
184
185
# List all tables in the current database
186
tables = cursor.get_tables()
187
print("Available tables:")
188
for table in tables:
189
catalog, schema, table_name, table_type = table[:4]
190
print(f" {schema}.{table_name} ({table_type})")
191
192
# List tables in a specific schema
193
sales_tables = cursor.get_tables(schema='sales')
194
print("\nTables in 'sales' schema:")
195
for table in sales_tables:
196
print(f" {table[2]} ({table[3]})") # table_name, table_type
197
198
# Get detailed column information for a specific table
199
columns = cursor.get_columns(schema='sales', table='orders')
200
print("\nColumns in 'sales.orders' table:")
201
for col in columns:
202
schema_name, table_name, col_name, data_type, type_name, col_size, nullable = col[1], col[2], col[3], col[4], col[5], col[6], col[10]
203
null_str = "NULL" if nullable else "NOT NULL"
204
print(f" {col_name}: {type_name}({col_size}) {null_str}")
205
206
# Find primary keys for a table
207
pk_info = cursor.get_primary_keys(schema='sales', table='orders')
208
if pk_info:
209
print(f"\nPrimary key for 'sales.orders':")
210
pk_columns = [col[3] for col in sorted(pk_info, key=lambda x: x[4])] # Sort by key_seq
211
print(f" Columns: {', '.join(pk_columns)}")
212
print(f" Constraint name: {pk_info[0][5]}")
213
214
# List all schemas
215
schemas = cursor.get_schemas()
216
print(f"\nAvailable schemas:")
217
for schema in schemas:
218
print(f" {schema[0]}")
219
220
# List all catalogs (databases)
221
catalogs = cursor.get_catalogs()
222
print(f"\nAvailable catalogs:")
223
for catalog in catalogs:
224
print(f" {catalog[0]}")
225
226
# Find tables matching a pattern
227
user_tables = cursor.get_tables(table='user%') # Tables starting with 'user'
228
print(f"\nTables matching 'user%' pattern:")
229
for table in user_tables:
230
print(f" {table[1]}.{table[2]}")
231
232
# Get stored procedures
233
procedures = cursor.get_procedures()
234
if procedures:
235
print(f"\nAvailable procedures:")
236
for proc in procedures:
237
schema, proc_name, proc_type = proc[1], proc[2], proc[7]
238
print(f" {schema}.{proc_name} (type: {proc_type})")
239
else:
240
print("\nNo stored procedures found")
241
242
cursor.close()
243
conn.close()
244
```
245
246
### Dynamic Schema Discovery
247
248
Advanced patterns for dynamically discovering and working with database schemas.
249
250
```python
251
def discover_table_structure(cursor, schema_name, table_name):
252
"""
253
Discover complete table structure including columns, primary keys, and metadata.
254
"""
255
# Get table information
256
tables = cursor.get_tables(schema=schema_name, table=table_name)
257
if not tables:
258
raise ValueError(f"Table {schema_name}.{table_name} not found")
259
260
table_info = tables[0]
261
table_type = table_info[3]
262
263
# Get column information
264
columns = cursor.get_columns(schema=schema_name, table=table_name)
265
column_info = []
266
267
for col in columns:
268
column_info.append({
269
'name': col[3],
270
'data_type': col[4],
271
'type_name': col[5],
272
'column_size': col[6],
273
'decimal_digits': col[8],
274
'nullable': col[10] == 1,
275
'default': col[12],
276
'ordinal_position': col[16],
277
'is_autoincrement': col[22] == 'YES',
278
})
279
280
# Get primary key information
281
pk_cols = cursor.get_primary_keys(schema=schema_name, table=table_name)
282
primary_key = []
283
if pk_cols:
284
primary_key = [col[3] for col in sorted(pk_cols, key=lambda x: x[4])]
285
286
return {
287
'schema': schema_name,
288
'table': table_name,
289
'table_type': table_type,
290
'columns': column_info,
291
'primary_key': primary_key
292
}
293
294
def generate_create_table_ddl(cursor, schema_name, table_name):
295
"""
296
Generate CREATE TABLE DDL statement based on table metadata.
297
"""
298
structure = discover_table_structure(cursor, schema_name, table_name)
299
300
ddl = f"CREATE TABLE {schema_name}.{table_name} (\n"
301
302
column_defs = []
303
for col in structure['columns']:
304
col_def = f" {col['name']} {col['type_name']}"
305
306
if col['column_size']:
307
col_def += f"({col['column_size']}"
308
if col['decimal_digits']:
309
col_def += f", {col['decimal_digits']}"
310
col_def += ")"
311
312
if not col['nullable']:
313
col_def += " NOT NULL"
314
315
if col['default']:
316
col_def += f" DEFAULT {col['default']}"
317
318
column_defs.append(col_def)
319
320
ddl += ",\n".join(column_defs)
321
322
if structure['primary_key']:
323
pk_def = f" PRIMARY KEY ({', '.join(structure['primary_key'])})"
324
ddl += f",\n{pk_def}"
325
326
ddl += "\n);"
327
328
return ddl
329
330
# Usage example
331
cursor = conn.cursor()
332
try:
333
# Discover table structure
334
structure = discover_table_structure(cursor, 'public', 'users')
335
print(f"Table: {structure['schema']}.{structure['table']}")
336
print(f"Type: {structure['table_type']}")
337
print(f"Columns: {len(structure['columns'])}")
338
print(f"Primary Key: {structure['primary_key']}")
339
340
# Generate DDL
341
ddl = generate_create_table_ddl(cursor, 'public', 'users')
342
print(f"\nGenerated DDL:\n{ddl}")
343
344
except ValueError as e:
345
print(f"Error: {e}")
346
```
347
348
### Cross-Database Metadata Operations
349
350
Working with metadata across multiple databases and schemas in Redshift.
351
352
```python { .api }
353
# Database metadata scope configuration
354
conn = redshift_connector.connect(
355
# ... connection parameters
356
database_metadata_current_db_only=True # Limit metadata to current database (default)
357
# database_metadata_current_db_only=False # Include cross-database metadata (datashare support)
358
)
359
360
def list_all_databases_and_schemas(cursor):
361
"""
362
List all accessible databases and their schemas.
363
"""
364
catalogs = cursor.get_catalogs()
365
366
for catalog in catalogs:
367
catalog_name = catalog[0]
368
print(f"Database: {catalog_name}")
369
370
# Get schemas in this catalog
371
schemas = cursor.get_schemas(catalog=catalog_name)
372
for schema in schemas:
373
schema_name = schema[0]
374
print(f" Schema: {schema_name}")
375
376
# Get table count in schema
377
tables = cursor.get_tables(catalog=catalog_name, schema=schema_name)
378
print(f" Tables: {len(tables)}")
379
380
def find_tables_by_column(cursor, column_name):
381
"""
382
Find all tables that contain a specific column name.
383
"""
384
columns = cursor.get_columns(column=column_name)
385
386
tables_with_column = set()
387
for col in columns:
388
table_key = (col[0], col[1], col[2]) # catalog, schema, table
389
tables_with_column.add(table_key)
390
391
return list(tables_with_column)
392
393
# Example usage
394
cursor = conn.cursor()
395
396
# Find all tables with a 'user_id' column
397
tables_with_user_id = find_tables_by_column(cursor, 'user_id')
398
print("Tables containing 'user_id' column:")
399
for catalog, schema, table in tables_with_user_id:
400
print(f" {catalog}.{schema}.{table}")
401
```
402
403
### Metadata Caching and Performance
404
405
Strategies for efficient metadata operations and caching for better performance.
406
407
```python
408
class MetadataCache:
409
"""Simple metadata cache for improved performance."""
410
411
def __init__(self, cursor):
412
self.cursor = cursor
413
self._table_cache = {}
414
self._column_cache = {}
415
416
def get_table_info(self, schema, table):
417
"""Get cached table information."""
418
key = (schema, table)
419
if key not in self._table_cache:
420
tables = self.cursor.get_tables(schema=schema, table=table)
421
self._table_cache[key] = tables[0] if tables else None
422
return self._table_cache[key]
423
424
def get_column_info(self, schema, table):
425
"""Get cached column information."""
426
key = (schema, table)
427
if key not in self._column_cache:
428
columns = self.cursor.get_columns(schema=schema, table=table)
429
self._column_cache[key] = columns
430
return self._column_cache[key]
431
432
def clear_cache(self):
433
"""Clear the metadata cache."""
434
self._table_cache.clear()
435
self._column_cache.clear()
436
437
# Usage example
438
cursor = conn.cursor()
439
metadata_cache = MetadataCache(cursor)
440
441
# First access - queries database
442
table_info = metadata_cache.get_table_info('public', 'users')
443
columns = metadata_cache.get_column_info('public', 'users')
444
445
# Subsequent accesses - uses cache
446
table_info2 = metadata_cache.get_table_info('public', 'users') # From cache
447
columns2 = metadata_cache.get_column_info('public', 'users') # From cache
448
```