0
# Metadata Retrieval
1
2
Retrieve schema information and metadata from SQL queries without loading the full dataset. This functionality is useful for understanding query structure, validating queries, and inspecting data types before performing full data loads.
3
4
## Capabilities
5
6
### Query Metadata Extraction
7
8
Get column names, data types, and other metadata information from a SQL query result set.
9
10
```python { .api }
11
def get_meta(
12
conn: str | ConnectionUrl,
13
query: str,
14
protocol: Protocol | None = None,
15
) -> pd.DataFrame:
16
"""
17
Get metadata (header) of the given query result set.
18
19
Parameters:
20
- conn: Database connection string or ConnectionUrl
21
- query: SQL query to analyze
22
- protocol: Backend-specific transfer protocol (optional)
23
24
Returns:
25
pandas DataFrame containing schema metadata
26
"""
27
```
28
29
**Usage Examples:**
30
31
```python
32
import connectorx as cx
33
34
postgres_url = "postgresql://username:password@server:port/database"
35
36
# Get metadata for a simple query
37
query = "SELECT customer_id, order_date, total_amount FROM orders"
38
meta_df = cx.get_meta(postgres_url, query)
39
print(meta_df)
40
41
# Get metadata for complex queries with joins
42
complex_query = """
43
SELECT
44
c.customer_name,
45
o.order_date,
46
oi.product_id,
47
oi.quantity * oi.unit_price as total_value
48
FROM customers c
49
JOIN orders o ON c.customer_id = o.customer_id
50
JOIN order_items oi ON o.order_id = oi.order_id
51
WHERE o.order_date >= '2023-01-01'
52
"""
53
meta_df = cx.get_meta(postgres_url, complex_query)
54
55
# Specify protocol for optimization
56
meta_df = cx.get_meta(postgres_url, query, protocol="cursor")
57
```
58
59
## Metadata Information
60
61
The returned DataFrame contains schema information about the query result:
62
63
### Column Information
64
65
- **Column names**: Names of all columns in the result set
66
- **Data types**: Database-specific and Python-equivalent data types
67
- **Nullability**: Whether columns can contain NULL values
68
- **Precision/Scale**: For decimal and numeric types
69
70
### Query Validation
71
72
Metadata retrieval serves as query validation:
73
74
- **Syntax errors**: Invalid SQL will raise errors during metadata extraction
75
- **Missing tables/columns**: References to non-existent objects are caught
76
- **Type conflicts**: Join and comparison type mismatches are identified
77
- **Permission issues**: Access control violations are detected
78
79
## Implementation Details
80
81
### Query Execution
82
83
ConnectorX typically executes a `LIMIT 1` version of your query to extract metadata:
84
85
```sql
86
-- Your query
87
SELECT customer_id, order_date, total_amount FROM orders
88
89
-- Executed for metadata (conceptually)
90
SELECT customer_id, order_date, total_amount FROM orders LIMIT 1
91
```
92
93
### Protocol Support
94
95
Different protocols may affect metadata extraction:
96
97
- **binary**: Fastest metadata extraction for most databases
98
- **cursor**: May be required for some server configurations
99
- **text**: Alternative for databases with binary protocol issues
100
101
## Use Cases
102
103
### Pre-flight Query Validation
104
105
Validate queries before executing expensive operations:
106
107
```python
108
try:
109
meta = cx.get_meta(conn, expensive_query)
110
print(f"Query will return {len(meta.columns)} columns")
111
112
# Proceed with full query
113
df = cx.read_sql(conn, expensive_query, partition_num=10)
114
except Exception as e:
115
print(f"Query validation failed: {e}")
116
```
117
118
### Schema Discovery
119
120
Explore database schemas and query results:
121
122
```python
123
# Discover table structure
124
table_meta = cx.get_meta(conn, "SELECT * FROM unknown_table")
125
print("Available columns:", table_meta.columns.tolist())
126
print("Data types:", table_meta.dtypes.to_dict())
127
128
# Analyze complex query results
129
join_meta = cx.get_meta(conn, complex_join_query)
130
for col in join_meta.columns:
131
print(f"{col}: {join_meta[col].dtype}")
132
```
133
134
### ETL Pipeline Planning
135
136
Plan data transformations based on source metadata:
137
138
```python
139
source_meta = cx.get_meta(source_conn, extraction_query)
140
141
# Plan transformations based on source types
142
for col_name, col_type in source_meta.dtypes.items():
143
if col_type == 'object': # String columns
144
print(f"Plan text processing for {col_name}")
145
elif 'datetime' in str(col_type): # Date columns
146
print(f"Plan date transformation for {col_name}")
147
```
148
149
### Dynamic Query Building
150
151
Build queries dynamically based on available columns:
152
153
```python
154
# Get available columns
155
meta = cx.get_meta(conn, "SELECT * FROM dynamic_table")
156
available_cols = meta.columns.tolist()
157
158
# Build query based on available columns
159
required_cols = ['id', 'name', 'created_date']
160
existing_cols = [col for col in required_cols if col in available_cols]
161
162
if existing_cols:
163
dynamic_query = f"SELECT {', '.join(existing_cols)} FROM dynamic_table"
164
df = cx.read_sql(conn, dynamic_query)
165
```
166
167
## Error Handling
168
169
Common errors when retrieving metadata:
170
171
- **Connection errors**: Database unavailable or authentication failure
172
- **Permission errors**: Insufficient privileges to execute query
173
- **Syntax errors**: Invalid SQL syntax
174
- **Missing objects**: Referenced tables/columns don't exist
175
- **Protocol errors**: Unsupported protocol for specific database
176
177
```python
178
try:
179
meta = cx.get_meta(conn, query)
180
except Exception as e:
181
if "permission denied" in str(e).lower():
182
print("Check database permissions")
183
elif "does not exist" in str(e).lower():
184
print("Referenced table or column not found")
185
else:
186
print(f"Metadata extraction failed: {e}")
187
```