Tessl Tile for pypi/connectorx@0.4.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

connection-management.md data-loading.md federated-queries.md index.md metadata-retrieval.md query-partitioning.md

metadata-retrieval.mddocs/

0
# Metadata Retrieval
1

2
Retrieve schema information and metadata from SQL queries without loading the full dataset. This functionality is useful for understanding query structure, validating queries, and inspecting data types before performing full data loads.
3

4
## Capabilities
5

6
### Query Metadata Extraction
7

8
Get column names, data types, and other metadata information from a SQL query result set.
9

10
```python { .api }
11
def get_meta(
12
    conn: str | ConnectionUrl,
13
    query: str,
14
    protocol: Protocol | None = None,
15
) -> pd.DataFrame:
16
    """
17
    Get metadata (header) of the given query result set.
18

19
    Parameters:
20
    - conn: Database connection string or ConnectionUrl
21
    - query: SQL query to analyze
22
    - protocol: Backend-specific transfer protocol (optional)
23

24
    Returns:
25
    pandas DataFrame containing schema metadata
26
    """
27
```
28

29
**Usage Examples:**
30

31
```python
32
import connectorx as cx
33

34
postgres_url = "postgresql://username:password@server:port/database"
35

36
# Get metadata for a simple query
37
query = "SELECT customer_id, order_date, total_amount FROM orders"
38
meta_df = cx.get_meta(postgres_url, query)
39
print(meta_df)
40

41
# Get metadata for complex queries with joins
42
complex_query = """
43
SELECT 
44
    c.customer_name,
45
    o.order_date,
46
    oi.product_id,
47
    oi.quantity * oi.unit_price as total_value
48
FROM customers c
49
JOIN orders o ON c.customer_id = o.customer_id  
50
JOIN order_items oi ON o.order_id = oi.order_id
51
WHERE o.order_date >= '2023-01-01'
52
"""
53
meta_df = cx.get_meta(postgres_url, complex_query)
54

55
# Specify protocol for optimization
56
meta_df = cx.get_meta(postgres_url, query, protocol="cursor")
57
```
58

59
## Metadata Information
60

61
The returned DataFrame contains schema information about the query result:
62

63
### Column Information
64

65
- **Column names**: Names of all columns in the result set
66
- **Data types**: Database-specific and Python-equivalent data types
67
- **Nullability**: Whether columns can contain NULL values
68
- **Precision/Scale**: For decimal and numeric types
69

70
### Query Validation
71

72
Metadata retrieval serves as query validation:
73

74
- **Syntax errors**: Invalid SQL will raise errors during metadata extraction
75
- **Missing tables/columns**: References to non-existent objects are caught
76
- **Type conflicts**: Join and comparison type mismatches are identified
77
- **Permission issues**: Access control violations are detected
78

79
## Implementation Details
80

81
### Query Execution
82

83
ConnectorX typically executes a `LIMIT 1` version of your query to extract metadata:
84

85
```sql
86
-- Your query
87
SELECT customer_id, order_date, total_amount FROM orders
88

89
-- Executed for metadata (conceptually)
90
SELECT customer_id, order_date, total_amount FROM orders LIMIT 1
91
```
92

93
### Protocol Support
94

95
Different protocols may affect metadata extraction:
96

97
- **binary**: Fastest metadata extraction for most databases
98
- **cursor**: May be required for some server configurations
99
- **text**: Alternative for databases with binary protocol issues
100

101
## Use Cases
102

103
### Pre-flight Query Validation
104

105
Validate queries before executing expensive operations:
106

107
```python
108
try:
109
    meta = cx.get_meta(conn, expensive_query)
110
    print(f"Query will return {len(meta.columns)} columns")
111
    
112
    # Proceed with full query
113
    df = cx.read_sql(conn, expensive_query, partition_num=10)
114
except Exception as e:
115
    print(f"Query validation failed: {e}")
116
```
117

118
### Schema Discovery
119

120
Explore database schemas and query results:
121

122
```python
123
# Discover table structure
124
table_meta = cx.get_meta(conn, "SELECT * FROM unknown_table")
125
print("Available columns:", table_meta.columns.tolist())
126
print("Data types:", table_meta.dtypes.to_dict())
127

128
# Analyze complex query results
129
join_meta = cx.get_meta(conn, complex_join_query)
130
for col in join_meta.columns:
131
    print(f"{col}: {join_meta[col].dtype}")
132
```
133

134
### ETL Pipeline Planning
135

136
Plan data transformations based on source metadata:
137

138
```python
139
source_meta = cx.get_meta(source_conn, extraction_query)
140

141
# Plan transformations based on source types
142
for col_name, col_type in source_meta.dtypes.items():
143
    if col_type == 'object':  # String columns
144
        print(f"Plan text processing for {col_name}")
145
    elif 'datetime' in str(col_type):  # Date columns
146
        print(f"Plan date transformation for {col_name}")
147
```
148

149
### Dynamic Query Building
150

151
Build queries dynamically based on available columns:
152

153
```python
154
# Get available columns
155
meta = cx.get_meta(conn, "SELECT * FROM dynamic_table")
156
available_cols = meta.columns.tolist()
157

158
# Build query based on available columns
159
required_cols = ['id', 'name', 'created_date']
160
existing_cols = [col for col in required_cols if col in available_cols]
161

162
if existing_cols:
163
    dynamic_query = f"SELECT {', '.join(existing_cols)} FROM dynamic_table"
164
    df = cx.read_sql(conn, dynamic_query)
165
```
166

167
## Error Handling
168

169
Common errors when retrieving metadata:
170

171
- **Connection errors**: Database unavailable or authentication failure
172
- **Permission errors**: Insufficient privileges to execute query
173
- **Syntax errors**: Invalid SQL syntax
174
- **Missing objects**: Referenced tables/columns don't exist
175
- **Protocol errors**: Unsupported protocol for specific database
176

177
```python
178
try:
179
    meta = cx.get_meta(conn, query)
180
except Exception as e:
181
    if "permission denied" in str(e).lower():
182
        print("Check database permissions")
183
    elif "does not exist" in str(e).lower():
184
        print("Referenced table or column not found")
185
    else:
186
        print(f"Metadata extraction failed: {e}")
187
```

Version

Tile

Files

metadata-retrieval.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

metadata-retrieval.mddocs/