or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

connection-management.mddata-loading.mdfederated-queries.mdindex.mdmetadata-retrieval.mdquery-partitioning.md

metadata-retrieval.mddocs/

0

# Metadata Retrieval

1

2

Retrieve schema information and metadata from SQL queries without loading the full dataset. This functionality is useful for understanding query structure, validating queries, and inspecting data types before performing full data loads.

3

4

## Capabilities

5

6

### Query Metadata Extraction

7

8

Get column names, data types, and other metadata information from a SQL query result set.

9

10

```python { .api }

11

def get_meta(

12

conn: str | ConnectionUrl,

13

query: str,

14

protocol: Protocol | None = None,

15

) -> pd.DataFrame:

16

"""

17

Get metadata (header) of the given query result set.

18

19

Parameters:

20

- conn: Database connection string or ConnectionUrl

21

- query: SQL query to analyze

22

- protocol: Backend-specific transfer protocol (optional)

23

24

Returns:

25

pandas DataFrame containing schema metadata

26

"""

27

```

28

29

**Usage Examples:**

30

31

```python

32

import connectorx as cx

33

34

postgres_url = "postgresql://username:password@server:port/database"

35

36

# Get metadata for a simple query

37

query = "SELECT customer_id, order_date, total_amount FROM orders"

38

meta_df = cx.get_meta(postgres_url, query)

39

print(meta_df)

40

41

# Get metadata for complex queries with joins

42

complex_query = """

43

SELECT

44

c.customer_name,

45

o.order_date,

46

oi.product_id,

47

oi.quantity * oi.unit_price as total_value

48

FROM customers c

49

JOIN orders o ON c.customer_id = o.customer_id

50

JOIN order_items oi ON o.order_id = oi.order_id

51

WHERE o.order_date >= '2023-01-01'

52

"""

53

meta_df = cx.get_meta(postgres_url, complex_query)

54

55

# Specify protocol for optimization

56

meta_df = cx.get_meta(postgres_url, query, protocol="cursor")

57

```

58

59

## Metadata Information

60

61

The returned DataFrame contains schema information about the query result:

62

63

### Column Information

64

65

- **Column names**: Names of all columns in the result set

66

- **Data types**: Database-specific and Python-equivalent data types

67

- **Nullability**: Whether columns can contain NULL values

68

- **Precision/Scale**: For decimal and numeric types

69

70

### Query Validation

71

72

Metadata retrieval serves as query validation:

73

74

- **Syntax errors**: Invalid SQL will raise errors during metadata extraction

75

- **Missing tables/columns**: References to non-existent objects are caught

76

- **Type conflicts**: Join and comparison type mismatches are identified

77

- **Permission issues**: Access control violations are detected

78

79

## Implementation Details

80

81

### Query Execution

82

83

ConnectorX typically executes a `LIMIT 1` version of your query to extract metadata:

84

85

```sql

86

-- Your query

87

SELECT customer_id, order_date, total_amount FROM orders

88

89

-- Executed for metadata (conceptually)

90

SELECT customer_id, order_date, total_amount FROM orders LIMIT 1

91

```

92

93

### Protocol Support

94

95

Different protocols may affect metadata extraction:

96

97

- **binary**: Fastest metadata extraction for most databases

98

- **cursor**: May be required for some server configurations

99

- **text**: Alternative for databases with binary protocol issues

100

101

## Use Cases

102

103

### Pre-flight Query Validation

104

105

Validate queries before executing expensive operations:

106

107

```python

108

try:

109

meta = cx.get_meta(conn, expensive_query)

110

print(f"Query will return {len(meta.columns)} columns")

111

112

# Proceed with full query

113

df = cx.read_sql(conn, expensive_query, partition_num=10)

114

except Exception as e:

115

print(f"Query validation failed: {e}")

116

```

117

118

### Schema Discovery

119

120

Explore database schemas and query results:

121

122

```python

123

# Discover table structure

124

table_meta = cx.get_meta(conn, "SELECT * FROM unknown_table")

125

print("Available columns:", table_meta.columns.tolist())

126

print("Data types:", table_meta.dtypes.to_dict())

127

128

# Analyze complex query results

129

join_meta = cx.get_meta(conn, complex_join_query)

130

for col in join_meta.columns:

131

print(f"{col}: {join_meta[col].dtype}")

132

```

133

134

### ETL Pipeline Planning

135

136

Plan data transformations based on source metadata:

137

138

```python

139

source_meta = cx.get_meta(source_conn, extraction_query)

140

141

# Plan transformations based on source types

142

for col_name, col_type in source_meta.dtypes.items():

143

if col_type == 'object': # String columns

144

print(f"Plan text processing for {col_name}")

145

elif 'datetime' in str(col_type): # Date columns

146

print(f"Plan date transformation for {col_name}")

147

```

148

149

### Dynamic Query Building

150

151

Build queries dynamically based on available columns:

152

153

```python

154

# Get available columns

155

meta = cx.get_meta(conn, "SELECT * FROM dynamic_table")

156

available_cols = meta.columns.tolist()

157

158

# Build query based on available columns

159

required_cols = ['id', 'name', 'created_date']

160

existing_cols = [col for col in required_cols if col in available_cols]

161

162

if existing_cols:

163

dynamic_query = f"SELECT {', '.join(existing_cols)} FROM dynamic_table"

164

df = cx.read_sql(conn, dynamic_query)

165

```

166

167

## Error Handling

168

169

Common errors when retrieving metadata:

170

171

- **Connection errors**: Database unavailable or authentication failure

172

- **Permission errors**: Insufficient privileges to execute query

173

- **Syntax errors**: Invalid SQL syntax

174

- **Missing objects**: Referenced tables/columns don't exist

175

- **Protocol errors**: Unsupported protocol for specific database

176

177

```python

178

try:

179

meta = cx.get_meta(conn, query)

180

except Exception as e:

181

if "permission denied" in str(e).lower():

182

print("Check database permissions")

183

elif "does not exist" in str(e).lower():

184

print("Referenced table or column not found")

185

else:

186

print(f"Metadata extraction failed: {e}")

187

```