Load data from databases to dataframes, the fastest way.
86
Quality
Pending
Does it follow best practices?
Impact
86%
1.04xAverage score across 10 eval scenarios
Retrieve schema information and metadata from SQL queries without loading the full dataset. This functionality is useful for understanding query structure, validating queries, and inspecting data types before performing full data loads.
Get column names, data types, and other metadata information from a SQL query result set.
def get_meta(
conn: str | ConnectionUrl,
query: str,
protocol: Protocol | None = None,
) -> pd.DataFrame:
"""
Get metadata (header) of the given query result set.
Parameters:
- conn: Database connection string or ConnectionUrl
- query: SQL query to analyze
- protocol: Backend-specific transfer protocol (optional)
Returns:
pandas DataFrame containing schema metadata
"""Usage Examples:
import connectorx as cx
postgres_url = "postgresql://username:password@server:port/database"
# Get metadata for a simple query
query = "SELECT customer_id, order_date, total_amount FROM orders"
meta_df = cx.get_meta(postgres_url, query)
print(meta_df)
# Get metadata for complex queries with joins
complex_query = """
SELECT
c.customer_name,
o.order_date,
oi.product_id,
oi.quantity * oi.unit_price as total_value
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.order_date >= '2023-01-01'
"""
meta_df = cx.get_meta(postgres_url, complex_query)
# Specify protocol for optimization
meta_df = cx.get_meta(postgres_url, query, protocol="cursor")The returned DataFrame contains schema information about the query result:
Metadata retrieval serves as query validation:
ConnectorX typically executes a LIMIT 1 version of your query to extract metadata:
-- Your query
SELECT customer_id, order_date, total_amount FROM orders
-- Executed for metadata (conceptually)
SELECT customer_id, order_date, total_amount FROM orders LIMIT 1Different protocols may affect metadata extraction:
Validate queries before executing expensive operations:
try:
meta = cx.get_meta(conn, expensive_query)
print(f"Query will return {len(meta.columns)} columns")
# Proceed with full query
df = cx.read_sql(conn, expensive_query, partition_num=10)
except Exception as e:
print(f"Query validation failed: {e}")Explore database schemas and query results:
# Discover table structure
table_meta = cx.get_meta(conn, "SELECT * FROM unknown_table")
print("Available columns:", table_meta.columns.tolist())
print("Data types:", table_meta.dtypes.to_dict())
# Analyze complex query results
join_meta = cx.get_meta(conn, complex_join_query)
for col in join_meta.columns:
print(f"{col}: {join_meta[col].dtype}")Plan data transformations based on source metadata:
source_meta = cx.get_meta(source_conn, extraction_query)
# Plan transformations based on source types
for col_name, col_type in source_meta.dtypes.items():
if col_type == 'object': # String columns
print(f"Plan text processing for {col_name}")
elif 'datetime' in str(col_type): # Date columns
print(f"Plan date transformation for {col_name}")Build queries dynamically based on available columns:
# Get available columns
meta = cx.get_meta(conn, "SELECT * FROM dynamic_table")
available_cols = meta.columns.tolist()
# Build query based on available columns
required_cols = ['id', 'name', 'created_date']
existing_cols = [col for col in required_cols if col in available_cols]
if existing_cols:
dynamic_query = f"SELECT {', '.join(existing_cols)} FROM dynamic_table"
df = cx.read_sql(conn, dynamic_query)Common errors when retrieving metadata:
try:
meta = cx.get_meta(conn, query)
except Exception as e:
if "permission denied" in str(e).lower():
print("Check database permissions")
elif "does not exist" in str(e).lower():
print("Referenced table or column not found")
else:
print(f"Metadata extraction failed: {e}")Install with Tessl CLI
npx tessl i tessl/pypi-connectorxdocs
evals
scenario-1
scenario-2
scenario-3
scenario-4
scenario-5
scenario-6
scenario-7
scenario-8
scenario-9
scenario-10