Simple DDL Parser to parse SQL & dialects like HQL, TSQL (MSSQL), Oracle, AWS Redshift, Snowflake, MySQL, PostgreSQL, etc ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc.; sequences, alters, custom types & other entities from ddl.
—
Specialized parsing logic for different SQL database dialects, handling syntax variations and database-specific features like data types, functions, and DDL extensions. Each dialect class extends the base SQL parser with platform-specific rules and token recognition.
Foundation dialect class that provides common SQL DDL parsing functionality shared across all database platforms.
class BaseSQL:
"""
Base SQL dialect providing common DDL parsing functionality.
Handles standard SQL DDL statements, data types, and constraints.
"""MySQL-specific parsing rules including AUTO_INCREMENT, MySQL data types, storage engines, and MySQL-specific syntax extensions.
class MySQL:
"""
MySQL dialect parser handling MySQL-specific syntax including:
- AUTO_INCREMENT columns
- MySQL data types (TINYINT, MEDIUMINT, etc.)
- Storage engine specifications (ENGINE=InnoDB)
- MySQL-specific constraints and options
"""PostgreSQL-specific parsing for SERIAL types, arrays, custom data types, and PostgreSQL DDL extensions.
class PSQL:
"""
PostgreSQL dialect parser handling PostgreSQL-specific syntax including:
- SERIAL and BIGSERIAL auto-increment types
- Array data types (INTEGER[], VARCHAR[] etc.)
- PostgreSQL-specific functions and operators
- Custom domain types and enums
"""MSSQL/T-SQL parsing for SQL Server data types, IDENTITY columns, and SQL Server-specific DDL syntax.
class MSSQL:
"""
Microsoft SQL Server dialect parser handling T-SQL syntax including:
- IDENTITY columns for auto-increment
- SQL Server data types (NVARCHAR, DATETIME2, etc.)
- SQL Server-specific constraints and options
- T-SQL specific DDL extensions
"""Oracle-specific parsing for Oracle data types, sequences, and Oracle DDL syntax variations.
class Oracle:
"""
Oracle dialect parser handling Oracle-specific syntax including:
- Oracle data types (NUMBER, VARCHAR2, CLOB, etc.)
- Oracle sequences and triggers
- Oracle-specific constraints and storage options
- Oracle DDL syntax variations
"""HQL parsing for Hadoop/Hive table definitions, partitioning, and big data specific DDL constructs.
class HQL:
"""
Hive Query Language dialect parser handling HQL syntax including:
- Hive table partitioning (PARTITIONED BY)
- Hive storage formats (STORED AS, ROW FORMAT)
- Hive-specific data types and functions
- External table definitions
"""BigQuery-specific parsing for BigQuery data types, clustering, partitioning, and Google Cloud SQL syntax.
class BigQuery:
"""
Google BigQuery dialect parser handling BigQuery syntax including:
- BigQuery data types (ARRAY, STRUCT, GEOGRAPHY, etc.)
- Table clustering and partitioning
- BigQuery-specific functions and operators
- Dataset and project references
"""Redshift-specific parsing for Redshift data types, distribution keys, sort keys, and AWS-specific DDL features.
class Redshift:
"""
AWS Redshift dialect parser handling Redshift syntax including:
- Redshift data types and encodings
- Distribution keys (DISTKEY) and sort keys (SORTKEY)
- Redshift-specific table properties
- Redshift compression and storage options
"""Snowflake-specific parsing for Snowflake data types, clustering keys, and Snowflake DDL syntax.
class Snowflake:
"""
Snowflake dialect parser handling Snowflake syntax including:
- Snowflake data types (VARIANT, OBJECT, ARRAY)
- Clustering keys and micro-partitions
- Snowflake-specific table properties
- Time travel and data sharing syntax
"""Spark SQL parsing for Spark-specific data types, partitioning, and distributed table definitions.
class SparkSQL:
"""
Apache Spark SQL dialect parser handling Spark syntax including:
- Spark SQL data types and functions
- Table partitioning and bucketing
- Spark-specific storage formats
- Delta Lake and Iceberg table syntax
"""DB2-specific parsing for DB2 data types, tablespaces, and IBM-specific DDL syntax features.
class IBMDb2:
"""
IBM DB2 dialect parser handling DB2 syntax including:
- DB2 data types and functions
- Tablespace definitions
- DB2-specific constraints and options
- IBM-specific DDL extensions
"""Athena-specific parsing for Athena/Presto SQL syntax, external tables, and AWS Glue catalog integration.
class Athena:
"""
AWS Athena dialect parser handling Athena syntax including:
- Athena/Presto data types and functions
- External table definitions with S3 locations
- Partition projection and storage formats
- AWS Glue catalog integration syntax
"""The parser automatically applies appropriate dialect rules based on DDL syntax:
from simple_ddl_parser import DDLParser
# MySQL-specific syntax is automatically recognized
mysql_ddl = """
CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL
) ENGINE=InnoDB;
"""
parser = DDLParser(mysql_ddl)
result = parser.run()
# Parser automatically uses MySQL dialect rules# PostgreSQL arrays are parsed correctly
postgres_ddl = """
CREATE TABLE products (
id SERIAL PRIMARY KEY,
tags VARCHAR(50)[],
prices DECIMAL(10,2)[]
);
"""
parser = DDLParser(postgres_ddl)
result = parser.run()
# Array types are properly recognized and parsed# BigQuery STRUCT and ARRAY types
bigquery_ddl = """
CREATE TABLE analytics.events (
event_id STRING,
user_data STRUCT<
name STRING,
age INT64,
preferences ARRAY<STRING>
>
);
"""
parser = DDLParser(bigquery_ddl)
result = parser.run()
# Complex nested types are parsed with full structureEach dialect parser recognizes and handles:
Install with Tessl CLI
npx tessl i tessl/pypi-simple-ddl-parser