CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-simple-ddl-parser

Simple DDL Parser to parse SQL & dialects like HQL, TSQL (MSSQL), Oracle, AWS Redshift, Snowflake, MySQL, PostgreSQL, etc ddl files to json/python dict with full information about columns: types, defaults, primary keys, etc.; sequences, alters, custom types & other entities from ddl.

Pending
Overview
Eval results
Files

input-dialects.mddocs/

Input Dialects

Specialized parsing logic for different SQL database dialects, handling syntax variations and database-specific features like data types, functions, and DDL extensions. Each dialect class extends the base SQL parser with platform-specific rules and token recognition.

Capabilities

Base SQL Dialect

Foundation dialect class that provides common SQL DDL parsing functionality shared across all database platforms.

class BaseSQL:
    """
    Base SQL dialect providing common DDL parsing functionality.
    Handles standard SQL DDL statements, data types, and constraints.
    """

MySQL Dialect

MySQL-specific parsing rules including AUTO_INCREMENT, MySQL data types, storage engines, and MySQL-specific syntax extensions.

class MySQL:
    """
    MySQL dialect parser handling MySQL-specific syntax including:
    - AUTO_INCREMENT columns
    - MySQL data types (TINYINT, MEDIUMINT, etc.)
    - Storage engine specifications (ENGINE=InnoDB)
    - MySQL-specific constraints and options
    """

PostgreSQL Dialect

PostgreSQL-specific parsing for SERIAL types, arrays, custom data types, and PostgreSQL DDL extensions.

class PSQL:
    """
    PostgreSQL dialect parser handling PostgreSQL-specific syntax including:
    - SERIAL and BIGSERIAL auto-increment types
    - Array data types (INTEGER[], VARCHAR[] etc.)
    - PostgreSQL-specific functions and operators
    - Custom domain types and enums
    """

Microsoft SQL Server Dialect

MSSQL/T-SQL parsing for SQL Server data types, IDENTITY columns, and SQL Server-specific DDL syntax.

class MSSQL:
    """
    Microsoft SQL Server dialect parser handling T-SQL syntax including:
    - IDENTITY columns for auto-increment
    - SQL Server data types (NVARCHAR, DATETIME2, etc.)
    - SQL Server-specific constraints and options
    - T-SQL specific DDL extensions
    """

Oracle Dialect

Oracle-specific parsing for Oracle data types, sequences, and Oracle DDL syntax variations.

class Oracle:
    """
    Oracle dialect parser handling Oracle-specific syntax including:
    - Oracle data types (NUMBER, VARCHAR2, CLOB, etc.)
    - Oracle sequences and triggers
    - Oracle-specific constraints and storage options
    - Oracle DDL syntax variations
    """

Hive Query Language Dialect

HQL parsing for Hadoop/Hive table definitions, partitioning, and big data specific DDL constructs.

class HQL:
    """
    Hive Query Language dialect parser handling HQL syntax including:
    - Hive table partitioning (PARTITIONED BY)
    - Hive storage formats (STORED AS, ROW FORMAT)
    - Hive-specific data types and functions
    - External table definitions
    """

Google BigQuery Dialect

BigQuery-specific parsing for BigQuery data types, clustering, partitioning, and Google Cloud SQL syntax.

class BigQuery:
    """
    Google BigQuery dialect parser handling BigQuery syntax including:
    - BigQuery data types (ARRAY, STRUCT, GEOGRAPHY, etc.)
    - Table clustering and partitioning
    - BigQuery-specific functions and operators
    - Dataset and project references
    """

AWS Redshift Dialect

Redshift-specific parsing for Redshift data types, distribution keys, sort keys, and AWS-specific DDL features.

class Redshift:
    """
    AWS Redshift dialect parser handling Redshift syntax including:
    - Redshift data types and encodings
    - Distribution keys (DISTKEY) and sort keys (SORTKEY)
    - Redshift-specific table properties
    - Redshift compression and storage options
    """

Snowflake Dialect

Snowflake-specific parsing for Snowflake data types, clustering keys, and Snowflake DDL syntax.

class Snowflake:
    """
    Snowflake dialect parser handling Snowflake syntax including:
    - Snowflake data types (VARIANT, OBJECT, ARRAY)
    - Clustering keys and micro-partitions
    - Snowflake-specific table properties
    - Time travel and data sharing syntax
    """

Apache Spark SQL Dialect

Spark SQL parsing for Spark-specific data types, partitioning, and distributed table definitions.

class SparkSQL:
    """
    Apache Spark SQL dialect parser handling Spark syntax including:
    - Spark SQL data types and functions
    - Table partitioning and bucketing
    - Spark-specific storage formats
    - Delta Lake and Iceberg table syntax
    """

IBM DB2 Dialect

DB2-specific parsing for DB2 data types, tablespaces, and IBM-specific DDL syntax features.

class IBMDb2:
    """
    IBM DB2 dialect parser handling DB2 syntax including:
    - DB2 data types and functions
    - Tablespace definitions
    - DB2-specific constraints and options
    - IBM-specific DDL extensions
    """

AWS Athena Dialect

Athena-specific parsing for Athena/Presto SQL syntax, external tables, and AWS Glue catalog integration.

class Athena:
    """
    AWS Athena dialect parser handling Athena syntax including:
    - Athena/Presto data types and functions
    - External table definitions with S3 locations
    - Partition projection and storage formats
    - AWS Glue catalog integration syntax
    """

Usage Examples

Automatic Dialect Detection

The parser automatically applies appropriate dialect rules based on DDL syntax:

from simple_ddl_parser import DDLParser

# MySQL-specific syntax is automatically recognized
mysql_ddl = """
CREATE TABLE users (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(255) NOT NULL
) ENGINE=InnoDB;
"""

parser = DDLParser(mysql_ddl)
result = parser.run()
# Parser automatically uses MySQL dialect rules

PostgreSQL Array Types

# PostgreSQL arrays are parsed correctly
postgres_ddl = """
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    tags VARCHAR(50)[],
    prices DECIMAL(10,2)[]
);
"""

parser = DDLParser(postgres_ddl)
result = parser.run()
# Array types are properly recognized and parsed

BigQuery Nested Types

# BigQuery STRUCT and ARRAY types
bigquery_ddl = """
CREATE TABLE analytics.events (
    event_id STRING,
    user_data STRUCT<
        name STRING,
        age INT64,
        preferences ARRAY<STRING>
    >
);
"""

parser = DDLParser(bigquery_ddl)
result = parser.run()
# Complex nested types are parsed with full structure

Dialect-Specific Features

Each dialect parser recognizes and handles:

  • Data Types: Platform-specific data types and their variations
  • Auto-increment: Different auto-increment syntax (AUTO_INCREMENT, SERIAL, IDENTITY)
  • Constraints: Platform-specific constraint syntax and options
  • Storage Options: Engine specifications, tablespaces, compression
  • Functions: Database-specific functions in defaults and computed columns
  • Extensions: Platform-specific DDL extensions and syntax variations

Install with Tessl CLI

npx tessl i tessl/pypi-simple-ddl-parser

docs

cli.md

core-parsing.md

exceptions.md

index.md

input-dialects.md

output-formatting.md

tile.json