Writes, optimizes, and debugs T-SQL queries. Explains SQL Server internals, troubleshoots performance issues, and guides database administration tasks including backup/restore, high availability, security, and index design. Use when the user asks about T-SQL syntax, SQL Server administration, query performance, stored procedures, indexes, locking, transactions, backup/restore, high availability, security, or any MSSQL-related topic — even without saying 'SQL Server' explicitly. Also trigger on terms like SSMS, tempdb, bcp, sqlcmd, MSSQL, sp_executesql, NOLOCK, columnstore, Hekaton, RCSI, param sniffing, or execution plan.
100
100%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Use PolyBase when you need to:
Prefer linked servers when: you need real-time transactional federation to another SQL Server instance — PolyBase is optimized for analytical/bulk patterns.
Prefer BULK INSERT when: you need simple, well-understood CSV/format-file loading from a network share or local path, and don't need data virtualization.
┌─────────────────────────────────────────┐
│ SQL Server Engine │
│ ┌──────────────┐ ┌─────────────────┐ │
│ │ Query Plan │ │ PolyBase DMS │ │ ← Data Movement Service
│ │ (pushdown?) │ │ (data xfer) │ │
│ └──────┬───────┘ └────────┬────────┘ │
└─────────┼───────────────────┼───────────┘
│ SQL │ data
▼ ▼
┌───────────────┐ ┌───────────────┐
│ PolyBase │ │ External │
│ Engine │ │ Storage │
│ (pushdown │ │ (S3, Azure │
│ queries) │ │ Blob, HDFS, │
└───────────────┘ │ Oracle, etc) │
└───────────────┘Key components:
| Component | Role |
|---|---|
| SQL Engine | Parses query, builds plan, decides pushdown |
| PolyBase Engine | Translates SQL to external system queries |
| Data Movement Service (DMS) | Transfers data between external source and SQL Server |
| External Data Source | The remote storage or RDBMS |
| External File Format | How to parse files (delimiter, encoding, row terminator) |
| External Table | Schema definition mapped to external data |
[!NOTE] SQL Server 2022 PolyBase in 2022 adds S3-compatible object storage support, improved parquet handling, and
OPENROWSETenhancements without requiring full PolyBase installation in some scenarios.
| Feature | 2016 | 2019 | 2022 |
|---|---|---|---|
| Hadoop / HDFS | ✓ | ✓ | ✓ |
| Azure Blob Storage | ✓ | ✓ | ✓ |
| SQL Server (ODBC) | — | ✓ | ✓ |
| Oracle | — | ✓ | ✓ |
| Teradata | — | ✓ | ✓ |
| MongoDB | — | ✓ | ✓ |
| S3-compatible storage | — | — | ✓ |
| Scale-out groups | ✓ (on-prem) | ✓ (on-prem) | ✓ (on-prem) |
| Edition requirement | Enterprise | All editions (2019+) | All editions |
[!WARNING] Deprecated PolyBase scale-out groups (head + compute nodes for Hadoop federation) are deprecated as of SQL Server 2022. Use single-node PolyBase with direct S3/Azure Blob connectivity instead. Scale-out groups were removed from SQL Server 2025.
PolyBase is a separate feature component installed via SQL Server Setup:
-- Check if PolyBase is installed
SELECT SERVERPROPERTY('IsPolyBaseInstalled'); -- 1 = yes
-- Enable PolyBase
EXEC sp_configure 'polybase enabled', 1;
RECONFIGURE;
-- Verify
SELECT name, value_in_use
FROM sys.configurations
WHERE name = 'polybase enabled';PolyBase runs two services: MSSQLPolyBase and MSSQLLaunchpad. Both need network access to external sources. On Windows, these run as the SQL Server service account by default.
For HDFS connectivity, install a compatible JRE and set the JAVA_HOME environment variable before starting PolyBase services. SQL Server 2019+ bundles the Zulu JRE for convenience.
CREATE EXTERNAL DATA SOURCE <name>
WITH (
TYPE = { HADOOP | BLOB_STORAGE | RDBMS | GENERIC },
LOCATION = '<protocol>://<host>:<port>[/<path>]'
[, CREDENTIAL = <database_scoped_credential>]
[, CONNECTION_OPTIONS = '<key>=<value>[; ...]'] -- for RDBMS types
[, PUSHDOWN = { ON | OFF }]
[, SHARD_MAP_MANAGER_DATABASE_NAME = '<db>'] -- elastic query
[, DATABASE_NAME = '<db>'] -- RDBMS
);-- Create credential for S3
CREATE DATABASE SCOPED CREDENTIAL S3Credential
WITH IDENTITY = 'S3 Access Key',
SECRET = 'accessKeyId:secretAccessKey';
-- Note: format is literally 'accessKeyId:secretAccessKey' concatenated with colon
CREATE EXTERNAL DATA SOURCE MyS3
WITH (
TYPE = BLOB_STORAGE,
LOCATION = 's3://<bucket-endpoint>', -- e.g., s3://my-minio-host:9000
CREDENTIAL = S3Credential
);-- For Azure Blob (WASBS / HTTPS)
CREATE DATABASE SCOPED CREDENTIAL AzureBlobCred
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = '<SAS token without leading ?>';
CREATE EXTERNAL DATA SOURCE MyAzureBlob
WITH (
TYPE = BLOB_STORAGE,
LOCATION = 'wasbs://<container>@<account>.blob.core.windows.net',
CREDENTIAL = AzureBlobCred
);
-- Or using HTTPS URL style
CREATE EXTERNAL DATA SOURCE MyAzureBlobHTTPS
WITH (
TYPE = BLOB_STORAGE,
LOCATION = 'https://<account>.blob.core.windows.net/<container>',
CREDENTIAL = AzureBlobCred
);-- SQL Server to SQL Server (replaces linked server for analytics)
CREATE DATABASE SCOPED CREDENTIAL RemoteSQLCred
WITH IDENTITY = 'remoteuser',
SECRET = 'password';
CREATE EXTERNAL DATA SOURCE RemoteSQLServer
WITH (
TYPE = RDBMS,
LOCATION = 'sqlserver://remote-server-name',
DATABASE_NAME = 'RemoteDB',
CREDENTIAL = RemoteSQLCred
);
-- Oracle
CREATE EXTERNAL DATA SOURCE OracleSource
WITH (
TYPE = RDBMS,
LOCATION = 'oracle://oracle-host:1521',
DATABASE_NAME = 'ORCL',
CREDENTIAL = OracleCred
);-- Delimited text (CSV)
CREATE EXTERNAL FILE FORMAT CsvFormat
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (
FIELD_TERMINATOR = ',',
STRING_DELIMITER = '"',
FIRST_ROW = 2, -- skip header row
USE_TYPE_DEFAULT = TRUE, -- use column default for nulls
ENCODING = 'UTF8' -- UTF8 or UTF16
),
DATA_COMPRESSION = 'org.apache.hadoop.io.compress.GzipCodec'
-- Supported compression: GzipCodec, DefaultCodec (deflate)
);
-- Parquet
CREATE EXTERNAL FILE FORMAT ParquetFormat
WITH (
FORMAT_TYPE = PARQUET,
DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'
-- or GzipCodec; Snappy is default for Parquet in most tooling
);
-- ORC
CREATE EXTERNAL FILE FORMAT OrcFormat
WITH (
FORMAT_TYPE = ORC,
DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'
);
-- Delta format (SQL Server 2022 preview, check current GA status)
-- Note: Delta Lake support via Parquet reading of Delta tables is available
-- through OPENROWSET in Azure Synapse; on-prem support variesFormat type capabilities:
| Format | Predicate pushdown | Column pruning | Compression | Schema enforcement |
|---|---|---|---|---|
| DELIMITEDTEXT | Limited | No (reads all cols) | gzip/deflate | No |
| PARQUET | Good | Yes | Snappy/gzip/none | Schema in file |
| ORC | Good | Yes | Snappy/zlib/none | Schema in file |
-- External table over CSV files
CREATE EXTERNAL TABLE dbo.SalesExternal (
SaleDate DATE NOT NULL,
ProductId INT NOT NULL,
Quantity INT NOT NULL,
Amount DECIMAL(10, 2) NOT NULL,
Region NVARCHAR(50) NULL
)
WITH (
DATA_SOURCE = MyS3,
LOCATION = '/data/sales/', -- directory; reads all files within
FILE_FORMAT = CsvFormat,
REJECT_TYPE = PERCENTAGE, -- PERCENTAGE or VALUE
REJECT_VALUE = 5, -- reject if >5% rows are bad
REJECT_SAMPLE_VALUE = 1000 -- sample this many rows to compute %
);
-- External table over Parquet files
CREATE EXTERNAL TABLE dbo.EventsExternal (
EventId BIGINT,
EventType VARCHAR(100),
EventTime DATETIME2(3),
Payload NVARCHAR(MAX)
)
WITH (
DATA_SOURCE = MyS3,
LOCATION = '/data/events/year=2024/',
FILE_FORMAT = ParquetFormat
);
-- External table to RDBMS (Oracle/SQL Server)
CREATE EXTERNAL TABLE dbo.RemoteOrders (
OrderId INT,
CustomerId INT,
OrderDate DATE,
Total DECIMAL(12, 2)
)
WITH (
DATA_SOURCE = OracleSource,
LOCATION = 'SCHEMA.ORDERS' -- remote schema.table or just table
);| Source | LOCATION examples |
|---|---|
| S3/Azure Blob directory | /prefix/path/ — trailing slash means directory |
| S3/Azure Blob file | /prefix/path/file.parquet — specific file |
| Parquet partition directory | /data/events/year=2024/ — Hive-style partitions |
| RDBMS | schema.tablename or just tablename |
| HDFS | /hdfs/path/ |
-- Check rejection details after a query
SELECT * FROM sys.dm_exec_external_work;
-- Rows are written to a rejection table in the external location:
-- <location>/_rejections/<query-guid>/<timestamp>.txtOPENROWSET with PolyBase lets you query external files without creating an external table. Best for one-off exploratory queries.
-- Read a single CSV file from a network share (traditional BULK provider)
SELECT *
FROM OPENROWSET(
BULK 'C:\data\sales.csv',
FORMATFILE = 'C:\data\sales.fmt',
FIRSTROW = 2
) AS t;[!NOTE] SQL Server 2022 SQL Server 2022 enables
OPENROWSETto read directly from S3-compatible storage and Azure Blob without creating formal external table objects. TheCREDENTIALargument accepts a database-scoped credential.
-- Ad-hoc query against S3 parquet file (2022+)
SELECT TOP 100 *
FROM OPENROWSET(
BULK 's3://<endpoint>/bucket/path/file.parquet',
FORMAT = 'PARQUET',
CREDENTIAL = (
IDENTITY = 'S3 Access Key',
SECRET = 'accessKeyId:secretAccessKey'
)
) AS t;
-- Read all parquet files in a directory
SELECT *
FROM OPENROWSET(
BULK 's3://<endpoint>/bucket/path/*.parquet',
FORMAT = 'PARQUET',
CREDENTIAL = (IDENTITY = 'S3 Access Key', SECRET = '...')
) AS t;
-- CSV with explicit schema (WITH clause)
SELECT *
FROM OPENROWSET(
BULK 's3://<endpoint>/bucket/data.csv',
FORMAT = 'CSV',
FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '0x0a',
CREDENTIAL = (IDENTITY = 'S3 Access Key', SECRET = '...')
) WITH (
Col1 INT,
Col2 VARCHAR(100),
Col3 DECIMAL(10,2)
) AS t;[!NOTE] SQL Server 2022 Native S3 connector added in SQL Server 2022. Works with AWS S3, MinIO, Pure Storage FlashBlade, NetApp StorageGRID, Cloudflare R2, and other S3-API-compatible systems.
-- The SECRET must be 'accessKeyId:secretAccessKey' — note the colon separator
CREATE DATABASE SCOPED CREDENTIAL MyS3Cred
WITH IDENTITY = 'S3 Access Key',
SECRET = 'AKIAIOSFODNN7EXAMPLE:wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY';
-- For HTTPS endpoint (when S3 uses custom domain with TLS)
CREATE EXTERNAL DATA SOURCE MyMinIO
WITH (
TYPE = BLOB_STORAGE,
LOCATION = 's3://<minio-host>:9000', -- HTTP; use https:// for TLS
CREDENTIAL = MyS3Cred
);-- PolyBase enables writing to S3 via INSERT INTO external table
INSERT INTO dbo.ArchiveExternal -- external table pointing to S3
SELECT *
FROM dbo.Orders
WHERE OrderDate < '2023-01-01';
-- Creates files in the S3 location; file naming is engine-controlled-- SQL Server 2022 backup to S3 uses a DIFFERENT credential format
-- (see 44-backup-restore.md for the backup-specific S3 syntax)
-- PolyBase S3 and backup S3 use different credential structures-- Using Managed Identity (Azure SQL MI / Azure VM with managed identity)
CREATE DATABASE SCOPED CREDENTIAL MIBlobCred
WITH IDENTITY = 'Managed Identity';
CREATE EXTERNAL DATA SOURCE AzureBlobMI
WITH (
TYPE = BLOB_STORAGE,
LOCATION = 'https://<account>.blob.core.windows.net/<container>',
CREDENTIAL = MIBlobCred
);
-- Using SAS token
CREATE DATABASE SCOPED CREDENTIAL SASCred
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sv=2021-06-08&ss=b&srt=co&sp=rwdlacuptfx&...';
-- Note: no leading '?' character in the SAS tokenCREATE EXTERNAL DATA SOURCE HadoopCluster
WITH (
TYPE = HADOOP,
LOCATION = 'hdfs://<namenode>:8020'
-- No credential needed for Kerberos-secured clusters in some configs;
-- for unsecured HDFS, no credential required
);
-- For Kerberos-secured Hadoop: configure krb5.conf and keytab separately
-- via PolyBase configuration files (not via T-SQL credentials)[!WARNING] Deprecated Hadoop connector scale-out groups are deprecated in SQL Server 2022. Single-node PolyBase with direct Hadoop connectivity still works for 2022, but plan to migrate to S3-based data lakes for new architectures.
[!NOTE] SQL Server 2019 Generic RDBMS connector (TYPE = RDBMS and TYPE = GENERIC) introduced in SQL Server 2019 for Oracle, Teradata, MongoDB, and any ODBC-compatible source.
-- Requires Oracle OLE DB or ODBC driver on SQL Server host
CREATE DATABASE SCOPED CREDENTIAL OracleCred
WITH IDENTITY = 'oracleuser',
SECRET = 'oraclepassword';
CREATE EXTERNAL DATA SOURCE OracleDB
WITH (
TYPE = RDBMS,
LOCATION = 'oracle://oracle-host:1521',
DATABASE_NAME = 'ORCLPDB1', -- PDB name or SID
CREDENTIAL = OracleCred
);
CREATE EXTERNAL TABLE dbo.OracleCustomers (
CustomerId INT,
CustomerName NVARCHAR(200),
Country VARCHAR(50)
)
WITH (
DATA_SOURCE = OracleDB,
LOCATION = 'ORASCHEMA.CUSTOMERS'
);
SELECT * FROM dbo.OracleCustomers WHERE Country = 'US';CREATE EXTERNAL DATA SOURCE MongoDB
WITH (
TYPE = GENERIC,
LOCATION = 'mongodb://mongo-host:27017',
CONNECTION_OPTIONS = 'Database=mydb',
CREDENTIAL = MongoCred
);
-- MongoDB external tables are read-only; schema must be defined
-- JSON fields are flattened; nested documents need explicit mapping
CREATE EXTERNAL TABLE dbo.MongoEvents (
_id NVARCHAR(50),
eventType VARCHAR(100),
createdAt DATETIME2
)
WITH (
DATA_SOURCE = MongoDB,
LOCATION = 'events' -- collection name
);-- TYPE = GENERIC with ODBC driver
CREATE DATABASE SCOPED CREDENTIAL ODBCCred
WITH IDENTITY = 'dbuser', SECRET = 'dbpassword';
CREATE EXTERNAL DATA SOURCE MySQLSource
WITH (
TYPE = GENERIC,
LOCATION = 'odbc://mysql-host:3306',
CONNECTION_OPTIONS = 'Driver={MySQL ODBC 8.0 Unicode Driver};
Database=mydb',
CREDENTIAL = ODBCCred,
PUSHDOWN = ON
);Predicate pushdown means SQL Server sends filter conditions to the external source rather than pulling all data and filtering locally. This is the primary performance mechanism for external tables.
When the optimizer builds a plan for an external table query, it examines predicates and decides which can be "pushed down":
The plan shows a Remote Query or External Select operator containing the pushed predicate.
-- Enable actual execution plan, then:
SELECT SaleDate, SUM(Amount) AS Total
FROM dbo.SalesExternal
WHERE Region = 'North'
AND SaleDate >= '2024-01-01'
GROUP BY SaleDate;
-- In the plan, look for External Select or Remote Query operator
-- The operator text shows what SQL/predicate was sent to the remote source| Scenario | Why pushdown fails | Fix |
|---|---|---|
| Non-sargable predicate | WHERE YEAR(SaleDate) = 2024 | Rewrite to range: SaleDate >= '2024-01-01' |
| CAST on column | WHERE CAST(Amount AS INT) > 100 | Avoid function on column |
| Collation mismatch | String compare requires collation conversion | Match collations or use explicit COLLATE |
| Connector limitation | MongoDB doesn't push all predicates | Filter locally after pull |
| PUSHDOWN = OFF | Explicitly disabled on data source | Enable if performance permits |
-- Disable pushdown at data source level (rarely useful)
ALTER EXTERNAL DATA SOURCE MyS3 SET PUSHDOWN = OFF;
-- Enable per-query (override data source default)
SELECT * FROM dbo.SalesExternal
WITH (PUSHDOWN = ON)
WHERE Region = 'North';
-- Note: table hint syntax for PUSHDOWN is not universally supported;
-- rely on data source setting and optimizer decisionThe optimizer treats external tables as having no statistics by default, leading to poor row estimates and potentially bad join strategies.
-- Create statistics on an external table column
CREATE STATISTICS stat_region
ON dbo.SalesExternal (Region);
-- Note: SQL Server samples the external data to build the histogram
-- This reads actual data from S3/blob — can be slow for large files
CREATE STATISTICS stat_saledate
ON dbo.SalesExternal (SaleDate)
WITH FULLSCAN; -- reads all data; use SAMPLE N ROWS for large sources
-- Multi-column statistics
CREATE STATISTICS stat_region_date
ON dbo.SalesExternal (Region, SaleDate);DBCC SHOW_STATISTICS ('dbo.SalesExternal', 'stat_region');UPDATE STATISTICS dbo.SalesExternal;
-- Auto-update does NOT apply to external tables
-- You must manually update statistics after data changes in the external sourceBest practice: Create statistics on all columns used in WHERE clauses and JOIN predicates for external tables. The cost of sampling once is worth the improved plan quality for repeated queries.
| Approach | When to use | Trade-off |
|---|---|---|
| External table (virtualization) | Data lives naturally in cloud storage; you want to avoid duplication; data is too large to import | Slower per-query (network I/O each time); no local indexes |
| CTAS / SELECT INTO | You need fast repeated queries; data doesn't change often | Data copied to SQL Server; requires storage; staleness risk |
| Staged import | Regular batch loads (daily/hourly ETL) | Classic ETL pattern; combine with BULK INSERT or bcp |
-- Pull data from S3 into a local table
SELECT *
INTO dbo.SalesLocal
FROM dbo.SalesExternal
WHERE SaleDate >= '2024-01-01';
-- Or via INSERT with TABLOCK for minimal logging
INSERT INTO dbo.SalesLocal WITH (TABLOCK)
SELECT *
FROM dbo.SalesExternal
WHERE SaleDate >= '2024-01-01';[!WARNING] Deprecated Scale-out groups are deprecated in SQL Server 2022 and removed in SQL Server 2025. The following is for maintaining existing 2019 deployments.
Scale-out groups add compute nodes to parallelize data movement:
-- On the head node:
EXEC sp_polybase_join_group
@head_node_machine_name = 'HEAD-NODE',
@dms_control_channel_membershipport = 16450,
@worker_data_movement_services_port = 16451;For new deployments, use single-node PolyBase with direct object storage connectivity instead.
-- Currently running external queries
SELECT
r.session_id,
r.status,
r.start_time,
r.command,
r.total_elapsed_time / 1000.0 AS elapsed_sec,
t.text AS query_text
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
WHERE r.command LIKE '%External%'
OR r.wait_type LIKE '%EXTERNAL%';
-- External work (data movement stats)
SELECT *
FROM sys.dm_exec_external_work;SELECT
execution_id,
step_index,
operation_type,
distribution_type,
status,
rows_processed,
bytes_processed,
start_time,
end_time,
error_id
FROM sys.dm_exec_dms_workers;-- PolyBase errors appear in SQL Server error log
EXEC sp_readerrorlog 0, 1, 'PolyBase';
-- DMS service log (Windows):
-- %ProgramFiles%\Microsoft SQL Server\<instance>\MSSQL\Log\PolyBase\| Error | Likely cause | Fix |
|---|---|---|
Msg 46530: External tables not supported | PolyBase not enabled | sp_configure 'polybase enabled', 1 |
Msg 16543: Connection refused | External source unreachable | Check firewall, DNS, credential |
Msg 7320: Cannot execute the query | Predicate cannot be pushed | Check query for non-sargable expressions |
HdfsBridge::recordReaderFillBuffer - Unexpected error | Schema mismatch | Verify column types match file schema |
Access Denied (S3 403) | Credential wrong or IAM policy | Recheck access key ID and secret |
| Reject threshold exceeded | Bad data in files | Increase REJECT_VALUE or fix source files |
| Feature | Azure SQL Database | Azure SQL MI | SQL Server 2022 | Azure Synapse |
|---|---|---|---|---|
| External tables (BLOB_STORAGE) | Yes (via elastic query) | Yes | Yes | Yes (built-in) |
| External tables (RDBMS/Oracle) | Elastic query only | Yes | Yes | No |
| S3 connector | No | No | Yes | No |
| OPENROWSET (Parquet/CSV) | Limited | Limited | Yes (2022) | Yes (serverless SQL) |
| Scale-out | No | No | Deprecated | Built-in MPP |
| Predicate pushdown to Blob | Partial | Partial | Yes | Yes |
Azure SQL Database elastic query: A legacy feature for cross-database queries in Azure SQL that uses external tables and data sources. Functional but not recommended for new designs — use Azure Synapse Analytics or Azure Data Factory instead.
Azure Synapse Analytics: Has native, highly optimized support for Parquet/CSV on Azure Data Lake Storage Gen2 via serverless SQL pools. If your primary workload is analytics over cloud storage, Synapse is better-suited than SQL Server PolyBase.
-- List all external data sources
SELECT
name,
type_desc,
location,
credential_id,
pushdown
FROM sys.external_data_sources;
-- List all external file formats
SELECT
name,
format_type,
field_terminator,
string_delimiter,
first_row,
data_compression
FROM sys.external_file_formats;
-- List all external tables with their data source
SELECT
t.name AS table_name,
s.name AS schema_name,
eds.name AS data_source_name,
t.location,
t.reject_type,
t.reject_value
FROM sys.external_tables t
JOIN sys.schemas s ON t.schema_id = s.schema_id
JOIN sys.external_data_sources eds ON t.data_source_id = eds.data_source_id;
-- Statistics on external tables
SELECT
t.name AS table_name,
s.name AS stat_name,
s.auto_created,
s.user_created,
sp.last_updated,
sp.rows,
sp.rows_sampled
FROM sys.stats s
JOIN sys.external_tables t ON s.object_id = t.object_id
CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) sp;
-- Database-scoped credentials
SELECT
name,
credential_identity,
create_date,
modify_date
FROM sys.database_scoped_credentials;-- Assumes external table dbo.RawSalesExternal exists over S3 path
-- Run as a daily SQL Agent job
DECLARE @cutoff DATE = DATEADD(DAY, -1, CAST(GETDATE() AS DATE));
TRUNCATE TABLE dbo.SalesStaging;
INSERT INTO dbo.SalesStaging WITH (TABLOCK)
SELECT
SaleDate,
ProductId,
Quantity,
Amount,
Region
FROM dbo.RawSalesExternal
WHERE SaleDate = @cutoff;
-- Then merge into production table
MERGE dbo.SalesFact AS tgt
USING dbo.SalesStaging AS src
ON tgt.SaleDate = src.SaleDate AND tgt.ProductId = src.ProductId
WHEN MATCHED THEN
UPDATE SET tgt.Quantity = src.Quantity, tgt.Amount = src.Amount
WHEN NOT MATCHED BY TARGET THEN
INSERT (SaleDate, ProductId, Quantity, Amount, Region)
VALUES (src.SaleDate, src.ProductId, src.Quantity, src.Amount, src.Region);-- One-off query — no external table setup needed (SQL Server 2022+)
SELECT
Region,
SUM(Amount) AS TotalSales,
COUNT(*) AS NumOrders
FROM OPENROWSET(
BULK 's3://my-endpoint/sales-data/year=2024/*.parquet',
FORMAT = 'PARQUET',
CREDENTIAL = (IDENTITY = 'S3 Access Key', SECRET = 'key:secret')
) WITH (
Region VARCHAR(50),
Amount DECIMAL(10,2)
) AS t
GROUP BY Region
ORDER BY TotalSales DESC;-- Create an external table pointing to a write location
CREATE EXTERNAL TABLE dbo.ArchiveExport2023 (
OrderId INT,
OrderDate DATE,
Total DECIMAL(12,2),
Region VARCHAR(50)
)
WITH (
DATA_SOURCE = MyS3,
LOCATION = '/archive/orders/2023/',
FILE_FORMAT = ParquetFormat
);
-- Export data
INSERT INTO dbo.ArchiveExport2023
SELECT OrderId, OrderDate, Total, Region
FROM dbo.Orders
WHERE YEAR(OrderDate) = 2023;
-- Files are created in S3 with engine-chosen names-- Query Oracle data from SQL Server with a local join
SELECT
lo.OrderId,
lo.OrderDate,
oc.CustomerName,
oc.Country
FROM dbo.LocalOrders lo
JOIN dbo.OracleCustomers oc -- external table over Oracle
ON lo.CustomerId = oc.CustomerId
WHERE lo.OrderDate >= '2024-01-01'
AND oc.Country = 'US';
-- Optimizer will push the Country = 'US' filter to Oracle
-- OrderDate filter applies locally after joinPolyBase must be enabled explicitly. sp_configure 'polybase enabled', 1; RECONFIGURE is required even after installation. Forgetting this produces Msg 46530.
Database-scoped credentials are per-database. You must create the credential in the same database as the external data source and external table. Moving the database requires recreating credentials (passwords are not backed up).
S3 credential format is unusual. The SECRET must be 'accessKeyId:secretAccessKey' — not just the secret key. The colon separator is required.
SAS tokens must not have a leading ?. When creating Azure Blob credentials, the SECRET must start with sv=..., not ?sv=....
Auto-statistics update does not apply to external tables. You must manually run UPDATE STATISTICS after the external data changes. Stale statistics lead to bad cardinality estimates and slow plans.
REJECT_TYPE = PERCENTAGE requires REJECT_SAMPLE_VALUE. Without a sample size, SQL Server cannot compute the percentage. Omitting REJECT_SAMPLE_VALUE causes an error.
Predicate pushdown is not guaranteed. Functions on columns, collation mismatches, and connector limitations prevent pushdown. Always check the execution plan's Remote Query operator to see what was pushed.
External tables are read-only by default for most connectors. Only BLOB_STORAGE data sources (S3, Azure Blob) support INSERT via external tables. RDBMS external tables are read-only in most configurations.
LOCATION must match the actual path exactly. S3 paths are case-sensitive. A mismatched path returns no rows (not an error) for some connectors.
PolyBase services must be running. MSSQLPolyBase and MSSQLLaunchpad services must be started. External table queries fail with timeout or connection errors if these services are stopped.
Parquet schema must match the external table definition. If the Parquet file has column name or type mismatches, you get runtime errors. Use OPENROWSET without a WITH clause first to discover the file schema.
Scale-out groups require all nodes to have PolyBase installed. For 2019 deployments only; don't use in 2022.
44-backup-restore.md — S3 backup syntax (different credential format from PolyBase)47-cli-bulk-operations.md — BULK INSERT, OPENROWSET BULK, bcp for file-based imports45-linked-servers.md — linked servers for OLTP cross-server federation16-security-encryption.md — database-scoped credentials and TDE32-performance-diagnostics.md — execution plan analysis for external queriesreferences