Utils for streaming large files from S3, HDFS, GCS, SFTP, Azure Blob Storage, and local filesystem with transparent compression support
npx @tessl/cli install tessl/pypi-smart-open@7.3.0Smart Open is a Python library for efficient streaming of very large files from/to various storage systems including S3, Google Cloud Storage, Azure Blob Storage, HDFS, WebHDFS, HTTP/HTTPS, SFTP, and local filesystem. It provides transparent, on-the-fly compression/decompression for multiple formats and serves as a drop-in replacement for Python's built-in open() function with 100% compatibility.
pip install smart_openfrom smart_open import openFor URI parsing:
from smart_open import parse_uriFor compression handling:
from smart_open import register_compressorFor legacy context manager:
from smart_open import smart_openfrom smart_open import open
# Stream from S3
with open('s3://my-bucket/large-file.txt') as f:
for line in f:
print(line.strip())
# Stream with compression (automatic detection)
with open('s3://my-bucket/file.txt.gz') as f:
content = f.read()
# Write to cloud storage
with open('gs://my-bucket/output.txt', 'w') as f:
f.write('Hello, world!')
# Local files work too (drop-in replacement)
with open('./local-file.txt') as f:
data = f.read()
# Binary operations with seeking
with open('s3://my-bucket/data.bin', 'rb') as f:
f.seek(1000) # Seek to position 1000
chunk = f.read(100) # Read 100 bytesSmart Open uses a modular transport architecture:
open() function interface compatible with built-in open()Each transport module provides consistent parse_uri(), open_uri(), and open() functions, with Reader/Writer classes implementing standard Python I/O interfaces.
Universal file operations that work across all supported storage systems with transparent compression support.
def open(uri, mode='r', buffering=-1, encoding=None, errors=None, newline=None,
closefd=True, opener=None, compression='infer_from_extension',
transport_params=None): ...
def parse_uri(uri_as_string): ...Access to major cloud storage platforms with native client optimizations and streaming capabilities.
# S3 operations
from smart_open.s3 import open, iter_bucket, Reader, MultipartWriter
# GCS operations
from smart_open.gcs import open, Reader, Writer
# Azure operations
from smart_open.azure import open, Reader, WriterHTTP/HTTPS, FTP, and SSH-based file access with authentication and secure connection support.
# HTTP operations
from smart_open.http import open
# FTP operations
from smart_open.ftp import open
# SSH/SFTP operations
from smart_open.ssh import openIntegration with Hadoop ecosystem (HDFS, WebHDFS) for big data processing workflows.
# HDFS operations
from smart_open.hdfs import open
# WebHDFS operations
from smart_open.webhdfs import openAutomatic and explicit compression handling for multiple formats with streaming support.
def register_compressor(ext, callback): ...
def get_supported_compression_types(): ...
def get_supported_extensions(): ...Helper functions for URI handling, byte ranges, parallel processing, and custom transport development.
# URI utilities
from smart_open.utils import safe_urlsplit, make_range_string
# Concurrency utilities
from smart_open.concurrency import create_pool
# Transport registration
from smart_open.transport import register_transport, get_transportSmart Open supports a wide variety of URL formats:
s3://bucket/key, s3://key:secret@bucket/keygs://bucket/blobazure://container/blobhttp://example.com/file, https://example.com/fileftp://host/path, ftps://host/pathssh://user@host/path, sftp://user@host/pathhdfs:///path/file, hdfs://namenode:port/path/filewebhdfs://host:port/path/file./path/file, file:///absolute/path, ~/path/fileAll formats support transparent compression based on file extensions (.gz, .bz2, .zst, etc.).