Python package for retrieving and managing data from the Internet Movie Database (IMDb) about movies, people, characters and companies
—
Quality
Pending
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Primary functions for creating IMDb access instances and retrieving basic system information. These form the foundation for all IMDb data operations across different access methods.
Creates IMDb access system instances with configurable data sources and parameters. The factory function automatically selects appropriate parsers based on the specified access system.
def IMDb(accessSystem=None, *arguments, **keywords):
"""
Create an instance of the appropriate IMDb access system.
Parameters:
- accessSystem: str, optional - Access method ('http', 'sql', 's3', 'auto', 'config')
- results: int - Default number of search results (default: 20)
- keywordsResults: int - Default number of keyword results (default: 100)
- reraiseExceptions: bool - Whether to re-raise exceptions (default: True)
- loggingLevel: int - Logging level
- loggingConfig: str - Path to logging configuration file
- imdbURL_base: str - Base IMDb URL (default: 'https://www.imdb.com/')
Returns:
IMDbBase subclass instance (IMDbHTTPAccessSystem, IMDbSqlAccessSystem, or IMDbS3AccessSystem)
"""Usage Example:
from imdb import IMDb
# Default HTTP access
ia = IMDb()
# Explicit HTTP access with custom settings
ia = IMDb('http', results=50, reraiseExceptions=False)
# SQL database access
ia = IMDb('sql', host='localhost', database='imdb')
# S3 dataset access
ia = IMDb('s3')
# Configuration file-based access
ia = IMDb('config')Alias for the IMDb function providing identical functionality with updated branding.
Cinemagoer = IMDbUsage Example:
from imdb import Cinemagoer
# Identical to IMDb() function
ia = Cinemagoer()Returns the list of currently available data access systems based on installed dependencies and system configuration.
def available_access_systems():
"""
Return the list of available data access systems.
Returns:
list: Available access system names (e.g., ['http', 'sql'])
"""Usage Example:
from imdb import available_access_systems
# Check what access systems are available
systems = available_access_systems()
print(f"Available systems: {systems}")
# Output: ['http'] or ['http', 'sql'] depending on installationAccess Methods: 'http', 'https', 'web', 'html'
Access Methods: 'sql', 'db', 'database'
Access Methods: 's3', 's3dataset', 'imdbws'
The IMDb function can automatically load configuration from files when accessSystem='config' or accessSystem='auto'.
Configuration File Locations (searched in order):
./cinemagoer.cfg or ./imdbpy.cfg (current directory)./.cinemagoer.cfg or ./.imdbpy.cfg (current directory, hidden)~/cinemagoer.cfg or ~/imdbpy.cfg (home directory)~/.cinemagoer.cfg or ~/.imdbpy.cfg (home directory, hidden)/etc/cinemagoer.cfg or /etc/imdbpy.cfg (Unix systems)/etc/conf.d/cinemagoer.cfg or /etc/conf.d/imdbpy.cfg (Unix systems)Configuration File Format:
[imdbpy]
accessSystem = http
results = 30
keywordsResults = 150
reraiseExceptions = true
imdbURL_base = https://www.imdb.com/class ConfigParserWithCase:
"""
Case-sensitive configuration parser for IMDb settings.
Methods:
- get(section, option, *args, **kwds): Get configuration value
- getDict(section): Get section as dictionary
- items(section, *args, **kwds): Get section items as list
"""All core access functions can raise IMDb-specific exceptions:
from imdb import IMDb, IMDbError, IMDbDataAccessError
try:
ia = IMDb('invalid_system')
except IMDbError as e:
print(f"IMDb error: {e}")
try:
ia = IMDb('sql') # If SQL system not available
except IMDbError as e:
print(f"SQL access not available: {e}")Optimize performance for different use cases and access patterns:
HTTP Access (Default):
# HTTP access - good for most use cases
ia = IMDb() # Default HTTP accessSQL Access:
# SQL access - optimal for large-scale applications
ia = IMDb('sql', host='localhost', user='imdb', password='password')S3 Access:
# S3 access - good for cloud-based bulk processing
ia = IMDb('s3')Selective Information Loading:
# Efficient - only load needed information
movie = ia.get_movie('0133093', info=['main', 'plot'])
# Inefficient - loads all available information
movie = ia.get_movie('0133093', info='all')Batch Updates:
# Efficient - batch processing
movies = ia.search_movie('Matrix')
for movie in movies[:5]: # Limit results
ia.update(movie, info=['main']) # Minimal info for listings
# Inefficient - individual detailed updates
for movie in movies:
ia.update(movie, info='all') # Excessive informationLarge Dataset Handling:
# Process results in batches to manage memory
def process_large_chart():
top_movies = ia.get_top250_movies()
# Process in smaller chunks
chunk_size = 50
for i in range(0, len(top_movies), chunk_size):
chunk = top_movies[i:i + chunk_size]
# Process chunk
for movie in chunk:
# Minimal processing to conserve memory
print(f"{movie['title']} ({movie['year']})")Results Caching:
from functools import lru_cache
# Cache expensive operations
@lru_cache(maxsize=100)
def cached_movie_search(title):
return ia.search_movie(title, results=5)
# Reuse cached results
movies1 = cached_movie_search('Matrix') # Network call
movies2 = cached_movie_search('Matrix') # Cached resultInstall with Tessl CLI
npx tessl i tessl/pypi-cinemagoer