0
# Core Data Access
1
2
Primary functions for creating IMDb access instances and retrieving basic system information. These form the foundation for all IMDb data operations across different access methods.
3
4
## Capabilities
5
6
### IMDb Instance Creation
7
8
Creates IMDb access system instances with configurable data sources and parameters. The factory function automatically selects appropriate parsers based on the specified access system.
9
10
```python { .api }
11
def IMDb(accessSystem=None, *arguments, **keywords):
12
"""
13
Create an instance of the appropriate IMDb access system.
14
15
Parameters:
16
- accessSystem: str, optional - Access method ('http', 'sql', 's3', 'auto', 'config')
17
- results: int - Default number of search results (default: 20)
18
- keywordsResults: int - Default number of keyword results (default: 100)
19
- reraiseExceptions: bool - Whether to re-raise exceptions (default: True)
20
- loggingLevel: int - Logging level
21
- loggingConfig: str - Path to logging configuration file
22
- imdbURL_base: str - Base IMDb URL (default: 'https://www.imdb.com/')
23
24
Returns:
25
IMDbBase subclass instance (IMDbHTTPAccessSystem, IMDbSqlAccessSystem, or IMDbS3AccessSystem)
26
"""
27
```
28
29
**Usage Example:**
30
31
```python
32
from imdb import IMDb
33
34
# Default HTTP access
35
ia = IMDb()
36
37
# Explicit HTTP access with custom settings
38
ia = IMDb('http', results=50, reraiseExceptions=False)
39
40
# SQL database access
41
ia = IMDb('sql', host='localhost', database='imdb')
42
43
# S3 dataset access
44
ia = IMDb('s3')
45
46
# Configuration file-based access
47
ia = IMDb('config')
48
```
49
50
### Cinemagoer Alias
51
52
Alias for the IMDb function providing identical functionality with updated branding.
53
54
```python { .api }
55
Cinemagoer = IMDb
56
```
57
58
**Usage Example:**
59
60
```python
61
from imdb import Cinemagoer
62
63
# Identical to IMDb() function
64
ia = Cinemagoer()
65
```
66
67
### Available Access Systems
68
69
Returns the list of currently available data access systems based on installed dependencies and system configuration.
70
71
```python { .api }
72
def available_access_systems():
73
"""
74
Return the list of available data access systems.
75
76
Returns:
77
list: Available access system names (e.g., ['http', 'sql'])
78
"""
79
```
80
81
**Usage Example:**
82
83
```python
84
from imdb import available_access_systems
85
86
# Check what access systems are available
87
systems = available_access_systems()
88
print(f"Available systems: {systems}")
89
# Output: ['http'] or ['http', 'sql'] depending on installation
90
```
91
92
## Access System Types
93
94
### HTTP Access System
95
96
**Access Methods**: `'http'`, `'https'`, `'web'`, `'html'`
97
- Web scraping access to IMDb website
98
- Default access method
99
- No additional dependencies beyond base requirements
100
- Rate-limited by IMDb's website policies
101
102
### SQL Database Access System
103
104
**Access Methods**: `'sql'`, `'db'`, `'database'`
105
- Direct SQL database access to local IMDb data
106
- Requires separate IMDb database setup
107
- Fastest access for bulk operations
108
- Requires additional SQL database dependencies
109
110
### S3 Dataset Access System
111
112
**Access Methods**: `'s3'`, `'s3dataset'`, `'imdbws'`
113
- Access to IMDb S3 datasets and web services
114
- Official IMDb data source
115
- Requires AWS credentials and network access
116
- Most up-to-date and authoritative data
117
118
## Configuration System
119
120
### Automatic Configuration
121
122
The IMDb function can automatically load configuration from files when `accessSystem='config'` or `accessSystem='auto'`.
123
124
**Configuration File Locations** (searched in order):
125
1. `./cinemagoer.cfg` or `./imdbpy.cfg` (current directory)
126
2. `./.cinemagoer.cfg` or `./.imdbpy.cfg` (current directory, hidden)
127
3. `~/cinemagoer.cfg` or `~/imdbpy.cfg` (home directory)
128
4. `~/.cinemagoer.cfg` or `~/.imdbpy.cfg` (home directory, hidden)
129
5. `/etc/cinemagoer.cfg` or `/etc/imdbpy.cfg` (Unix systems)
130
6. `/etc/conf.d/cinemagoer.cfg` or `/etc/conf.d/imdbpy.cfg` (Unix systems)
131
132
**Configuration File Format:**
133
134
```ini
135
[imdbpy]
136
accessSystem = http
137
results = 30
138
keywordsResults = 150
139
reraiseExceptions = true
140
imdbURL_base = https://www.imdb.com/
141
```
142
143
### Custom Configuration
144
145
```python { .api }
146
class ConfigParserWithCase:
147
"""
148
Case-sensitive configuration parser for IMDb settings.
149
150
Methods:
151
- get(section, option, *args, **kwds): Get configuration value
152
- getDict(section): Get section as dictionary
153
- items(section, *args, **kwds): Get section items as list
154
"""
155
```
156
157
## Error Handling
158
159
All core access functions can raise IMDb-specific exceptions:
160
161
```python
162
from imdb import IMDb, IMDbError, IMDbDataAccessError
163
164
try:
165
ia = IMDb('invalid_system')
166
except IMDbError as e:
167
print(f"IMDb error: {e}")
168
169
try:
170
ia = IMDb('sql') # If SQL system not available
171
except IMDbError as e:
172
print(f"SQL access not available: {e}")
173
```
174
175
## Performance Best Practices
176
177
Optimize performance for different use cases and access patterns:
178
179
### Access System Selection
180
181
**HTTP Access (Default):**
182
- Best for: Small to medium applications, one-off scripts, development
183
- Performance: Moderate, dependent on network latency
184
- Rate limiting: Subject to IMDb's rate limits
185
- Best practices: Cache results, use batch operations when possible
186
187
```python
188
# HTTP access - good for most use cases
189
ia = IMDb() # Default HTTP access
190
```
191
192
**SQL Access:**
193
- Best for: Large-scale applications, high-volume queries, analytics
194
- Performance: Excellent for complex queries and bulk operations
195
- Setup required: Local IMDb database installation
196
- Best practices: Use for production applications with heavy usage
197
198
```python
199
# SQL access - optimal for large-scale applications
200
ia = IMDb('sql', host='localhost', user='imdb', password='password')
201
```
202
203
**S3 Access:**
204
- Best for: Cloud applications, AWS-integrated systems
205
- Performance: Good for bulk data processing
206
- Requirements: AWS credentials and S3 dataset access
207
- Best practices: Use for batch processing and analytics
208
209
```python
210
# S3 access - good for cloud-based bulk processing
211
ia = IMDb('s3')
212
```
213
214
### Information Set Optimization
215
216
**Selective Information Loading:**
217
```python
218
# Efficient - only load needed information
219
movie = ia.get_movie('0133093', info=['main', 'plot'])
220
221
# Inefficient - loads all available information
222
movie = ia.get_movie('0133093', info='all')
223
```
224
225
**Batch Updates:**
226
```python
227
# Efficient - batch processing
228
movies = ia.search_movie('Matrix')
229
for movie in movies[:5]: # Limit results
230
ia.update(movie, info=['main']) # Minimal info for listings
231
232
# Inefficient - individual detailed updates
233
for movie in movies:
234
ia.update(movie, info='all') # Excessive information
235
```
236
237
### Memory Management
238
239
**Large Dataset Handling:**
240
```python
241
# Process results in batches to manage memory
242
def process_large_chart():
243
top_movies = ia.get_top250_movies()
244
245
# Process in smaller chunks
246
chunk_size = 50
247
for i in range(0, len(top_movies), chunk_size):
248
chunk = top_movies[i:i + chunk_size]
249
# Process chunk
250
for movie in chunk:
251
# Minimal processing to conserve memory
252
print(f"{movie['title']} ({movie['year']})")
253
```
254
255
### Caching Strategies
256
257
**Results Caching:**
258
```python
259
from functools import lru_cache
260
261
# Cache expensive operations
262
@lru_cache(maxsize=100)
263
def cached_movie_search(title):
264
return ia.search_movie(title, results=5)
265
266
# Reuse cached results
267
movies1 = cached_movie_search('Matrix') # Network call
268
movies2 = cached_movie_search('Matrix') # Cached result
269
```