or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

advanced-features.mdcharacter-company-operations.mdconfig-utilities.mdcore-access.mddata-containers.mdindex.mdmovie-operations.mdperson-operations.md

core-access.mddocs/

0

# Core Data Access

1

2

Primary functions for creating IMDb access instances and retrieving basic system information. These form the foundation for all IMDb data operations across different access methods.

3

4

## Capabilities

5

6

### IMDb Instance Creation

7

8

Creates IMDb access system instances with configurable data sources and parameters. The factory function automatically selects appropriate parsers based on the specified access system.

9

10

```python { .api }

11

def IMDb(accessSystem=None, *arguments, **keywords):

12

"""

13

Create an instance of the appropriate IMDb access system.

14

15

Parameters:

16

- accessSystem: str, optional - Access method ('http', 'sql', 's3', 'auto', 'config')

17

- results: int - Default number of search results (default: 20)

18

- keywordsResults: int - Default number of keyword results (default: 100)

19

- reraiseExceptions: bool - Whether to re-raise exceptions (default: True)

20

- loggingLevel: int - Logging level

21

- loggingConfig: str - Path to logging configuration file

22

- imdbURL_base: str - Base IMDb URL (default: 'https://www.imdb.com/')

23

24

Returns:

25

IMDbBase subclass instance (IMDbHTTPAccessSystem, IMDbSqlAccessSystem, or IMDbS3AccessSystem)

26

"""

27

```

28

29

**Usage Example:**

30

31

```python

32

from imdb import IMDb

33

34

# Default HTTP access

35

ia = IMDb()

36

37

# Explicit HTTP access with custom settings

38

ia = IMDb('http', results=50, reraiseExceptions=False)

39

40

# SQL database access

41

ia = IMDb('sql', host='localhost', database='imdb')

42

43

# S3 dataset access

44

ia = IMDb('s3')

45

46

# Configuration file-based access

47

ia = IMDb('config')

48

```

49

50

### Cinemagoer Alias

51

52

Alias for the IMDb function providing identical functionality with updated branding.

53

54

```python { .api }

55

Cinemagoer = IMDb

56

```

57

58

**Usage Example:**

59

60

```python

61

from imdb import Cinemagoer

62

63

# Identical to IMDb() function

64

ia = Cinemagoer()

65

```

66

67

### Available Access Systems

68

69

Returns the list of currently available data access systems based on installed dependencies and system configuration.

70

71

```python { .api }

72

def available_access_systems():

73

"""

74

Return the list of available data access systems.

75

76

Returns:

77

list: Available access system names (e.g., ['http', 'sql'])

78

"""

79

```

80

81

**Usage Example:**

82

83

```python

84

from imdb import available_access_systems

85

86

# Check what access systems are available

87

systems = available_access_systems()

88

print(f"Available systems: {systems}")

89

# Output: ['http'] or ['http', 'sql'] depending on installation

90

```

91

92

## Access System Types

93

94

### HTTP Access System

95

96

**Access Methods**: `'http'`, `'https'`, `'web'`, `'html'`

97

- Web scraping access to IMDb website

98

- Default access method

99

- No additional dependencies beyond base requirements

100

- Rate-limited by IMDb's website policies

101

102

### SQL Database Access System

103

104

**Access Methods**: `'sql'`, `'db'`, `'database'`

105

- Direct SQL database access to local IMDb data

106

- Requires separate IMDb database setup

107

- Fastest access for bulk operations

108

- Requires additional SQL database dependencies

109

110

### S3 Dataset Access System

111

112

**Access Methods**: `'s3'`, `'s3dataset'`, `'imdbws'`

113

- Access to IMDb S3 datasets and web services

114

- Official IMDb data source

115

- Requires AWS credentials and network access

116

- Most up-to-date and authoritative data

117

118

## Configuration System

119

120

### Automatic Configuration

121

122

The IMDb function can automatically load configuration from files when `accessSystem='config'` or `accessSystem='auto'`.

123

124

**Configuration File Locations** (searched in order):

125

1. `./cinemagoer.cfg` or `./imdbpy.cfg` (current directory)

126

2. `./.cinemagoer.cfg` or `./.imdbpy.cfg` (current directory, hidden)

127

3. `~/cinemagoer.cfg` or `~/imdbpy.cfg` (home directory)

128

4. `~/.cinemagoer.cfg` or `~/.imdbpy.cfg` (home directory, hidden)

129

5. `/etc/cinemagoer.cfg` or `/etc/imdbpy.cfg` (Unix systems)

130

6. `/etc/conf.d/cinemagoer.cfg` or `/etc/conf.d/imdbpy.cfg` (Unix systems)

131

132

**Configuration File Format:**

133

134

```ini

135

[imdbpy]

136

accessSystem = http

137

results = 30

138

keywordsResults = 150

139

reraiseExceptions = true

140

imdbURL_base = https://www.imdb.com/

141

```

142

143

### Custom Configuration

144

145

```python { .api }

146

class ConfigParserWithCase:

147

"""

148

Case-sensitive configuration parser for IMDb settings.

149

150

Methods:

151

- get(section, option, *args, **kwds): Get configuration value

152

- getDict(section): Get section as dictionary

153

- items(section, *args, **kwds): Get section items as list

154

"""

155

```

156

157

## Error Handling

158

159

All core access functions can raise IMDb-specific exceptions:

160

161

```python

162

from imdb import IMDb, IMDbError, IMDbDataAccessError

163

164

try:

165

ia = IMDb('invalid_system')

166

except IMDbError as e:

167

print(f"IMDb error: {e}")

168

169

try:

170

ia = IMDb('sql') # If SQL system not available

171

except IMDbError as e:

172

print(f"SQL access not available: {e}")

173

```

174

175

## Performance Best Practices

176

177

Optimize performance for different use cases and access patterns:

178

179

### Access System Selection

180

181

**HTTP Access (Default):**

182

- Best for: Small to medium applications, one-off scripts, development

183

- Performance: Moderate, dependent on network latency

184

- Rate limiting: Subject to IMDb's rate limits

185

- Best practices: Cache results, use batch operations when possible

186

187

```python

188

# HTTP access - good for most use cases

189

ia = IMDb() # Default HTTP access

190

```

191

192

**SQL Access:**

193

- Best for: Large-scale applications, high-volume queries, analytics

194

- Performance: Excellent for complex queries and bulk operations

195

- Setup required: Local IMDb database installation

196

- Best practices: Use for production applications with heavy usage

197

198

```python

199

# SQL access - optimal for large-scale applications

200

ia = IMDb('sql', host='localhost', user='imdb', password='password')

201

```

202

203

**S3 Access:**

204

- Best for: Cloud applications, AWS-integrated systems

205

- Performance: Good for bulk data processing

206

- Requirements: AWS credentials and S3 dataset access

207

- Best practices: Use for batch processing and analytics

208

209

```python

210

# S3 access - good for cloud-based bulk processing

211

ia = IMDb('s3')

212

```

213

214

### Information Set Optimization

215

216

**Selective Information Loading:**

217

```python

218

# Efficient - only load needed information

219

movie = ia.get_movie('0133093', info=['main', 'plot'])

220

221

# Inefficient - loads all available information

222

movie = ia.get_movie('0133093', info='all')

223

```

224

225

**Batch Updates:**

226

```python

227

# Efficient - batch processing

228

movies = ia.search_movie('Matrix')

229

for movie in movies[:5]: # Limit results

230

ia.update(movie, info=['main']) # Minimal info for listings

231

232

# Inefficient - individual detailed updates

233

for movie in movies:

234

ia.update(movie, info='all') # Excessive information

235

```

236

237

### Memory Management

238

239

**Large Dataset Handling:**

240

```python

241

# Process results in batches to manage memory

242

def process_large_chart():

243

top_movies = ia.get_top250_movies()

244

245

# Process in smaller chunks

246

chunk_size = 50

247

for i in range(0, len(top_movies), chunk_size):

248

chunk = top_movies[i:i + chunk_size]

249

# Process chunk

250

for movie in chunk:

251

# Minimal processing to conserve memory

252

print(f"{movie['title']} ({movie['year']})")

253

```

254

255

### Caching Strategies

256

257

**Results Caching:**

258

```python

259

from functools import lru_cache

260

261

# Cache expensive operations

262

@lru_cache(maxsize=100)

263

def cached_movie_search(title):

264

return ia.search_movie(title, results=5)

265

266

# Reuse cached results

267

movies1 = cached_movie_search('Matrix') # Network call

268

movies2 = cached_movie_search('Matrix') # Cached result

269

```