or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

captcha-solving.mdchallenge-handling.mdcore-scraper.mdindex.mdjavascript-interpreters.mdproxy-management.mdstealth-mode.mduser-agent.md

core-scraper.mddocs/

0

# Core Scraper Functions

1

2

The main CloudScraper class and convenience functions that provide the primary interface for creating scraper instances and making requests with automatic Cloudflare challenge solving.

3

4

## Capabilities

5

6

### Creating Scraper Instances

7

8

Factory function for creating ready-to-go CloudScraper objects with comprehensive configuration options for all aspects of challenge solving and stealth operation.

9

10

```python { .api }

11

def create_scraper(sess=None, **kwargs) -> CloudScraper:

12

"""

13

Create a configured CloudScraper instance.

14

15

Parameters:

16

- sess: Optional existing requests.Session to extend

17

- debug: bool = False, enable debug logging

18

- disableCloudflareV1: bool = False, disable v1 challenge handling

19

- disableCloudflareV2: bool = False, disable v2 challenge handling

20

- disableCloudflareV3: bool = False, disable v3 challenge handling

21

- disableTurnstile: bool = False, disable Turnstile challenge handling

22

- delay: float = None, custom delay between challenge attempts

23

- captcha: dict = {}, captcha solver configuration

24

- interpreter: str = 'js2py', JavaScript interpreter to use

25

- browser: str|dict = None, browser fingerprinting configuration

26

- allow_brotli: bool = True, enable Brotli compression support

27

- enable_stealth: bool = True, enable stealth mode features

28

- rotating_proxies: list|dict = None, proxy rotation configuration

29

- proxy_options: dict = {}, proxy rotation strategy and settings

30

- stealth_options: dict = {}, stealth mode behavior configuration

31

- session_refresh_interval: int = 3600, session refresh interval in seconds

32

- auto_refresh_on_403: bool = True, auto-refresh session on 403 errors

33

- max_403_retries: int = 3, maximum 403 error retry attempts

34

- cipherSuite: str|list = None, custom TLS cipher suite

35

- ecdhCurve: str = 'prime256v1', ECDH curve for TLS negotiation

36

- server_hostname: str = None, custom server hostname for SNI

37

- source_address: str|tuple = None, source IP address for connections

38

- ssl_context: ssl.SSLContext = None, custom SSL context

39

- doubleDown: bool = True, enable double-down challenge solving

40

- solveDepth: int = 3, maximum challenge solving attempts

41

- requestPreHook: callable = None, function called before each request

42

- requestPostHook: callable = None, function called after each request

43

- min_request_interval: float = 1.0, minimum seconds between requests

44

- max_concurrent_requests: int = 1, maximum concurrent requests

45

- rotate_tls_ciphers: bool = True, enable TLS cipher rotation

46

47

Returns:

48

CloudScraper instance ready for making requests

49

"""

50

```

51

52

#### Usage Examples

53

54

```python

55

# Basic scraper with default settings

56

scraper = cloudscraper.create_scraper()

57

58

# Debug mode enabled

59

scraper = cloudscraper.create_scraper(debug=True)

60

61

# With proxy rotation

62

scraper = cloudscraper.create_scraper(

63

rotating_proxies=[

64

'http://user:pass@proxy1.example.com:8080',

65

'http://user:pass@proxy2.example.com:8080'

66

],

67

proxy_options={

68

'rotation_strategy': 'smart',

69

'ban_time': 300

70

}

71

)

72

73

# Advanced stealth configuration

74

scraper = cloudscraper.create_scraper(

75

enable_stealth=True,

76

stealth_options={

77

'min_delay': 2.0,

78

'max_delay': 6.0,

79

'human_like_delays': True,

80

'randomize_headers': True,

81

'browser_quirks': True

82

},

83

browser={

84

'browser': 'chrome',

85

'platform': 'windows',

86

'mobile': False

87

}

88

)

89

90

# With CAPTCHA solver

91

scraper = cloudscraper.create_scraper(

92

captcha={

93

'provider': '2captcha',

94

'api_key': 'your_api_key'

95

}

96

)

97

```

98

99

### Token Extraction

100

101

Extract Cloudflare authentication tokens and user agent for integration with external tools and applications.

102

103

```python { .api }

104

def get_tokens(url: str, **kwargs) -> tuple[dict[str, str], str]:

105

"""

106

Get Cloudflare tokens for a URL.

107

108

Parameters:

109

- url: str, target URL to get tokens for

110

- **kwargs: same configuration options as create_scraper()

111

112

Returns:

113

Tuple of (tokens_dict, user_agent_string)

114

- tokens_dict: Dictionary of Cloudflare cookies

115

- user_agent_string: User agent string used for requests

116

117

Raises:

118

- CloudflareIUAMError: If unable to find Cloudflare cookies

119

"""

120

```

121

122

#### Usage Examples

123

124

```python

125

# Basic token extraction

126

tokens, user_agent = cloudscraper.get_tokens('https://example.com')

127

print(tokens)

128

# {'cf_clearance': 'abc123...', 'cf_chl_2': 'xyz789...'}

129

130

# With proxy

131

tokens, user_agent = cloudscraper.get_tokens(

132

'https://example.com',

133

proxies={'http': 'http://proxy.example.com:8080'}

134

)

135

136

# With stealth mode

137

tokens, user_agent = cloudscraper.get_tokens(

138

'https://example.com',

139

enable_stealth=True,

140

stealth_options={'min_delay': 2.0, 'max_delay': 5.0}

141

)

142

```

143

144

### Cookie String Generation

145

146

Generate cookie header strings for use with external HTTP clients and tools.

147

148

```python { .api }

149

def get_cookie_string(url: str, **kwargs) -> tuple[str, str]:

150

"""

151

Generate cookie string and user agent for HTTP headers.

152

153

Parameters:

154

- url: str, target URL to get cookies for

155

- **kwargs: same configuration options as create_scraper()

156

157

Returns:

158

Tuple of (cookie_string, user_agent_string)

159

- cookie_string: Formatted cookie header value

160

- user_agent_string: User agent string used for requests

161

"""

162

```

163

164

#### Usage Examples

165

166

```python

167

# Generate cookie header

168

cookie_string, user_agent = cloudscraper.get_cookie_string('https://example.com')

169

print(f"Cookie: {cookie_string}")

170

print(f"User-Agent: {user_agent}")

171

172

# Use with curl command

173

import subprocess

174

cookie_arg, user_agent = cloudscraper.get_cookie_string('https://example.com')

175

result = subprocess.check_output([

176

'curl',

177

'--cookie', cookie_arg,

178

'-A', user_agent,

179

'https://example.com'

180

])

181

```

182

183

### CipherSuiteAdapter Class

184

185

Custom HTTPAdapter for requests that provides TLS cipher suite control and source address binding for enhanced anti-detection capabilities.

186

187

```python { .api }

188

class CipherSuiteAdapter(HTTPAdapter):

189

def __init__(self, *args, **kwargs):

190

"""

191

Initialize TLS adapter with custom cipher suite configuration.

192

193

Parameters:

194

- ssl_context: ssl.SSLContext = None, custom SSL context

195

- cipherSuite: str|list = None, TLS cipher suite specification

196

- source_address: str|tuple = None, source IP address for connections

197

- server_hostname: str = None, custom server hostname for SNI

198

- ecdhCurve: str = 'prime256v1', ECDH curve for key exchange

199

"""

200

201

def wrap_socket(self, *args, **kwargs):

202

"""

203

Wrap socket with SSL context and custom hostname handling.

204

"""

205

206

def init_poolmanager(self, *args, **kwargs):

207

"""

208

Initialize connection pool manager with SSL context.

209

"""

210

211

def proxy_manager_for(self, *args, **kwargs):

212

"""

213

Create proxy manager with SSL context configuration.

214

"""

215

```

216

217

#### Usage Examples

218

219

```python

220

# Custom cipher suite adapter

221

adapter = cloudscraper.CipherSuiteAdapter(

222

cipherSuite='ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384',

223

source_address=('192.168.1.100', 0),

224

server_hostname='example.com'

225

)

226

227

# Mount on session

228

session = requests.Session()

229

session.mount('https://', adapter)

230

```

231

232

### CloudScraper Class

233

234

Main scraper class that extends requests.Session with automatic Cloudflare challenge detection and solving capabilities.

235

236

```python { .api }

237

class CloudScraper:

238

def __init__(self, **kwargs):

239

"""

240

Initialize CloudScraper with configuration options.

241

242

Parameters: Same as create_scraper() function

243

"""

244

245

def request(self, method: str, url: str, *args, **kwargs):

246

"""

247

Make HTTP request with automatic challenge solving.

248

249

Parameters:

250

- method: str, HTTP method (GET, POST, etc.)

251

- url: str, target URL

252

- *args, **kwargs: standard requests arguments

253

254

Returns:

255

requests.Response object

256

257

Raises:

258

- CloudflareLoopProtection: If too many challenge attempts

259

- CloudflareChallengeError: If unknown challenge type detected

260

- Various challenge-specific exceptions

261

"""

262

263

def perform_request(self, method: str, url: str, *args, **kwargs):

264

"""

265

Make raw HTTP request without challenge solving.

266

267

Parameters: Same as request()

268

Returns: requests.Response object

269

"""

270

271

@staticmethod

272

def debugRequest(req):

273

"""

274

Debug request/response details.

275

276

Parameters:

277

- req: requests.Response object to debug

278

"""

279

280

def decodeBrotli(self, resp):

281

"""

282

Decode Brotli compressed response content.

283

284

Parameters:

285

- resp: requests.Response object

286

287

Returns:

288

Modified response object with decoded content

289

"""

290

291

def __getstate__(self):

292

"""

293

Support for pickle serialization of scraper instances.

294

295

Returns:

296

Dictionary of instance state for serialization

297

"""

298

299

def simpleException(self, exception, msg):

300

"""

301

Raise exception with no stack trace and reset depth counter.

302

303

Parameters:

304

- exception: Exception class to raise

305

- msg: str, error message

306

"""

307

308

def _should_refresh_session(self):

309

"""

310

Check if session should be refreshed based on age and error patterns.

311

312

Returns:

313

bool, True if session needs refresh

314

"""

315

316

def _refresh_session(self, url):

317

"""

318

Refresh session by clearing cookies and re-establishing connection.

319

320

Parameters:

321

- url: str, URL to test connection against

322

323

Returns:

324

bool, True if refresh succeeded

325

"""

326

327

def _clear_cloudflare_cookies(self):

328

"""

329

Clear Cloudflare-specific cookies to force re-authentication.

330

"""

331

332

def _apply_request_throttling(self):

333

"""

334

Apply request throttling to prevent TLS blocking from concurrent requests.

335

"""

336

337

def _rotate_tls_cipher_suite(self):

338

"""

339

Rotate TLS cipher suites to avoid detection patterns.

340

"""

341

```

342

343

#### Usage Examples

344

345

```python

346

# Direct class instantiation

347

scraper = cloudscraper.CloudScraper(debug=True)

348

349

# Make various types of requests

350

response = scraper.get('https://example.com')

351

response = scraper.post('https://example.com/api', json={'key': 'value'})

352

response = scraper.put('https://example.com/update', data='content')

353

354

# Access response data

355

print(response.status_code)

356

print(response.headers)

357

print(response.text)

358

print(response.json())

359

360

# Use session features

361

scraper.headers.update({'Custom-Header': 'value'})

362

scraper.cookies.set('session_id', 'abc123')

363

364

# Raw request without challenge solving

365

raw_response = scraper.perform_request('GET', 'https://example.com')

366

```

367

368

### Session Aliases

369

370

Alternative names for creating scraper instances to maintain backward compatibility.

371

372

```python { .api }

373

# Alias for create_scraper()

374

session = create_scraper

375

```

376

377

#### Usage Examples

378

379

```python

380

# Alternative session creation

381

scraper = cloudscraper.session() # Same as create_scraper()

382

```

383

384

## Error Handling

385

386

Core scraper functions can raise various exceptions:

387

388

```python

389

try:

390

scraper = cloudscraper.create_scraper()

391

response = scraper.get('https://protected-site.com')

392

except cloudscraper.CloudflareLoopProtection:

393

print("Too many challenge attempts - possible infinite loop")

394

except cloudscraper.CloudflareIUAMError:

395

print("Could not extract challenge parameters")

396

except cloudscraper.CloudflareChallengeError:

397

print("Unknown challenge type detected")

398

except Exception as e:

399

print(f"Unexpected error: {e}")

400

```

401

402

## Integration with Requests

403

404

CloudScraper is fully compatible with the requests library API:

405

406

```python

407

# All requests features work

408

scraper = cloudscraper.create_scraper()

409

410

# Authentication

411

scraper.auth = ('username', 'password')

412

413

# Custom headers

414

scraper.headers.update({'Authorization': 'Bearer token'})

415

416

# Session cookies

417

scraper.cookies.set('session', 'value')

418

419

# Request hooks

420

def log_request(response, *args, **kwargs):

421

print(f"Request to {response.url} returned {response.status_code}")

422

423

scraper.hooks['response'] = log_request

424

425

# Timeouts and retries work as expected

426

response = scraper.get('https://example.com', timeout=30)

427

```