0
# Core Scraper Functions
1
2
The main CloudScraper class and convenience functions that provide the primary interface for creating scraper instances and making requests with automatic Cloudflare challenge solving.
3
4
## Capabilities
5
6
### Creating Scraper Instances
7
8
Factory function for creating ready-to-go CloudScraper objects with comprehensive configuration options for all aspects of challenge solving and stealth operation.
9
10
```python { .api }
11
def create_scraper(sess=None, **kwargs) -> CloudScraper:
12
"""
13
Create a configured CloudScraper instance.
14
15
Parameters:
16
- sess: Optional existing requests.Session to extend
17
- debug: bool = False, enable debug logging
18
- disableCloudflareV1: bool = False, disable v1 challenge handling
19
- disableCloudflareV2: bool = False, disable v2 challenge handling
20
- disableCloudflareV3: bool = False, disable v3 challenge handling
21
- disableTurnstile: bool = False, disable Turnstile challenge handling
22
- delay: float = None, custom delay between challenge attempts
23
- captcha: dict = {}, captcha solver configuration
24
- interpreter: str = 'js2py', JavaScript interpreter to use
25
- browser: str|dict = None, browser fingerprinting configuration
26
- allow_brotli: bool = True, enable Brotli compression support
27
- enable_stealth: bool = True, enable stealth mode features
28
- rotating_proxies: list|dict = None, proxy rotation configuration
29
- proxy_options: dict = {}, proxy rotation strategy and settings
30
- stealth_options: dict = {}, stealth mode behavior configuration
31
- session_refresh_interval: int = 3600, session refresh interval in seconds
32
- auto_refresh_on_403: bool = True, auto-refresh session on 403 errors
33
- max_403_retries: int = 3, maximum 403 error retry attempts
34
- cipherSuite: str|list = None, custom TLS cipher suite
35
- ecdhCurve: str = 'prime256v1', ECDH curve for TLS negotiation
36
- server_hostname: str = None, custom server hostname for SNI
37
- source_address: str|tuple = None, source IP address for connections
38
- ssl_context: ssl.SSLContext = None, custom SSL context
39
- doubleDown: bool = True, enable double-down challenge solving
40
- solveDepth: int = 3, maximum challenge solving attempts
41
- requestPreHook: callable = None, function called before each request
42
- requestPostHook: callable = None, function called after each request
43
- min_request_interval: float = 1.0, minimum seconds between requests
44
- max_concurrent_requests: int = 1, maximum concurrent requests
45
- rotate_tls_ciphers: bool = True, enable TLS cipher rotation
46
47
Returns:
48
CloudScraper instance ready for making requests
49
"""
50
```
51
52
#### Usage Examples
53
54
```python
55
# Basic scraper with default settings
56
scraper = cloudscraper.create_scraper()
57
58
# Debug mode enabled
59
scraper = cloudscraper.create_scraper(debug=True)
60
61
# With proxy rotation
62
scraper = cloudscraper.create_scraper(
63
rotating_proxies=[
64
'http://user:pass@proxy1.example.com:8080',
65
'http://user:pass@proxy2.example.com:8080'
66
],
67
proxy_options={
68
'rotation_strategy': 'smart',
69
'ban_time': 300
70
}
71
)
72
73
# Advanced stealth configuration
74
scraper = cloudscraper.create_scraper(
75
enable_stealth=True,
76
stealth_options={
77
'min_delay': 2.0,
78
'max_delay': 6.0,
79
'human_like_delays': True,
80
'randomize_headers': True,
81
'browser_quirks': True
82
},
83
browser={
84
'browser': 'chrome',
85
'platform': 'windows',
86
'mobile': False
87
}
88
)
89
90
# With CAPTCHA solver
91
scraper = cloudscraper.create_scraper(
92
captcha={
93
'provider': '2captcha',
94
'api_key': 'your_api_key'
95
}
96
)
97
```
98
99
### Token Extraction
100
101
Extract Cloudflare authentication tokens and user agent for integration with external tools and applications.
102
103
```python { .api }
104
def get_tokens(url: str, **kwargs) -> tuple[dict[str, str], str]:
105
"""
106
Get Cloudflare tokens for a URL.
107
108
Parameters:
109
- url: str, target URL to get tokens for
110
- **kwargs: same configuration options as create_scraper()
111
112
Returns:
113
Tuple of (tokens_dict, user_agent_string)
114
- tokens_dict: Dictionary of Cloudflare cookies
115
- user_agent_string: User agent string used for requests
116
117
Raises:
118
- CloudflareIUAMError: If unable to find Cloudflare cookies
119
"""
120
```
121
122
#### Usage Examples
123
124
```python
125
# Basic token extraction
126
tokens, user_agent = cloudscraper.get_tokens('https://example.com')
127
print(tokens)
128
# {'cf_clearance': 'abc123...', 'cf_chl_2': 'xyz789...'}
129
130
# With proxy
131
tokens, user_agent = cloudscraper.get_tokens(
132
'https://example.com',
133
proxies={'http': 'http://proxy.example.com:8080'}
134
)
135
136
# With stealth mode
137
tokens, user_agent = cloudscraper.get_tokens(
138
'https://example.com',
139
enable_stealth=True,
140
stealth_options={'min_delay': 2.0, 'max_delay': 5.0}
141
)
142
```
143
144
### Cookie String Generation
145
146
Generate cookie header strings for use with external HTTP clients and tools.
147
148
```python { .api }
149
def get_cookie_string(url: str, **kwargs) -> tuple[str, str]:
150
"""
151
Generate cookie string and user agent for HTTP headers.
152
153
Parameters:
154
- url: str, target URL to get cookies for
155
- **kwargs: same configuration options as create_scraper()
156
157
Returns:
158
Tuple of (cookie_string, user_agent_string)
159
- cookie_string: Formatted cookie header value
160
- user_agent_string: User agent string used for requests
161
"""
162
```
163
164
#### Usage Examples
165
166
```python
167
# Generate cookie header
168
cookie_string, user_agent = cloudscraper.get_cookie_string('https://example.com')
169
print(f"Cookie: {cookie_string}")
170
print(f"User-Agent: {user_agent}")
171
172
# Use with curl command
173
import subprocess
174
cookie_arg, user_agent = cloudscraper.get_cookie_string('https://example.com')
175
result = subprocess.check_output([
176
'curl',
177
'--cookie', cookie_arg,
178
'-A', user_agent,
179
'https://example.com'
180
])
181
```
182
183
### CipherSuiteAdapter Class
184
185
Custom HTTPAdapter for requests that provides TLS cipher suite control and source address binding for enhanced anti-detection capabilities.
186
187
```python { .api }
188
class CipherSuiteAdapter(HTTPAdapter):
189
def __init__(self, *args, **kwargs):
190
"""
191
Initialize TLS adapter with custom cipher suite configuration.
192
193
Parameters:
194
- ssl_context: ssl.SSLContext = None, custom SSL context
195
- cipherSuite: str|list = None, TLS cipher suite specification
196
- source_address: str|tuple = None, source IP address for connections
197
- server_hostname: str = None, custom server hostname for SNI
198
- ecdhCurve: str = 'prime256v1', ECDH curve for key exchange
199
"""
200
201
def wrap_socket(self, *args, **kwargs):
202
"""
203
Wrap socket with SSL context and custom hostname handling.
204
"""
205
206
def init_poolmanager(self, *args, **kwargs):
207
"""
208
Initialize connection pool manager with SSL context.
209
"""
210
211
def proxy_manager_for(self, *args, **kwargs):
212
"""
213
Create proxy manager with SSL context configuration.
214
"""
215
```
216
217
#### Usage Examples
218
219
```python
220
# Custom cipher suite adapter
221
adapter = cloudscraper.CipherSuiteAdapter(
222
cipherSuite='ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384',
223
source_address=('192.168.1.100', 0),
224
server_hostname='example.com'
225
)
226
227
# Mount on session
228
session = requests.Session()
229
session.mount('https://', adapter)
230
```
231
232
### CloudScraper Class
233
234
Main scraper class that extends requests.Session with automatic Cloudflare challenge detection and solving capabilities.
235
236
```python { .api }
237
class CloudScraper:
238
def __init__(self, **kwargs):
239
"""
240
Initialize CloudScraper with configuration options.
241
242
Parameters: Same as create_scraper() function
243
"""
244
245
def request(self, method: str, url: str, *args, **kwargs):
246
"""
247
Make HTTP request with automatic challenge solving.
248
249
Parameters:
250
- method: str, HTTP method (GET, POST, etc.)
251
- url: str, target URL
252
- *args, **kwargs: standard requests arguments
253
254
Returns:
255
requests.Response object
256
257
Raises:
258
- CloudflareLoopProtection: If too many challenge attempts
259
- CloudflareChallengeError: If unknown challenge type detected
260
- Various challenge-specific exceptions
261
"""
262
263
def perform_request(self, method: str, url: str, *args, **kwargs):
264
"""
265
Make raw HTTP request without challenge solving.
266
267
Parameters: Same as request()
268
Returns: requests.Response object
269
"""
270
271
@staticmethod
272
def debugRequest(req):
273
"""
274
Debug request/response details.
275
276
Parameters:
277
- req: requests.Response object to debug
278
"""
279
280
def decodeBrotli(self, resp):
281
"""
282
Decode Brotli compressed response content.
283
284
Parameters:
285
- resp: requests.Response object
286
287
Returns:
288
Modified response object with decoded content
289
"""
290
291
def __getstate__(self):
292
"""
293
Support for pickle serialization of scraper instances.
294
295
Returns:
296
Dictionary of instance state for serialization
297
"""
298
299
def simpleException(self, exception, msg):
300
"""
301
Raise exception with no stack trace and reset depth counter.
302
303
Parameters:
304
- exception: Exception class to raise
305
- msg: str, error message
306
"""
307
308
def _should_refresh_session(self):
309
"""
310
Check if session should be refreshed based on age and error patterns.
311
312
Returns:
313
bool, True if session needs refresh
314
"""
315
316
def _refresh_session(self, url):
317
"""
318
Refresh session by clearing cookies and re-establishing connection.
319
320
Parameters:
321
- url: str, URL to test connection against
322
323
Returns:
324
bool, True if refresh succeeded
325
"""
326
327
def _clear_cloudflare_cookies(self):
328
"""
329
Clear Cloudflare-specific cookies to force re-authentication.
330
"""
331
332
def _apply_request_throttling(self):
333
"""
334
Apply request throttling to prevent TLS blocking from concurrent requests.
335
"""
336
337
def _rotate_tls_cipher_suite(self):
338
"""
339
Rotate TLS cipher suites to avoid detection patterns.
340
"""
341
```
342
343
#### Usage Examples
344
345
```python
346
# Direct class instantiation
347
scraper = cloudscraper.CloudScraper(debug=True)
348
349
# Make various types of requests
350
response = scraper.get('https://example.com')
351
response = scraper.post('https://example.com/api', json={'key': 'value'})
352
response = scraper.put('https://example.com/update', data='content')
353
354
# Access response data
355
print(response.status_code)
356
print(response.headers)
357
print(response.text)
358
print(response.json())
359
360
# Use session features
361
scraper.headers.update({'Custom-Header': 'value'})
362
scraper.cookies.set('session_id', 'abc123')
363
364
# Raw request without challenge solving
365
raw_response = scraper.perform_request('GET', 'https://example.com')
366
```
367
368
### Session Aliases
369
370
Alternative names for creating scraper instances to maintain backward compatibility.
371
372
```python { .api }
373
# Alias for create_scraper()
374
session = create_scraper
375
```
376
377
#### Usage Examples
378
379
```python
380
# Alternative session creation
381
scraper = cloudscraper.session() # Same as create_scraper()
382
```
383
384
## Error Handling
385
386
Core scraper functions can raise various exceptions:
387
388
```python
389
try:
390
scraper = cloudscraper.create_scraper()
391
response = scraper.get('https://protected-site.com')
392
except cloudscraper.CloudflareLoopProtection:
393
print("Too many challenge attempts - possible infinite loop")
394
except cloudscraper.CloudflareIUAMError:
395
print("Could not extract challenge parameters")
396
except cloudscraper.CloudflareChallengeError:
397
print("Unknown challenge type detected")
398
except Exception as e:
399
print(f"Unexpected error: {e}")
400
```
401
402
## Integration with Requests
403
404
CloudScraper is fully compatible with the requests library API:
405
406
```python
407
# All requests features work
408
scraper = cloudscraper.create_scraper()
409
410
# Authentication
411
scraper.auth = ('username', 'password')
412
413
# Custom headers
414
scraper.headers.update({'Authorization': 'Bearer token'})
415
416
# Session cookies
417
scraper.cookies.set('session', 'value')
418
419
# Request hooks
420
def log_request(response, *args, **kwargs):
421
print(f"Request to {response.url} returned {response.status_code}")
422
423
scraper.hooks['response'] = log_request
424
425
# Timeouts and retries work as expected
426
response = scraper.get('https://example.com', timeout=30)
427
```