Tessl Tile for pypi/cloudscraper@3.0.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

captcha-solving.md challenge-handling.md core-scraper.md index.md javascript-interpreters.md proxy-management.md stealth-mode.md user-agent.md

core-scraper.mddocs/

0
# Core Scraper Functions
1

2
The main CloudScraper class and convenience functions that provide the primary interface for creating scraper instances and making requests with automatic Cloudflare challenge solving.
3

4
## Capabilities
5

6
### Creating Scraper Instances
7

8
Factory function for creating ready-to-go CloudScraper objects with comprehensive configuration options for all aspects of challenge solving and stealth operation.
9

10
```python { .api }
11
def create_scraper(sess=None, **kwargs) -> CloudScraper:
12
    """
13
    Create a configured CloudScraper instance.
14

15
    Parameters:
16
    - sess: Optional existing requests.Session to extend
17
    - debug: bool = False, enable debug logging
18
    - disableCloudflareV1: bool = False, disable v1 challenge handling
19
    - disableCloudflareV2: bool = False, disable v2 challenge handling  
20
    - disableCloudflareV3: bool = False, disable v3 challenge handling
21
    - disableTurnstile: bool = False, disable Turnstile challenge handling
22
    - delay: float = None, custom delay between challenge attempts
23
    - captcha: dict = {}, captcha solver configuration
24
    - interpreter: str = 'js2py', JavaScript interpreter to use
25
    - browser: str|dict = None, browser fingerprinting configuration
26
    - allow_brotli: bool = True, enable Brotli compression support
27
    - enable_stealth: bool = True, enable stealth mode features
28
    - rotating_proxies: list|dict = None, proxy rotation configuration
29
    - proxy_options: dict = {}, proxy rotation strategy and settings
30
    - stealth_options: dict = {}, stealth mode behavior configuration
31
    - session_refresh_interval: int = 3600, session refresh interval in seconds
32
    - auto_refresh_on_403: bool = True, auto-refresh session on 403 errors
33
    - max_403_retries: int = 3, maximum 403 error retry attempts
34
    - cipherSuite: str|list = None, custom TLS cipher suite
35
    - ecdhCurve: str = 'prime256v1', ECDH curve for TLS negotiation
36
    - server_hostname: str = None, custom server hostname for SNI
37
    - source_address: str|tuple = None, source IP address for connections
38
    - ssl_context: ssl.SSLContext = None, custom SSL context
39
    - doubleDown: bool = True, enable double-down challenge solving
40
    - solveDepth: int = 3, maximum challenge solving attempts
41
    - requestPreHook: callable = None, function called before each request
42
    - requestPostHook: callable = None, function called after each request
43
    - min_request_interval: float = 1.0, minimum seconds between requests
44
    - max_concurrent_requests: int = 1, maximum concurrent requests
45
    - rotate_tls_ciphers: bool = True, enable TLS cipher rotation
46

47
    Returns:
48
    CloudScraper instance ready for making requests
49
    """
50
```
51

52
#### Usage Examples
53

54
```python
55
# Basic scraper with default settings
56
scraper = cloudscraper.create_scraper()
57

58
# Debug mode enabled
59
scraper = cloudscraper.create_scraper(debug=True)
60

61
# With proxy rotation
62
scraper = cloudscraper.create_scraper(
63
    rotating_proxies=[
64
        'http://user:pass@proxy1.example.com:8080',
65
        'http://user:pass@proxy2.example.com:8080'
66
    ],
67
    proxy_options={
68
        'rotation_strategy': 'smart',
69
        'ban_time': 300
70
    }
71
)
72

73
# Advanced stealth configuration
74
scraper = cloudscraper.create_scraper(
75
    enable_stealth=True,
76
    stealth_options={
77
        'min_delay': 2.0,
78
        'max_delay': 6.0,
79
        'human_like_delays': True,
80
        'randomize_headers': True,
81
        'browser_quirks': True
82
    },
83
    browser={
84
        'browser': 'chrome',
85
        'platform': 'windows',
86
        'mobile': False
87
    }
88
)
89

90
# With CAPTCHA solver
91
scraper = cloudscraper.create_scraper(
92
    captcha={
93
        'provider': '2captcha',
94
        'api_key': 'your_api_key'
95
    }
96
)
97
```
98

99
### Token Extraction
100

101
Extract Cloudflare authentication tokens and user agent for integration with external tools and applications.
102

103
```python { .api }
104
def get_tokens(url: str, **kwargs) -> tuple[dict[str, str], str]:
105
    """
106
    Get Cloudflare tokens for a URL.
107

108
    Parameters:
109
    - url: str, target URL to get tokens for
110
    - **kwargs: same configuration options as create_scraper()
111

112
    Returns:
113
    Tuple of (tokens_dict, user_agent_string)
114
    - tokens_dict: Dictionary of Cloudflare cookies
115
    - user_agent_string: User agent string used for requests
116

117
    Raises:
118
    - CloudflareIUAMError: If unable to find Cloudflare cookies
119
    """
120
```
121

122
#### Usage Examples
123

124
```python
125
# Basic token extraction
126
tokens, user_agent = cloudscraper.get_tokens('https://example.com')
127
print(tokens)
128
# {'cf_clearance': 'abc123...', 'cf_chl_2': 'xyz789...'}
129

130
# With proxy
131
tokens, user_agent = cloudscraper.get_tokens(
132
    'https://example.com',
133
    proxies={'http': 'http://proxy.example.com:8080'}
134
)
135

136
# With stealth mode
137
tokens, user_agent = cloudscraper.get_tokens(
138
    'https://example.com',
139
    enable_stealth=True,
140
    stealth_options={'min_delay': 2.0, 'max_delay': 5.0}
141
)
142
```
143

144
### Cookie String Generation
145

146
Generate cookie header strings for use with external HTTP clients and tools.
147

148
```python { .api }
149
def get_cookie_string(url: str, **kwargs) -> tuple[str, str]:
150
    """
151
    Generate cookie string and user agent for HTTP headers.
152

153
    Parameters:
154
    - url: str, target URL to get cookies for
155
    - **kwargs: same configuration options as create_scraper()
156

157
    Returns:
158
    Tuple of (cookie_string, user_agent_string)
159
    - cookie_string: Formatted cookie header value
160
    - user_agent_string: User agent string used for requests
161
    """
162
```
163

164
#### Usage Examples
165

166
```python
167
# Generate cookie header
168
cookie_string, user_agent = cloudscraper.get_cookie_string('https://example.com')
169
print(f"Cookie: {cookie_string}")
170
print(f"User-Agent: {user_agent}")
171

172
# Use with curl command
173
import subprocess
174
cookie_arg, user_agent = cloudscraper.get_cookie_string('https://example.com')
175
result = subprocess.check_output([
176
    'curl',
177
    '--cookie', cookie_arg,
178
    '-A', user_agent,
179
    'https://example.com'
180
])
181
```
182

183
### CipherSuiteAdapter Class
184

185
Custom HTTPAdapter for requests that provides TLS cipher suite control and source address binding for enhanced anti-detection capabilities.
186

187
```python { .api }
188
class CipherSuiteAdapter(HTTPAdapter):
189
    def __init__(self, *args, **kwargs):
190
        """
191
        Initialize TLS adapter with custom cipher suite configuration.
192
        
193
        Parameters:
194
        - ssl_context: ssl.SSLContext = None, custom SSL context
195
        - cipherSuite: str|list = None, TLS cipher suite specification
196
        - source_address: str|tuple = None, source IP address for connections
197
        - server_hostname: str = None, custom server hostname for SNI
198
        - ecdhCurve: str = 'prime256v1', ECDH curve for key exchange
199
        """
200
    
201
    def wrap_socket(self, *args, **kwargs):
202
        """
203
        Wrap socket with SSL context and custom hostname handling.
204
        """
205
    
206
    def init_poolmanager(self, *args, **kwargs):
207
        """
208
        Initialize connection pool manager with SSL context.
209
        """
210
    
211
    def proxy_manager_for(self, *args, **kwargs):
212
        """
213
        Create proxy manager with SSL context configuration.
214
        """
215
```
216

217
#### Usage Examples
218

219
```python
220
# Custom cipher suite adapter
221
adapter = cloudscraper.CipherSuiteAdapter(
222
    cipherSuite='ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384',
223
    source_address=('192.168.1.100', 0),
224
    server_hostname='example.com'
225
)
226

227
# Mount on session
228
session = requests.Session()
229
session.mount('https://', adapter)
230
```
231

232
### CloudScraper Class
233

234
Main scraper class that extends requests.Session with automatic Cloudflare challenge detection and solving capabilities.
235

236
```python { .api }
237
class CloudScraper:
238
    def __init__(self, **kwargs):
239
        """
240
        Initialize CloudScraper with configuration options.
241
        
242
        Parameters: Same as create_scraper() function
243
        """
244

245
    def request(self, method: str, url: str, *args, **kwargs):
246
        """
247
        Make HTTP request with automatic challenge solving.
248

249
        Parameters:
250
        - method: str, HTTP method (GET, POST, etc.)
251
        - url: str, target URL
252
        - *args, **kwargs: standard requests arguments
253

254
        Returns:
255
        requests.Response object
256

257
        Raises:
258
        - CloudflareLoopProtection: If too many challenge attempts
259
        - CloudflareChallengeError: If unknown challenge type detected
260
        - Various challenge-specific exceptions
261
        """
262

263
    def perform_request(self, method: str, url: str, *args, **kwargs):
264
        """
265
        Make raw HTTP request without challenge solving.
266
        
267
        Parameters: Same as request()
268
        Returns: requests.Response object
269
        """
270

271
    @staticmethod
272
    def debugRequest(req):
273
        """
274
        Debug request/response details.
275
        
276
        Parameters:
277
        - req: requests.Response object to debug
278
        """
279

280
    def decodeBrotli(self, resp):
281
        """
282
        Decode Brotli compressed response content.
283
        
284
        Parameters:
285
        - resp: requests.Response object
286
        
287
        Returns:
288
        Modified response object with decoded content
289
        """
290
    
291
    def __getstate__(self):
292
        """
293
        Support for pickle serialization of scraper instances.
294
        
295
        Returns:
296
        Dictionary of instance state for serialization
297
        """
298
    
299
    def simpleException(self, exception, msg):
300
        """
301
        Raise exception with no stack trace and reset depth counter.
302
        
303
        Parameters:
304
        - exception: Exception class to raise
305
        - msg: str, error message
306
        """
307
    
308
    def _should_refresh_session(self):
309
        """
310
        Check if session should be refreshed based on age and error patterns.
311
        
312
        Returns:
313
        bool, True if session needs refresh
314
        """
315
    
316
    def _refresh_session(self, url):
317
        """
318
        Refresh session by clearing cookies and re-establishing connection.
319
        
320
        Parameters:
321
        - url: str, URL to test connection against
322
        
323
        Returns:
324
        bool, True if refresh succeeded
325
        """
326
    
327
    def _clear_cloudflare_cookies(self):
328
        """
329
        Clear Cloudflare-specific cookies to force re-authentication.
330
        """
331
    
332
    def _apply_request_throttling(self):
333
        """
334
        Apply request throttling to prevent TLS blocking from concurrent requests.
335
        """
336
    
337
    def _rotate_tls_cipher_suite(self):
338
        """
339
        Rotate TLS cipher suites to avoid detection patterns.
340
        """
341
```
342

343
#### Usage Examples
344

345
```python
346
# Direct class instantiation
347
scraper = cloudscraper.CloudScraper(debug=True)
348

349
# Make various types of requests
350
response = scraper.get('https://example.com')
351
response = scraper.post('https://example.com/api', json={'key': 'value'})
352
response = scraper.put('https://example.com/update', data='content')
353

354
# Access response data
355
print(response.status_code)
356
print(response.headers)
357
print(response.text)
358
print(response.json())
359

360
# Use session features
361
scraper.headers.update({'Custom-Header': 'value'})
362
scraper.cookies.set('session_id', 'abc123')
363

364
# Raw request without challenge solving
365
raw_response = scraper.perform_request('GET', 'https://example.com')
366
```
367

368
### Session Aliases
369

370
Alternative names for creating scraper instances to maintain backward compatibility.
371

372
```python { .api }
373
# Alias for create_scraper()
374
session = create_scraper
375
```
376

377
#### Usage Examples
378

379
```python
380
# Alternative session creation
381
scraper = cloudscraper.session()  # Same as create_scraper()
382
```
383

384
## Error Handling
385

386
Core scraper functions can raise various exceptions:
387

388
```python
389
try:
390
    scraper = cloudscraper.create_scraper()
391
    response = scraper.get('https://protected-site.com')
392
except cloudscraper.CloudflareLoopProtection:
393
    print("Too many challenge attempts - possible infinite loop")
394
except cloudscraper.CloudflareIUAMError:
395
    print("Could not extract challenge parameters")
396
except cloudscraper.CloudflareChallengeError:
397
    print("Unknown challenge type detected")
398
except Exception as e:
399
    print(f"Unexpected error: {e}")
400
```
401

402
## Integration with Requests
403

404
CloudScraper is fully compatible with the requests library API:
405

406
```python
407
# All requests features work
408
scraper = cloudscraper.create_scraper()
409

410
# Authentication
411
scraper.auth = ('username', 'password')
412

413
# Custom headers
414
scraper.headers.update({'Authorization': 'Bearer token'})
415

416
# Session cookies
417
scraper.cookies.set('session', 'value')
418

419
# Request hooks
420
def log_request(response, *args, **kwargs):
421
    print(f"Request to {response.url} returned {response.status_code}")
422

423
scraper.hooks['response'] = log_request
424

425
# Timeouts and retries work as expected
426
response = scraper.get('https://example.com', timeout=30)
427
```

Version

Tile

Files

core-scraper.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

core-scraper.mddocs/