tessl/pypi-cloudscraper

Enhanced Python module to bypass Cloudflare's anti-bot page with support for v1, v2, v3 challenges, Turnstile, proxy rotation, and stealth mode.

—

Pending

Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

Overview

Eval results

Files

Core Scraper Functions

Name: tessl/pypi-cloudscraper
Author: tessl

The main CloudScraper class and convenience functions that provide the primary interface for creating scraper instances and making requests with automatic Cloudflare challenge solving.

Capabilities

Creating Scraper Instances

Factory function for creating ready-to-go CloudScraper objects with comprehensive configuration options for all aspects of challenge solving and stealth operation.

def create_scraper(sess=None, **kwargs) -> CloudScraper:
    """
    Create a configured CloudScraper instance.

    Parameters:
    - sess: Optional existing requests.Session to extend
    - debug: bool = False, enable debug logging
    - disableCloudflareV1: bool = False, disable v1 challenge handling
    - disableCloudflareV2: bool = False, disable v2 challenge handling  
    - disableCloudflareV3: bool = False, disable v3 challenge handling
    - disableTurnstile: bool = False, disable Turnstile challenge handling
    - delay: float = None, custom delay between challenge attempts
    - captcha: dict = {}, captcha solver configuration
    - interpreter: str = 'js2py', JavaScript interpreter to use
    - browser: str|dict = None, browser fingerprinting configuration
    - allow_brotli: bool = True, enable Brotli compression support
    - enable_stealth: bool = True, enable stealth mode features
    - rotating_proxies: list|dict = None, proxy rotation configuration
    - proxy_options: dict = {}, proxy rotation strategy and settings
    - stealth_options: dict = {}, stealth mode behavior configuration
    - session_refresh_interval: int = 3600, session refresh interval in seconds
    - auto_refresh_on_403: bool = True, auto-refresh session on 403 errors
    - max_403_retries: int = 3, maximum 403 error retry attempts
    - cipherSuite: str|list = None, custom TLS cipher suite
    - ecdhCurve: str = 'prime256v1', ECDH curve for TLS negotiation
    - server_hostname: str = None, custom server hostname for SNI
    - source_address: str|tuple = None, source IP address for connections
    - ssl_context: ssl.SSLContext = None, custom SSL context
    - doubleDown: bool = True, enable double-down challenge solving
    - solveDepth: int = 3, maximum challenge solving attempts
    - requestPreHook: callable = None, function called before each request
    - requestPostHook: callable = None, function called after each request
    - min_request_interval: float = 1.0, minimum seconds between requests
    - max_concurrent_requests: int = 1, maximum concurrent requests
    - rotate_tls_ciphers: bool = True, enable TLS cipher rotation

    Returns:
    CloudScraper instance ready for making requests
    """

Usage Examples

# Basic scraper with default settings
scraper = cloudscraper.create_scraper()

# Debug mode enabled
scraper = cloudscraper.create_scraper(debug=True)

# With proxy rotation
scraper = cloudscraper.create_scraper(
    rotating_proxies=[
        'http://user:pass@proxy1.example.com:8080',
        'http://user:pass@proxy2.example.com:8080'
    ],
    proxy_options={
        'rotation_strategy': 'smart',
        'ban_time': 300
    }
)

# Advanced stealth configuration
scraper = cloudscraper.create_scraper(
    enable_stealth=True,
    stealth_options={
        'min_delay': 2.0,
        'max_delay': 6.0,
        'human_like_delays': True,
        'randomize_headers': True,
        'browser_quirks': True
    },
    browser={
        'browser': 'chrome',
        'platform': 'windows',
        'mobile': False
    }
)

# With CAPTCHA solver
scraper = cloudscraper.create_scraper(
    captcha={
        'provider': '2captcha',
        'api_key': 'your_api_key'
    }
)

Token Extraction

Extract Cloudflare authentication tokens and user agent for integration with external tools and applications.

def get_tokens(url: str, **kwargs) -> tuple[dict[str, str], str]:
    """
    Get Cloudflare tokens for a URL.

    Parameters:
    - url: str, target URL to get tokens for
    - **kwargs: same configuration options as create_scraper()

    Returns:
    Tuple of (tokens_dict, user_agent_string)
    - tokens_dict: Dictionary of Cloudflare cookies
    - user_agent_string: User agent string used for requests

    Raises:
    - CloudflareIUAMError: If unable to find Cloudflare cookies
    """

Usage Examples

# Basic token extraction
tokens, user_agent = cloudscraper.get_tokens('https://example.com')
print(tokens)
# {'cf_clearance': 'abc123...', 'cf_chl_2': 'xyz789...'}

# With proxy
tokens, user_agent = cloudscraper.get_tokens(
    'https://example.com',
    proxies={'http': 'http://proxy.example.com:8080'}
)

# With stealth mode
tokens, user_agent = cloudscraper.get_tokens(
    'https://example.com',
    enable_stealth=True,
    stealth_options={'min_delay': 2.0, 'max_delay': 5.0}
)

Cookie String Generation

Generate cookie header strings for use with external HTTP clients and tools.

def get_cookie_string(url: str, **kwargs) -> tuple[str, str]:
    """
    Generate cookie string and user agent for HTTP headers.

    Parameters:
    - url: str, target URL to get cookies for
    - **kwargs: same configuration options as create_scraper()

    Returns:
    Tuple of (cookie_string, user_agent_string)
    - cookie_string: Formatted cookie header value
    - user_agent_string: User agent string used for requests
    """

Usage Examples

# Generate cookie header
cookie_string, user_agent = cloudscraper.get_cookie_string('https://example.com')
print(f"Cookie: {cookie_string}")
print(f"User-Agent: {user_agent}")

# Use with curl command
import subprocess
cookie_arg, user_agent = cloudscraper.get_cookie_string('https://example.com')
result = subprocess.check_output([
    'curl',
    '--cookie', cookie_arg,
    '-A', user_agent,
    'https://example.com'
])

CipherSuiteAdapter Class

Custom HTTPAdapter for requests that provides TLS cipher suite control and source address binding for enhanced anti-detection capabilities.

class CipherSuiteAdapter(HTTPAdapter):
    def __init__(self, *args, **kwargs):
        """
        Initialize TLS adapter with custom cipher suite configuration.
        
        Parameters:
        - ssl_context: ssl.SSLContext = None, custom SSL context
        - cipherSuite: str|list = None, TLS cipher suite specification
        - source_address: str|tuple = None, source IP address for connections
        - server_hostname: str = None, custom server hostname for SNI
        - ecdhCurve: str = 'prime256v1', ECDH curve for key exchange
        """
    
    def wrap_socket(self, *args, **kwargs):
        """
        Wrap socket with SSL context and custom hostname handling.
        """
    
    def init_poolmanager(self, *args, **kwargs):
        """
        Initialize connection pool manager with SSL context.
        """
    
    def proxy_manager_for(self, *args, **kwargs):
        """
        Create proxy manager with SSL context configuration.
        """

Usage Examples

# Custom cipher suite adapter
adapter = cloudscraper.CipherSuiteAdapter(
    cipherSuite='ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384',
    source_address=('192.168.1.100', 0),
    server_hostname='example.com'
)

# Mount on session
session = requests.Session()
session.mount('https://', adapter)

CloudScraper Class

Main scraper class that extends requests.Session with automatic Cloudflare challenge detection and solving capabilities.

class CloudScraper:
    def __init__(self, **kwargs):
        """
        Initialize CloudScraper with configuration options.
        
        Parameters: Same as create_scraper() function
        """

    def request(self, method: str, url: str, *args, **kwargs):
        """
        Make HTTP request with automatic challenge solving.

        Parameters:
        - method: str, HTTP method (GET, POST, etc.)
        - url: str, target URL
        - *args, **kwargs: standard requests arguments

        Returns:
        requests.Response object

        Raises:
        - CloudflareLoopProtection: If too many challenge attempts
        - CloudflareChallengeError: If unknown challenge type detected
        - Various challenge-specific exceptions
        """

    def perform_request(self, method: str, url: str, *args, **kwargs):
        """
        Make raw HTTP request without challenge solving.
        
        Parameters: Same as request()
        Returns: requests.Response object
        """

    @staticmethod
    def debugRequest(req):
        """
        Debug request/response details.
        
        Parameters:
        - req: requests.Response object to debug
        """

    def decodeBrotli(self, resp):
        """
        Decode Brotli compressed response content.
        
        Parameters:
        - resp: requests.Response object
        
        Returns:
        Modified response object with decoded content
        """
    
    def __getstate__(self):
        """
        Support for pickle serialization of scraper instances.
        
        Returns:
        Dictionary of instance state for serialization
        """
    
    def simpleException(self, exception, msg):
        """
        Raise exception with no stack trace and reset depth counter.
        
        Parameters:
        - exception: Exception class to raise
        - msg: str, error message
        """
    
    def _should_refresh_session(self):
        """
        Check if session should be refreshed based on age and error patterns.
        
        Returns:
        bool, True if session needs refresh
        """
    
    def _refresh_session(self, url):
        """
        Refresh session by clearing cookies and re-establishing connection.
        
        Parameters:
        - url: str, URL to test connection against
        
        Returns:
        bool, True if refresh succeeded
        """
    
    def _clear_cloudflare_cookies(self):
        """
        Clear Cloudflare-specific cookies to force re-authentication.
        """
    
    def _apply_request_throttling(self):
        """
        Apply request throttling to prevent TLS blocking from concurrent requests.
        """
    
    def _rotate_tls_cipher_suite(self):
        """
        Rotate TLS cipher suites to avoid detection patterns.
        """

Usage Examples

# Direct class instantiation
scraper = cloudscraper.CloudScraper(debug=True)

# Make various types of requests
response = scraper.get('https://example.com')
response = scraper.post('https://example.com/api', json={'key': 'value'})
response = scraper.put('https://example.com/update', data='content')

# Access response data
print(response.status_code)
print(response.headers)
print(response.text)
print(response.json())

# Use session features
scraper.headers.update({'Custom-Header': 'value'})
scraper.cookies.set('session_id', 'abc123')

# Raw request without challenge solving
raw_response = scraper.perform_request('GET', 'https://example.com')

Session Aliases

Alternative names for creating scraper instances to maintain backward compatibility.

# Alias for create_scraper()
session = create_scraper

Usage Examples

# Alternative session creation
scraper = cloudscraper.session()  # Same as create_scraper()

Error Handling

Core scraper functions can raise various exceptions:

try:
    scraper = cloudscraper.create_scraper()
    response = scraper.get('https://protected-site.com')
except cloudscraper.CloudflareLoopProtection:
    print("Too many challenge attempts - possible infinite loop")
except cloudscraper.CloudflareIUAMError:
    print("Could not extract challenge parameters")
except cloudscraper.CloudflareChallengeError:
    print("Unknown challenge type detected")
except Exception as e:
    print(f"Unexpected error: {e}")

Integration with Requests

CloudScraper is fully compatible with the requests library API:

# All requests features work
scraper = cloudscraper.create_scraper()

# Authentication
scraper.auth = ('username', 'password')

# Custom headers
scraper.headers.update({'Authorization': 'Bearer token'})

# Session cookies
scraper.cookies.set('session', 'value')

# Request hooks
def log_request(response, *args, **kwargs):
    print(f"Request to {response.url} returned {response.status_code}")

scraper.hooks['response'] = log_request

# Timeouts and retries work as expected
response = scraper.get('https://example.com', timeout=30)

Install with Tessl CLI