Every major website today runs some form of bot detection. Cloudflare, Akamai, PerimeterX, DataDome - they all operate on the same fundamental principles, but the implementation details are where it gets interesting.

In this post, I'll walk through the core detection layers, how they fingerprint your requests, and the mental model I use when approaching a new target.

The Detection Stack

Most anti-bot systems operate across three layers:

  1. Network-level fingerprinting - TLS/JA3 signatures, HTTP/2 settings, cipher suites
  2. Browser-level challenges - JavaScript execution, canvas fingerprinting, WebGL hashes
  3. Behavioral analysis - Mouse movements, scroll patterns, timing between actions

The mistake most people make is focusing on layer 2 (browser automation) while completely ignoring layer 1. Your Playwright script can execute every JS challenge perfectly, but if your TLS handshake screams "I'm a Python script," you're blocked before the page even loads.

TLS Fingerprinting: The Silent Gatekeeper

When your client initiates a TLS handshake, it sends a ClientHello message containing:

  • Supported cipher suites (and their order)
  • TLS extensions
  • Supported groups (elliptic curves)
  • Signature algorithms

These values create a unique fingerprint. A real Chrome browser always sends the same set in the same order. Python's requests library sends a completely different set.

# This is what gets you blocked - the default requests fingerprint
import requests
response = requests.get("https://protected-site.com")
# Result: 403 Forbidden

# The fix: impersonate a real browser's TLS stack
from curl_cffi import requests as cffi_requests
response = cffi_requests.get(
    "https://protected-site.com",
    impersonate="chrome"
)
# Result: 200 OK

The curl_cffi library wraps curl-impersonate, which patches libcurl to mimic real browser TLS signatures. This single change can bypass 80% of bot detection.

HTTP/2 Settings Matter

Beyond TLS, HTTP/2 connection settings are another fingerprinting vector:

  • SETTINGS_HEADER_TABLE_SIZE
  • SETTINGS_ENABLE_PUSH
  • SETTINGS_MAX_CONCURRENT_STREAMS
  • SETTINGS_INITIAL_WINDOW_SIZE
  • SETTINGS_MAX_FRAME_SIZE
  • SETTINGS_MAX_HEADER_LIST_SIZE

Each browser sends these in a specific combination. Chrome, Firefox, and Safari all differ. Your scraping library probably sends none of them.

The Mental Model

When I approach a new anti-bot system, I follow this process:

  1. Capture a real browser session - Use Chrome DevTools or mitmproxy to record every request
  2. Identify the challenge endpoints - Look for JavaScript that generates tokens, cookies, or headers
  3. Compare fingerprints - Diff the TLS, HTTP/2, and header signatures between your script and the real browser
  4. Fix from the bottom up - Start with TLS, then HTTP/2, then headers, then JS challenges

Working bottom-up is critical because each layer depends on the one below it. No amount of JavaScript execution will help if your network fingerprint is wrong.

Key Takeaways

  • Bot detection is layered - you need to match at every level
  • TLS fingerprinting is the most overlooked and most effective detection method
  • Tools like curl_cffi solve 80% of the problem by impersonating browser network stacks
  • Always capture and compare real browser traffic before writing a single line of scraping code
  • Behavioral analysis is the final frontier - timing, mouse movements, and scroll patterns

The best scrapers don't fight the detection system. They become indistinguishable from a real user at every protocol layer.


This is an educational overview for security research and authorized testing. Always ensure you have permission before testing any system.