Ask HN: Scaling a targeted web crawler beyond 500M pages/day

14 points
1/21/1970
20 hours ago
by honungsburk

Comments


faangguyindia

If you want to access data from websites which prevent it, you gotta use a headless browser with Residential Proxy Network Like Bright Data (formerly Luminati).

4 hours ago

4lx87

I'm curious, how do you deal with Cloudflare and similar anti-bot systems? Just keep shopping the job around to different proxies?

12 hours ago