How to fix the 407 errors while scrapping the websites using crawl4ai + playwright + apify with proxies #858
Replies: 2 comments
-
|
@prokhorenkomykhailo Quite a lot to unpack here 😁. Both from the error codes and based on the logs, this doesn't appear to be a crawl4AI or proxy issue. I checked one of the links here. For eg: It has a self signed certificate and the browser was immediately blocking it. When I tried with http:// protocol instead, the firewall at my office blocked it. So this page seems to create genuine trust issues, spooking the network to block it (I mean even your proxy provider will have some kind of protection on their infra). |
Beta Was this translation helpful? Give feedback.
-
|
Hi @aravindkarnam , Could you please guide me on what exactly needs to be done to fix this issue? Specifically: If I need to bypass the SSL verification for the self-signed certificate, how can I configure my crawler (Crawl4AI + Playwright) to do this safely? If the website owner needs to fix the SSL certificate, what steps should they take to get a valid SSL certificate and enable HTTPS? Are there any alternative solutions or workarounds I can use to successfully crawl this website without compromising security? I’d really appreciate your help in resolving this. Thank you in advance! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, community memebers, hope you are doing great.
Recently I have worked on Crawl4ai, playwright using proxy with Apify for crawling some web pages.
The main goal is to get html content and the screenshot of the webpages and their status code with desktop + mobile versions.
Here is the source code
It's working generally fine.
But it contains several issues - like
Cloudflare pages often appearing
Error (HTTP 407)

NO HTML, NO Screenshot Results from Crawl



This is the log in Apify
I’m currently using Crawl4AI with Apify to crawl web pages, but I’m encountering a proxy authentication error (HTTP 407) when trying to access webpages using proxy. I've checked my proxy settings, but I'm not sure what else to try.
Additionally, after running the crawl, I’m not getting any HTML or screenshot results.
Could anyone provide advice on how to resolve these issues? Any help would be greatly appreciated!
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions