[Bug]: Unable to crawl https://www.behance.net #881
Replies: 2 comments
-
|
@Mng-dev-ai I tried your code snippet as well on behance.net. Issue is that behance is loading all the data in the page via a javascript bundle and AJAX requests. Since you haven't explicitly added any wait condition, the scraping occurs soon as the DOM content is loaded(which is just a script tag). Try the following it should get you all the data from the page: import asyncio
from crawl4ai import AsyncWebCrawler
from crawl4ai.async_configs import BrowserConfig, CrawlerRunConfig
async def main():
browser_config = BrowserConfig(headless=False)
run_config = CrawlerRunConfig(
delay_before_return_html=3, scan_full_page=True, scroll_delay=1
)
async with AsyncWebCrawler(config=browser_config) as crawler:
result = await crawler.arun(url="https://www.behance.net/", config=run_config)
print(result.markdown)
if __name__ == "__main__":
asyncio.run(main())Compared to Firecrawl, we offer a lot more customisation and extensibility in the way pages are scraped, cleaned and data extracted. Our default crawler is optimised for static sites like wikipedia etc So what I have done in the script above is
This should work fine for most single page applications(SPA), however I see that we can improve user experience by toggling the run configs, based on nature of the page. |
Beta Was this translation helpful? Give feedback.
-
|
@aravindkarnam thank you, I think the problem was related to the browser, changed browser_type to firefox fixed the issue here's an example: |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
crawl4ai version
0.5.0.post4
Expected Behavior
it should return some results, tested the same url using firecrawl and it worked
Current Behavior
no results
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Code snippets
OS
macOS
Python version
3.10.2
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
Beta Was this translation helpful? Give feedback.
All reactions