user data crawling opens two windows, unable to control correct user browser #545
Replies: 12 comments
-
I'm also facing something similar. After I login to an application within the on_browser_created hook. Crawler opens a different window but it seems to miss the browser context and is unable to crawl internal page and shows the login page again. While it works on trying to open multiple pages on the same context after logging in one of the page within on_browser_created. Does this work for you in case you tried? Thanks. |
Beta Was this translation helpful? Give feedback.
-
async with AsyncWebCrawler(
headless=False, # Set to False to see what is happening
use_managed_browser=True,
browser_type="chromium",
) as crawler:
result = await crawler.arun(
url="https://www.youtube.com/", # -> breakpoint
... I placed a breakpoint here, and it opened only one browser window. After logging in and resuming execution, it successfully scraped the content displayed after logging in. This is a potential solution. |
Beta Was this translation helpful? Give feedback.
-
I think this is a different issue regarding hooks and authentication. However, I am also currently facing the exact issue with @mukulchaudhary. There are no means to grab The documentation stated to use the Image shows two browser contexts after following the documentation's Hooks & Auth for AsyncWebCrawler with Google Sign in. The authenticated browser was ignored, and a new browser requiring another login shows. Is this the expected behavior when using |
Beta Was this translation helpful? Give feedback.
-
Exactly. That's what is missing. |
Beta Was this translation helpful? Give feedback.
-
@unclecode Please share your views/guidance on this when you get time. Thanks . |
Beta Was this translation helpful? Give feedback.
-
The new version seems to have resolved this issue. Please try installing the latest version from the main branch to see if the problem persists. |
Beta Was this translation helpful? Give feedback.
-
The issue's still there -- documentation states that I can put my Closest I can see here is |
Beta Was this translation helpful? Give feedback.
-
Hello everybody, I am here :)) let me go through this in detail. First of all, @pttodv , there's no session ID. Instead, you pass the session ID to the @BZBY Regarding the manage browser setting, I recorded this video to make it more clear. First, make sure to set the issue_236.mp4I will show two different cases: when passing a user data directory and when not. How I created the folder? 1/ With user data directory: In the first part of the video, I pass this folder and attempt to access the YouTube website, which opens perfectly. It logs in to YouTube with my account and extracts the data. 2/ Without user data directory: In the video's second part, I do this without passing the user data directory. You'll see it's a fresh account that doesn't show my personal youtube data. Although I can put a breakpoint or use a hook to wait, then return to the browser, log in, and perform actions before letting the crawl continue. Anyway in this scenario, a new user directory is created and all data is there. I prefer the first option as it allows me to create multiple profiles for multiple purposes. Now let me know if this helps. I don't face the issue of multiple browsers opening at the same time. |
Beta Was this translation helpful? Give feedback.
-
It doesn't seem to work atleast in my case where I perform login actions in the on browser created hook, set the hook on AsyncPlaywrightCrawlerStrategy and use the strategy on AsyncWebCrawler to crawl an internal page as per the documentation. It takes me to the logoin page only. @unclecode Please provide an example when you can. |
Beta Was this translation helpful? Give feedback.
-
To add, I used crawl() to get the Is there a way to use the browser instance opened during
|
Beta Was this translation helpful? Give feedback.
-
I use Claude generate some code, and they worked for me, maybe can provide some help:
|
Beta Was this translation helpful? Give feedback.
-
@pttodv Sorry for my delay in response. I've been very busy preparing 0.3.74. I'm trying to repeat your situation here, but I still wasn't successful. Could you try the new version? In the new version, use managed_browser and pass a user_data_dir. Session ID: when you don't pass any session_id to your And remember, if you don't pass any user directory (user profile directory), it will open one browser and one pop-up window asking you to choose your profile. If you click on a different profile, you may get two browsers. I don't know if this is the case you're facing or not. Please try and let me know. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I've just tested the latest main branch functionality on Windows 11 or Ubuntu and encountered an issue. Here’s the test code I used:
When running this code, two browser windows open up: one displays Chrome's login screen, and the other loads the URL I specified. All subsequent operations happen in the second browser window, but closing it also causes the first browser window to close. This suggests that the two windows are instances of the same browser. However, when I add user data as follows:
The issue becomes apparent. The first window is my real browser instance, but the second window lacks my user data—it only has bookmark information and doesn’t display the user profile icon in the top right corner of Chrome. This means that the second window cannot access sites I’m already logged into, so I have to log in again.
Ideally, I should be able to open a browser with my actual user profile or use a command like:
This command allows me to open a browser that I can access directly using playwright.chromium.connect_over_cdp(cdp_url) to interact with my existing open browser instance.
Beta Was this translation helpful? Give feedback.
All reactions