Table information not extracted #1380
Unanswered
hboen1990
asked this question in
Forums - Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I dont see the table information when I scrape the following site with url="https://www.farmacotherapeutischkompas.nl/bladeren/preparaatteksten/f/faricimab"
I check the source of the page and it is defined as table
screens capture:

and for the site: "https://www.tilburg.nl/gemeente/stad-en-dorpen/cultuur-vrije-tijd/winkelen-en-markten/" I can see a table.
I tested with the following code.
import asyncio
from crawl4ai import *
from crawl4ai.async_configs import BrowserConfig, CrawlerRunConfig
config = CrawlerRunConfig(
markdown_generator=DefaultMarkdownGenerator(
options={"ignore_links": True}
),
table_score_threshold=4
)
async def crawl():
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(
url="https://www.tilburg.nl/gemeente/stad-en-dorpen/cultuur-vrije-tijd/winkelen-en-markten/",
# url="https://www.farmacotherapeutischkompas.nl/bladeren/preparaatteksten/f/faricimab",
config=config
)
# Get tables
tables = result.media.get("tables", [])
print(f"Found {len(tables)} data tables in total.")
with open('medicijn2.md', 'w', encoding='utf-8') as f:
f.write(result.markdown.raw_markdown)
if name == "main":
asyncio.run(crawl())
Beta Was this translation helpful? Give feedback.
All reactions