Changelog mentiones "Support for complex JSONPath, JSON-CSS, and Microdata extraction" ? #853
Replies: 1 comment
-
|
@sandermanneke Yes. It's a feature by the name of JsonCssExtractionStrategy. Here's the docs for it! We have also recently introduced a utility function that makes use of an LLM to generate the schema to map the target data fields to the css/xpath in html. schema = JsonCssExtractionStrategy.generate_schema(
result.html,
llm_config=LLMConfig(
provider="gemini/gemini-2.0-flash", api_token="env:GEMINI_API_KEY"
),
target_json_example={
"name": "McDonald's",
"outlet": "Malleshwaram",
"delivery_time": "30-40 mins",
"menu": [
{
"name": "Chicken Surprise Burger",
"price": 76,
"rating": 3.3,
"rating_count": 3,
"coupon": "Buy 1 Get 1",
"description": "Introducing the new Chicken Surprise Burger which has the perfect balance of a crispy fried chicken patty, the crunch of onions and the richness of creamy sauce."
},
{
"name": "Chicken Surprise Burger Combo",
"price": 238,
"rating": 3.3,
"rating_count": 6,
"coupon": "Buy 1 Get 1",
"description": "Introducing the new Chicken Surprise Burger which has the perfect balance of a crispy fried chicken patty, the crunch of onions and the richness of creamy sauce."
},
],
},
)I also recently put together a gist where I show how to get McDonalds menu from an online food delivery site. Check out this example and run it to get a full understanding. Let us know how it goes and if you need any additional help, post your code snippet. I'll help you. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I would like to extract json fields from json strings that are in the html source, for example microdata with application/ld+json.
The changelog there is mention of Improved JSON Extraction: Support for complex JSONPath, JSON-CSS, and Microdata extraction.
https://github.com/unclecode/crawl4ai/blob/79328e42925c9ce8c030a1cadfe68c88cbe02c36/CHANGELOG.md#0424x-2024-12-31
This suggests there is some way to parse a json string and then return values by its path.
However I cant find any reference in the docs or in the source to that extent.
Is this an oversight in the changelog or am I missing something ?
Beta Was this translation helpful? Give feedback.
All reactions