Skip to content
Discussion options

You must be logged in to vote

hi @rdv0011 , there are 2 small issues with the regex:

1/ It was looking for class="address-row" exactly, but the real HTML had other classes, too (like class="address-row even").
2/ It was trying to grab an id attribute from the row, but the rows didn't have one. They had a data-code attribute instead.

The fix is to tweak the regex to handle both of those things. I also combined the steps into a single arun() call, which is a bit cleaner.

You can try this regex:

regex_pattern = r'<tr[^>]*?class="[^"]*\baddress-row\b[^"]*"[^>]*?data-code="([^"]+)"[^>]*?>'

And you only need to pass the custom parameter in RegexExtractionStrategy, not the other parameters:

extraction_strategy = RegexExtrac…

Replies: 3 comments

Comment options

You must be logged in to vote
0 replies
Comment options

ntohidi
Jun 10, 2025
Collaborator Sponsor

You must be logged in to vote
0 replies
Answer selected by ntohidi
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants