fix: preserve tail text in Selector.get_all_text by tsubasakong · Pull Request #175 · D4Vinci/Scrapling

tsubasakong · 2026-03-07T22:47:01Z

Summary

preserve tail text nodes when collecting recursive text content
keep text order stable across nested children
add regression coverage for interleaved child/tail text and ignored-tag tails

Fixes #167

Validation

source .venv/bin/activate && pytest tests/parser/test_parser_advanced.py -q
source .venv/bin/activate && pytest tests/parser/test_general.py -q -k all_text
git diff --check

…t-tail-nodes

D4Vinci · 2026-03-08T14:43:34Z

A duplicate of #168, and this PR is against the contribution rules.

Also, PR #168 is the best approach because:

Correctness: The ancestor-walk approach for ignored tags is the only one that handles arbitrarily nested ignored elements correctly. This PR still recurses into ignored elements, potentially leaking their content.
Performance: Using a pre-compiled XPath(".//text()") at the module level is consistent with the existing codebase patterns (_find_all_elements, _find_all_elements_with_spaces are already pre-compiled the same way).
Simplification: It correctly removes the unnecessary _find_all_elements expansion from ignored_elements — since it walks ancestors, it only needs the tag elements themselves in the set, which is simpler and faster.

tsubasakong and others added 4 commits March 7, 2026 14:46

fix: preserve tail text in get_all_text

50fded7

Merge remote-tracking branch 'upstream/main' into fix/167-get-all-tex…

270d232

…t-tail-nodes

Merge remote-tracking branch 'upstream/main' into ai-sync-175-1772935322

8a4c390

Merge remote-tracking branch 'upstream/main' into HEAD

c3f0ab2

D4Vinci added the PR-against-rules This PR doesn't comply with one or more of the contribution rules. label Mar 8, 2026

D4Vinci closed this Mar 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: preserve tail text in Selector.get_all_text#175

fix: preserve tail text in Selector.get_all_text#175
tsubasakong wants to merge 4 commits intoD4Vinci:mainfrom
tsubasakong:fix/167-get-all-text-tail-nodes

tsubasakong commented Mar 7, 2026

Uh oh!

D4Vinci commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

tsubasakong commented Mar 7, 2026

Summary

Validation

Uh oh!

D4Vinci commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants