Skip to content

Conversation

Magicbook1108
Copy link
Contributor

@Magicbook1108 Magicbook1108 commented Sep 29, 2025

What problem does this PR solve?

Feat, add toc extraction method NOT FINISHED!!!

Type of change

  • New Feature (non-breaking change which adds functionality)

Current issues:

1. Excessive Token Usage with img_url Input (Line 540)
When passing img_url formatted input, the token count explodes to an extremely large range (approximately 250,000–600,000 tokens). This suggests that the way image URLs are being processed leads to massive and unnecessary tokenization, which severely impacts performance and cost.

2. Task Repetition with pending >= 1
When pending >= 1, the task keeps being re-executed in a loop but never progresses to the next stage. In other words, the system detects a pending task and repeatedly retries execution, but it doesn’t move forward in the pipeline, causing a deadlock-like state.

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Sep 29, 2025
@dosubot dosubot bot added the 💞 feature Feature request, pull request that fullfill a new feature. label Sep 29, 2025
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Sep 29, 2025
@Magicbook1108 Magicbook1108 changed the title Feat: add toc extraction method NOT FINISHED!!! Feat: add toc extraction method supported by llm Sep 29, 2025
@Magicbook1108
Copy link
Contributor Author

All issues solved

@Magicbook1108 Magicbook1108 added the ci Continue Integration label Sep 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continue Integration 💞 feature Feature request, pull request that fullfill a new feature. size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant