Bug Report: Incorrect Title Extraction for Documents with Front Pages

# Description

ScienceBeam fails to correctly extract article titles from PDF manuscripts that have front pages (cover pages with journal branding, terms & conditions, etc.). Instead of extracting the actual article title, it extracts header/footer text and journal information from the front page.

# What are the Steps to Reproduce to issue?

1. Submit a manuscript PDF with a front page containing journal branding and terms/conditions (e.g., Taylor & Francis "Expert Review" journal format)
2. Process the document through ScienceBeam for XML conversion
3. Examine the resulting XML file's <article-title> element in the <front>/<article-meta>/<title-group> section


# What is the Expected behaviour?

The <article-title> element should contain the actual manuscript title. For example, based on the content in file 76_pdf-0.1.11.xml(See attached file)

[76.pdf](https://github.com/user-attachments/files/24605919/76.pdf), the title should be something like: "The Toxic Effects of Ethylene Glycol Tetraacetate Acid, Ferrum Lek and Methanol on the Glutathione System: correction Options"
But it is coming as(see screenshot) - 
<img width="1165" height="39" alt="Image" src="https://github.com/user-attachments/assets/1ae188de-1439-45aa-ab7c-653c0b8b204b" />

# Additional Context

Affected file: 

[76_pdf-normal.xml](https://github.com/user-attachments/files/24605983/76_pdf-normal.xml)

The actual article content (abstract, body) is correctly extracted
This appears to be a pattern recognition issue where ScienceBeam is not properly identifying and skipping front page content when locating the title

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug Report: Incorrect Title Extraction for Documents with Front Pages #561

Description

What are the Steps to Reproduce to issue?

What is the Expected behaviour?

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug Report: Incorrect Title Extraction for Documents with Front Pages #561

Description

Description

What are the Steps to Reproduce to issue?

What is the Expected behaviour?

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions