Skip to content

Extracting Headings and paragraphs in the same order from ODT file #135

@ryocchin

Description

@ryocchin

I managed to solve this problem myself.
I created a temporary file in which the Heading is replaced with a paragraph with a special style name, and by rereading it, I was able to reproduce the same order as before.
Thank you if anyone read this


Hello. I am trying to convert odt file to html file using odfpy.
Our files have some headings placed between regular paragraphs that indicate chapter names. For example:

Chapter 1 xxxx
paragraph bbbb dddd eeee.
paragraph ddddd eeee ffff.
Chapter 2 yyyy
paragraph ggggggg
paragraph hhhhhh
Where "Chapter 1" and "chapter 2" are heading in odt and other lines are ordinary paragraphs. I tried to extract these lines as follows:

doc = opendocument.load("somefile.odt")
headers = doc. getElementByType(text.H)   
paras = doc. getElementByType(text.P)

Headings and paragraphs are extrcated in separate lists in 'headers' and 'paras'.
And it is unable to reproduce the same order as in the ODT file. Is there any way to get a list of texts in the same order?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions