Extract all text if subelements are present#3
Conversation
|
Are you sure this is desirable? Looks like <title> content may be a complex structure, in which case concatenating texts doesn't produce anything meaningful. Third example at http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-title.html shows title elements inside elements, main title and a sub-title. In this case joining two texts sort of works, but it needs some separator. In other cases it wouldn't make sense at all. All we want here is automatically give converted dictionary a reasonable title, but if source dictionary doesn't have a simple title there's no point in guessing. |
|
I did the change mostly for the I could change the code to only include the content of specific sub tags (probably only |
In that case I think text() is the wrong thing to change. text() and attr() are simple utility functions, let them stay that way. Perhaps xpaths for license and copyright tags can be improved, or maybe it makes sense to look for license and copyright in several places. But honestly, this is just best effort to populate tags automatically, in no way is it meant to absolve conversion author from verifying resulting dictionary. |
I agree, and I created this ticket while doing exactly that. I've changed all FreeDict dictionaries so that the current implementation successfully finds license name and URL. Unfortunately, I didn't find a way to add the required
What about adding the new version as a separate |
The
textfunction only extracts text via the element'stextattribute. This returns only a part of the text when subelements are present:see https://docs.python.org/2/library/xml.etree.elementtree.html#xml.etree.ElementTree.Element.text
I've fixed this by iterating over all text elements in that element/subtree.