You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following #341, all XML-variant formats (.xml, .xslt, .xsl, .xsd, .dtd, .wsdl) are now correctly routed to SourceCodePipeline and processed as source code. However, they are treated as flat text wrapped in code blocks, without any awareness of the XML document structure.
Proposal
Add hierarchical/structural parsing support for XML content, similar to how JsonPipeline handles JSON documents. This would allow the system to:
Parse XML into a document tree and split content along meaningful structural boundaries (elements, namespaces, sections)
Produce better chunks that preserve semantic context rather than splitting mid-element
Extract metadata from XML declarations, root elements, or schema definitions
Context
Following #341, all XML-variant formats (
.xml,.xslt,.xsl,.xsd,.dtd,.wsdl) are now correctly routed toSourceCodePipelineand processed as source code. However, they are treated as flat text wrapped in code blocks, without any awareness of the XML document structure.Proposal
Add hierarchical/structural parsing support for XML content, similar to how
JsonPipelinehandles JSON documents. This would allow the system to:References
JsonPipeline/JsonDocumentSplitter— Existing implementation for structured JSON parsing that could serve as a model