Skip to content

Conversation

@jorants
Copy link

@jorants jorants commented Jun 16, 2025

Currently collection types are not allowed for attributes. This is logical for the default serializers, but the user might overwrite default behavior with a custom validator that might return a collection. For example:

from typing import Annotated, Any
import pydantic_xml as pxml
from pydantic import BeforeValidator
from pydantic.functional_serializers import PlainSerializer

def validate_space_separated_attr(value: str) -> list[str]:  
    return value.split(" ")

def serialize_space_separated_attr(values: list[Any]) -> str:  
    return " ".join(str(x) for x in values)

SpaceSeparatedValueListAttr = Annotated[list[str], BeforeValidator(validate_space_separated_attr), PlainSerializer(serialize_space_separated_attr)]
 
class Person(pxml.BaseXmlModel):
    children: SpaceSeparatedValueListAttr = pxml.attr()
    name: str = pxml.element()
    
doc = """
<Person children="Bob Eve">
  <name>Alice</name>
</Person>
"""
    
alice = Person.from_xml(doc)
print(alice.children) # prints ['Bob', 'Eve']
print(alice.to_xml())

Instead of disallowing outright, this change parses these attributes as a string and leave it up to the user.
This might not be a good final solution, it would be nicer to check if a custom validator logic is present and error when it is not.
However, I do not see an easy way to add such a check.

What do you think? I at least got stuck parsing XML documents that contain space-separated lists in attributes.

@dapper91
Copy link
Owner

@jorants Hi,

For now you can define custom schema:

from typing import Annotated

import pydantic_xml as pxml
from pydantic_core import core_schema as cs


class SpaceSeparatedValueListSchema:
    @classmethod
    def __get_pydantic_core_schema__(cls, source_type, handler):
        schema = cs.no_info_after_validator_function(lambda val: val.split(' '), cs.str_schema())
        serialization = cs.plain_serializer_function_ser_schema(lambda lst: ' '.join(lst))
        return cs.json_or_python_schema(json_schema=schema, python_schema=schema, serialization=serialization)


class Person(pxml.BaseXmlModel):
    children: Annotated[list[str], SpaceSeparatedValueListSchema] = pxml.attr()
    name: str = pxml.element()


doc = """
<Person children="Bob Eve">
  <name>Alice</name>
</Person>
"""

alice = Person.from_xml(doc)
print(alice.children)  # prints ['Bob', 'Eve']
print(alice.to_xml())  # prints b'<Person children="Bob Eve"><name>Alice</name></Person>'

@dapper91
Copy link
Owner

I suppose this collection attribute implementation is not very intuitive since it only works if validator and serializer are provided.
In my opinion such code should work too:

class Person(pxml.BaseXmlModel):
    children: list[str] = pxml.attr()
    name: str = pxml.element()

but it fails with a misleading error.

return ElementSerializer.from_core_schema(schema, ctx)
elif ctx.entity_location is EntityLocation.ATTRIBUTE:
raise errors.ModelFieldError(ctx.model_name, ctx.field_name, "attributes of collection types are not supported")
return primitive.from_core_schema(pcs.StringSchema(type="str"), ctx)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it should work with any primitive type not only with a string one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants