Skip to content

[Bug] Potential Billion Laughs Attack Vector via Unrestricted XML Parsing in ZeepSchemaHelper #558

@ShangzhiXu

Description

@ShangzhiXu

Hello Google Ads API Team,

Firstly, thanks so much for your great work!
While using and reviewing the googleads-python-lib, I came across a potential XML parsing issue in the ZeepSchemaHelper class that I'd like to raise for discussion.

I understand that the library is designed to work with trusted WSDL endpoints provided by Google, and this issue is unlikely to be exploitable under normal use. However, for defense-in-depth and potential future-proofing, I wanted to share the finding.

# Affected Source Code: `googleads/common.py`

class ZeepSchemaHelper(GoogleSchemaHelper):
  def __init__(self, endpoint, timeout, proxy_config, namespace_override, cache):
    ...
    transport = _ZeepProxyTransport(timeout, proxy_config, cache)
    
    try:
      data = transport.load(endpoint)  #   [Untrusted Input Source: XML from user-supplied endpoint]
    except requests.exceptions.HTTPError as e:
      raise googleads.errors.GoogleAdsSoapTransportError(str(e))

    self.schema = zeep.xsd.Schema(
        lxml.etree.fromstring(data)    #  [VULNERABILITY SINK: unsafe XML parsing]
    )

This type of attack leverages recursive entity declarations in XML to cause exponential memory usage like the Billion Laughs attack.

We can set

parser = lxml.etree.XMLParser(
    resolve_entities=False,
    load_dtd=False,
    no_network=True
)

to solve this

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions