Skip to content

fix: unbounded inventory tree query OOM-kills container on large GAM networks #1154

@benminer

Description

@benminer

Severity

Medium

Status

Confirmed

Description

The inventory tree endpoint fetches every active ad unit for a tenant in a single unbounded query. For large GAM networks this blows up process memory and gets the container OOM-killed before a response is ever returned.

We hit this on 2026-03-24 with a tenant that has ~230k active ad units in gam_inventory. The process peaked at 3.63 GB RSS before the kernel killed it:

Out of memory: Killed process 680 (python) total-vm:4811004kB, anon-rss:3738544kB

The machine had 4 GB RAM. We bumped the instance to 12 GB as a short-term workaround, but GAM publishers can have far more than 230k units so this needs a real fix.

The unbounded query is in src/admin/blueprints/inventory.py at lines 924-939:

stmt = select(GAMInventory).where(
    GAMInventory.tenant_id == tenant_id,
    GAMInventory.inventory_type == "ad_unit",
    GAMInventory.status == "ACTIVE",
)
matching_units = db_session.scalars(stmt).all()  # no limit, ever

Each row carries a full inventory_metadata JSONB column. At 230k rows that's ~3.74 GB loaded into Python memory in one shot.

Two things make this worse than it might look. First, SimpleCache only stores the result after a successful response, so on a fresh start the full query runs unconditionally. Since the process OOMs before anything is returned, the cache never gets populated and every restart hits the same crash. Second, GAMInventoryService.get_ad_unit_tree() already has a limit=1000 guard, but get_inventory_tree() in the blueprint doesn't call it. It runs its own raw query with no limit.

Impact

  • Any tenant with a large enough GAM network OOM-kills the container on the first request after startup
  • Cache never repopulates after an OOM, so the crash repeats on every restart
  • Affects all deployments, this is purely a function of inventory size

Steps to reproduce

  1. Sync a GAM network with >10k active ad units into a tenant
  2. Restart the container (clears the in-process cache)
  3. Open the Inventory page for that tenant in the Admin UI
  4. GET /api/tenant/<id>/inventory/tree loads the full result set and OOMs

Suggested fix

get_inventory_list() already handles this correctly. It caps at 500 and returns has_more (lines 1227 and 1251). The same pattern on get_inventory_tree() would fix this:

stmt = select(GAMInventory).where(
    GAMInventory.tenant_id == tenant_id,
    GAMInventory.inventory_type == "ad_unit",
    GAMInventory.status == "ACTIVE",
).limit(5000)

matching_units = db_session.scalars(stmt).all()
truncated = len(matching_units) == 5000

Affected files

  • src/admin/blueprints/inventory.py:924-939 — unbounded query
  • src/admin/blueprints/inventory.py:1227,1251 — existing has_more pattern to follow

Metadata

Metadata

Type

No type

Projects

Status

Under Review

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions