Skip to content

[BUG] Event Storm when targeting \* with Syndic #61845

@TheBirdsNest

Description

@TheBirdsNest

Description
When running commands against all minions, I.E salt '*' saltutil.sync_all, a return events are triggered and loop endlessly, filling the event bus and crashing the infrastructure. This appears in our environment where we operate a master/syndic architecture with x5 syndic masters reporting to a single central master. Example of events that fill the event bus are below. We have 6000 minions and these events were still coming in after 2 hours.

The action is returned by the minion, however, the master still fires a find_job event and it endlessly cycles.

salt/job/20220324134159405558/ret/74f13a0b-4134-489b-97b5-55ff7ac88482  {
    "_stamp": "2022-03-24T14:49:58.398897",
    "fun": "saltutil.sync_all",
    "fun_args": null,
    "id": "74f13a0b-4134-489b-97b5-55ff7ac88482",
    "jid": "20220324134159405558",
    "retcode": 0,
    "return": {
        "beacons": [],
        "clouds": [],
        "engines": [],
        "executors": [],
        "grains": [],
        "log_handlers": [],
        "matchers": [],
        "modules": [
            "modules.spectrum"
        ],
        "output": [],
        "proxymodules": [],
        "renderers": [],
        "returners": [],
        "sdb": [],
        "serializers": [],
        "states": [
            "states.spectrum"
        ],
        "thorium": [],
        "utils": []
    },
    "success": true
}
salt/job/20220324134242473594/ret/1bd1821d-7c23-41f2-bc8c-5b1171f68a14  {
    "_stamp": "2022-03-24T14:49:58.410478",
    "fun": "saltutil.find_job",
    "fun_args": null,
    "id": "1bd1821d-7c23-41f2-bc8c-5b1171f68a14",
    "jid": "20220324134242473594",
    "retcode": 0,
    "return": {},
    "success": true
}

Setup
x1 Salt Master running on a Cloud VM with 8GB RAM and 4 vCPU's
x5 Syndic Masters running on a Cloud VM with 256GB RAM and 64 vCPU's serving 1500 proxy minions each.

Steps to Reproduce the behavior
Call any command and target all devices using '*' glob.

Expected behavior
When targeting all minions, the events should return in a timely manner and not endlessly loop.

Versions Report

Salt Version:
          Salt: 3002.5
 
Dependency Versions:
          cffi: 1.14.4
      cherrypy: unknown
      dateutil: 2.8.1
     docker-py: 4.4.4
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.0.2
       libgit2: Not Installed
      M2Crypto: 0.35.2
          Mako: Not Installed
       msgpack: 0.6.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: 2.20
      pycrypto: 2.6.1
  pycryptodome: 3.9.9
        pygit2: Not Installed
        Python: 3.6.8 (default, Nov 16 2020, 16:55:22)
  python-gnupg: Not Installed
        PyYAML: 5.3.1
         PyZMQ: 17.0.0
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.1.4
 
System Versions:
          dist: centos 7 Core
        locale: UTF-8
       machine: x86_64
       release: 3.10.0-1160.49.1.el7.x86_64
        system: Linux
       version: CentOS Linux 7 Core

Metadata

Metadata

Assignees

Labels

Salt-Syndicbugbroken, incorrect, or confusing behaviorseverity-criticaltop severity, seen by most users, serious issues

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions