Skip to content

ZygmaCore/incident_simulator

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ZygmaCore Incident Simulator

Logo

A Python CLI game that simulates real-world incident response for on-call engineers.


📘 About The Project

ZygmaCore Incident Simulator is an interactive Python command-line simulator that puts you in the role of an on-call engineer during a production incident.

The system presents a scenario:

INCIDENT: SERVER DOWN
The main API server is unresponsive, users are seeing errors, and metrics are spiking.

Your job is to choose what to do next — check logs, ping the server, restart it blindly, or investigate deeper.
Each decision leads to different outcomes: ✅ success, 😐 neutral, or ❌ disaster.

This project is designed as both:

  • A fun CLI game for engineers.
  • A learning tool for practicing structured incident response thinking.

💡 Why This Project Exists

Most beginner Python projects focus on calculators, to-do apps, or simple utilities.
This simulator is different — it introduces:

  • Realistic SRE/DevOps scenarios (server down, bad deploy, DB issues).
  • Decision trees with consequences, not just right/wrong answers.
  • Narrative-driven learning, which is more memorable than static tutorials.

It helps:

  • New developers understand why “just restart the server” can be dangerous.
  • Students practice input handling, branching logic, and loops.
  • Engineers reflect on better incident response strategies in a low-risk environment.

🖼 Screenshot

Screenshot Screenshot Screenshot

🛠 Technical Stack

Technology Usage
🐍 Python 3 Core language for the simulator logic
💻 CLI / Terminal User interaction and scenario choices
📦 os Clears the screen between scenes (os.system("clear"))
time Displays the time when the incident summary is generated

✨ Key Features

  • 🖼 ASCII Art Intro
    Stylish server illustration to set the mood before the incident begins.

  • 🚨 Realistic Incident Scenario
    “Server down” situation with user impact and monitoring alerts.

  • 🧠 Branching Decision Tree
    Multiple choices at each step (check logs, ping, restart, investigate, etc.).

  • 🎯 Multiple Endings

    • SUCCESS ENDING – root cause identified, minimal downtime.
    • 😐 NEUTRAL ENDING – service restored, but RCA incomplete.
    • BAD ENDING – inaction makes things worse.
    • 💥 CATASTROPHIC ENDING – blind restart causes major outage.
  • 🧪 Input Validation
    Ensures numeric choices where required and handles invalid input with “GAME OVER” style messaging.

  • 🔁 Replay Loop
    After each run, the user can retry the incident simulation without restarting the program.

  • 🕒 Timestamped Summary
    Displays a “Generated at” time to simulate real incident timeline context.


🚀 Getting Started

Follow the steps below to run the ZygmaCore Incident Simulator locally.

✅ Prerequisites

You’ll need:

  • Python 3.8+
  • A terminal:
    • macOS / Linux: Terminal
    • Windows: PowerShell or Command Prompt

💡 Note: The script uses os.system("clear") which works on macOS/Linux.
On Windows, you may change it to cls or remove the clear calls.


📦 Installation

  1. Clone the repository

    git clone https://github.com/ZygmaCore/incident_simulator.git
    cd incident_simulator
  2. (Optional) Create and activate a virtual environment

    python -m venv .venv
    # On macOS/Linux
    source .venv/bin/activate
    # On Windows
    .venv\Scripts\activate
  3. Prepare the script

    Ensure the main script (for example main.py) contains the provided code and is located in the project root.


▶️ Usage

Run the simulator from your terminal:

python main.py

You’ll see something like:

==============================
ZYGMACORE INCIDENT SIMULATOR
==============================

INCIDENT: SERVER DOWN
The main API server is currently UNRESPONSIVE.
Users are reporting errors and the monitoring system shows elevated failure rates.
You are the on-call engineer. What will you do?

1) Check server logs.
2) Ping the server
3) Restart the server immediately

Example flow:

  • Choose 1 → Check logs → see HTTP 500 errors after a recent deployment.

  • Then choose:

    • 1 → Inspect DB connection → find exhausted connection pool → ✅ SUCCESS ENDING
    • 2 → Rollback deployment → service recovers but RCA incomplete → 😐 NEUTRAL ENDING
    • 3 → Ignore and wait → ❌ BAD ENDING

At the end of a run, you’ll see:

==============================
INCIDENT SIMULATOR COMPLETED
==============================
Retry Again? (Y/N)

Enter y to replay, or n to exit with a final:

Have a Good Day~

🤝 Contributing

Contributions, suggestions, and improvements are very welcome! ✨

If you’d like to contribute:

  1. Fork the repository
  2. Create a new branch: git checkout -b feature/your-feature-name
  3. Commit your changes: git commit -m "Add feature: your-feature-name"
  4. Push to your branch: git push origin feature/your-feature-name
  5. Open a Pull Request with a clear description of your changes

Ideas for contributions:

  • Add more incident scenarios (e.g., latency spikes, database outage, queue backlog)
  • Add difficulty levels
  • Add colorized output using libraries like rich
  • Improve input validation and error messaging

📄 License & Contact

This project is licensed under the MIT License. See the LICENSE file for full details.

Author Contact:
🌐 https://alhikam.me 🐙 https://github.com/ZygmaCore

About

A terminal-based incident response simulator for practicing on-call decisions under pressure.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%