EnvMapper (formerly Netographer)

Introduction

Once we have access to a network, we can execute arbitrary commands on discovered machines. In large organizations with many computers, it can be difficult to decide where the most fruitful targets would be. A list of processes a computer is running can tell us some important information about how the computer is used. If we see Photoshop in a list of processes, we might infer that the user is a graphic designer or photographer. If we see cmd.exe, it might be a developer or administrator. A user running excel.exe might be a manager or analyst. Any one of these might be a source of valuable business data for white-hats and black-hats alike.

Overview

EnvMapper is a command-line intelligence tool that uses LLMs and system data to understand how users in a network operate, group similar activities, and identify patterns or anomalies. By analyzing process lists and Active Directory (AD) group memberships, EnvMapper builds behavioral clusters that can help defenders and red teams alike uncover high-value systems, common workflows, and unusual activity.

EnvMapper supports multiple LLM providers, including:

OpenAI (default)
AWS Bedrock (Claude, Llama/Mistral chat; Titan embeddings)
Any OpenAI-compatible API (self-hosted via Ollama, vLLM, Together, Fireworks, etc.)
Local embeddings via sentence-transformers (for fully on-prem use)

Installation

Requirements

Python 3.10 or later

Dependencies:

pip install -U pandas numpy scikit-learn hdbscan tenacity python-dateutil tqdm tabulate openai boto3 sentence-transformers

Clone and Setup

git clone https://github.com/<your-org>/envmapper.git
cd envmapper

You can run EnvMapper directly as a script:

python netographer_llm_anyprovider.py --help

Provider Configuration

EnvMapper works with multiple backends. Choose your preferred setup below.

1. OpenAI (Default)

export OPENAI_API_KEY=sk-...
python netographer_llm_anyprovider.py cluster \
  --tasklist-glob "tasklists/*.txt" \
  --enum4linux enum4linux.txt \
  --json-out clusters.json \
  --md-out clusters.md

2. Self-Hosted (Ollama, vLLM, OpenRouter)

export OPENAI_API_KEY=dummy
python netographer_llm_anyprovider.py cluster \
  --provider openai_compat \
  --openai-base-url http://localhost:11434/v1 \
  --chat-model llama3.1:8b-instruct \
  --embed-model sentence_transformers \
  --st-model all-MiniLM-L6-v2

3. AWS Bedrock (Claude 3.5 + Titan)

export AWS_REGION=us-east-1
python netographer_llm_anyprovider.py cluster \
  --provider bedrock \
  --chat-model anthropic.claude-3-5-sonnet-20240620-v1:0 \
  --embed-model amazon.titan-embed-text-v2:0 \
  --bedrock-model-family anthropic \
  --tasklist-glob "tasklists/*.txt" \
  --enum4linux enum4linux.txt \
  --json-out clusters.json \
  --md-out clusters.md

Approach

1. Data Ingestion

EnvMapper reads:

Windows tasklist.exe outputs to identify running processes per user.
Enum4Linux outputs to understand domain users and groups.

This raw data is transformed into per-user activity vectors.

2. Embedding Generation

Each user’s behavioral summary is embedded using:

Provider embeddings (OpenAI, Bedrock Titan)
Or local embeddings via sentence-transformers.

These embeddings capture similarity in user activity.

3. Clustering and Labeling

HDBSCAN or KMeans is used to cluster users by behavior.
The LLM generates descriptive cluster labels (role, workflow, risk).
The output includes:
- clusters.json — structured machine-readable output
- clusters.md — human-readable summary with risk labels

Example Output

{
  "algo": "hdbscan",
  "clusters": {
    "0": { "label": "Developers", "risk": "medium", "size": 12 },
    "1": { "label": "Finance Staff", "risk": "high", "size": 6 },
    "2": { "label": "Admins", "risk": "high", "size": 3 }
  }
}

Use Cases

Identify clusters of similar users based on process data.
Detect outliers or anomalous system usage.
Generate human-readable activity summaries.
Enrich blue/red team reconnaissance or threat modeling.

Roadmap

Integrate live data ingestion via SMB/WinRM.
Add visualization (e.g., network maps via Graphviz).
Extend cluster labeling with risk scoring models.
Add cloud-native ingest for large enterprise analysis.

License

Setup and Usage

supports: • OpenAI (default) • AWS Bedrock (Claude, Llama/Mistral chat; Titan for embeddings) • Any OpenAI-compatible endpoint (self-hosted vLLM, Ollama at http://localhost:11434/v1, OpenRouter, Together, Fireworks, etc.) • Local embeddings via sentence-transformers (so you can keep vectors entirely on-prem)

Example commands

OpenAI

export OPENAI_API_KEY=sk-...
python netographer_llm_anyprovider.py cluster \
  --tasklist-glob "tasklists/*.txt" \
  --enum4linux enum4linux.txt \
  --json-out clusters.json --md-out clusters.mdexport OPENAI_API_KEY=sk-...
python netographer_llm_anyprovider.py cluster \
  --tasklist-glob "tasklists/*.txt" \
  --enum4linux enum4linux.txt \
  --json-out clusters.json --md-out clusters.md

Self hosted (OLLAMA)

export OPENAI_API_KEY=ollama   # any non-empty token is fine for many setups
python netographer_llm_anyprovider.py cluster \
  --provider openai_compat \
  --openai-base-url http://localhost:11434/v1 \
  --chat-model llama3.1:8b-instruct \
  --embed-model sentence_transformers --st-model all-MiniLM-L6-v2

AWS Bedrock (Claude 3.5 Sonnet + Titan embeddings)

export AWS_REGION=us-east-1
# also ensure credentials via env / profile / role
python netographer_llm_anyprovider.py cluster \
  --provider bedrock \
  --chat-model anthropic.claude-3-5-sonnet-20240620-v1:0 \
  --embed-model amazon.titan-embed-text-v2:0 \
  --bedrock-model-family anthropic \
  --tasklist-glob "tasklists/*.txt" \
  --enum4linux enum4linux.txt \
  --json-out clusters.json --md-out clusters.md

Summary

First Steps

After running tasklist.exe on every workstation and saving the results to a local directory, it's easy enough to grep through every file and find specific processes, but what if we don't even know what we're looking for? Every organization is different and our goals might be different on each penetration test so we need a way to get some high level data about all of the computers. We wrote a command-line tool called netographer (note: the name isn't very good, what should we call it?) which can provide us such information and more.

users/groups

The utility enum4linux enumerates all of the users in a domain and the groups to which they are assigned. Netographer can parse this file and return a JSON object mapping either users to an array of groups they are in or groups to an array of users in that group. These are the users and groups commands respectively. Because we don't always have time or permission to run enum4linux, we can also use a list of files showing logged-in-users as the source data for this command (but we will be missing group membership).

processes

Once we've looked at the users and groups for anything interesting, we can get a high-level view of all running processes on the network. The processes command will parse input tasklists and return a JSON object containing an array of all processes and how many times they occurred (called "process-counts"), an array of all processes and their frequency from 0 to 1 (called "process-frequencies"), and a sub-object representing all users and their running tasks (called "users"). If the output from the users command is piped into this command, the set of users will be restricted to those from the piped users object. Otherwise, the users will be the set of users from every tasklist.

cluster

This is the fun one. The output from the previous commands is useful, but in very largs organizations even reading through those can be unmanagable. What if we could group users by the processes they are running and then invesitage each group? Two users running devenv.exe (Visual Studio), are probably doing similar kind of work and have access to similar kinds of business data. Two users running dwm.exe (Desktop Windows Manager) are probably not very similar because almost every Windows computer has that process.

With this in mind, we designed a distance metric that compares the processes in common between to users weighted by overall process frequency and places similar users closer together. Using heirarchical cluster, we initially assign every user to their own cluster. Repeatedly combining the most similar clusters reduces the number of clusters to the point where we have maybe 3 or 5 groups of hopefully different types of users based solely on the processes they were running. The inputs to this command is the output from the processes command. The output from this command is a JSON object of cluster compositions from 2 to 10 clusters showing the most common processes for each calculated cluster of users. The optimal number of clusters depends on the data so several different values are provided for convenience.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
envmapper.py		envmapper.py
summary.md		summary.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EnvMapper (formerly Netographer)

Introduction

Overview

Installation

Requirements

Clone and Setup

Provider Configuration

1. OpenAI (Default)

2. Self-Hosted (Ollama, vLLM, OpenRouter)

3. AWS Bedrock (Claude 3.5 + Titan)

Approach

1. Data Ingestion

2. Embedding Generation

3. Clustering and Labeling

Example Output

Use Cases

Roadmap

License

Setup and Usage

Summary

First Steps

users/groups

processes

cluster

About

Uh oh!

Releases

Packages

Languages

pfussell/envmapper

Folders and files

Latest commit

History

Repository files navigation

EnvMapper (formerly Netographer)

Introduction

Overview

Installation

Requirements

Clone and Setup

Provider Configuration

1. OpenAI (Default)

2. Self-Hosted (Ollama, vLLM, OpenRouter)

3. AWS Bedrock (Claude 3.5 + Titan)

Approach

1. Data Ingestion

2. Embedding Generation

3. Clustering and Labeling

Example Output

Use Cases

Roadmap

License

Setup and Usage

Summary

First Steps

users/groups

processes

cluster

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages