SAST-AI-Workflow

SAST-AI-Workflow is a LLM-based tool designed to detect and flag suspected vulnerabilities through SAST(Static Application Security Testing). It inspects suspicious lines of code in a given repository and deeply review the legitimacy of errors. This workflow involves existing SAST reports, source code analysis, CWE data and other known examples.

Purpose

The SAST-AI-Workflow can be integrated into the vulnerability detection process as an AI-assisted tool. It offers enhanced insights that may be overlooked during manual verification, while also reducing the time required by engineers.

As an initial step, we applied the workflow to the SAST scanning of the RHEL systemd project (source: systemd GitHub). We intend to extend this approach to support additional C-based projects in the future.

Architecture

The workflow uses a LangGraph-based agent architecture (NAT framework). A linear pipeline of nodes processes each SAST report, with the investigate node running an autonomous multi-stage research loop per issue.

Input → pre_process → filter → investigate → summarize_justifications → calculate_metrics → write_results → Output

See Architecture Details and the diagrams (Mermaid + Excalidraw) for the full picture.

Agent Pipeline Nodes

Node	Role
`pre_process`	Initialize workflow, load config, set up vector store
`filter`	Filter known false positives using embedding similarity
`investigate`	Autonomously investigate each SAST issue (see below)
`summarize_justifications`	Generate human-readable analysis summaries
`calculate_metrics`	Compute precision/recall and performance metrics
`write_results`	Export final verdicts to configured outputs

The `investigate` Node: Research → Analysis → Evaluation

Each SAST issue is handled by a subgraph that loops until the evaluator approves a verdict or a safety limit is reached:

┌─ RESEARCH (ReAct agent) ─────────────────────────────────┐
│  Calls tools to gather code evidence:                     │
│    fetch_code · read_file · search_codebase               │
│    list_directory · file_search                           │
└──────────────────────────────────────────────────────────┘
         │
         ▼
┌─ ANALYSIS (LLM) ─────────────────────────────────────────┐
│  Produces structured verdict + confidence score           │
└──────────────────────────────────────────────────────────┘
         │
         ▼
┌─ EVALUATION (reflection) ────────────────────────────────┐
│  Critiques the analysis:                                  │
│    • approved       → END                                 │
│    • needs more code → loop back to RESEARCH              │
│    • reanalyze      → loop back to ANALYSIS               │
│    • safety limit   → CIRCUIT BREAKER → END               │
└──────────────────────────────────────────────────────────┘

Research Tools

Tool	Purpose
`fetch_code`	Retrieve a specific file/function from the target repository
`read_file`	Read a file by path from the source tree
`search_codebase`	Semantic search across the codebase for relevant code patterns
`list_directory`	List directory contents to explore unfamiliar code structure
`file_search`	Search for files by name pattern (conditionally included)

Circuit Breakers

The investigation subgraph has multiple safety limits to prevent runaway LLM usage:

Limit	Value	Behavior
Max LLM calls per research iteration	15	Graceful exit; accumulated code is preserved
Max consecutive evaluation rejections	3	Forces `NEEDS_REVIEW` verdict
Max iterations without new code	2	Forces `NEEDS_REVIEW` verdict
Evaluation error (after retries)	—	Forces `NEEDS_REVIEW` + `stop_reason = "evaluation_error"`

For full details see the Investigation Package README.

Input Sources

SAST HTML Reports: Processes scan results from SAST HTML reports.
Source Code: Requires access to the exact source code used to generate the SAST HTML report.
Verified Data: Incorporates known error cases for continuous learning and better results.
CWE Information: Embeds CWE (Common Weakness Enumeration) data to enrich vulnerability analysis context.

Embeddings & Vector Store

Converts input data (verified data, source code) into embeddings using a specialized sentence transformer HuggingFace model (all-mpnet-base-v2) and stores them in an in-memory vector store (FAISS).

LLM Integration

Supports OpenAI-compatible AI models.
Supports NVIDIA NIMs via the ChatNVIDIA LangChain integration.

Confidence Scoring

Every analyzed issue receives a confidence score (0–100%) that quantifies how much trust to place in the verdict. The score is a weighted combination of:

Agent Confidence (~37.5%) — The LLM's self-assessed certainty in its verdict
Investigation Depth (~37.5%) — How thoroughly the issue was investigated (symbols explored, tool calls, clean completion vs. safety limits)
Evidence Strength (~25%) — Quality of supporting evidence (FAISS similarity, files fetched, concrete code/CVE references)

Issues matching known false positives in the vector store skip the full investigation and use the filter's match confidence directly.

Score Range	Level	Recommended Action
80–100%	High	Verdict can generally be trusted
50–79%	Medium	Consider spot-checking
0–49%	Low	Manual review recommended

All weights and normalization caps are configurable in config/default_config.yaml. See Confidence Scoring for the full formula, interpretation guidance, and configuration reference.

Note: Confidence scoring is under active improvement — weights and calibration are being refined (14/4/26)

Evaluation

Applies metrics (from Ragas library) to assess the quality of model outputs.
Note: SAST-AI-Workflow is primarily focused on identifying false alarms (False Positives).

🔌 Installation & Setup

Please refer to how to run guideline.

📊 Observability

Langfuse Guide — reading traces, analyzing costs, debugging from trace data

📋 Operations

Operations Runbook — monitoring, alerts, performance tuning, credential rotation

🚀 CI/CD Pipeline

We provide a Tekton pipeline and helper scripts for deploying the SAST‑AI‑Workflow on an OpenShift cluster. Please refer to how to execute the pipeline.

For triggering workflows via the orchestrator API, see the Triggering Workflows Guide.

Name		Name	Last commit message	Last commit date
Latest commit History 637 Commits
.dvc		.dvc
.github/workflows		.github/workflows
config		config
data		data
deploy		deploy
docs		docs
evaluation		evaluation
src		src
tests		tests
.dockerignore		.dockerignore
.dvcignore		.dvcignore
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Containerfile		Containerfile
Containerfile.base.dockerfile		Containerfile.base.dockerfile
LICENSE		LICENSE
README.md		README.md
TESTING.md		TESTING.md
dev-requirements.txt		dev-requirements.txt
dvc.yaml		dvc.yaml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
sonar-project.properties		sonar-project.properties
test-requirements.txt		test-requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SAST-AI-Workflow

Purpose

Architecture

Agent Pipeline Nodes

The `investigate` Node: Research → Analysis → Evaluation

Research Tools

Circuit Breakers

Input Sources

Embeddings & Vector Store

LLM Integration

Confidence Scoring

Evaluation

🔌 Installation & Setup

📊 Observability

📋 Operations

🚀 CI/CD Pipeline

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SAST-AI-Workflow

Purpose

Architecture

Agent Pipeline Nodes

The investigate Node: Research → Analysis → Evaluation

Research Tools

Circuit Breakers

Input Sources

Embeddings & Vector Store

LLM Integration

Confidence Scoring

Evaluation

🔌 Installation & Setup

📊 Observability

📋 Operations

🚀 CI/CD Pipeline

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

The `investigate` Node: Research → Analysis → Evaluation

Packages