SAST-AI-Workflow is a LLM-based tool designed to detect and flag suspected vulnerabilities through SAST(Static Application Security Testing). It inspects suspicious lines of code in a given repository and deeply review the legitimacy of errors. This workflow involves existing SAST reports, source code analysis, CWE data and other known examples.
The SAST-AI-Workflow can be integrated into the vulnerability detection process as an AI-assisted tool. It offers enhanced insights that may be overlooked during manual verification, while also reducing the time required by engineers.
As an initial step, we applied the workflow to the SAST scanning of the RHEL systemd project (source: systemd GitHub). We intend to extend this approach to support additional C-based projects in the future.
The workflow uses a LangGraph-based agent architecture (NAT framework). A linear pipeline of nodes processes each SAST report, with the investigate node running an autonomous multi-stage research loop per issue.
Input β pre_process β filter β investigate β summarize_justifications β calculate_metrics β write_results β Output
See Architecture Details and the diagrams (Mermaid + Excalidraw) for the full picture.
| Node | Role |
|---|---|
pre_process |
Initialize workflow, load config, set up vector store |
filter |
Filter known false positives using embedding similarity |
investigate |
Autonomously investigate each SAST issue (see below) |
summarize_justifications |
Generate human-readable analysis summaries |
calculate_metrics |
Compute precision/recall and performance metrics |
write_results |
Export final verdicts to configured outputs |
Each SAST issue is handled by a subgraph that loops until the evaluator approves a verdict or a safety limit is reached:
ββ RESEARCH (ReAct agent) ββββββββββββββββββββββββββββββββββ
β Calls tools to gather code evidence: β
β fetch_code Β· read_file Β· search_codebase β
β list_directory Β· file_search β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββ ANALYSIS (LLM) ββββββββββββββββββββββββββββββββββββββββββ
β Produces structured verdict + confidence score β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββ EVALUATION (reflection) βββββββββββββββββββββββββββββββββ
β Critiques the analysis: β
β β’ approved β END β
β β’ needs more code β loop back to RESEARCH β
β β’ reanalyze β loop back to ANALYSIS β
β β’ safety limit β CIRCUIT BREAKER β END β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Tool | Purpose |
|---|---|
fetch_code |
Retrieve a specific file/function from the target repository |
read_file |
Read a file by path from the source tree |
search_codebase |
Semantic search across the codebase for relevant code patterns |
list_directory |
List directory contents to explore unfamiliar code structure |
file_search |
Search for files by name pattern (conditionally included) |
The investigation subgraph has multiple safety limits to prevent runaway LLM usage:
| Limit | Value | Behavior |
|---|---|---|
| Max LLM calls per research iteration | 15 | Graceful exit; accumulated code is preserved |
| Max consecutive evaluation rejections | 3 | Forces NEEDS_REVIEW verdict |
| Max iterations without new code | 2 | Forces NEEDS_REVIEW verdict |
| Evaluation error (after retries) | β | Forces NEEDS_REVIEW + stop_reason = "evaluation_error" |
For full details see the Investigation Package README.
- SAST HTML Reports: Processes scan results from SAST HTML reports.
- Source Code: Requires access to the exact source code used to generate the SAST HTML report.
- Verified Data: Incorporates known error cases for continuous learning and better results.
- CWE Information: Embeds CWE (Common Weakness Enumeration) data to enrich vulnerability analysis context.
- Converts input data (verified data, source code) into embeddings using a specialized sentence transformer HuggingFace model (all-mpnet-base-v2) and stores them in an in-memory vector store (FAISS).
- Supports OpenAI-compatible AI models.
- Supports NVIDIA NIMs via the
ChatNVIDIALangChain integration.
Every analyzed issue receives a confidence score (0β100%) that quantifies how much trust to place in the verdict. The score is a weighted combination of:
- Agent Confidence (~37.5%) β The LLM's self-assessed certainty in its verdict
- Investigation Depth (~37.5%) β How thoroughly the issue was investigated (symbols explored, tool calls, clean completion vs. safety limits)
- Evidence Strength (~25%) β Quality of supporting evidence (FAISS similarity, files fetched, concrete code/CVE references)
Issues matching known false positives in the vector store skip the full investigation and use the filter's match confidence directly.
| Score Range | Level | Recommended Action |
|---|---|---|
| 80β100% | High | Verdict can generally be trusted |
| 50β79% | Medium | Consider spot-checking |
| 0β49% | Low | Manual review recommended |
All weights and normalization caps are configurable in config/default_config.yaml. See Confidence Scoring for the full formula, interpretation guidance, and configuration reference.
Note: Confidence scoring is under active improvement β weights and calibration are being refined (14/4/26)
- Applies metrics (from Ragas library) to assess the quality of model outputs.
- Note: SAST-AI-Workflow is primarily focused on identifying false alarms (False Positives).
Please refer to how to run guideline.
- Langfuse Guide β reading traces, analyzing costs, debugging from trace data
- Operations Runbook β monitoring, alerts, performance tuning, credential rotation
We provide a Tekton pipeline and helper scripts for deploying the SASTβAIβWorkflow on an OpenShift cluster. Please refer to how to execute the pipeline.
For triggering workflows via the orchestrator API, see the Triggering Workflows Guide.