Skip to content

RHEcosystemAppEng/sast-ai-workflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

637 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SAST-AI-Workflow

SonarQube Cloud Quality Gate Status Code Smells Maintainability Rating Vulnerabilities

main branch Image Build Release Image Quay.io

SAST-AI-Workflow is a LLM-based tool designed to detect and flag suspected vulnerabilities through SAST(Static Application Security Testing). It inspects suspicious lines of code in a given repository and deeply review the legitimacy of errors. This workflow involves existing SAST reports, source code analysis, CWE data and other known examples.

Purpose

The SAST-AI-Workflow can be integrated into the vulnerability detection process as an AI-assisted tool. It offers enhanced insights that may be overlooked during manual verification, while also reducing the time required by engineers.

As an initial step, we applied the workflow to the SAST scanning of the RHEL systemd project (source: systemd GitHub). We intend to extend this approach to support additional C-based projects in the future.

Architecture

The workflow uses a LangGraph-based agent architecture (NAT framework). A linear pipeline of nodes processes each SAST report, with the investigate node running an autonomous multi-stage research loop per issue.

Input β†’ pre_process β†’ filter β†’ investigate β†’ summarize_justifications β†’ calculate_metrics β†’ write_results β†’ Output

See Architecture Details and the diagrams (Mermaid + Excalidraw) for the full picture.

Agent Pipeline Nodes

Node Role
pre_process Initialize workflow, load config, set up vector store
filter Filter known false positives using embedding similarity
investigate Autonomously investigate each SAST issue (see below)
summarize_justifications Generate human-readable analysis summaries
calculate_metrics Compute precision/recall and performance metrics
write_results Export final verdicts to configured outputs

The investigate Node: Research β†’ Analysis β†’ Evaluation

Each SAST issue is handled by a subgraph that loops until the evaluator approves a verdict or a safety limit is reached:

β”Œβ”€ RESEARCH (ReAct agent) ─────────────────────────────────┐
β”‚  Calls tools to gather code evidence:                     β”‚
β”‚    fetch_code Β· read_file Β· search_codebase               β”‚
β”‚    list_directory Β· file_search                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€ ANALYSIS (LLM) ─────────────────────────────────────────┐
β”‚  Produces structured verdict + confidence score           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€ EVALUATION (reflection) ────────────────────────────────┐
β”‚  Critiques the analysis:                                  β”‚
β”‚    β€’ approved       β†’ END                                 β”‚
β”‚    β€’ needs more code β†’ loop back to RESEARCH              β”‚
β”‚    β€’ reanalyze      β†’ loop back to ANALYSIS               β”‚
β”‚    β€’ safety limit   β†’ CIRCUIT BREAKER β†’ END               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Research Tools

Tool Purpose
fetch_code Retrieve a specific file/function from the target repository
read_file Read a file by path from the source tree
search_codebase Semantic search across the codebase for relevant code patterns
list_directory List directory contents to explore unfamiliar code structure
file_search Search for files by name pattern (conditionally included)

Circuit Breakers

The investigation subgraph has multiple safety limits to prevent runaway LLM usage:

Limit Value Behavior
Max LLM calls per research iteration 15 Graceful exit; accumulated code is preserved
Max consecutive evaluation rejections 3 Forces NEEDS_REVIEW verdict
Max iterations without new code 2 Forces NEEDS_REVIEW verdict
Evaluation error (after retries) β€” Forces NEEDS_REVIEW + stop_reason = "evaluation_error"

For full details see the Investigation Package README.

Input Sources

  • SAST HTML Reports: Processes scan results from SAST HTML reports.
  • Source Code: Requires access to the exact source code used to generate the SAST HTML report.
  • Verified Data: Incorporates known error cases for continuous learning and better results.
  • CWE Information: Embeds CWE (Common Weakness Enumeration) data to enrich vulnerability analysis context.

Embeddings & Vector Store

  • Converts input data (verified data, source code) into embeddings using a specialized sentence transformer HuggingFace model (all-mpnet-base-v2) and stores them in an in-memory vector store (FAISS).

LLM Integration

  • Supports OpenAI-compatible AI models.
  • Supports NVIDIA NIMs via the ChatNVIDIA LangChain integration.

Confidence Scoring

Every analyzed issue receives a confidence score (0–100%) that quantifies how much trust to place in the verdict. The score is a weighted combination of:

  • Agent Confidence (~37.5%) β€” The LLM's self-assessed certainty in its verdict
  • Investigation Depth (~37.5%) β€” How thoroughly the issue was investigated (symbols explored, tool calls, clean completion vs. safety limits)
  • Evidence Strength (~25%) β€” Quality of supporting evidence (FAISS similarity, files fetched, concrete code/CVE references)

Issues matching known false positives in the vector store skip the full investigation and use the filter's match confidence directly.

Score Range Level Recommended Action
80–100% High Verdict can generally be trusted
50–79% Medium Consider spot-checking
0–49% Low Manual review recommended

All weights and normalization caps are configurable in config/default_config.yaml. See Confidence Scoring for the full formula, interpretation guidance, and configuration reference.

Note: Confidence scoring is under active improvement β€” weights and calibration are being refined (14/4/26)

Evaluation

  • Applies metrics (from Ragas library) to assess the quality of model outputs.
  • Note: SAST-AI-Workflow is primarily focused on identifying false alarms (False Positives).

πŸ”Œ Installation & Setup

Please refer to how to run guideline.

πŸ“Š Observability

  • Langfuse Guide β€” reading traces, analyzing costs, debugging from trace data

πŸ“‹ Operations

πŸš€ CI/CD Pipeline

We provide a Tekton pipeline and helper scripts for deploying the SAST‑AI‑Workflow on an OpenShift cluster. Please refer to how to execute the pipeline.

For triggering workflows via the orchestrator API, see the Triggering Workflows Guide.

About

Review static code vulnerabilities with the power of Gen-AI βš‘πŸ€–

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors