Updated README

Andrei-Constantin-Programmer · Andrei-Constantin-Programmer · commit 1069c9ce613b · 2025-08-31T12:32:20.000+01:00
diff --git a/README.md b/README.md
@@ -5,288 +5,121 @@
 ![Python](https://img.shields.io/badge/python-3.8+-blue.svg)
 ![Status](https://img.shields.io/badge/status-research-orange.svg)
 
-A cutting-edge research project developed in collaboration with **IBM Research** and **University College London (UCL)** to automatically migrate legacy Java codebases to modern, maintainable solutions using AI-powered antipattern detection and intelligent refactoring suggestions.
+Local-first, agentic refactoring pipeline for Java that pairs SonarQube findings with LLM-based reasoning to detect smells/anti-patterns and propose file-scoped, behaviour-preserving edits. Changes are gated by compile + tests; public interfaces are preserved; static analysis is re-run for reporting only. Run artefacts (plans, diffs, logs) are persisted for auditability.
 
-## 🚀 Overview
+Project made in collaboration with **UCL** and **IBM**.
 
-This tool leverages the power of Large Language Models (LLMs) and vector-based knowledge retrieval to provide comprehensive analysis of Java code, automatically detecting common antipatterns and suggesting concrete refactoring strategies. It represents a significant advancement in automated code modernization and technical debt reduction.
-
-## ✨ Key Features
-
-- 🔍 **Intelligent Antipattern Detection**: Automatically identifies 20+ common Java antipatterns including God Object, Long Method, Feature Envy, and more
-- 🤖 **AI-Powered Analysis**: Utilizes state-of-the-art LLMs (Granite, Llama, etc.) for deep semantic code understanding
-- 📊 **Context-Aware Analysis**: Vector database enables intelligent knowledge retrieval for more accurate assessments
-- 🛠️ **Actionable Refactoring Recommendations**: Provides step-by-step refactoring guidance with effort estimates
-- 🏗️ **Modular Agent Architecture**: Extensible design with specialized agents for different analysis tasks
-- 📈 **Comprehensive Reporting**: Detailed analysis reports with confidence scores and impact assessments
-
-## 🏛️ Architecture
-
-The tool follows a modular, agent-based architecture:
-
-```
-AntiPattern_Remediator/
-├── 📁 src/                           # Core source code
-│   ├── 📁 core/                      # Analysis engine
-│   │   ├── 📁 agents/               # Specialized analysis agents
-│   │   │   ├── 🔧 base_agent.py     # Agent interface foundation
-│   │   │   ├── 🔍 antipattern_scanner.py  # Pattern detection agent
-│   │   │   ├── 🔄 code_transformer.py     # Code transformation agent
-│   │   │   └── 🛠️ refactoring_agent.py    # Refactoring strategy agent
-│   │   ├── 📁 graph/                # Workflow orchestration
-│   │   │   ├── 🌐 create_graph.py   # Main workflow builder
-│   │   │   └── ⚡ enhanced_workflow.py     # Advanced pipeline
-│   │   ├── 📋 state.py              # Shared state management
-│   │   └── 🔄 workflow.py           # Basic workflow definitions
-│   └── 📁 data/                     # Data management layer
-│       └── 📁 database/             # Vector database components
-│           └── 💾 vector_db.py      # Vector DB operations
-├── ⚙️ config/                       # Configuration management
-│   └── 📝 settings.py              # Application settings
-├── 🔧 scripts/                     # Utility scripts
-│   ├── 🚀 setup_db.py             # Database initialization
-│   └── ▶️ run_analysis.py          # Standalone analysis runner
-├── 📊 static/                      # Static resources
-│   ├── 📖 ap.txt                  # Antipattern knowledge base
-│   └── 💾 vector_db/              # Vector database storage
-├── 🎯 main.py                      # Main application entry point
-└── 📦 requirements.txt             # Python dependencies
-```
-
-## 🛠️ Installation & Setup
-
-### Prerequisites
+## What it does
+- Interprets rule-based static analysis (SonarQube) as signals, not ground truth.
+- Coordinates single-responsibility agents (Scanner -> Strategist -> Transformer -> Reviewer -> Explainer) with a shared context.
+- Enforces compile+test acceptance; tests are never modified.
+- Java, file scope only (no cross-file/architectural refactors).
+- Provider-agnostic LLM layer (e.g., local Ollama; hosted options supported) with externalised prompts.
+- Uses a keyed document Trove (definitions, symptoms, safe remedies) for deterministic retrieval.
 
+## Requirements
 - **Python 3.8+** 
-- **Ollama** (for LLM support)
+- An LLM backend (e.g., [Ollama](https://ollama.ai) locally)
+- SonarQube access ([local](https://docs.sonarsource.com/sonarqube-server/10.6/try-out-sonarqube/)) for static analysis
 - **Git**
+- **Java JDK** (11 recommended)
+- **Maven** (3.9.11 recommended)
 
-### Step-by-Step Installation
-
-1. **Clone the Repository**
-   ```bash
-   git clone https://github.com/your-repo/Legacy-Code-Migration.git
-   cd Legacy-Code-Migration
-   ```
-
-2. **Create Virtual Environment**
-   ```bash
-   python -m venv venv
-   source venv/bin/activate  # On Windows: venv\Scripts\activate
-   ```
-
-3. **Install Dependencies**
-   ```bash
-   pip install -r requirements.txt
-   ```
-
-4. **Install & Configure Ollama**
-   ```bash
-   # Install Ollama (visit https://ollama.ai for platform-specific instructions)
-   
-   # Pull required models
-   ollama pull granite3.3:8b          # Main analysis model
-   ollama pull nomic-embed-text       # Embedding model for vector search
-   ```
-
-5. **Initialize Vector Database**
-   ```bash
-   python scripts/setup_db.py
-   ```
+## Installation & Configuration
 
-6. **Verify Installation**
-   ```bash
-   python main.py
-   ```
-
-## 📖 Usage Guide
+```bash
+# 1. Clone
+git clone https://github.com/Andrei-Constantin-Programmer/Anti-Pattern-Resolutor.git
+cd Anti-Pattern-Resolutor
 
-### 🎯 Quick Start
+# 2. Create and activate virtual environment
+python -m venv venv
+source venv/bin/activate  # On Windows: venv\Scripts\activate
 
-**Analyze Sample Code:**
-```bash
-python main.py
+# 3. Install dependencies
+python install_requirements.py # pip install -r requirements.txt works on Unix, 
+                               # but not Windows due to incompatible libraries.
 ```
 
-**Custom Analysis:**
+Optional (Ollama locally):
 ```bash
-python scripts/run_analysis.py
+ollama pull granite3.3:8b
+ollama pull nomic-embed-text
 ```
 
-### 💻 Programmatic Usage
-
-```python
-from src.core.graph import CreateGraph
-from src.data.database import VectorDBManager
+Further LangChain configurations can be changed by modifying `AntiPattern_Remediator/config/settings.py`.
 
-# Initialize components
-vector_db = VectorDBManager()
-workflow = CreateGraph(db_manager=vector_db.get_db()).workflow
+## Usage
 
-# Analyze your Java code
-java_code = """
-public class UserManager {
-    private List<User> users = new ArrayList<>();
-    private List<String> logs = new ArrayList<>();
-    
-    public void addUser(User user) {
-        users.add(user);
-        logs.add("User added: " + user.getName());
-        // Send email notification
-        EmailService.sendWelcomeEmail(user);
-        // Update analytics
-        AnalyticsService.trackUserRegistration(user);
-    }
-    
-    public void generateReport() {
-        // Complex report generation logic...
-    }
-}
-"""
+### Prepare coverage candidates
+This stage clones repos, runs tests with JaCoCo, and writes a list of files with 100% line coverage to safely target.
 
-# Run analysis
-result = workflow.invoke({
-    "code": java_code,
-    "context": None,
-    "answer": None
-})
-
-print("Analysis Results:", result["answer"])
+Create `repos.txt` in the repository root:
 ```
-
-
-## ⚙️ Configuration
-
-Edit `config/settings.py` to customize behavior:
-
-```python
-# Model Configuration
-LLM_MODEL = "granite3.3:8b"           # Primary analysis model
-EMBEDDING_MODEL = "nomic-embed-text"   # Vector embedding model
-
-# Analysis Parameters
-CHUNK_SIZE = 1000                      # Text chunking for vector DB
-CHUNK_OVERLAP = 200                    # Overlap between chunks
-CONFIDENCE_THRESHOLD = 0.7             # Minimum confidence for pattern detection
-
-# Database Settings
-VECTOR_DB_DIR = "static/vector_db"     # Vector database location
+https://github.com/org/repo-one
+https://github.com/org/repo-two
 ```
 
-## 🧠 Supported Antipatterns
-
-The tool currently detects and provides refactoring guidance for:
-
-| Category | Antipatterns |
-|----------|-------------|
-| **Structural** | God Object, Long Method, Large Class, Data Class |
-| **Behavioral** | Feature Envy, Message Chains, Inappropriate Intimacy |
-| **Creational** | Singleton Abuse, Factory Abuse |
-| **Architectural** | Circular Dependencies, Tight Coupling |
-| **Performance** | N+1 Queries, Premature Optimization |
-
-## 🔧 Core Components
-
-### 🤖 Analysis Agents
-
-- **`AntipatternScanner`**: Identifies code smells and antipatterns using pattern matching and ML techniques
-- **`CodeTransformer`**: Applies automated code transformations and suggests improvements
-- **`RefactoringAgent`**: Generates comprehensive refactoring strategies with effort estimates
-
-### 🌐 Workflow Engine
-
-- **`CreateGraph`**: Orchestrates the complete analysis pipeline using LangGraph
-- **`EnhancedWorkflow`**: Advanced multi-step analysis with context-aware processing
-
-### 💾 Data Management
-
-- **`VectorDBManager`**: Manages vector database operations for knowledge retrieval
-- **Settings System**: Centralized configuration with environment-specific overrides
-
-## 📊 Sample Output
-
+Run:
+```bash
+# From repository root
+python jacoco_tool/jacoco_analysis.py --repos repos.txt
+
+# Useful flags:
+#   --single-repo https://github.com/user/repo
+#   --clone-dir clones
+#   --output-dir jacoco_results
+#   --force-jacoco
+#   --timeout 600
+#   --verbose
 ```
-🚀 Legacy Code Migration Tool - Analysis Results
-================================================================
-
-📋 ANTIPATTERN ANALYSIS RESULTS
-================================================================
 
-1. **God Object Detected**
-   - Location: UserManager class
-   - Issue: Class handles user management, logging, email notifications, and analytics
-   - Impact: High coupling, difficult to test and maintain
-   - Refactoring: Split into UserService, LoggingService, NotificationService
-   - Effort Estimate: 4-6 hours
+Outputs:
+- Cloned sources under `clones/` (default)
+- Coverage artefacts and a combined file list under `jacoco_results/`
+(path is printed at the end of the run)
 
-2. **Feature Envy Detected**
-   - Location: addUser() method
-   - Issue: Method heavily uses EmailService and AnalyticsService
-   - Impact: Poor cohesion, violation of Single Responsibility Principle
-   - Refactoring: Move email/analytics logic to respective services
-   - Effort Estimate: 2-3 hours
+### Provide a SonarQube token
+Generate a **user token** in SonarQube (My Account -> Security), then set it as `SONARQUBE_TOKEN`.
+- Docs: https://docs.sonarsource.com/sonarqube-server/latest/user-guide/managing-tokens/#generating-a-token
 
-3. **Long Method**
-   - Location: generateReport() method
-   - Issue: Method contains 45 lines of complex logic
-   - Impact: Difficult to understand and modify
-   - Refactoring: Extract smaller, focused methods
-   - Effort Estimate: 3-4 hours
-
-================================================================
-📊 Analysis Summary: 3 antipatterns detected
-🎯 Estimated Total Refactoring Effort: 9-13 hours
-📈 Code Quality Impact: High improvement expected
-================================================================
+Unix:
+```bash
+export SONARQUBE_TOKEN="paste-your-token"
 ```
 
-## 🤝 Contributing
-
-This is an active research project. We welcome contributions in several areas:
-
-- 🐛 **Bug Reports**: Submit issues via GitHub
-- 🔧 **Feature Requests**: Suggest new antipatterns or analysis capabilities
-- 📖 **Documentation**: Improve setup guides and usage examples
-- 🧪 **Testing**: Add test cases for edge scenarios
+Windows PowerShell (temporary):
+```powershell
+$env:SONARQUBE_TOKEN = "paste-your-token"
+```
 
-### Development Setup
+Windows (persist):
+```powershell
+setx SONARQUBE_TOKEN "paste-your-token"
+# Restart the terminal afterwards
+```
 
+### Run the Remediator
 ```bash
-# Clone with development dependencies
-pip install -r requirements.txt
-
-# Run tests
-python -m pytest tests/
-
-# Code formatting
-black src/
-isort src/
+python AntiPattern_Remediator/main.py
 ```
 
-## 📜 License
-
-This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+The pipeline selects 100%-covered files, proposes minimal, behaviour-preserving edits, and gates them behind compile + test.  
 
-## 🏆 Acknowledgments
+SonarQube is re-run for reporting.  
 
-- **IBM Research** - For providing technical expertise and computational resources
-- **University College London (UCL)** - For research guidance and academic support
-- **LangChain Community** - For the foundational LLM orchestration framework
-- **Ollama Project** - For making LLM deployment accessible and efficient
+Plans, diffs, logs, and summaries are written to the run output directory (path shown in the console).
 
-<!-- ## 📞 Support & Contact
+### Troubleshooting
+- **"No coverage results found"**  
+Ensure `mvn -q -DskipTests=false test` succeeds in each repo; consider `--timeout` and `--force-jacoco` when running the JaCoCo tool.
+- **Auth errors with SonarQube**  
+Confirm `SONARQUBE_TOKEN` is set in your current shell and your SonarQube URL is reachable.
+- **Java/Maven not found**  
+Verify JDK 11 and Maven are on `PATH`.
 
-- 📧 **Email**: [project-email@domain.com]
-- 🐛 **Issues**: [GitHub Issues](https://github.com/your-repo/Legacy-Code-Migration/issues)
-- 📖 **Documentation**: [Wiki](https://github.com/your-repo/Legacy-Code-Migration/wiki)
-- 💬 **Discussions**: [GitHub Discussions](https://github.com/your-repo/Legacy-Code-Migration/discussions) -->
-
----
-
-<div align="center">
-
-**Built with ❤️ for the developer community**
-
-[⭐ Star this repo](https://github.com/your-repo/Legacy-Code-Migration) | [🔧 Report Bug](https://github.com/your-repo/Legacy-Code-Migration/issues) | [💡 Request Feature](https://github.com/your-repo/Legacy-Code-Migration/issues)
-
-</div>
+## Acknowledgments
+- **IBM** - For providing technical expertise and computational resources, and mentorship from Dr Amrin Maria Khan and Prof. John McNamara
+- **University College London (UCL)** - For research guidance and academic support, under the supervision of Dr Jens Krinke
+- **LangChain Community** - For the foundational LLM orchestration framework
+- **Ollama Project** - For making LLM deployment accessible and efficient