|
5 | 5 |  |
6 | 6 |  |
7 | 7 |
|
8 | | -A cutting-edge research project developed in collaboration with **IBM Research** and **University College London (UCL)** to automatically migrate legacy Java codebases to modern, maintainable solutions using AI-powered antipattern detection and intelligent refactoring suggestions. |
| 8 | +Local-first, agentic refactoring pipeline for Java that pairs SonarQube findings with LLM-based reasoning to detect smells/anti-patterns and propose file-scoped, behaviour-preserving edits. Changes are gated by compile + tests; public interfaces are preserved; static analysis is re-run for reporting only. Run artefacts (plans, diffs, logs) are persisted for auditability. |
9 | 9 |
|
10 | | -## 🚀 Overview |
| 10 | +Project made in collaboration with **UCL** and **IBM**. |
11 | 11 |
|
12 | | -This tool leverages the power of Large Language Models (LLMs) and vector-based knowledge retrieval to provide comprehensive analysis of Java code, automatically detecting common antipatterns and suggesting concrete refactoring strategies. It represents a significant advancement in automated code modernization and technical debt reduction. |
13 | | - |
14 | | -## ✨ Key Features |
15 | | - |
16 | | -- 🔍 **Intelligent Antipattern Detection**: Automatically identifies 20+ common Java antipatterns including God Object, Long Method, Feature Envy, and more |
17 | | -- 🤖 **AI-Powered Analysis**: Utilizes state-of-the-art LLMs (Granite, Llama, etc.) for deep semantic code understanding |
18 | | -- 📊 **Context-Aware Analysis**: Vector database enables intelligent knowledge retrieval for more accurate assessments |
19 | | -- 🛠️ **Actionable Refactoring Recommendations**: Provides step-by-step refactoring guidance with effort estimates |
20 | | -- 🏗️ **Modular Agent Architecture**: Extensible design with specialized agents for different analysis tasks |
21 | | -- 📈 **Comprehensive Reporting**: Detailed analysis reports with confidence scores and impact assessments |
22 | | - |
23 | | -## 🏛️ Architecture |
24 | | - |
25 | | -The tool follows a modular, agent-based architecture: |
26 | | - |
27 | | -``` |
28 | | -AntiPattern_Remediator/ |
29 | | -├── 📁 src/ # Core source code |
30 | | -│ ├── 📁 core/ # Analysis engine |
31 | | -│ │ ├── 📁 agents/ # Specialized analysis agents |
32 | | -│ │ │ ├── 🔧 base_agent.py # Agent interface foundation |
33 | | -│ │ │ ├── 🔍 antipattern_scanner.py # Pattern detection agent |
34 | | -│ │ │ ├── 🔄 code_transformer.py # Code transformation agent |
35 | | -│ │ │ └── 🛠️ refactoring_agent.py # Refactoring strategy agent |
36 | | -│ │ ├── 📁 graph/ # Workflow orchestration |
37 | | -│ │ │ ├── 🌐 create_graph.py # Main workflow builder |
38 | | -│ │ │ └── ⚡ enhanced_workflow.py # Advanced pipeline |
39 | | -│ │ ├── 📋 state.py # Shared state management |
40 | | -│ │ └── 🔄 workflow.py # Basic workflow definitions |
41 | | -│ └── 📁 data/ # Data management layer |
42 | | -│ └── 📁 database/ # Vector database components |
43 | | -│ └── 💾 vector_db.py # Vector DB operations |
44 | | -├── ⚙️ config/ # Configuration management |
45 | | -│ └── 📝 settings.py # Application settings |
46 | | -├── 🔧 scripts/ # Utility scripts |
47 | | -│ ├── 🚀 setup_db.py # Database initialization |
48 | | -│ └── ▶️ run_analysis.py # Standalone analysis runner |
49 | | -├── 📊 static/ # Static resources |
50 | | -│ ├── 📖 ap.txt # Antipattern knowledge base |
51 | | -│ └── 💾 vector_db/ # Vector database storage |
52 | | -├── 🎯 main.py # Main application entry point |
53 | | -└── 📦 requirements.txt # Python dependencies |
54 | | -``` |
55 | | - |
56 | | -## 🛠️ Installation & Setup |
57 | | - |
58 | | -### Prerequisites |
| 12 | +## What it does |
| 13 | +- Interprets rule-based static analysis (SonarQube) as signals, not ground truth. |
| 14 | +- Coordinates single-responsibility agents (Scanner -> Strategist -> Transformer -> Reviewer -> Explainer) with a shared context. |
| 15 | +- Enforces compile+test acceptance; tests are never modified. |
| 16 | +- Java, file scope only (no cross-file/architectural refactors). |
| 17 | +- Provider-agnostic LLM layer (e.g., local Ollama; hosted options supported) with externalised prompts. |
| 18 | +- Uses a keyed document Trove (definitions, symptoms, safe remedies) for deterministic retrieval. |
59 | 19 |
|
| 20 | +## Requirements |
60 | 21 | - **Python 3.8+** |
61 | | -- **Ollama** (for LLM support) |
| 22 | +- An LLM backend (e.g., [Ollama](https://ollama.ai) locally) |
| 23 | +- SonarQube access ([local](https://docs.sonarsource.com/sonarqube-server/10.6/try-out-sonarqube/)) for static analysis |
62 | 24 | - **Git** |
| 25 | +- **Java JDK** (11 recommended) |
| 26 | +- **Maven** (3.9.11 recommended) |
63 | 27 |
|
64 | | -### Step-by-Step Installation |
65 | | - |
66 | | -1. **Clone the Repository** |
67 | | - ```bash |
68 | | - git clone https://github.com/your-repo/Legacy-Code-Migration.git |
69 | | - cd Legacy-Code-Migration |
70 | | - ``` |
71 | | - |
72 | | -2. **Create Virtual Environment** |
73 | | - ```bash |
74 | | - python -m venv venv |
75 | | - source venv/bin/activate # On Windows: venv\Scripts\activate |
76 | | - ``` |
77 | | - |
78 | | -3. **Install Dependencies** |
79 | | - ```bash |
80 | | - pip install -r requirements.txt |
81 | | - ``` |
82 | | - |
83 | | -4. **Install & Configure Ollama** |
84 | | - ```bash |
85 | | - # Install Ollama (visit https://ollama.ai for platform-specific instructions) |
86 | | - |
87 | | - # Pull required models |
88 | | - ollama pull granite3.3:8b # Main analysis model |
89 | | - ollama pull nomic-embed-text # Embedding model for vector search |
90 | | - ``` |
91 | | - |
92 | | -5. **Initialize Vector Database** |
93 | | - ```bash |
94 | | - python scripts/setup_db.py |
95 | | - ``` |
| 28 | +## Installation & Configuration |
96 | 29 |
|
97 | | -6. **Verify Installation** |
98 | | - ```bash |
99 | | - python main.py |
100 | | - ``` |
101 | | - |
102 | | -## 📖 Usage Guide |
| 30 | +```bash |
| 31 | +# 1. Clone |
| 32 | +git clone https://github.com/Andrei-Constantin-Programmer/Anti-Pattern-Resolutor.git |
| 33 | +cd Anti-Pattern-Resolutor |
103 | 34 |
|
104 | | -### 🎯 Quick Start |
| 35 | +# 2. Create and activate virtual environment |
| 36 | +python -m venv venv |
| 37 | +source venv/bin/activate # On Windows: venv\Scripts\activate |
105 | 38 |
|
106 | | -**Analyze Sample Code:** |
107 | | -```bash |
108 | | -python main.py |
| 39 | +# 3. Install dependencies |
| 40 | +python install_requirements.py # pip install -r requirements.txt works on Unix, |
| 41 | + # but not Windows due to incompatible libraries. |
109 | 42 | ``` |
110 | 43 |
|
111 | | -**Custom Analysis:** |
| 44 | +Optional (Ollama locally): |
112 | 45 | ```bash |
113 | | -python scripts/run_analysis.py |
| 46 | +ollama pull granite3.3:8b |
| 47 | +ollama pull nomic-embed-text |
114 | 48 | ``` |
115 | 49 |
|
116 | | -### 💻 Programmatic Usage |
117 | | - |
118 | | -```python |
119 | | -from src.core.graph import CreateGraph |
120 | | -from src.data.database import VectorDBManager |
| 50 | +Further LangChain configurations can be changed by modifying `AntiPattern_Remediator/config/settings.py`. |
121 | 51 |
|
122 | | -# Initialize components |
123 | | -vector_db = VectorDBManager() |
124 | | -workflow = CreateGraph(db_manager=vector_db.get_db()).workflow |
| 52 | +## Usage |
125 | 53 |
|
126 | | -# Analyze your Java code |
127 | | -java_code = """ |
128 | | -public class UserManager { |
129 | | - private List<User> users = new ArrayList<>(); |
130 | | - private List<String> logs = new ArrayList<>(); |
131 | | - |
132 | | - public void addUser(User user) { |
133 | | - users.add(user); |
134 | | - logs.add("User added: " + user.getName()); |
135 | | - // Send email notification |
136 | | - EmailService.sendWelcomeEmail(user); |
137 | | - // Update analytics |
138 | | - AnalyticsService.trackUserRegistration(user); |
139 | | - } |
140 | | - |
141 | | - public void generateReport() { |
142 | | - // Complex report generation logic... |
143 | | - } |
144 | | -} |
145 | | -""" |
| 54 | +### Prepare coverage candidates |
| 55 | +This stage clones repos, runs tests with JaCoCo, and writes a list of files with 100% line coverage to safely target. |
146 | 56 |
|
147 | | -# Run analysis |
148 | | -result = workflow.invoke({ |
149 | | - "code": java_code, |
150 | | - "context": None, |
151 | | - "answer": None |
152 | | -}) |
153 | | - |
154 | | -print("Analysis Results:", result["answer"]) |
| 57 | +Create `repos.txt` in the repository root: |
155 | 58 | ``` |
156 | | - |
157 | | - |
158 | | -## ⚙️ Configuration |
159 | | - |
160 | | -Edit `config/settings.py` to customize behavior: |
161 | | - |
162 | | -```python |
163 | | -# Model Configuration |
164 | | -LLM_MODEL = "granite3.3:8b" # Primary analysis model |
165 | | -EMBEDDING_MODEL = "nomic-embed-text" # Vector embedding model |
166 | | - |
167 | | -# Analysis Parameters |
168 | | -CHUNK_SIZE = 1000 # Text chunking for vector DB |
169 | | -CHUNK_OVERLAP = 200 # Overlap between chunks |
170 | | -CONFIDENCE_THRESHOLD = 0.7 # Minimum confidence for pattern detection |
171 | | - |
172 | | -# Database Settings |
173 | | -VECTOR_DB_DIR = "static/vector_db" # Vector database location |
| 59 | +https://github.com/org/repo-one |
| 60 | +https://github.com/org/repo-two |
174 | 61 | ``` |
175 | 62 |
|
176 | | -## 🧠 Supported Antipatterns |
177 | | - |
178 | | -The tool currently detects and provides refactoring guidance for: |
179 | | - |
180 | | -| Category | Antipatterns | |
181 | | -|----------|-------------| |
182 | | -| **Structural** | God Object, Long Method, Large Class, Data Class | |
183 | | -| **Behavioral** | Feature Envy, Message Chains, Inappropriate Intimacy | |
184 | | -| **Creational** | Singleton Abuse, Factory Abuse | |
185 | | -| **Architectural** | Circular Dependencies, Tight Coupling | |
186 | | -| **Performance** | N+1 Queries, Premature Optimization | |
187 | | - |
188 | | -## 🔧 Core Components |
189 | | - |
190 | | -### 🤖 Analysis Agents |
191 | | - |
192 | | -- **`AntipatternScanner`**: Identifies code smells and antipatterns using pattern matching and ML techniques |
193 | | -- **`CodeTransformer`**: Applies automated code transformations and suggests improvements |
194 | | -- **`RefactoringAgent`**: Generates comprehensive refactoring strategies with effort estimates |
195 | | - |
196 | | -### 🌐 Workflow Engine |
197 | | - |
198 | | -- **`CreateGraph`**: Orchestrates the complete analysis pipeline using LangGraph |
199 | | -- **`EnhancedWorkflow`**: Advanced multi-step analysis with context-aware processing |
200 | | - |
201 | | -### 💾 Data Management |
202 | | - |
203 | | -- **`VectorDBManager`**: Manages vector database operations for knowledge retrieval |
204 | | -- **Settings System**: Centralized configuration with environment-specific overrides |
205 | | - |
206 | | -## 📊 Sample Output |
207 | | - |
| 63 | +Run: |
| 64 | +```bash |
| 65 | +# From repository root |
| 66 | +python jacoco_tool/jacoco_analysis.py --repos repos.txt |
| 67 | + |
| 68 | +# Useful flags: |
| 69 | +# --single-repo https://github.com/user/repo |
| 70 | +# --clone-dir clones |
| 71 | +# --output-dir jacoco_results |
| 72 | +# --force-jacoco |
| 73 | +# --timeout 600 |
| 74 | +# --verbose |
208 | 75 | ``` |
209 | | -🚀 Legacy Code Migration Tool - Analysis Results |
210 | | -================================================================ |
211 | | -
|
212 | | -📋 ANTIPATTERN ANALYSIS RESULTS |
213 | | -================================================================ |
214 | 76 |
|
215 | | -1. **God Object Detected** |
216 | | - - Location: UserManager class |
217 | | - - Issue: Class handles user management, logging, email notifications, and analytics |
218 | | - - Impact: High coupling, difficult to test and maintain |
219 | | - - Refactoring: Split into UserService, LoggingService, NotificationService |
220 | | - - Effort Estimate: 4-6 hours |
| 77 | +Outputs: |
| 78 | +- Cloned sources under `clones/` (default) |
| 79 | +- Coverage artefacts and a combined file list under `jacoco_results/` |
| 80 | +(path is printed at the end of the run) |
221 | 81 |
|
222 | | -2. **Feature Envy Detected** |
223 | | - - Location: addUser() method |
224 | | - - Issue: Method heavily uses EmailService and AnalyticsService |
225 | | - - Impact: Poor cohesion, violation of Single Responsibility Principle |
226 | | - - Refactoring: Move email/analytics logic to respective services |
227 | | - - Effort Estimate: 2-3 hours |
| 82 | +### Provide a SonarQube token |
| 83 | +Generate a **user token** in SonarQube (My Account -> Security), then set it as `SONARQUBE_TOKEN`. |
| 84 | +- Docs: https://docs.sonarsource.com/sonarqube-server/latest/user-guide/managing-tokens/#generating-a-token |
228 | 85 |
|
229 | | -3. **Long Method** |
230 | | - - Location: generateReport() method |
231 | | - - Issue: Method contains 45 lines of complex logic |
232 | | - - Impact: Difficult to understand and modify |
233 | | - - Refactoring: Extract smaller, focused methods |
234 | | - - Effort Estimate: 3-4 hours |
235 | | -
|
236 | | -================================================================ |
237 | | -📊 Analysis Summary: 3 antipatterns detected |
238 | | -🎯 Estimated Total Refactoring Effort: 9-13 hours |
239 | | -📈 Code Quality Impact: High improvement expected |
240 | | -================================================================ |
| 86 | +Unix: |
| 87 | +```bash |
| 88 | +export SONARQUBE_TOKEN="paste-your-token" |
241 | 89 | ``` |
242 | 90 |
|
243 | | -## 🤝 Contributing |
244 | | - |
245 | | -This is an active research project. We welcome contributions in several areas: |
246 | | - |
247 | | -- 🐛 **Bug Reports**: Submit issues via GitHub |
248 | | -- 🔧 **Feature Requests**: Suggest new antipatterns or analysis capabilities |
249 | | -- 📖 **Documentation**: Improve setup guides and usage examples |
250 | | -- 🧪 **Testing**: Add test cases for edge scenarios |
| 91 | +Windows PowerShell (temporary): |
| 92 | +```powershell |
| 93 | +$env:SONARQUBE_TOKEN = "paste-your-token" |
| 94 | +``` |
251 | 95 |
|
252 | | -### Development Setup |
| 96 | +Windows (persist): |
| 97 | +```powershell |
| 98 | +setx SONARQUBE_TOKEN "paste-your-token" |
| 99 | +# Restart the terminal afterwards |
| 100 | +``` |
253 | 101 |
|
| 102 | +### Run the Remediator |
254 | 103 | ```bash |
255 | | -# Clone with development dependencies |
256 | | -pip install -r requirements.txt |
257 | | - |
258 | | -# Run tests |
259 | | -python -m pytest tests/ |
260 | | - |
261 | | -# Code formatting |
262 | | -black src/ |
263 | | -isort src/ |
| 104 | +python AntiPattern_Remediator/main.py |
264 | 105 | ``` |
265 | 106 |
|
266 | | -## 📜 License |
267 | | - |
268 | | -This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. |
| 107 | +The pipeline selects 100%-covered files, proposes minimal, behaviour-preserving edits, and gates them behind compile + test. |
269 | 108 |
|
270 | | -## 🏆 Acknowledgments |
| 109 | +SonarQube is re-run for reporting. |
271 | 110 |
|
272 | | -- **IBM Research** - For providing technical expertise and computational resources |
273 | | -- **University College London (UCL)** - For research guidance and academic support |
274 | | -- **LangChain Community** - For the foundational LLM orchestration framework |
275 | | -- **Ollama Project** - For making LLM deployment accessible and efficient |
| 111 | +Plans, diffs, logs, and summaries are written to the run output directory (path shown in the console). |
276 | 112 |
|
277 | | -<!-- ## 📞 Support & Contact |
| 113 | +### Troubleshooting |
| 114 | +- **"No coverage results found"** |
| 115 | +Ensure `mvn -q -DskipTests=false test` succeeds in each repo; consider `--timeout` and `--force-jacoco` when running the JaCoCo tool. |
| 116 | +- **Auth errors with SonarQube** |
| 117 | +Confirm `SONARQUBE_TOKEN` is set in your current shell and your SonarQube URL is reachable. |
| 118 | +- **Java/Maven not found** |
| 119 | +Verify JDK 11 and Maven are on `PATH`. |
278 | 120 |
|
279 | | -- 📧 **Email**: [[email protected]] |
280 | | -- 🐛 **Issues**: [GitHub Issues](https://github.com/your-repo/Legacy-Code-Migration/issues) |
281 | | -- 📖 **Documentation**: [Wiki](https://github.com/your-repo/Legacy-Code-Migration/wiki) |
282 | | -- 💬 **Discussions**: [GitHub Discussions](https://github.com/your-repo/Legacy-Code-Migration/discussions) --> |
283 | | - |
284 | | ---- |
285 | | - |
286 | | -<div align="center"> |
287 | | - |
288 | | -**Built with ❤️ for the developer community** |
289 | | - |
290 | | -[⭐ Star this repo](https://github.com/your-repo/Legacy-Code-Migration) | [🔧 Report Bug](https://github.com/your-repo/Legacy-Code-Migration/issues) | [💡 Request Feature](https://github.com/your-repo/Legacy-Code-Migration/issues) |
291 | | - |
292 | | -</div> |
| 121 | +## Acknowledgments |
| 122 | +- **IBM** - For providing technical expertise and computational resources, and mentorship from Dr Amrin Maria Khan and Prof. John McNamara |
| 123 | +- **University College London (UCL)** - For research guidance and academic support, under the supervision of Dr Jens Krinke |
| 124 | +- **LangChain Community** - For the foundational LLM orchestration framework |
| 125 | +- **Ollama Project** - For making LLM deployment accessible and efficient |
0 commit comments