Skip to content

Commit 1069c9c

Browse files
Updated README
1 parent 7476f7e commit 1069c9c

File tree

1 file changed

+83
-250
lines changed

1 file changed

+83
-250
lines changed

README.md

Lines changed: 83 additions & 250 deletions
Original file line numberDiff line numberDiff line change
@@ -5,288 +5,121 @@
55
![Python](https://img.shields.io/badge/python-3.8+-blue.svg)
66
![Status](https://img.shields.io/badge/status-research-orange.svg)
77

8-
A cutting-edge research project developed in collaboration with **IBM Research** and **University College London (UCL)** to automatically migrate legacy Java codebases to modern, maintainable solutions using AI-powered antipattern detection and intelligent refactoring suggestions.
8+
Local-first, agentic refactoring pipeline for Java that pairs SonarQube findings with LLM-based reasoning to detect smells/anti-patterns and propose file-scoped, behaviour-preserving edits. Changes are gated by compile + tests; public interfaces are preserved; static analysis is re-run for reporting only. Run artefacts (plans, diffs, logs) are persisted for auditability.
99

10-
## 🚀 Overview
10+
Project made in collaboration with **UCL** and **IBM**.
1111

12-
This tool leverages the power of Large Language Models (LLMs) and vector-based knowledge retrieval to provide comprehensive analysis of Java code, automatically detecting common antipatterns and suggesting concrete refactoring strategies. It represents a significant advancement in automated code modernization and technical debt reduction.
13-
14-
## ✨ Key Features
15-
16-
- 🔍 **Intelligent Antipattern Detection**: Automatically identifies 20+ common Java antipatterns including God Object, Long Method, Feature Envy, and more
17-
- 🤖 **AI-Powered Analysis**: Utilizes state-of-the-art LLMs (Granite, Llama, etc.) for deep semantic code understanding
18-
- 📊 **Context-Aware Analysis**: Vector database enables intelligent knowledge retrieval for more accurate assessments
19-
- 🛠️ **Actionable Refactoring Recommendations**: Provides step-by-step refactoring guidance with effort estimates
20-
- 🏗️ **Modular Agent Architecture**: Extensible design with specialized agents for different analysis tasks
21-
- 📈 **Comprehensive Reporting**: Detailed analysis reports with confidence scores and impact assessments
22-
23-
## 🏛️ Architecture
24-
25-
The tool follows a modular, agent-based architecture:
26-
27-
```
28-
AntiPattern_Remediator/
29-
├── 📁 src/ # Core source code
30-
│ ├── 📁 core/ # Analysis engine
31-
│ │ ├── 📁 agents/ # Specialized analysis agents
32-
│ │ │ ├── 🔧 base_agent.py # Agent interface foundation
33-
│ │ │ ├── 🔍 antipattern_scanner.py # Pattern detection agent
34-
│ │ │ ├── 🔄 code_transformer.py # Code transformation agent
35-
│ │ │ └── 🛠️ refactoring_agent.py # Refactoring strategy agent
36-
│ │ ├── 📁 graph/ # Workflow orchestration
37-
│ │ │ ├── 🌐 create_graph.py # Main workflow builder
38-
│ │ │ └── ⚡ enhanced_workflow.py # Advanced pipeline
39-
│ │ ├── 📋 state.py # Shared state management
40-
│ │ └── 🔄 workflow.py # Basic workflow definitions
41-
│ └── 📁 data/ # Data management layer
42-
│ └── 📁 database/ # Vector database components
43-
│ └── 💾 vector_db.py # Vector DB operations
44-
├── ⚙️ config/ # Configuration management
45-
│ └── 📝 settings.py # Application settings
46-
├── 🔧 scripts/ # Utility scripts
47-
│ ├── 🚀 setup_db.py # Database initialization
48-
│ └── ▶️ run_analysis.py # Standalone analysis runner
49-
├── 📊 static/ # Static resources
50-
│ ├── 📖 ap.txt # Antipattern knowledge base
51-
│ └── 💾 vector_db/ # Vector database storage
52-
├── 🎯 main.py # Main application entry point
53-
└── 📦 requirements.txt # Python dependencies
54-
```
55-
56-
## 🛠️ Installation & Setup
57-
58-
### Prerequisites
12+
## What it does
13+
- Interprets rule-based static analysis (SonarQube) as signals, not ground truth.
14+
- Coordinates single-responsibility agents (Scanner -> Strategist -> Transformer -> Reviewer -> Explainer) with a shared context.
15+
- Enforces compile+test acceptance; tests are never modified.
16+
- Java, file scope only (no cross-file/architectural refactors).
17+
- Provider-agnostic LLM layer (e.g., local Ollama; hosted options supported) with externalised prompts.
18+
- Uses a keyed document Trove (definitions, symptoms, safe remedies) for deterministic retrieval.
5919

20+
## Requirements
6021
- **Python 3.8+**
61-
- **Ollama** (for LLM support)
22+
- An LLM backend (e.g., [Ollama](https://ollama.ai) locally)
23+
- SonarQube access ([local](https://docs.sonarsource.com/sonarqube-server/10.6/try-out-sonarqube/)) for static analysis
6224
- **Git**
25+
- **Java JDK** (11 recommended)
26+
- **Maven** (3.9.11 recommended)
6327

64-
### Step-by-Step Installation
65-
66-
1. **Clone the Repository**
67-
```bash
68-
git clone https://github.com/your-repo/Legacy-Code-Migration.git
69-
cd Legacy-Code-Migration
70-
```
71-
72-
2. **Create Virtual Environment**
73-
```bash
74-
python -m venv venv
75-
source venv/bin/activate # On Windows: venv\Scripts\activate
76-
```
77-
78-
3. **Install Dependencies**
79-
```bash
80-
pip install -r requirements.txt
81-
```
82-
83-
4. **Install & Configure Ollama**
84-
```bash
85-
# Install Ollama (visit https://ollama.ai for platform-specific instructions)
86-
87-
# Pull required models
88-
ollama pull granite3.3:8b # Main analysis model
89-
ollama pull nomic-embed-text # Embedding model for vector search
90-
```
91-
92-
5. **Initialize Vector Database**
93-
```bash
94-
python scripts/setup_db.py
95-
```
28+
## Installation & Configuration
9629

97-
6. **Verify Installation**
98-
```bash
99-
python main.py
100-
```
101-
102-
## 📖 Usage Guide
30+
```bash
31+
# 1. Clone
32+
git clone https://github.com/Andrei-Constantin-Programmer/Anti-Pattern-Resolutor.git
33+
cd Anti-Pattern-Resolutor
10334

104-
### 🎯 Quick Start
35+
# 2. Create and activate virtual environment
36+
python -m venv venv
37+
source venv/bin/activate # On Windows: venv\Scripts\activate
10538

106-
**Analyze Sample Code:**
107-
```bash
108-
python main.py
39+
# 3. Install dependencies
40+
python install_requirements.py # pip install -r requirements.txt works on Unix,
41+
# but not Windows due to incompatible libraries.
10942
```
11043

111-
**Custom Analysis:**
44+
Optional (Ollama locally):
11245
```bash
113-
python scripts/run_analysis.py
46+
ollama pull granite3.3:8b
47+
ollama pull nomic-embed-text
11448
```
11549

116-
### 💻 Programmatic Usage
117-
118-
```python
119-
from src.core.graph import CreateGraph
120-
from src.data.database import VectorDBManager
50+
Further LangChain configurations can be changed by modifying `AntiPattern_Remediator/config/settings.py`.
12151

122-
# Initialize components
123-
vector_db = VectorDBManager()
124-
workflow = CreateGraph(db_manager=vector_db.get_db()).workflow
52+
## Usage
12553

126-
# Analyze your Java code
127-
java_code = """
128-
public class UserManager {
129-
private List<User> users = new ArrayList<>();
130-
private List<String> logs = new ArrayList<>();
131-
132-
public void addUser(User user) {
133-
users.add(user);
134-
logs.add("User added: " + user.getName());
135-
// Send email notification
136-
EmailService.sendWelcomeEmail(user);
137-
// Update analytics
138-
AnalyticsService.trackUserRegistration(user);
139-
}
140-
141-
public void generateReport() {
142-
// Complex report generation logic...
143-
}
144-
}
145-
"""
54+
### Prepare coverage candidates
55+
This stage clones repos, runs tests with JaCoCo, and writes a list of files with 100% line coverage to safely target.
14656

147-
# Run analysis
148-
result = workflow.invoke({
149-
"code": java_code,
150-
"context": None,
151-
"answer": None
152-
})
153-
154-
print("Analysis Results:", result["answer"])
57+
Create `repos.txt` in the repository root:
15558
```
156-
157-
158-
## ⚙️ Configuration
159-
160-
Edit `config/settings.py` to customize behavior:
161-
162-
```python
163-
# Model Configuration
164-
LLM_MODEL = "granite3.3:8b" # Primary analysis model
165-
EMBEDDING_MODEL = "nomic-embed-text" # Vector embedding model
166-
167-
# Analysis Parameters
168-
CHUNK_SIZE = 1000 # Text chunking for vector DB
169-
CHUNK_OVERLAP = 200 # Overlap between chunks
170-
CONFIDENCE_THRESHOLD = 0.7 # Minimum confidence for pattern detection
171-
172-
# Database Settings
173-
VECTOR_DB_DIR = "static/vector_db" # Vector database location
59+
https://github.com/org/repo-one
60+
https://github.com/org/repo-two
17461
```
17562

176-
## 🧠 Supported Antipatterns
177-
178-
The tool currently detects and provides refactoring guidance for:
179-
180-
| Category | Antipatterns |
181-
|----------|-------------|
182-
| **Structural** | God Object, Long Method, Large Class, Data Class |
183-
| **Behavioral** | Feature Envy, Message Chains, Inappropriate Intimacy |
184-
| **Creational** | Singleton Abuse, Factory Abuse |
185-
| **Architectural** | Circular Dependencies, Tight Coupling |
186-
| **Performance** | N+1 Queries, Premature Optimization |
187-
188-
## 🔧 Core Components
189-
190-
### 🤖 Analysis Agents
191-
192-
- **`AntipatternScanner`**: Identifies code smells and antipatterns using pattern matching and ML techniques
193-
- **`CodeTransformer`**: Applies automated code transformations and suggests improvements
194-
- **`RefactoringAgent`**: Generates comprehensive refactoring strategies with effort estimates
195-
196-
### 🌐 Workflow Engine
197-
198-
- **`CreateGraph`**: Orchestrates the complete analysis pipeline using LangGraph
199-
- **`EnhancedWorkflow`**: Advanced multi-step analysis with context-aware processing
200-
201-
### 💾 Data Management
202-
203-
- **`VectorDBManager`**: Manages vector database operations for knowledge retrieval
204-
- **Settings System**: Centralized configuration with environment-specific overrides
205-
206-
## 📊 Sample Output
207-
63+
Run:
64+
```bash
65+
# From repository root
66+
python jacoco_tool/jacoco_analysis.py --repos repos.txt
67+
68+
# Useful flags:
69+
# --single-repo https://github.com/user/repo
70+
# --clone-dir clones
71+
# --output-dir jacoco_results
72+
# --force-jacoco
73+
# --timeout 600
74+
# --verbose
20875
```
209-
🚀 Legacy Code Migration Tool - Analysis Results
210-
================================================================
211-
212-
📋 ANTIPATTERN ANALYSIS RESULTS
213-
================================================================
21476

215-
1. **God Object Detected**
216-
- Location: UserManager class
217-
- Issue: Class handles user management, logging, email notifications, and analytics
218-
- Impact: High coupling, difficult to test and maintain
219-
- Refactoring: Split into UserService, LoggingService, NotificationService
220-
- Effort Estimate: 4-6 hours
77+
Outputs:
78+
- Cloned sources under `clones/` (default)
79+
- Coverage artefacts and a combined file list under `jacoco_results/`
80+
(path is printed at the end of the run)
22181

222-
2. **Feature Envy Detected**
223-
- Location: addUser() method
224-
- Issue: Method heavily uses EmailService and AnalyticsService
225-
- Impact: Poor cohesion, violation of Single Responsibility Principle
226-
- Refactoring: Move email/analytics logic to respective services
227-
- Effort Estimate: 2-3 hours
82+
### Provide a SonarQube token
83+
Generate a **user token** in SonarQube (My Account -> Security), then set it as `SONARQUBE_TOKEN`.
84+
- Docs: https://docs.sonarsource.com/sonarqube-server/latest/user-guide/managing-tokens/#generating-a-token
22885

229-
3. **Long Method**
230-
- Location: generateReport() method
231-
- Issue: Method contains 45 lines of complex logic
232-
- Impact: Difficult to understand and modify
233-
- Refactoring: Extract smaller, focused methods
234-
- Effort Estimate: 3-4 hours
235-
236-
================================================================
237-
📊 Analysis Summary: 3 antipatterns detected
238-
🎯 Estimated Total Refactoring Effort: 9-13 hours
239-
📈 Code Quality Impact: High improvement expected
240-
================================================================
86+
Unix:
87+
```bash
88+
export SONARQUBE_TOKEN="paste-your-token"
24189
```
24290

243-
## 🤝 Contributing
244-
245-
This is an active research project. We welcome contributions in several areas:
246-
247-
- 🐛 **Bug Reports**: Submit issues via GitHub
248-
- 🔧 **Feature Requests**: Suggest new antipatterns or analysis capabilities
249-
- 📖 **Documentation**: Improve setup guides and usage examples
250-
- 🧪 **Testing**: Add test cases for edge scenarios
91+
Windows PowerShell (temporary):
92+
```powershell
93+
$env:SONARQUBE_TOKEN = "paste-your-token"
94+
```
25195

252-
### Development Setup
96+
Windows (persist):
97+
```powershell
98+
setx SONARQUBE_TOKEN "paste-your-token"
99+
# Restart the terminal afterwards
100+
```
253101

102+
### Run the Remediator
254103
```bash
255-
# Clone with development dependencies
256-
pip install -r requirements.txt
257-
258-
# Run tests
259-
python -m pytest tests/
260-
261-
# Code formatting
262-
black src/
263-
isort src/
104+
python AntiPattern_Remediator/main.py
264105
```
265106

266-
## 📜 License
267-
268-
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
107+
The pipeline selects 100%-covered files, proposes minimal, behaviour-preserving edits, and gates them behind compile + test.
269108

270-
## 🏆 Acknowledgments
109+
SonarQube is re-run for reporting.
271110

272-
- **IBM Research** - For providing technical expertise and computational resources
273-
- **University College London (UCL)** - For research guidance and academic support
274-
- **LangChain Community** - For the foundational LLM orchestration framework
275-
- **Ollama Project** - For making LLM deployment accessible and efficient
111+
Plans, diffs, logs, and summaries are written to the run output directory (path shown in the console).
276112

277-
<!-- ## 📞 Support & Contact
113+
### Troubleshooting
114+
- **"No coverage results found"**
115+
Ensure `mvn -q -DskipTests=false test` succeeds in each repo; consider `--timeout` and `--force-jacoco` when running the JaCoCo tool.
116+
- **Auth errors with SonarQube**
117+
Confirm `SONARQUBE_TOKEN` is set in your current shell and your SonarQube URL is reachable.
118+
- **Java/Maven not found**
119+
Verify JDK 11 and Maven are on `PATH`.
278120

279-
- 📧 **Email**: [[email protected]]
280-
- 🐛 **Issues**: [GitHub Issues](https://github.com/your-repo/Legacy-Code-Migration/issues)
281-
- 📖 **Documentation**: [Wiki](https://github.com/your-repo/Legacy-Code-Migration/wiki)
282-
- 💬 **Discussions**: [GitHub Discussions](https://github.com/your-repo/Legacy-Code-Migration/discussions) -->
283-
284-
---
285-
286-
<div align="center">
287-
288-
**Built with ❤️ for the developer community**
289-
290-
[⭐ Star this repo](https://github.com/your-repo/Legacy-Code-Migration) | [🔧 Report Bug](https://github.com/your-repo/Legacy-Code-Migration/issues) | [💡 Request Feature](https://github.com/your-repo/Legacy-Code-Migration/issues)
291-
292-
</div>
121+
## Acknowledgments
122+
- **IBM** - For providing technical expertise and computational resources, and mentorship from Dr Amrin Maria Khan and Prof. John McNamara
123+
- **University College London (UCL)** - For research guidance and academic support, under the supervision of Dr Jens Krinke
124+
- **LangChain Community** - For the foundational LLM orchestration framework
125+
- **Ollama Project** - For making LLM deployment accessible and efficient

0 commit comments

Comments
 (0)