Skip to content

bigsparsh/gitmate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 

Repository files navigation

GitMate

Your AI-Powered Guide to Understanding Any Codebase

Python 3.13+ License: MIT Made with LangChain Tree-sitter Powered

Onboarding to a new codebase shouldn't feel like deciphering ancient hieroglyphics.

Getting Started . Features . Usage . Contributing


Description

GitMate transforms the daunting task of codebase onboarding into an intuitive, interactive experience. By fusing deterministic static analysis (ASTs, LSP) with probabilistic AI reasoning (LLMs, RAG), GitMate provides a complete semantic understanding of any repository. It allows developers to "chat" with their code, visualizes complex dependencies, and accelerates the time-to-understanding by up to 70%.

The Core Problem: Cognitive Overload

Modern software repositories are complex, interconnected ecosystems. For new developers, the learning curve is steep and costly:

  1. High Code Volume, Low Documentation: READMEs rarely capture the intricate runtime behaviors or architectural decisions.
  2. Invisible Dependencies: Modifying a single function can have cascading effects that static linters miss.
  3. Inefficient Onboarding: Developers spend nearly 75% of their time reading code versus writing it. The "Time-to-First-Commit" often spans weeks.
  4. Legacy Black Boxes: Inheriting undocumented legacy code is risky and error-prone without deep contextual understanding.

Who struggles most?

  • New Team Members needing to become productive immediately.
  • Open Source Contributors navigating massive, unfamiliar projects.
  • Maintainers auditing legacy systems or refactoring complex modules.

The Solution: Neuro-Symbolic Code Analysis

GitMate bridges the gap between raw code and human understanding:

  1. Precision Parsing (The Logic)

    • Utilizing Tree-sitter, GitMate constructs a rigorous Abstract Syntax Tree (AST) of the codebase, ensuring every function, class, and variable is indexed with 100% accuracy.
  2. Semantic Enrichment (The Knowledge)

    • An LSP (Language Server Protocol) client resolves symbol references and call hierarchies, mapping the "connectome" of the software.
  3. AI Synthesis (The Insight)

    • Large Language Models (Llama 3.3 via Groq) generate human-readable explanations for every entity, stored in a FAISS vector database for semantic retrieval.
  4. Interactive Exploration

    • A modern Next.js 16 web dashboard allows users to query the codebase using natural language, visualize data flows, and navigate complex architectures effortlessly.

Features

Intelligent Code Parsing

  • Tree-sitter AST analysis for accurate, error-tolerant parsing
  • Extracts functions, variables, structs, enums across files
  • Polyglot Support: C/C++, TypeScript/TSX, JSON, and Python
  • Captures exact file locations and structural context

Architecture Awareness

  • Reference Tracking: Instantly find symbol usage across the project
  • Call Hierarchy: Visualize upstream callers and downstream dependencies
  • LSP Integration: Leverages clangd and typescript-language-server
  • Smart Degradation: Falls back to heuristic analysis if LSP is absent

Neuro-Symbolic AI Core

  • Context-Aware Explanations: Auto-generates docs for undocumented code
  • Streaming RAG: Retrieval Augmented Generation with sub-second latency
  • High-Performance Inference: Powered by Groq's Llama 3.3 70B
  • Vector Memory: Persistent semantic index using FAISS & Ollama

Modern Web Interface

  • Next.js 16 Dashboard: Built with React 19 and TailwindCSS 4
  • Real-time Chat: Streaming responses via WebSockets
  • Visualizations: Interactive dependency graphs and file trees
  • Multi-Tenant: Project isolation with Postgres & Prisma

Impact & Use Cases

  • "Explain This Function": Instant clarity on complex logic
  • "What breaks if I change this?": Impact analysis via dependency graphs
  • "How does Auth work?": Semantic search across the entire modules
  • Secure by Design: Local embedding generation and isolated project environments

Project Gallery

Experience the power of GitMate through our modern web interface.

1

Comprehensive Dashboard Overview

2 3 4
5 6 7
8 9 (1) 1

INSTALLATION

Prerequisites

  • Python 3.13+ (Backend)
  • Node.js 18+ & pnpm (Frontend)
  • PostgreSQL (Database)
  • Ollama (Embeddings) - Install Ollama
  • UV (Python Package Manager) - Install UV

STEP 1: CLONE THE REPOSITORY

git clone https://github.com/bigsparsh/gitmate.git
cd gitmate

STEP 2: BACKEND SETUP

cd backend

# Install dependencies using UV
uv sync

# Configure environment
# Create .env file with GROQ_API_KEY and DATABASE_URL

STEP 3: FRONTEND SETUP

cd frontend

# Install dependencies
pnpm install

# Initialize Database
pnpm prisma generate
pnpm prisma db push

STEP 4: AI & SERVICES SETUP

# Pull the embedding model locally
ollama pull nomic-embed-text

STEP 5: LSP SETUP (OPTIONAL)

For enhanced tracking and call hierarchy features:

# For C/C++ support
sudo apt install clangd    # Ubuntu/Debian
brew install llvm          # macOS

USAGE

1. Start the Backend Server

cd backend
source .venv/bin/activate
uv run server.py
# Server runs at http://localhost:8000

2. Start the Frontend Dashboard

cd frontend
pnpm dev
# Dashboard available at http://localhost:3000

3. Explore Your Codebase

  1. Open http://localhost:3000 in your browser.
  2. Enter a GitHub Repository URL to start a new project.
  3. The system will clone, parse, and analyze the repo in the background.
  4. Interact with the Chat, File Explorer, or Dependency Graph to understand the code.
6

ARCHITECTURE

Technical Stack & Data Flow

GitMate employs a Hybrid Neuro-Symbolic Architecture that combines the deterministic precision of static analysis with the probabilistic reasoning of Large Language Models.

Architecture

1. The Persistence Layer (Backend)

  • FastAPI (Python 3.13): High-performance async API server handling WebSocket streams for real-time chat.
  • Tree-sitter: Incremental parsing library extracting precise ASTs for C++, Python, TypeScript, and Java.
  • LSP Client: A custom Python wrapper interacting with clangd and tsserver via stdio pipes to extract Call Hierarchies and References.

2. The Cognitive Layer (AI Engine)

  • Vector Store: FAISS (Facebook AI Similarity Search) indexes code chunks using Nomic Embed Text (via Ollama) for local, privacy-focused semantic retrieval.
  • Inference: Groq API running Llama 3.3 70B provides near-instantaneous reasoning and code explanation.
  • RAG Pipeline: LangChain orchestrates the retrieval of semantic context + AST structure + Call Graph data to ground the LLM's responses in reality.

3. The Presentation Layer (Frontend)

  • Framework: Next.js 16 (App Router) & React 19 for server-side rendering and static generation.
  • State & UI: TailwindCSS 4 for styling, Mermaid.js for rendering live dependency graphs, and Prisma ORM for managing user sessions and history.
  • Streaming: Server-Sent Events (SSE) and WebSockets ensure a fluid, "typing-like" experience during AI generation.

4. Data Model

  • PostgreSQL: Stores relational data (Users, Projects, Chat History).
  • Relational Integrity: Tracks the lineage of every analysis session and user interaction.

Project Structure


gitmate/
├── frontend/
│   ├── app/
│   ├── components/
│   ├── hooks/
│   ├── lib/
│   ├── prisma/
│   ├── public/
│   ├── types/
│   ├── .gitignore
│   ├── README.md
│   ├── components.json
│   ├── eslint.config.mjs
│   ├── instructions.md
│   ├── middleware.ts
│   ├── next.config.ts
│   ├── package.json
│   ├── pnpm-lock.yaml
│   ├── pnpm-workspace.yaml
│   ├── postcss.config.mjs
│   ├── prisma.config.ts
│   ├── tsconfig.json
│
├── backend/
│   ├── assets/
│   ├── instructions.md
│   ├── lsp_client.py
│   ├── main.py
│   ├── pyproject.toml
│   ├── tree-sitter-docs.md
│   └── uv.lock
│
├── README.md
└── .gitignore


FUTURE VISION

IDE Integration

  • VS Code and JetBrains plugins
  • Real-time "Copilot" style explanations in-editor
  • One-click navigation from IDE to GitMate Graph View

Collaborative Onboarding

  • Multiplayer sessions for team code reviews
  • Shared annotations and "Knowledge Trails"
  • Interactive "Walkthrough" recording for new hires

CI/CD Autonomous Agents

  • Github Action to auto-analyze PRs
  • "Risk Report" generation for every commit
  • Automated architecture drift detection

Self-Healing Repositories

  • Auto-generation of test cases for legacy code
  • Automated refactoring suggestions based on AST patterns
  • Vulnerability detection and patch suggestion

ROADMAP

  • v0.1: Core Tree-sitter + LSP + LLM integration (CLI)
  • v0.2: Vector Database Memory & Context Awareness
  • v1.0: Full Web Dashboard (Next.js 16) & Streaming Chat
  • v1.1: Multi-repo support & Organization workspaces
  • v1.2: IDE Extensions for VS Code & JetBrains
  • v2.0: Autonomous Refactoring Agents

CONTRIBUTION

image

INSPIRATION

  • Every developer who struggled with a new codebase
  • The open-source community's commitment to accessibility
  • The vision of AI-augmented development

Made with ❤️ for Developers, by Developers

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •