Skip to content

RajdeepGupta07/PDF-Annotation-Extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF Annotation Extractor

A Flask web app that extracts annotations (highlights, underlines, squiggly marks, comments) from uploaded PDF files and exports them as CSV and JSON.


Folder Structure

project/
├── app.py
├── templates/
│   └── index.html
├── uploads/          ← auto-created
├── outputs/          ← auto-created
└── README.md

Requirements

  • Python 3.9+
  • pip

Setup & Run

1. Create and activate a virtual environment (recommended)

python -m venv venv

# macOS / Linux
source venv/bin/activate

# Windows
venv\Scripts\activate

2. Install dependencies

pip install flask pymupdf werkzeug

3. Run the app

python app.py

4. Open in your browser

http://127.0.0.1:5000

Usage

  1. Upload any annotated PDF via the drag-and-drop zone or file browser.
  2. Click Extract Annotations.
  3. View results in the table on the page.
  4. Download as CSV or JSON using the buttons.

Output Format

CSV (outputs/annotations.csv):

page,label,text
1,Highlight,"example highlighted text"
2,Comment,"This is a sticky note"
3,Underline,"underlined sentence"

JSON (outputs/annotations.json):

[
  { "page": 1, "label": "Highlight", "text": "example highlighted text" },
  { "page": 2, "label": "Comment",   "text": "This is a sticky note" },
  { "page": 3, "label": "Underline", "text": "underlined sentence" }
]

Supported Annotation Types

Type PDF Standard Name
Highlight PDF_ANNOT_HIGHLIGHT
Underline PDF_ANNOT_UNDERLINE
Squiggly PDF_ANNOT_SQUIGGLY
Strikeout PDF_ANNOT_STRIKEOUT
Comment PDF_ANNOT_TEXT (sticky)
Free Text PDF_ANNOT_FREE_TEXT

API Endpoints

Method Route Description
GET / Render upload page
POST / Upload PDF and process annotations
GET /download/csv Download annotations.csv
GET /download/json Download annotations.json

About

A Flask web app that extracts annotations (highlights, underlines, squiggly marks, comments) from uploaded PDF files and exports them as CSV and JSON.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors