A Flask web app that extracts annotations (highlights, underlines, squiggly marks, comments) from uploaded PDF files and exports them as CSV and JSON.
project/
├── app.py
├── templates/
│ └── index.html
├── uploads/ ← auto-created
├── outputs/ ← auto-created
└── README.md
- Python 3.9+
- pip
python -m venv venv
# macOS / Linux
source venv/bin/activate
# Windows
venv\Scripts\activatepip install flask pymupdf werkzeugpython app.pyhttp://127.0.0.1:5000
- Upload any annotated PDF via the drag-and-drop zone or file browser.
- Click Extract Annotations.
- View results in the table on the page.
- Download as CSV or JSON using the buttons.
CSV (outputs/annotations.csv):
page,label,text
1,Highlight,"example highlighted text"
2,Comment,"This is a sticky note"
3,Underline,"underlined sentence"
JSON (outputs/annotations.json):
[
{ "page": 1, "label": "Highlight", "text": "example highlighted text" },
{ "page": 2, "label": "Comment", "text": "This is a sticky note" },
{ "page": 3, "label": "Underline", "text": "underlined sentence" }
]| Type | PDF Standard Name |
|---|---|
| Highlight | PDF_ANNOT_HIGHLIGHT |
| Underline | PDF_ANNOT_UNDERLINE |
| Squiggly | PDF_ANNOT_SQUIGGLY |
| Strikeout | PDF_ANNOT_STRIKEOUT |
| Comment | PDF_ANNOT_TEXT (sticky) |
| Free Text | PDF_ANNOT_FREE_TEXT |
| Method | Route | Description |
|---|---|---|
| GET | / |
Render upload page |
| POST | / |
Upload PDF and process annotations |
| GET | /download/csv |
Download annotations.csv |
| GET | /download/json |
Download annotations.json |