File Processing

Several custom scripts to aid in file processing.

Install

pip install -r requirements.txt

html2md

Replaces the folder subfolders with a Markdown files.

Input: A folder (non-recursive), only processes HTML files in the root directory's immediate subfolders
Output: Markdown files named after the subfolders

Usage

./html2md.py /path/to/folder

pdf2md

Replaces the PDF with a Markdown file.

Input: A PDF file
Output: A Markdown file (in the same folder as the input)

Usage

./pdf2md.py /path/to/file.pdf

pdf_cleaner

Replaces PDFs with cleaned versions.

Input: A folder (recursive), cleans every PDF under it.
Output: Replaces every PDF with its cleaned version.

Usage

./pdf_cleaner.py /path/to/folder

text_extractor

Extracts the first page's text via OCR and prints it to the terminal.

Input: A PDF file
Output: Printed to stdout (no files created)

Usage

./text_extractor.py /path/to/file.pdf

Dependencies

Requires Tesseract OCR to be installed on your system.
On macOS (Homebrew): brew install tesseract

merge_pdf

Merges multiple PDFs into one in the order provided.

Input: Output path followed by input PDF paths (2 or more)
Output: A single merged PDF at the output path

Usage

./merge_pdf.py /path/to/output.pdf /path/to/1.pdf /path/to/2.pdf ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

File Processing

html2md

pdf2md

pdf_cleaner

text_extractor

merge_pdf

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.gitignore		.gitignore
README.md		README.md
html2md.py		html2md.py
merge_pdf.py		merge_pdf.py
pdf2md.py		pdf2md.py
pdf_cleaner.py		pdf_cleaner.py
requirements.txt		requirements.txt
text_extractor.py		text_extractor.py

Folders and files

Latest commit

History

Repository files navigation

File Processing

html2md

pdf2md

pdf_cleaner

text_extractor

merge_pdf

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages