Skip to content

guy915/File-Processing-CLI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

File Processing

Several custom scripts to aid in file processing.

Install

pip install -r requirements.txt

html2md

Replaces the folder subfolders with a Markdown files.

  • Input: A folder (non-recursive), only processes HTML files in the root directory's immediate subfolders
  • Output: Markdown files named after the subfolders

Usage

./html2md.py /path/to/folder

pdf2md

Replaces the PDF with a Markdown file.

  • Input: A PDF file
  • Output: A Markdown file (in the same folder as the input)

Usage

./pdf2md.py /path/to/file.pdf

pdf_cleaner

Replaces PDFs with cleaned versions.

  • Input: A folder (recursive), cleans every PDF under it.
  • Output: Replaces every PDF with its cleaned version.

Usage

./pdf_cleaner.py /path/to/folder

text_extractor

Extracts the first page's text via OCR and prints it to the terminal.

  • Input: A PDF file
  • Output: Printed to stdout (no files created)

Usage

./text_extractor.py /path/to/file.pdf

Dependencies

  • Requires Tesseract OCR to be installed on your system.
  • On macOS (Homebrew): brew install tesseract

merge_pdf

Merges multiple PDFs into one in the order provided.

  • Input: Output path followed by input PDF paths (2 or more)
  • Output: A single merged PDF at the output path

Usage

./merge_pdf.py /path/to/output.pdf /path/to/1.pdf /path/to/2.pdf ...

About

Custom scripts for file processing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages