- Vicalligraphy: A dataset collected from the Internet, consisting of Vietnamese calligraphy images in various writing styles.
- ViSynth1M: A synthetic dataset containing 1,000,000 scene text images.
- ViCalligraphySynth: A synthetic dataset containing 10,000 generated Vietnamese calligraphy images, created using 5 Vietnamese calligraphy fonts. It is designed to improve OCR models' ability to recognize calligraphic text with diverse font styles and layouts.
- SupportSamples: Used to compare confused words and select the most similar ones generated from 5 Vietnamese calligraphy fonts.
Utilize PaddleOCR to train and evaluate 5 models: ABINet, SRN, PARSeq, SVTR, ViTSTR
Ensure you have Python 3.7 or later installed. Then, install PaddleOCR using:
pip install paddlepaddle-gpu==2.6.1
If having any error, please follow the official guide here: PaddleOCR Quick Start
Run the following command to train a model:
python tools/train.py -c path/to/config/file
Configuration files for models can be found in: PaddleOCR/config/ViCalligraphy/
To evaluate a trained model, use:
python tools/eval.py -c path/to/config/file -o Global.pretrained_model=path/to/pretrained/model
Checkpoints for models can be found in: PaddleOCR/output/rec/
Train and evaluate VietOCR
Install using pip:
pip install vietocr
You can follow this notebook: vietocr/ViCalligraphy.ipynb to know how to use the model.
Our model weight here: vietocr/weights/transformerocr.pth
Utilize OpenOCR to train and evaluate SMTR
- PyTorch version >= 1.13.0
- Python version >= 3.7
conda create -n openocr python==3.8
conda activate openocr
# install gpu version torch
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidia
# or cpu version
conda install pytorch torchvision torchaudio cpuonly -c pytorch
After installing dependencies, the following two installation methods are available. Either one can be chosen.
Or our conda env:
conda env create -f OpenOCR/environment.yml
Usage:
python tools/infer_rec.py --c ./configs/rec/svtrv2/repsvtr_ch.yml --o Global.infer_img=/path/img_fold or /path/img_file
Our config file: OpenOCR/configs/smtr/config.yml
Our checkpoint here.
Python == 3.7
conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r CCD/CCD_Ha/requirement.txt
Or using conda env:
conda env create -f CCD/environment.yml
The difference between character-based and stroke-based models lies only in the inference step. Therefore, during fine-tuning, we follow the training approach of the character-based model.
cd CCD/CCD_Ha/
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 CCD_Ha/train_finetune.py --config path/to/config/file
Our configuration files: CCD/CCD_Ha/Dino/configs/
- ViCalligraphy: Config file
- ViCalligraphy + ViCalligraphySynth: Config file
- ViSynth1m + ViCalligraphy: Config file
- ViSynth1m + ViCalligraphy + ViCalligraphySynth: Config file
- ViCalligraphy: Config file
- ViCalligraphy + ViCalligraphySynth: Config file
- ViSynth1m + ViCalligraphy: Config file
- ViSynth1m + ViCalligraphy + ViCalligraphySynth: Config file
# Character-based
cd CCD/CCD_Ha
CUDA_VISIBLE_DEVICES=0 python test.py --config path/to/config/file
# Stroke-based (Stroke-level Decomposition)
cd CCD/CCD_stroke
CUDA_VISIBLE_DEVICES=0 python test.py --config path/to/config/file
Our checkpoint files: CCD/CCD_Ha/saved_models/
- ViCalligraphy: Checkpoint
- ViCalligraphy + ViCalligraphySynth: Checkpoint
- ViSynth1m + ViCalligraphy: Checkpoint
- ViSynth1m + ViCalligraphy + ViCalligraphySynth: Checkpoint
- ViCalligraphy: Checkpoint
- ViCalligraphy + ViCalligraphySynth: Checkpoint
- ViSynth1m + ViCalligraphy: Checkpoint
- ViSynth1m + ViCalligraphy + ViCalligraphySynth: Checkpoint
Python = 3.7
pip install streamlit==1.23.1 streamlit-drawable-canvas
streamlit run DemoSTR/app.py --server.port 8501 --server.address 0.0.0.0