Skip to content

Not able to Train Assamese language #144

@Alok31555

Description

@Alok31555

I am trying to train the Tesseract OCR engine to recognize the Assamese language. I have created the following files:

  • .tif – image of the text
  • .txt – text file with correct text
  • .box – box file with character positions

I followed the training steps, but the model does not work well. The accuracy is very low. It makes a lot of mistakes, even on the training image.
I am training with 8000 text samples. I’m unsure if this amount of data is sufficient or if I need to add more data to improve the model’s accuracy.

Can someone help me understand what went wrong?
I want to improve the accuracy and make the model work for Assamese.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions