Not able to Train Assamese language

I am trying to **train** the **Tesseract OCR** engine to recognize the **Assamese language**. I have created the following files:

- .tif – image of the text
- .txt – text file with correct text
- .box – box file with character positions

I followed the training steps, but the **model does not work well**. The accuracy is very low. It makes **a lot of mistakes**, even on the training image.
I am training with **8000 text samples**. I’m unsure if this amount of data is **sufficient** or if I need to **add more data** to improve the model’s accuracy.

Can someone help me understand what went wrong?
I want to improve the accuracy and make the model work for Assamese.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Not able to Train Assamese language #144

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Not able to Train Assamese language #144

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions