-
Notifications
You must be signed in to change notification settings - Fork 424
Open
Description
I am trying to train the Tesseract OCR engine to recognize the Assamese language. I have created the following files:
- .tif – image of the text
- .txt – text file with correct text
- .box – box file with character positions
I followed the training steps, but the model does not work well. The accuracy is very low. It makes a lot of mistakes, even on the training image.
I am training with 8000 text samples. I’m unsure if this amount of data is sufficient or if I need to add more data to improve the model’s accuracy.
Can someone help me understand what went wrong?
I want to improve the accuracy and make the model work for Assamese.
Metadata
Metadata
Assignees
Labels
No labels