30,000 natural scene OCR data for minority languages in Southeast Asia, including Khmer (Cambodia), Lao and Burmese. The diversity of collection includes a variety of natural scenes and a variety of shooting angles. This set of data can be used for Southeast Asian language OCR tasks.
For more details, please refer to the link: https://www.nexdata.ai/datasets/ocr/1758?source=Github
30,000 images, including 10,000 images in Khmer (Cambodia), 10,000 images in Lao, and 10,000 images in Burmese
including slogan, receipt, poster, warning sign, road sign, food packaging, billboard, station sign and signboard, etc.
including a variety of natural scenes, multiple shooting angles
cellphone
looking up angle, looking down angle, eye-level angle
the image format is common format such as.jpg, the annotation file format is .json
line-level (column-level) quadrilateral bounding box annotation and transcription for the texts;polygon bounding box annotation and transcription for the texts
the error bound of each vertex of quadrilateral or polygon bounding box is within 5 pixels, which is a
Commercial License