Skip to content

Nexdata-AI/30000-Images-Natural-Scenes-OCR-Data-in-Southeast-Asian-Languages

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

30000-Images-Natural-Scenes-OCR-Data-in-Southeast-Asian-Languages

Description

30,000 natural scene OCR data for minority languages in Southeast Asia, including Khmer (Cambodia), Lao and Burmese. The diversity of collection includes a variety of natural scenes and a variety of shooting angles. This set of data can be used for Southeast Asian language OCR tasks.

For more details, please refer to the link: https://www.nexdata.ai/datasets/ocr/1758?source=Github

Specifications

Data size

30,000 images, including 10,000 images in Khmer (Cambodia), 10,000 images in Lao, and 10,000 images in Burmese

Collecting environment

including slogan, receipt, poster, warning sign, road sign, food packaging, billboard, station sign and signboard, etc.

Data diversity

including a variety of natural scenes, multiple shooting angles

Device

cellphone

Photographic angle

looking up angle, looking down angle, eye-level angle

Data format

the image format is common format such as.jpg, the annotation file format is .json

Annotation content

line-level (column-level) quadrilateral bounding box annotation and transcription for the texts;polygon bounding box annotation and transcription for the texts

Accuracy rate

the error bound of each vertex of quadrilateral or polygon bounding box is within 5 pixels, which is a

Licensing Information

Commercial License

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published