Vietnamese text recognition in scene images using deep learning
Abstract
This article proposes an effective method for recognizing Vietnamese text in scene images. The proposed method is based on the idea of combining three processing tasks simultaneously in one recognition stage, including (i) Recognizing (predicting) character sequences from images; (ii) Context processing; and (iii) Fusing and iterative correction. The effectiveness of this method was carried out on two Vietnamese scene image datasets collected from reality: VinText and VnSceneText. Experimental results show that the proposed method is capable of detecting text of any shape and size with high and stable accuracy. Specifically, the method achieves word-level accuracy, character-level accuracy is (81.87%, 93.02%) and (82.56%, 94.33%) for the test datasets, respectively.