Authors:
Kha Cong Nguyen
and
Ryosuke Odate
Affiliation:
Research and Development Group, Hitachi, Ltd., Tokyo, Japan
Keyword(s):
Text Detection, Text Recognition, Anchor Boxes, Clustering, Feature Extraction, Non-Maximum Suppression.
Abstract:
Normally, text recognition systems include two main parts: text detection and text recognition. Text detection is a prerequisite and has a big impact on the performance of text recognition. In this paper, we propose a high-accuracy model for detecting text-lines on a receipt dataset. We focus on the three most important points to improve the performance of the model: anchor boxes for locating text regions, backbone networks to extract features, and a suppression method to select the best fitting bounding box for each text region. Specifically, we propose a clustering method to determine anchor boxes and apply novel convolution neural networks for feature extraction. These two points are the newly constructing strategies of the model. Besides, we propose a training strategy to make the model output angles of text-lines, then revise bounding boxes with the angles before applying the suppression method. This strategy is to detect skewed and downward/upward curved text-lines. Our model o
utperforms other best models submitted to the ICDAR 2019 competition with the detection rate of 98.87% (F1 score) so that we can trust the model for detecting text-lines automatically. These strategies are also flexible to apply for other datasets of various domains.
(More)