Abstract
In this paper, we propose a text detection algorithm which is hybrid and multi-scale. First, it relies on a connected component-based approach: After the segmentation of the image, a classification step using a new wavelet descriptor spots the letters. A new graph modeling and its traversal procedure allow to form candidate text areas. Second, a texture-based approach discards the false positives. Finally, the detected text areas are precisely cut out and a new binarization step is introduced. The main advantage of our method is that few assumptions are put forward. Thus, “challenging texts” like multi-sized, multi-colored, multi-oriented or curved text can be localized. The efficiency of TextCatcher has been validated on three different datasets: Two come from the ICDAR competition, and the third one contains photographs we have taken with various daily life texts. We present both qualitative and quantitative results.
Similar content being viewed by others
Notes
The scores of participating are freely available [22].
Dataset is available at https://www.lrde.epita.fr/~jonathan/
References
Abrash, M.: Michael Abrash’s Graphics Programming Black Book, 10th edn. Coriolis Group Books, Scottsdale (1997)
Anthimopoulos, M., Gatos, B., Pratikakis, I.: Detection of artificial and scene text in images and video frames. Pattern Anal. Appl. 16(3), 431–446 (2013)
Arth, C., Limberger, F., Bischof, H.: Real-time license plate recognition on an embedded dsp-platform. In: Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007)
Bai, B., Yin, F., Liu, C.L.: Scene text localization using gradient local correlation. In: International Conference on Document Analysis and Recognition, pp. 1380–1384 (2013)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. Conf. Comput. Vis. Pattern Recognit. 1, 886–893 (2005)
Daubechies, I.: Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, Philadelphia (1992)
Daubechies, I., Sweldens, W.: Factoring wavelet transforms into lifting steps. J. Fourier Anal. Appl. 4(3), 245–267 (1998)
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Conference on Computer Vision and Pattern Recognition, pp. 2963–2970 (2010). doi:10.1109/CVPR.2010.5540041
Fabrizio, J., Marcotegui, B., Cord, M.: Text segmentation in natural scenes using toggle-mapping. In: International Conference on Image Processing, pp. 2349–2352 (2009)
Fabrizio, J., Marcotegui, B., Cord, M.: Text detection in street level image. Pattern Anal. Appl. 16(4), 519–533 (2013)
Gao, S., Wang, C., Xiao, B., Shi, C., Zhang, Y., Lv, Z., Shi, Y.: Adaptive scene text detection based on transferring adaboost. In: International Conference on Document Analysis and Recognition, pp. 388–392 (2013)
Gatos, B., Ntirogiannis, K., Pratikakis, I.: Icdar document image binarization contest. In: International Conference on Document Analysis and Recognition (2009)
Gomez, L., Karatzas, D.: Multi-script text extraction from natural scenes. In: International Conference on Document Analysis and Recognition, pp. 467–471 (2013)
Haralick, R., Shanmugam, K., Dinstein, I.: Textural features for image classification. IEEE Trans. Syst. Man Cybern. 3(6), 610–621 (1973)
Huang, W., Qiao, Y., Tang, X.: Robust scene text detection with convolution neural network induced MSER trees. In: European Conference on Computer Vision, pp. 497–511 (2014)
Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: European Conference on Computer Vision (2014)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of ECML, pp. 137–142 (1998)
Jung, K., Kim, I.K., Jain, A.K.: Text information extraction in images and video: a survey. Pattern Recogn. 37(5), 977–997 (2004)
Kan, C., Srinath, M.D.: Scene text localization and recognition with oriented stroke detection. In: International Conference on Computer Vision, pp. 97–104 (2013)
Karaoglu, S., Fernando, B., Tremeau, A.: A novel algorithm for text detection and localization in natural scene images. In: Proceedings of DICTA, pp. 635–642 (2010)
Karaoglu, S., Gemert, J., Gevers, T.: Object reading: text recognition for object recognition. In: Proceedings of ECCVW-IFCVCR, pp. 456–465 (2012)
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., de las Heras, L.P.: ICDAR 2013 robust reading competition. In: International Conference on Document Analysis and Recognition, pp. 1484–1493 (2013)
Kasar, T., Agarai, G.: Multi-script and multi-oriented text localization from scene images. In: International Workshop on Camera-Based Document Analysis and Recognition, pp. 1–14 (2012)
Li, R., Wang, S., Shi, Z.: A two level algorithm for text detection in natural scene images. In: International Workshop on Document Analysis Systems (2014)
Li, Y., Shen, C., Jia, W., van den Hengel, A.: Leveraging surrounding context for scene text detection. In: International Conference on Image Processing, pp. 2264–2268 (2013)
Mao, J., Li, H., Zhou, W., Yan, S., Tian, Q.: Scale based region growing for scene text detection. In: International conference on MultiMedia, pp. 1007–1016 (2013)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: British Machine Vision Conference, pp. 384–393 (2002)
Meng, Q., Song, Y.: Text detection in natural scenes with salient region. In: International Workshop on Document Analysis Systems, pp. 384–388 (2012)
Merino-Gracia, C., Lenc, K., Mirmehdi, M.: A head-mounted device for recognizing text in natural scenes. In: International Workshop on Camera-Based Document Analysis and Recognition, pp. 29–41 (2011)
Minetto, R., Thome, N., Cord, M., Leite, N.J., Stolfi, J.: T-hog: an effective gradient-based descriptor for single line text regions. Pattern Recogn. 46(3), 1078–1090 (2013)
Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Asian Conference on Computer Vision, pp. 770–783 (2011)
Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: Conference on Computer Vision and Pattern Recognition, pp. 3538–3545 (2012)
Neumann, L., Matas, J.: On combining multiple segmentations in scene text recognition. In: International Conference on Document Analysis and Recognition, pp. 523–527 (2013)
Ojala, T., Pietikinen, M., Harwood, D.: A comparative study of texture measures with classification based on featured distributions. Pattern Recogn. 29(1), 51–59 (1996)
Olena Team: Milena, generic c++ library for image processing and pattern recognition. https://www.lrde.epita.fr/wiki/Olena/Milena
Opitz, M., Diem, M., Fiel S. and Kleber, F., Sablatnig: End-to-end text recognition with local ternary patterns, mser and deep convolutional nets. In: International Workshop on Document Analysis Systems (2014)
Phan, T.Q., Shivakumara, P., Tan, C.L.: Detecting text in the real world. In: International conference on MultiMedia, pp. 765–768 (2012)
Prakash, S., Ravishankar, M.: Multi-oriented video text detection and extraction using dct feature extraction and projection based rotation calculation. In: Proceedings of ICACCI, pp. 714–718 (2013)
Serra, J.: Toggle mappings. In: Simon, J.C. (ed.) From pixels to features, pp. 61–72. Elsevier, North-Holland (1989)
Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S.: Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recogn. Lett. 34(2), 107–116 (2013)
Shi, C., Wang, C., Xiao, B., Zhang, Y., Gao, S., Zhang, Z.: Scene text recognition using part-based tree-structured character detection. In: Conference on Computer Vision and Pattern Recognition, pp. 2961–2968 (2013)
Shivakumara, P., Basavaraju, H.T., Guru, D.S., Tan, C.L.: Detection of curved text in video: Quad tree based method. In: International Conference on Document Analysis and Recognition, pp. 594–598 (2013)
Sumathi, C.P., Santhanam, T., Gayathri, G.: A survey on various approaches of text extraction in images. Int. J. Comput. Sci. Eng. Surv. 3(4), 27–42 (2012)
Tomer, P., Goyal, A.: Ant clustering based text detection in natural scene images. In: Proceedings of ICCCNT, pp. 1–7 (2013)
Usevitch, B.E.: A tutorial on modern lossy wavelet image compression: foundations of JPEG 2000. IEEE Signal Process. Mag. 18(5), 22–35 (2001)
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: International Conference on Computer Vision, pp. 1457–1464 (2011)
Wang, T., Wu, D.J., Coates, A., Ng, A.Y.: End-to-end text recognition with convolutional neural networks. In: International Conference on Pattern Recognition, pp. 3304–3308 (2012)
Wang, X., Song, Y., Zhang, Y.: Natural scene text detection with multi-channel connected component segmentation. In: International Conference on Document Analysis and Recognition, pp. 1375–1379 (2013)
Wolf, C., Jolion, J.M.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. Recognit. 8(4), 280–296 (2006)
Xu-Cheng, Y., Xuwang, Y., Kaizhu, H., Hong-Wei, H.: Robust text detection in natural scene images. Pattern Anal. Mach. Intell. 36(5), 970–983 (2013)
Yang, H., Quehl, B., Sack, H.: A framework for improved video text detection and recognition. Multimed. Tools Appl. 69(1), 217–245 (2014)
Yao, C., Xiang, B., Wenyu, L., Yi, M., Zhuowan, T.: Detecting texts of arbitrary orientations in natural images. In: International Conference on Computer Vision, pp. 1083–1090 (2012)
Yi, C., Tian, Y.: Assistive text reading from complex background for blind persons. In: International Workshop on Camera-Based Document Analysis and Recognition, pp. 15–28 (2011)
Zagoris, K., Pratikakis, I.: Text detection in natural images using bio-inspired models. In: International Conference on Document Analysis and Recognition, pp. 1370–1374 (2013)
Zhang, J., Chong, Y.: Text localization based on the discrete shearlet transform. In: ICSESS, pp. 262–266 (2013)
Zhang, J., Kasturi, R.: Extraction of text objects in video documents: recent progress. In: International Workshop on Document Analysis Systems, pp. 5–17 (2008)
Zhang, Y., Huang, K., Liu, C.: Fast and robust graph-based transductive learning via minimum tree cut. In: 11th IEEE International Conference on Data Mining, ICDM 2011, Vancouver, BC, Canada, December 11–14, 2011, pp. 952–961 (2011)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fabrizio, J., Robert-Seidowsky, M., Dubuisson, S. et al. TextCatcher: a method to detect curved and challenging text in natural scenes. IJDAR 19, 99–117 (2016). https://doi.org/10.1007/s10032-016-0264-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-016-0264-4