AccNet: occluded scene text enhancing network with accretion blocks

Gong, Yanxiang; Zhang, Zhiqiang; Duan, Guozhen; Ma, Zheng; Xie, Mei

doi:10.1007/s00138-022-01351-5

AccNet: occluded scene text enhancing network with accretion blocks

Original Paper
Published: 05 November 2022

Volume 34, article number 1, (2023)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Yanxiang Gong¹,
Zhiqiang Zhang¹,
Guozhen Duan¹,
Zheng Ma¹ &
…
Mei Xie ORCID: orcid.org/0000-0001-5605-8867¹

339 Accesses
1 Altmetric
Explore all metrics

Abstract

Scene text with occlusions is common in the real world, and occluded text recognition is important for many machine vision applications. However, corresponding techniques are not well explored as public datasets cannot represent the situation well, and methods designed for occluded text are still scarce. In this work, we discuss different kinds of occlusions and propose an occluded scene text enhancing network to improve recognition performance. The network is based on generative adversarial networks, and we design accretion blocks to help the network generate the occluded image regions. The model is independent of the recognition networks, so it can be readily used in different frameworks and can be easily trained without the annotations of text content. We also refine the training objective to improve the framework. Experiments on several public benchmarks demonstrate that the proposed method effectively enhances occluded text images, improving recognition accuracy by over 10% on several state-of-the-art frameworks. Meanwhile, the network has no severe impact on the text images without occlusions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Fig. 6

Fig. 7

Fig. 9

How Far Deep Learning Systems for Text Detection and Recognition in Natural Scenes are Affected by Occlusion?

The Surprisingly Straightforward Scene Text Removal Method with Gated Attention and Region of Interest Generation: A Comprehensive Prominent Model Analysis

Scene text removal via cascaded text stroke detection and erasing

Article Open access 06 December 2021

References

Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)
Article Google Scholar
Luo, C., Jin, L., Sun, Z.: Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn. 90, 109–118 (2019)
Article Google Scholar
Qiao, Z., Zhou, Y., Yang, D., Zhou, Y., Wang, W.: Seed: semantics enhanced encoder-decoder framework for scene text recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13528–13537 (2020)
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2019)
Article Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Proceedings of the 28th international conference on neural information processing systems. 2, 2017–2025 (2015)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Article Google Scholar
Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4168–4176 (2016)
Liu, W., Chen, C., Wong, K.-Y.K., Su, Z., Han, J.: Star-net: a spatial attention residue network for scene text recognition. In: British machine vision conference. 2, 7 (2016)
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 3, 2672–2680 (2014)
Google Scholar
Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125–1134 (2017)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: European conference on computer vision. pp. 694–711 (2016)
Zhang, S., Liu, Y., Jin, L., Huang, Y., Lai, S.: Ensnet: Ensconce text in the wild. In: Proceedings of the AAAI conference on artificial intelligence. 33, 801–808 (2019)
Gong, Y., Deng, L., Zhang, Z., Duan, G., Ma, Z., Xie, M.: Unattached irregular scene text rectification with refined objective. Neurocomputing 463, 101–108 (2021)
Article Google Scholar
Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S.J., Lee, H.: What is wrong with scene text recognition model comparisons? dataset and model analysis. In: 2019 IEEE/CVF International conference on computer vision, pp. 4714–4722 (2019)
Lei, Z., Zhao, S., Song, H., Shen, J.: Scene text recognition using residual convolutional recurrent neural network. Mach. Vis. Appl. 29(5), 861–871 (2018)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International conference on learning representations (2015)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on machine learning, pp. 369–376 (2006)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)
Borisyuk, F., Gordo, A., Sivakumar, V.: Rosetta: Large scale system for text detection and recognition in images. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 71–79 (2018)
Lee, C.-Y., Osindero, S.: Recursive recurrent nets with attention modeling for ocr in the wild. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp. 2231–2239 (2016)
Deng, L., Gong, Y., Lu, X., Yi, X., Ma, Z., Xie, M.: Focus-enhanced scene text recognition with deformable convolutions. In: 2019 IEEE 5th International conference on computer and communications, pp. 1685–1689 (2019). IEEE
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp. 764–773 (2017)
Wang, C., Liu, C.-L.: Multi-branch guided attention network for irregular text recognition. Neurocomputing 425, 278–289 (2021)
Article Google Scholar
Qiu, S., Wen, G., Fan, Y.: Occluded object detection in high-resolution remote sensing images using partial configuration object model. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 10(5), 1909–1925 (2017)
Article Google Scholar
Wang, J., Yuan, Y., Yu, G.: Face attention network: An effective face detector for the occluded faces. arXiv preprint arXiv:1711.07246 (2017)
Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., Shen, C.: Repulsion loss: Detecting pedestrians in a crowd. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7774–7783 (2018)
Ding, D., Ram, S., Rodríguez, J.J.: Image inpainting using nonlocal texture matching and nonlinear filtering. IEEE Trans. Image Process. 28(4), 1705–1719 (2018)
Article Google Scholar
Duan, J., Pan, Z., Zhang, B., Liu, W., Tai, X.-C.: Fast algorithm for color texture image inpainting using the non-local ctv model. J. Global Optim. 62(4), 853–876 (2015)
Article MATH Google Scholar
Fan, Q., Zhang, L.: A novel patch matching algorithm for exemplar-based image inpainting. Multimedia Tools Appl. 77(9), 10807–10821 (2018)
Article Google Scholar
Tutsoy, O., Barkana, D.E., Balikci, K.: A novel exploration-exploitation-based adaptive law for intelligent model-free control approaches. IEEE Trans. Cybern. (2021). https://doi.org/10.1109/TCYB.2021.3091680
Article Google Scholar
Tutsoy, O., Colak, S.: Adaptive estimator design for unstable output error systems: a test problem and traditional system identification based analysis. Proc. Inst. Mech. Eng. Part I J. Syste. Control Eng. 229(10), 902–916 (2015)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Dinh, L., Krueger, D., Bengio, Y.: Nice: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516 (2014)
Brock, A., Donahue, J., Simonyan, K.: Large scale gan training for high fidelity natural image synthesis. In: 7th International conference on learning representations (2019)
Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: 4th International conference on learning representations (2016)
Wang, T.-C., Liu, M.-Y., Zhu, J.-Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8798–8807 (2018)
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp. 2223–2232 (2017)
Lahiri, A., Jain, A.K., Agrawal, S., Mitra, P., Biswas, P.K.: Prior guided gan based semantic inpainting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13696–13705 (2020)
Zhao, L., Mo, Q., Lin, S., Wang, Z., Zuo, Z., Chen, H., Xing, W., Lu, D.: Uctgan: Diverse image inpainting based on unsupervised cross-space translation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5741–5750 (2020)
Vo, D.M., Sugimoto, A.: Paired-d++ gan for image manipulation with text. Mach. Vis. Appl. 33(3), 1–15 (2022)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MATH Google Scholar
Dumoulin, V., Visin, F.: A guide to convolution arithmetic for deep learning. arXiv preprint arXiv:1603.07285 (2016)
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: an imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703 (2019)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. Adv. Neural Inf. Process. Syst. pp. 2234–2242 (2016)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 2017, 6627–6638 (2017)
Google Scholar
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: 2013 12th international conference on document analysis and recognition, pp. 1484–1493 (2013). IEEE
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., : Icdar 2015 competition on robust reading. In: 2015 13th international conference on document analysis and recognition, pp. 1156–1160 (2015). IEEE
Mishra, A., Alahari, K., Jawahar, C.: Scene text recognition using higher order language priors. In: British machine vision conference (2012)
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: IEEE Conference on computer vision and pattern recognition, pp. 2315–2324 (2016)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. In: Workshop on deep learning, NIPS (2014)
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vision 116(1), 1–20 (2016)
Article Google Scholar
Zheng, C., Cham, T.-J., Cai, J.: Pluralistic image completion. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp. 1438–1447 (2019)

Download references

Funding

This work was partly supported by the National Key Research and Development Program of China (Grant number 2018AAA0103203).

Author information

Authors and Affiliations

School of Information and Communication Engineering, University of Electronic Science and Technology of China, Xiyuan Ave, Chengdu, 611731, Sichuan, China
Yanxiang Gong, Zhiqiang Zhang, Guozhen Duan, Zheng Ma & Mei Xie

Authors

Yanxiang Gong
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Guozhen Duan
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Ma
View author publications
You can also search for this author in PubMed Google Scholar
Mei Xie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mei Xie.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gong, Y., Zhang, Z., Duan, G. et al. AccNet: occluded scene text enhancing network with accretion blocks. Machine Vision and Applications 34, 1 (2023). https://doi.org/10.1007/s00138-022-01351-5

Download citation

Received: 13 April 2022
Revised: 29 September 2022
Accepted: 21 October 2022
Published: 05 November 2022
DOI: https://doi.org/10.1007/s00138-022-01351-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AccNet: occluded scene text enhancing network with accretion blocks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

How Far Deep Learning Systems for Text Detection and Recognition in Natural Scenes are Affected by Occlusion?

The Surprisingly Straightforward Scene Text Removal Method with Gated Attention and Region of Interest Generation: A Comprehensive Prominent Model Analysis

Scene text removal via cascaded text stroke detection and erasing

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

AccNet: occluded scene text enhancing network with accretion blocks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

How Far Deep Learning Systems for Text Detection and Recognition in Natural Scenes are Affected by Occlusion?

The Surprisingly Straightforward Scene Text Removal Method with Gated Attention and Region of Interest Generation: A Comprehensive Prominent Model Analysis

Scene text removal via cascaded text stroke detection and erasing

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation