iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://doi.org/10.1007/s00138-022-01351-5
AccNet: occluded scene text enhancing network with accretion blocks | Machine Vision and Applications Skip to main content
Log in

AccNet: occluded scene text enhancing network with accretion blocks

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Scene text with occlusions is common in the real world, and occluded text recognition is important for many machine vision applications. However, corresponding techniques are not well explored as public datasets cannot represent the situation well, and methods designed for occluded text are still scarce. In this work, we discuss different kinds of occlusions and propose an occluded scene text enhancing network to improve recognition performance. The network is based on generative adversarial networks, and we design accretion blocks to help the network generate the occluded image regions. The model is independent of the recognition networks, so it can be readily used in different frameworks and can be easily trained without the annotations of text content. We also refine the training objective to improve the framework. Experiments on several public benchmarks demonstrate that the proposed method effectively enhances occluded text images, improving recognition accuracy by over 10% on several state-of-the-art frameworks. Meanwhile, the network has no severe impact on the text images without occlusions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)

    Article  Google Scholar 

  2. Luo, C., Jin, L., Sun, Z.: Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn. 90, 109–118 (2019)

    Article  Google Scholar 

  3. Qiao, Z., Zhou, Y., Yang, D., Zhou, Y., Wang, W.: Seed: semantics enhanced encoder-decoder framework for scene text recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13528–13537 (2020)

  4. Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2019)

    Article  Google Scholar 

  5. Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Proceedings of the 28th international conference on neural information processing systems. 2, 2017–2025 (2015)

  6. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  7. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)

    Article  Google Scholar 

  8. Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4168–4176 (2016)

  9. Liu, W., Chen, C., Wong, K.-Y.K., Su, Z., Han, J.: Star-net: a spatial attention residue network for scene text recognition. In: British machine vision conference. 2, 7 (2016)

  10. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 3, 2672–2680 (2014)

    Google Scholar 

  11. Isola, P., Zhu, J.-Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1125–1134 (2017)

  12. Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: European conference on computer vision. pp. 694–711 (2016)

  13. Zhang, S., Liu, Y., Jin, L., Huang, Y., Lai, S.: Ensnet: Ensconce text in the wild. In: Proceedings of the AAAI conference on artificial intelligence. 33, 801–808 (2019)

  14. Gong, Y., Deng, L., Zhang, Z., Duan, G., Ma, Z., Xie, M.: Unattached irregular scene text rectification with refined objective. Neurocomputing 463, 101–108 (2021)

    Article  Google Scholar 

  15. Baek, J., Kim, G., Lee, J., Park, S., Han, D., Yun, S., Oh, S.J., Lee, H.: What is wrong with scene text recognition model comparisons? dataset and model analysis. In: 2019 IEEE/CVF International conference on computer vision, pp. 4714–4722 (2019)

  16. Lei, Z., Zhao, S., Song, H., Shen, J.: Scene text recognition using residual convolutional recurrent neural network. Mach. Vis. Appl. 29(5), 861–871 (2018)

    Article  Google Scholar 

  17. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International conference on learning representations (2015)

  18. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on machine learning, pp. 369–376 (2006)

  19. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)

  20. Borisyuk, F., Gordo, A., Sivakumar, V.: Rosetta: Large scale system for text detection and recognition in images. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 71–79 (2018)

  21. Lee, C.-Y., Osindero, S.: Recursive recurrent nets with attention modeling for ocr in the wild. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp. 2231–2239 (2016)

  22. Deng, L., Gong, Y., Lu, X., Yi, X., Ma, Z., Xie, M.: Focus-enhanced scene text recognition with deformable convolutions. In: 2019 IEEE 5th International conference on computer and communications, pp. 1685–1689 (2019). IEEE

  23. Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp. 764–773 (2017)

  24. Wang, C., Liu, C.-L.: Multi-branch guided attention network for irregular text recognition. Neurocomputing 425, 278–289 (2021)

    Article  Google Scholar 

  25. Qiu, S., Wen, G., Fan, Y.: Occluded object detection in high-resolution remote sensing images using partial configuration object model. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 10(5), 1909–1925 (2017)

    Article  Google Scholar 

  26. Wang, J., Yuan, Y., Yu, G.: Face attention network: An effective face detector for the occluded faces. arXiv preprint arXiv:1711.07246 (2017)

  27. Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., Shen, C.: Repulsion loss: Detecting pedestrians in a crowd. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7774–7783 (2018)

  28. Ding, D., Ram, S., Rodríguez, J.J.: Image inpainting using nonlocal texture matching and nonlinear filtering. IEEE Trans. Image Process. 28(4), 1705–1719 (2018)

    Article  Google Scholar 

  29. Duan, J., Pan, Z., Zhang, B., Liu, W., Tai, X.-C.: Fast algorithm for color texture image inpainting using the non-local ctv model. J. Global Optim. 62(4), 853–876 (2015)

    Article  MATH  Google Scholar 

  30. Fan, Q., Zhang, L.: A novel patch matching algorithm for exemplar-based image inpainting. Multimedia Tools Appl. 77(9), 10807–10821 (2018)

    Article  Google Scholar 

  31. Tutsoy, O., Barkana, D.E., Balikci, K.: A novel exploration-exploitation-based adaptive law for intelligent model-free control approaches. IEEE Trans. Cybern. (2021). https://doi.org/10.1109/TCYB.2021.3091680

    Article  Google Scholar 

  32. Tutsoy, O., Colak, S.: Adaptive estimator design for unstable output error systems: a test problem and traditional system identification based analysis. Proc. Inst. Mech. Eng. Part I J. Syste. Control Eng. 229(10), 902–916 (2015)

    Google Scholar 

  33. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)

  34. Dinh, L., Krueger, D., Bengio, Y.: Nice: Non-linear independent components estimation. arXiv preprint arXiv:1410.8516 (2014)

  35. Brock, A., Donahue, J., Simonyan, K.: Large scale gan training for high fidelity natural image synthesis. In: 7th International conference on learning representations (2019)

  36. Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)

  37. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: 4th International conference on learning representations (2016)

  38. Wang, T.-C., Liu, M.-Y., Zhu, J.-Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8798–8807 (2018)

  39. Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp. 2223–2232 (2017)

  40. Lahiri, A., Jain, A.K., Agrawal, S., Mitra, P., Biswas, P.K.: Prior guided gan based semantic inpainting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13696–13705 (2020)

  41. Zhao, L., Mo, Q., Lin, S., Wang, Z., Zuo, Z., Chen, H., Xing, W., Lu, D.: Uctgan: Diverse image inpainting based on unsupervised cross-space translation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5741–5750 (2020)

  42. Vo, D.M., Sugimoto, A.: Paired-d++ gan for image manipulation with text. Mach. Vis. Appl. 33(3), 1–15 (2022)

  43. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MATH  Google Scholar 

  44. Dumoulin, V., Visin, F.: A guide to convolution arithmetic for deep learning. arXiv preprint arXiv:1603.07285 (2016)

  45. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: an imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703 (2019)

  46. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  47. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. Adv. Neural Inf. Process. Syst. pp. 2234–2242 (2016)

  48. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 2017, 6627–6638 (2017)

    Google Scholar 

  49. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  50. Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: 2013 12th international conference on document analysis and recognition, pp. 1484–1493 (2013). IEEE

  51. Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., : Icdar 2015 competition on robust reading. In: 2015 13th international conference on document analysis and recognition, pp. 1156–1160 (2015). IEEE

  52. Mishra, A., Alahari, K., Jawahar, C.: Scene text recognition using higher order language priors. In: British machine vision conference (2012)

  53. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: IEEE Conference on computer vision and pattern recognition, pp. 2315–2324 (2016)

  54. Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. In: Workshop on deep learning, NIPS (2014)

  55. Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vision 116(1), 1–20 (2016)

    Article  Google Scholar 

  56. Zheng, C., Cham, T.-J., Cai, J.: Pluralistic image completion. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp. 1438–1447 (2019)

Download references

Funding

This work was partly supported by the National Key Research and Development Program of China (Grant number 2018AAA0103203).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mei Xie.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gong, Y., Zhang, Z., Duan, G. et al. AccNet: occluded scene text enhancing network with accretion blocks. Machine Vision and Applications 34, 1 (2023). https://doi.org/10.1007/s00138-022-01351-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-022-01351-5

Keywords

Navigation