ChaCo: Character Contrastive Learning for Handwritten Text Recognition

Zhang, Xiaoyi; Wang, Tianwei; Wang, Jiapeng; Jin, Lianwen; Luo, Canjie; Xue, Yang

doi:10.1007/978-3-031-21648-0_24

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13639))

Included in the following conference series:

International Conference on Frontiers in Handwriting Recognition

1513 Accesses
2 Citations

Abstract

Current mainstream text recognition models rely heavily on large-scale data, requiring expensive annotations to achieve high performance. Contrast-based self-supervised learning methods aimed at minimizing distances between positive pairs provide a nice way to alleviate this problem. Previous studies are implemented from the perspective of words, taking the entire word image as model input. But characters are actually the basic elements of words, so in this paper, we implement contrastive learning from another perspective, i.e., the perspective of characters. Specifically, a simple yet effective method, termed ChaCo, is proposed, which takes the characters and strokes (called a character unit) cropped from the word image as model input. However, in the commonly used random cropping approach, the positive pairs may contain completely different characters, in which case it is unreasonable to minimize the distance between positive pairs. To address this issue, we introduce a Character Unit Cropping Module (CUCM) to ensure the positive pairs contain the same characters by constraining the selection region of the positive sample. Experiments show that our proposed method can achieve much better representation quality than previous methods while requiring fewer computation resources. Under the semi-supervised setting, ChaCo can achieve promising performance with an accuracy improvement of 13.1 points on the IAM dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

CMT-Co: Contrastive Learning with Character Movement Task for Handwritten Text Recognition

Scene Text Recognition with Image-Text Matching-Guided Dictionary

Mixed Multimodal Contrastive Learning for Enhancing Detection and Correction of Faked and Misspelled Chinese Characters

Notes

1.
https://github.com/albumentations-team/albumentations.
2.
We contacted the authors of SeqCLR to get the training details.

References

Aberdam, A., et al.: Sequence-to-sequence contrastive learning for text recognition. In: CVPR, pp. 15302–15312 (2021)
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)
Google Scholar
Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., Kalinin, A.A.: Albumentations: fast and flexible image augmentations. Information 11(2), 125 (2020)
Article Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML, pp. 1597–1607 (2020)
Google Scholar
Chen, X., Fan, H., Girshick, R., He, K.: Improved baselines with momentum contrastive learning (2020). arXiv preprint arXiv:2003.04297
Chen, X., He, K.: Exploring simple siamese representation learning. In: CVPR, pp. 15750–15758 (2021)
Google Scholar
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006)
Google Scholar
Grill, J.B., et al.: Bootstrap your own latent-a new approach to self-supervised learning. In: NeurIPS, vol. 33, pp. 21271–21284 (2020)
Google Scholar
Grosicki, E., Abed, H.E.: ICDAR 2009 handwriting recognition competition. In: ICDAR, pp. 1398–1402. IEEE Computer Society (2009)
Google Scholar
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR, pp. 9726–9735 (2020)
Google Scholar
Kleber, F., Fiel, S., Diem, M., Sablatnig, R.: CVL-DataBase: an off-line database for writer retrieval, writer identification and word spotting. In: ICDAR, pp. 560–564 (2013)
Google Scholar
Liu, H., et al.: Perceiving stroke-semantic context: hierarchical contrastive learning for robust scene text recognition. In: AAAI (2022)
Google Scholar
Liu, X., et al.: Self-supervised learning: generative or contrastive. IEEE Trans. Knowl. Data Eng. 1 (2021)
Google Scholar
Luo, C., Jin, L., Chen, J.: SimAN: exploring self-supervised representation learning of scene text via similarity-aware normalization. In: CVPR (2022)
Google Scholar
Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002). https://doi.org/10.1007/s100320200071
Article MATH Google Scholar
Nguyen, N., et al.: Dictionary-guided scene text recognition. In: CVPR, pp. 7383–7392 (2021)
Google Scholar
van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding (2018). arXiv preprint arXiv:1807.03748
Tendle, A., Hasan, M.R.: A study of the generalizability of self-supervised representations. Mach. Learn. Appl. 6, 100124 (2021)
Google Scholar
Wang, T., et al.: Decoupled attention network for text recognition. In: AAAI, vol. 34, pp. 12216–12224 (2020)
Google Scholar
Wang, T., et al.: Implicit feature alignment: learn to convert text recognizer to text spotter. In: CVPR, pp. 5973–5982 (2021)
Google Scholar
Wang, X., Zhang, R., Shen, C., Kong, T., Li, L.: Dense contrastive learning for self-supervised visual pre-training. In: CVPR, pp. 3024–3033 (2021)
Google Scholar
Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: CVPR, pp. 3733–3742 (2018)
Google Scholar
Yan, R., Peng, L., Xiao, S., Yao, G.: Primitive representation learning for scene text recognition. In: CVPR, pp. 284–293 (2021)
Google Scholar
Zbontar, J., Jing, L., Misra, I., LeCun, Y., Deny, S.: Barlow twins: self-supervised learning via redundancy reduction. In: ICML, pp. 12310–12320 (2021)
Google Scholar

Download references

Acknowledgement

This research is supported in part by NSFC (Grant No. 61936003), GD-NSF (no. 2017A030312006, No. 2021A1515011870), Zhuhai Industry Core and Key Technology Research Project (no. ZH22044702200058PJL), and the Science and Technology Foundation of Guangzhou Huangpu Development District (Grant 2020GH17).

Author information

Authors and Affiliations

South China University of Technology, Guangzhou, China
Xiaoyi Zhang, Tianwei Wang, Jiapeng Wang, Lianwen Jin, Canjie Luo & Yang Xue
SCUT-Zhuhai Institute of Modern Industrial Innovation, Zhuhai, China
Lianwen Jin

Authors

Xiaoyi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Tianwei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jiapeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lianwen Jin
View author publications
You can also search for this author in PubMed Google Scholar
Canjie Luo
View author publications
You can also search for this author in PubMed Google Scholar
Yang Xue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lianwen Jin .

Editor information

Editors and Affiliations

Walmart Inc., Hoboken, NJ, USA
Utkarsh Porwal
Universitat Autònoma de Barcelona, Barcelona, Spain
Alicia Fornés
National University of Sciences and Technology (NUST), Islamabad, Pakistan
Faisal Shafait

A Appendix

In this appendix, the pseudo-code of data augmentation in Sect. 3.2 is shown below for reference.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, X., Wang, T., Wang, J., Jin, L., Luo, C., Xue, Y. (2022). ChaCo: Character Contrastive Learning for Handwritten Text Recognition. In: Porwal, U., Fornés, A., Shafait, F. (eds) Frontiers in Handwriting Recognition. ICFHR 2022. Lecture Notes in Computer Science, vol 13639. Springer, Cham. https://doi.org/10.1007/978-3-031-21648-0_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-21648-0_24
Published: 25 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21647-3
Online ISBN: 978-3-031-21648-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

ChaCo: Character Contrastive Learning for Handwritten Text Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

CMT-Co: Contrastive Learning with Character Movement Task for Handwritten Text Recognition

Scene Text Recognition with Image-Text Matching-Guided Dictionary

Mixed Multimodal Contrastive Learning for Enhancing Detection and Correction of Faked and Misspelled Chinese Characters

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

ChaCo: Character Contrastive Learning for Handwritten Text Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

CMT-Co: Contrastive Learning with Character Movement Task for Handwritten Text Recognition

Scene Text Recognition with Image-Text Matching-Guided Dictionary

Mixed Multimodal Contrastive Learning for Enhancing Detection and Correction of Faked and Misspelled Chinese Characters

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation