Abstract
Current mainstream text recognition models rely heavily on large-scale data, requiring expensive annotations to achieve high performance. Contrast-based self-supervised learning methods aimed at minimizing distances between positive pairs provide a nice way to alleviate this problem. Previous studies are implemented from the perspective of words, taking the entire word image as model input. But characters are actually the basic elements of words, so in this paper, we implement contrastive learning from another perspective, i.e., the perspective of characters. Specifically, a simple yet effective method, termed ChaCo, is proposed, which takes the characters and strokes (called a character unit) cropped from the word image as model input. However, in the commonly used random cropping approach, the positive pairs may contain completely different characters, in which case it is unreasonable to minimize the distance between positive pairs. To address this issue, we introduce a Character Unit Cropping Module (CUCM) to ensure the positive pairs contain the same characters by constraining the selection region of the positive sample. Experiments show that our proposed method can achieve much better representation quality than previous methods while requiring fewer computation resources. Under the semi-supervised setting, ChaCo can achieve promising performance with an accuracy improvement of 13.1 points on the IAM dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
We contacted the authors of SeqCLR to get the training details.
References
Aberdam, A., et al.: Sequence-to-sequence contrastive learning for text recognition. In: CVPR, pp. 15302–15312 (2021)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)
Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., Kalinin, A.A.: Albumentations: fast and flexible image augmentations. Information 11(2), 125 (2020)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML, pp. 1597–1607 (2020)
Chen, X., Fan, H., Girshick, R., He, K.: Improved baselines with momentum contrastive learning (2020). arXiv preprint arXiv:2003.04297
Chen, X., He, K.: Exploring simple siamese representation learning. In: CVPR, pp. 15750–15758 (2021)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006)
Grill, J.B., et al.: Bootstrap your own latent-a new approach to self-supervised learning. In: NeurIPS, vol. 33, pp. 21271–21284 (2020)
Grosicki, E., Abed, H.E.: ICDAR 2009 handwriting recognition competition. In: ICDAR, pp. 1398–1402. IEEE Computer Society (2009)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR, pp. 9726–9735 (2020)
Kleber, F., Fiel, S., Diem, M., Sablatnig, R.: CVL-DataBase: an off-line database for writer retrieval, writer identification and word spotting. In: ICDAR, pp. 560–564 (2013)
Liu, H., et al.: Perceiving stroke-semantic context: hierarchical contrastive learning for robust scene text recognition. In: AAAI (2022)
Liu, X., et al.: Self-supervised learning: generative or contrastive. IEEE Trans. Knowl. Data Eng. 1 (2021)
Luo, C., Jin, L., Chen, J.: SimAN: exploring self-supervised representation learning of scene text via similarity-aware normalization. In: CVPR (2022)
Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002). https://doi.org/10.1007/s100320200071
Nguyen, N., et al.: Dictionary-guided scene text recognition. In: CVPR, pp. 7383–7392 (2021)
van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding (2018). arXiv preprint arXiv:1807.03748
Tendle, A., Hasan, M.R.: A study of the generalizability of self-supervised representations. Mach. Learn. Appl. 6, 100124 (2021)
Wang, T., et al.: Decoupled attention network for text recognition. In: AAAI, vol. 34, pp. 12216–12224 (2020)
Wang, T., et al.: Implicit feature alignment: learn to convert text recognizer to text spotter. In: CVPR, pp. 5973–5982 (2021)
Wang, X., Zhang, R., Shen, C., Kong, T., Li, L.: Dense contrastive learning for self-supervised visual pre-training. In: CVPR, pp. 3024–3033 (2021)
Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: CVPR, pp. 3733–3742 (2018)
Yan, R., Peng, L., Xiao, S., Yao, G.: Primitive representation learning for scene text recognition. In: CVPR, pp. 284–293 (2021)
Zbontar, J., Jing, L., Misra, I., LeCun, Y., Deny, S.: Barlow twins: self-supervised learning via redundancy reduction. In: ICML, pp. 12310–12320 (2021)
Acknowledgement
This research is supported in part by NSFC (Grant No. 61936003), GD-NSF (no. 2017A030312006, No. 2021A1515011870), Zhuhai Industry Core and Key Technology Research Project (no. ZH22044702200058PJL), and the Science and Technology Foundation of Guangzhou Huangpu Development District (Grant 2020GH17).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
In this appendix, the pseudo-code of data augmentation in Sect. 3.2 is shown below for reference.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, X., Wang, T., Wang, J., Jin, L., Luo, C., Xue, Y. (2022). ChaCo: Character Contrastive Learning for Handwritten Text Recognition. In: Porwal, U., Fornés, A., Shafait, F. (eds) Frontiers in Handwriting Recognition. ICFHR 2022. Lecture Notes in Computer Science, vol 13639. Springer, Cham. https://doi.org/10.1007/978-3-031-21648-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-21648-0_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21647-3
Online ISBN: 978-3-031-21648-0
eBook Packages: Computer ScienceComputer Science (R0)