A shapelet-based framework for large-scale word-level sign language database auto-construction

Ma, Xiang; Wang, Qiang; Zheng, Tianyou; Yuan, Lin

doi:10.1007/s00521-022-08018-2

A shapelet-based framework for large-scale word-level sign language database auto-construction

Review
Published: 20 November 2022

Volume 35, pages 253–274, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Xiang Ma ORCID: orcid.org/0000-0002-1301-2807¹,
Qiang Wang¹,
Tianyou Zheng¹ &
…
Lin Yuan¹

363 Accesses
Explore all metrics

Abstract

Sign language recognition is a challenging and often underestimated problem that includes the asynchronous integration of multimodal articulators. Learning powerful applied statistical models requires much training data. However, well-labelled sign language databases are a scarce resource due to the high cost of manual labelling and performing. On the other hand, there exist a lot of sign language-interpreted videos on the Internet. This work aims to propose a framework to automatically learn a large-scale sign language database from sign language-interpreted videos. We achieved this by exploring the correspondence between subtitles and motions by discovering shapelets which are the most discriminative subsequences within the data sequences. In this paper, two modified shapelet methods were used to identify the target signs for 1000 words from 89 (96 h, 8 naive signers) sign language-interpreted videos in terms of brute force search and parameter learning. Then, an augmented (3–5 times larger) large-scale word-level sign database was finally constructed using an adaptive sample augmentation strategy that collected all similar video clips of the target sign as valid samples. Experiments on a subset of 100 words revealed a considerable speedup and 14% improvement in recall rate. The evaluation of three state-of-the-art sign language classifiers demonstrates the good discrimination of the database, and the sample augmentation strategy can significantly increase the recognition accuracy of all classifiers by 10–33% by increasing the number, variety, and balance of the data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sign Languague Recognition Without Frame-Sequencing Constraints: A Proof of Concept on the Argentinian Sign Language

Construction of a Japanese Sign Language Database with Various Data Types

Dynamic Affine-Invariant Shape-Appearance Handshape Features and Classification in Sign Language Videos

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Code availability

The database, models, and code are available at https://github.com/hitmaxiang/SPBSL.

Notes

References

Vos T, Barber RM, Bell B, Bertozzi-Villa A, Biryukov S, Bolliger I, Charlson F, Davis A, Degenhardt L, Dicker D (2015) Global, regional, and national incidence, prevalence, and years lived with disability for 301 acute and chronic diseases and injuries in 188 countries, 1990–2013: a systematic analysis for the global burden of disease study 2013. The Lancet 386(9995):743–800. https://doi.org/10.1016/S0140-6736(15)60692-4
Article Google Scholar
Olusanya BO, Neumann KJ, Saunders JE (2014) The global burden of disabling hearing impairment: a call to action. Bull World Health Organ 92:367–373. https://doi.org/10.2471/BLT.13.128728
Article Google Scholar
Stokoe J, William C (2005) Sign language structure: an outline of the visual communication systems of the American deaf. J Deaf Studi Deaf Educ 10(1):3–37. https://doi.org/10.1093/deafed/eni001
Article Google Scholar
Rabiner LR (1989) Tutorial on hidden Markov models and selected applications in speech recognition. In: Proceedings of the IEEE, vol 77, pp 257–286. https://doi.org/10.1109/5.18626
McCallum A, Freitag D, Pereira FCN (2000) Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the seventeenth international conference on machine learning (ICML 2000), pp 591–598
Lafferty JD, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning (ICML 2001), pp 282–289
Yu S-H, Huang C-L, Hsu S-C, Lin H-W, Wang H-W (2011) Vision-based continuous sign language recognition using product hmm. In: The first Asian conference on pattern recognition, pp 510–514. https://doi.org/10.1109/ACPR.2011.6166631
Wu C-H, Lin J-C, Wei W-L (2013) Two-level hierarchical alignment for semi-coupled hmm-based audiovisual emotion recognition with temporal course. IEEE Trans Multimedia 15(8):1880–1895. https://doi.org/10.1109/TMM.2013.2269314
Article Google Scholar
Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1724–1734. https://doi.org/10.3115/v1/D14-1179
Li D, Opazo CR, Yu X, Li H (2020) Word-level deep sign language recognition from video: a new large-scale dataset and methods comparison. In: 2020 IEEE winter conference on applications of computer vision (WACV), pp 1448–1458. https://doi.org/10.1109/WACV45572.2020.9093512
Carreira J, Zisserman A (2017) Quo Vadis, action recognition? A new model and the kinetics dataset. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 4724–4733. https://doi.org/10.1109/CVPR.2017.502
Kadous MW (2002) Temporal classification: extending the classification paradigm to multivariate time series. PhD thesis, School of Computer Science and Engineering, University of New South Wales
Fels SS, Hinton GE (1993) Glove-talk: a neural network interface between a data-glove and a speech synthesizer. IEEE Trans Neural Netw 4(1):2–8. https://doi.org/10.1109/72.182690
Article Google Scholar
Gao W, Ma J, Shan S, Chen X, Zeng W, Zhang H, Yan J, Wu J (2000) Handtalker: a multimodal dialog system using sign language and 3-d virtual human. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 1948. Beijing, China, pp 564–571. https://doi.org/10.1007/3-540-40063-x_74
Chai X, Wang H, Chen X (2014) The Devisign large vocabulary of Chinese sign language database and baseline evaluations. Technical report VIPL-TR-14-SLR-001. Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS
Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L (2009) Detecting influenza epidemics using search engine query data. Nature 457(7232):1012–1014. https://doi.org/10.1038/nature07634
Article Google Scholar
Xu E, Nemati S, Tremoulet AH (2022) A deep convolutional neural network for Kawasaki disease diagnosis. Sci Rep 12(1):1–6. https://doi.org/10.1038/s41598-022-15495-x
Article Google Scholar
Morales J, Yoshimura N, Xia Q, Wada A, Namioka Y, Maekawa T (2022) Acceleration-based human activity recognition of packaging tasks using motif-guided attention networks. In: 2022 IEEE international conference on pervasive computing and communications (PerCom), pp 1–12. https://doi.org/10.1109/PerCom53586.2022.9762388
Kumar P, Roy PP, Dogra DP (2018) Independent Bayesian classifier combination based sign language recognition using facial expression. Inf Sci 428:30–48. https://doi.org/10.1016/j.ins.2017.10.046
Article MathSciNet Google Scholar
Saeed S, Mahmood MK, Khan YD (2018) An exposition of facial expression recognition techniques. Neural Comput Appl 29(9):425–443. https://doi.org/10.1007/s00521-016-2522-2
Article Google Scholar
Shao Z, Li YF (2013) A new descriptor for multiple 3d motion trajectories recognition. In: 2013 IEEE international conference on robotics and automation, pp 4749–4754. https://doi.org/10.1109/ICRA.2013.6631253
Shao Z, Li Y (2015) Integral invariants for space motion trajectory matching and recognition. Pattern Recogn 48(8):2418–2432. https://doi.org/10.1016/j.patcog.2015.02.029
Article MATH Google Scholar
Wang H, Chai X, Chen X (2016) Sparse observation (so) alignment for sign language recognition. Neurocomputing 175:674–685. https://doi.org/10.1016/j.neucom.2015.10.112
Article Google Scholar
Kumar EK, Kishore PVV, Kiran Kumar MT, Kumar DA (2020) 3d sign language recognition with joint distance and angular coded color topographical descriptor on a 2 stream CNN. Neurocomputing 372:40–54. https://doi.org/10.1016/j.neucom.2019.09.059
Article Google Scholar
Ma X, Yuan L, Wen R, Wang Q (2020) Sign language recognition based on concept learning. In: 2020 IEEE international instrumentation and measurement technology conference (I2MTC), pp 1–6. https://doi.org/10.1109/I2MTC43012.2020.9128734
Wadhawan A, Kumar P (2020) Deep learning-based sign language recognition system for static signs. Neural Comput Appl 32(12):7957–7968. https://doi.org/10.1007/s00521-019-04691-y
Article Google Scholar
Güney S, Erkuş M (2021) A real-time approach to recognition of Turkish sign language by using convolutional neural networks. Neural Comput Appl. https://doi.org/10.1007/s00521-021-06664-6
Article Google Scholar
Huang J, Zhou W, Zhang Q, Li H, Li W (2018) Video-based sign language recognition without temporal segmentation. In: 32nd AAAI conference on artificial intelligence, AAAI 2018, vol 32, pp 2257–2264
Kumar P, Gauba H, Pratim Roy P, Prosad Dogra D (2017) A multimodal framework for sensor based sign language recognition. Neurocomputing 259:21–38. https://doi.org/10.1016/j.neucom.2016.08.132
Article Google Scholar
Gao L, Li H, Liu Z, Liu Z, Wan L, Feng W (2021) RNN-transducer based Chinese sign language recognition. Neurocomputing 434:45–54. https://doi.org/10.1016/j.neucom.2020.12.006
Article Google Scholar
Cihan Camgöz N, Koller O, Hadfield S, Bowden R (2020) Sign language transformers: joint end-to-end sign language recognition and translation. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10020–10030. https://doi.org/10.1109/CVPR42600.2020.01004
Liu Y, Zhang H, Xu D, He K (2022) Graph transformer network with temporal kernel attention for skeleton-based action recognition. Knowl Based Syst 240:108146. https://doi.org/10.1016/j.knosys.2022.108146
Article Google Scholar
Sun M, Savarese S (2011) Articulated part-based model for joint object detection and pose estimation. In: Proceedings of the IEEE international conference on computer vision, Barcelona, Spain, pp 723–730. https://doi.org/10.1109/ICCV.2011.6126309
Tompson J, Goroshin R, Jain A, LeCun Y, Bregler C (2015) Efficient object localization using convolutional networks. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 07-12-June-2015, Boston, MA, USA, pp 648–656. https://doi.org/10.1109/CVPR.2015.7298664
Wei S-E, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 2016-December, Las Vegas, NV, USA, pp 4724–4732. https://doi.org/10.1109/CVPR.2016.511
Simon T, Joo H, Matthews I, Sheikh Y (2017) Hand keypoint detection in single images using multiview bootstrapping. In: Proceedings-30th IEEE conference on computer vision and pattern recognition, CVPR 2017, vol 2017-January, Honolulu, HI, USA, pp 4645–4653. https://doi.org/10.1109/CVPR.2017.494
JOZE HV, Koller O (2016) Ms-asl: a large-scale data set and benchmark for understanding American sign language. In: Proceedings of the British machine vision conference, pp 41–14116. https://doi.org/10.5244/C.33.41
Albanie S, Varol G, Momeni L, Afouras T, Chung JS, Fox N, Zisserman A (2020) Bsl-1k: scaling up co-articulated sign language recognition using mouthing cues. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 12356 LNCS, Glasgow, UK, pp 35–53. https://doi.org/10.1007/978-3-030-58621-8_3
Momeni L, Varol G, Albanie S, Afouras T, Zisserman A (2021) Watch, read and lookup: learning to spot signs from multiple supervisors. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 12627 LNCS, pp 291–308. https://doi.org/10.1007/978-3-030-69544-6_18
Barbara L, Loeding AP, Sudeep Sarkar, Karshmer AI (2004) Progress in automated computer recognition of sign language. In: Computers helping people with special needs, 9th international conference, ICCHP 2004, Paris, France, July 7–9, 2004, Proceedings. Lecture notes in computer science, vol 3118, pp 1079–1087. https://doi.org/10.1007/978-3-540-27817-7_159
Martinez AM, Wilbur RB, Shay R, Kak AC (2002) Purdue RVL-SLLL ASL database for automatic recognition of American sign language. In: Proceedings 4th IEEE international conference on multimodal interfaces, ICMI 2002, pp 167–172. https://doi.org/10.1109/ICMI.2002.1166987
Zahedi M, Keysers D, Deselaers T, Ney H (2005) Combination of tangent distance and an image distortion model for appearance-based sign language. In: Lecture notes in computer science, vol 3663, Vienna, Austria, pp 401–408. https://doi.org/10.1007/11550518_50
Liu B, Xiao Y, Hao Z (2018) A selective multiple instance transfer learning method for text categorization problems. Knowl Based Syst 141:178–187. https://doi.org/10.1016/j.knosys.2017.11.019
Article Google Scholar
Zhang Y, Zhang H, Tian Y (2020) Sparse multiple instance learning with non-convex penalty. Neurocomputing 391:142–156. https://doi.org/10.1016/j.neucom.2020.01.100
Article Google Scholar
Buehler P, Everingham M, Zisserman A (2009) Learning sign language by watching tv (using weakly aligned subtitles). In: 2009 IEEE computer society conference on computer vision and pattern recognition workshops, CVPR workshops 2009, pp 2961–2968. https://doi.org/10.1109/CVPRW.2009.5206523
Pfister T, Charles J, Zisserman A (2013) Large-scale learning of sign language by watching tv (using co-occurrences). In: Proceedings of the British machine vision conference, pp 20–12011. https://doi.org/10.5244/C.27.20
Andrews S, Tsochantaridis I, Hofmann T (2002) Support vector machines for multiple-instance learning. In: Advances in neural information processing systems, vol 15, pp 561–568
Cooper H, Bowden R (2009) Learning signs from subtitles: a weakly supervised approach to sign language recognition. In: 2009 IEEE conference on computer vision and pattern recognition, pp 2568–2574. https://doi.org/10.1109/CVPR.2009.5206647
Varol G, Momeni L, Albanie S, Afouras T, Zisserman A (2021) Read and attend: temporal localisation in sign language videos. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 16852–16861. https://doi.org/10.1109/CVPR46437.2021.01658
Miech A, Alayrac J-B, Smaira L, Laptev I, Sivic J, Zisserman A (2020) End-to-end learning of visual representations from uncurated instructional videos. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 9876–9886. https://doi.org/10.1109/CVPR42600.2020.00990
Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, Paris, France, pp 947–955. https://doi.org/10.1145/1557019.1557122
Mueen A, Keogh E, Young N (2011) Logical-shapelets: an expressive primitive for time series classification. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 1154–1162. https://doi.org/10.1145/2020408.2020587
Rakthanmanon T, Keogh E (2013) Fast shapelets: a scalable algorithm for discovering time series shapelets. In: SIAM international conference on data mining 2013, SMD 2013, Austin, TX, USA, pp 668–676
Chang K-W, Deka B, Hwu W-MW, Roth D (2012) Efficient pattern-based time series classification on GPU. In: Proceedings-IEEE international conference on data mining, ICDM, Brussels, Belgium, pp 131–140. https://doi.org/10.1109/ICDM.2012.132
Ji C, Zhao C, Liu S, Yang C, Pan L, Wu L, Meng X (2019) A fast shapelet selection algorithm for time series classification. Comput Netw 148:231–240. https://doi.org/10.1016/j.comnet.2018.11.031
Article Google Scholar
Hu Y, Zhan P, Xu Y, Zhao J, Li Y, Li X (2021) Temporal representation learning for time series classification. Neural Comput Appl 33(8):3169–3182. https://doi.org/10.1007/s00521-020-05179-w
Article Google Scholar
Grabocka J, Schilling N, Wistuba M, Schmidt-Thieme L (2014) Learning time-series shapelets. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, New York, NY, USA, pp 392–401. https://doi.org/10.1145/2623330.2623613
Zhang Z, Zhang H, Wen Y, Zhang Y, Yuan X (2018) Discriminative extraction of features from time series. Neurocomputing 275:2317–2328. https://doi.org/10.1016/j.neucom.2017.11.002
Article Google Scholar
Shah M, Grabocka J, Schilling N, Wistuba M, Schmidt-Thieme L (2016) Learning DTW-Shapelets for time-series classification. In: Proceedings of the 3rd IKDD conference on data science, 2016, pp 1–8. https://doi.org/10.1145/2888451.2888456
Ma Q, Zhuang W, Li S, Huang D, Cottrell G (2020) Adversarial dynamic shapelet networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 5069–5076
Pfister T, Charles J, Zisserman A (2014) Domain-adaptive discriminative one-shot learning of gestures. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 8694 LNCS, Zurich, Switzerland, pp 814–829. https://doi.org/10.1007/978-3-319-10599-4_52
Yeh C-CM, Zhu Y, Ulanova L, Begum N, Ding Y, Dau HA, Silva DF, Mueen A, Keogh E (2016) Matrix profile i: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: Proceedings-IEEE international conference on data mining, ICDM, vol 0, Barcelona, Catalonia, Spain, pp 1317–1322. https://doi.org/10.1109/ICDM.2016.89
Zhu Y, Zimmerman Z, Senobari NS, Yeh C-CM, Funning G, Mueen A, Brisk P, Keogh E (2016) Matrix profile ii: exploiting a novel algorithm and GPUS to break the one hundred million barrier for time series motifs and joins. In: Proceedings-IEEE international conference on data mining, ICDM, vol 0, Barcelona, Catalonia, Spain, pp 739–748. https://doi.org/10.1109/ICDM.2016.126
Parliament S (2021) The playlist of BSL videos. https://youtube.com/playlist?list=PL4l0q4AbG0mmB3AEL6F-DCjK7uhRp0ll_. Accessed 21 July
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant No. 61876054.

Author information

Authors and Affiliations

Department of Control Science and Engineering, Harbin Institute of Technology, Harbin, 150001, People’s Republic of China
Xiang Ma, Qiang Wang, Tianyou Zheng & Lin Yuan

Authors

Xiang Ma
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tianyou Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Lin Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiang Ma.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ma, X., Wang, Q., Zheng, T. et al. A shapelet-based framework for large-scale word-level sign language database auto-construction. Neural Comput & Applic 35, 253–274 (2023). https://doi.org/10.1007/s00521-022-08018-2

Download citation

Received: 01 March 2022
Accepted: 26 October 2022
Published: 20 November 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s00521-022-08018-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A shapelet-based framework for large-scale word-level sign language database auto-construction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Sign Languague Recognition Without Frame-Sequencing Constraints: A Proof of Concept on the Argentinian Sign Language

Construction of a Japanese Sign Language Database with Various Data Types

Dynamic Affine-Invariant Shape-Appearance Handshape Features and Classification in Sign Language Videos

Code availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A shapelet-based framework for large-scale word-level sign language database auto-construction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Sign Languague Recognition Without Frame-Sequencing Constraints: A Proof of Concept on the Argentinian Sign Language

Construction of a Japanese Sign Language Database with Various Data Types

Dynamic Affine-Invariant Shape-Appearance Handshape Features and Classification in Sign Language Videos

Explore related subjects

Code availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation