Abstract
Crowdsourcing is a powerful tool for massive transcription at a relatively low cost, since the transcription effort is distributed into a set of collaborators, and therefore, supervision effort of professional transcribers may be dramatically reduced. Nevertheless, collaborators are a scarce resource, which makes optimisation very important in order to get the maximum benefit from their efforts. In this work, the optimisation of the work load in the side of collaborators is studied in a multimodal crowdsourcing platform where speech dictation of handwritten text lines is used as transcription source. The experiments explore how this optimisation allows to obtain similar results reducing the number of collaborators and the number of text lines that they have to read.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Fischer, A., Wüthrich, M., Liwicki, M., Frinken, V., Bunke, H., Viehhauser, G., Stolz, M.: Automatic transcription of handwritten medieval documents. In: Proceedings of the 15th VSMM, pp. 137–142 (2009)
Plamondon, R., Srihari, S.N.: On-line and off-line handwriting recognition: a comprehensive survey. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 63–84 (2000)
Doan, A., Ramakrishnan, R., Halevy, A.Y.: Crowdsourcing systems on the world-wide web. Commun. ACM 54(4), 86–96 (2011)
Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice Hall, Upper Saddle River (1993)
Hinton, G., Deng, L., Dong, Y., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)
Granell, E., Martínez-Hinarejos, C.D.: A multimodal crowdsourcing framework for transcribing historical handwritten documents. In Proceedings of the 16th DocEng, pp. 157–163 (2016)
Bellegarda, J.R.: Statistical language model adaptation: review and perspectives. Speech Commun. 42(1), 93–108 (2004)
Xue, J., Zhao, Y.: Improved confusion network algorithm and shortest path search from word lattice. In: Proceedings of the 30th ICASSP, vol. 1, pp. 853–856 (2005)
Alabau, V., Romero, V., Lagarda, A.L., Martínez-Hinarejos, C.D.: A multimodal approach to dictation of handwritten historical documents. In: Proceedings of the 12th Interspeech, pp. 2245–2248 (2011)
Granell, E., Martínez-Hinarejos, C.D.: Combining handwriting and speech recognition for transcribing historical handwritten documents. In: Proceedings of the 13th ICDAR, pp. 126–130 (2015)
Rueber, B.: Obtaining confidence measures from sentence probabilities. In: Proceedings of the 5th Eurospeech, pp. 739–742 (1997)
Wessel, F., Schlüter, R., Macherey, K., Ney, H.: Confidence measures for large vocabulary continuous speech recognition. IEEE Trans. Speech Audio Process. 9(3), 288–298 (2001)
Serrano, N., Castro, F., Juan, A.: The RODRIGO database. In: Proceedings of the 7th LREC, pp. 2709–2712 (2010)
Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mariño, J.B., Nadeu, C.: Albayzin speech database: design of the phonetic corpus. In: Proceedings of the 3rd EuroSpeech, pp. 175–178 (1993)
Dreuw, P., Jonas, S., Ney, H.: White-space models for offline Arabic handwriting recognition. In: Proceedings of the 19th ICPR, pp. 1–4 (2008)
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.C.: The HTK Book. Cambridge University Engineering Department, Cambridge (2006)
Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proceedings of ICASSP, vol. 1, pp. 181–184 (1995)
Bisani, M., Ney, H.: Bootstrap estimates for confidence intervals in ASR performance evaluation. In: Proceedings of ICASSP, vol. 1, pp. 409–412 (2004)
Luján-Mares, M., Tamarit, V., Alabau, V., Martínez-Hinarejos, C.D., Pastor, M., Sanchis, A., Toselli, A.H.: iATROS: a speech and handwritting recognition system. In: Procedings of the V Jornadas en Tecnologías del Habla, pp. 75–78 (2008)
Stolcke, A.: SRILM-an extensible language modeling toolkit. In Proceedings of the 3rd Interspeech, pp. 901–904 (2002)
Acknowledgments
Work partially supported by projects SmartWays - RTC-2014-1466-4 (MINECO) and CoMUN-HaT - TIN2015-70924-C2-1-R (MINECO/FEDER).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Granell, E., Martínez-Hinarejos, CD. (2016). Collaborator Effort Optimisation in Multimodal Crowdsourcing for Transcribing Historical Manuscripts. In: Abad, A., et al. Advances in Speech and Language Technologies for Iberian Languages. IberSPEECH 2016. Lecture Notes in Computer Science(), vol 10077. Springer, Cham. https://doi.org/10.1007/978-3-319-49169-1_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-49169-1_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49168-4
Online ISBN: 978-3-319-49169-1
eBook Packages: Computer ScienceComputer Science (R0)