Abstract
Determining the writer or transcriber of historical Arabic manuscripts has always been a major challenge for researchers in the field of humanities. With the development of advanced techniques in pattern recognition and machine learning, these technologies have been applied to automate the extraction of paleographical features in order to solve this issue. This paper presents a baseline system for writer identification, tested on a Historical Arabic dataset of 11610 single and double folio images. These texts were extracted from a unique collection of 567 Historical Arabic Manuscripts available at the Balamand Digital Humanities Center. A survey has been conducted on the available Arabic datasets and previously proposed techniques and algorithms. The Balamand dataset presents an important challenge due to the geo-historical identity of manuscripts and their physical conditions. An advanced Deep Learning system was developed and tested on three different Latin and Arabic datasets: ICDAR19, ICFHR20 and KHATT, before testing it on the Balamand dataset. The system was compared with many other systems and it has yielded a state-of-the-art performance on the new challenging images with 95.2% mean Average Precision (mAP) and 98.1% accuracy.
Similar content being viewed by others
Notes
Mention the Arabic corpora in the domain of OCR and handwritten recognition.
These manuscripts were digitized by the Saint Joseph of Damascus Manuscript Conservation Center (http://www.balamandmonastery.org.lb/index.php/about-the-center) and the Digital Humanities Centre (http://iohanes.uob-dh.org/?q=en/tags/digital-humanities).
The total number of digitized pages exceeds the number of photos.
“A statement providing information regarding the date, place, agency, or reason for production of the manuscript or other object” [29]
A frame made of cardboard or occasionally of wood on which cords of various thickness could be stretched, corresponding to the text frame lines and guidelines [17].
References
Abdelhaleem A, Droby A, Asi A, Kassis M, Al Asam R, El-sanaa J (2017) Wahd: a database for writer identification of arabic historical documents. In: 2017 1st International workshop on arabic script analysis and recognition (ASAR), pp 64–68. IEEE
Abdleazeem S, El-Sherif E (2008) Arabic handwritten digit recognition. Int J Doc Anal Recogn (IJDAR) 11:127–141
Asi A, Abdalhaleem A, Fecker D, Märgner V, El-Sana J (2017) On writer identification for arabic historical manuscripts. Int J Doc Anal Recogn (IJDAR) 20:173–187
Awaida S, Mahmoud S (2011) Writer identification of arabic handwritten digits. In: First international workshop on frontiers in arabic handwritng recognition, 2010
Awaida SM, Mahmoud SA (2012) State of the art in off-line writer identification of handwritten text and survey of writer identification of arabic text. Educ Res Rev 7:445
Bausi A, Borbone PG, Briquel-Chatonnet F, Buzi P, Gippert J, Macé C, Melissakēs Z, Parodi LE, Witakowski W, Sokolinski E (2015) Comparative Oriental manuscript studies: an introduction. COMSt
Chammas M, Makhoul A, Demerjian J (2020) Writer identification for historical handwritten documents using a single feature extraction method. In: 19th IEEE International conference on machine learning and applications (ICMLA 2020)
Chandra K, Kapoor G, Kohli R, Gupta A (2016) Improving software quality using machine learning. In: 2016 international conference on innovation and challenges in cyber security (ICICCS-INBUSH), pp 115–118. IEEE
Chaurasia P, Kohli R, Garg A (2014) Biometrics minutiae detection and feature extraction. LAP LAMBERT Academic Publishing
Chen S, Wang Y, Lin C-T, Ding W, Cao Z (2019) Semi-supervised feature learning for improving writer identification. Inform Sci 482:156–170
Christlein V, Bernecker D, Honig F, Angelopoulou E (2014) Writer identification and verification using GMM supervectors. IEEE Winter Conference on Applications of Computer Vision
Christlein V, Bernecker D, Hönig F, Maier A, Angelopoulou E (2017) Writer identification using GMM supervectors and Exemplar-SVMs. Pattern Recogn 63:258–267
Christlein V, Gropp M, Fiel S, Maier A (2017) Unsupervised feature learning for writer identification and writer retrieval. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR)
Christlein V, Maier A (2018) Encoding CNN activations for writer recognition. In: 2018 13th IAPR international workshop on document analysis systems (DAS)
Christlein V, Nicolaou A, Seuret M, Stutzmann D, Maier A (2019) ICDAR 2019 competition on image retrieval for historical handwritten documents. arXiv [cs.CV]
Dé roche FÇO, Rossi VS (2012) The manuscripts in Arabic characters. Viella
Déroche F et al (2005) Islamic codicology. An Introduction to the Study of Manuscripts in Arabic Script
Djeddi C, Souici-Meslati L (2011) Artificial immune recognition system for arabic writer identification. In: International symposium on innovations in information and communications technology, pp 159–165. IEEE
Fecker D, Asi A, Pantke W, Märgner V, El-Sana J, Fingscheidt T (2014) Document writer analysis with rejection for historical arabic manuscripts. In: 2014 14th international conference on frontiers in handwriting recognition, pp 743–748. IEEE
Fecker D, Asit A, Märgner V, El-Sana J, Fingscheidt T (2014) Writer identification for historical arabic documents. In: 2014 22nd International conference on pattern recognition, pp 3050–3055. IEEE
Fiel S, Sablatnig R (2015) Writer identification and retrieval using a convolutional neural network. Computer Analysis of Images and Patterns, 26–37
Hannad Y, Siddiqi I, Djeddi C, El-Kettani ME-Y (2019) Improving arabic writer identification using score-level fusion of textural descriptors. IET Biometr 8:221–229
Lai S, Zhu Y, Jin L (2020) Encoding pathlet and sift features with bagged vlad for historical writer identification. IEEE Trans Inform Forens Secur 15:3553–3566
Mahmoud SA, Ahmad I, Al-Khatib WG, Alshayeb M, Parvez MT, Märgner V, Fink GA (2014) Khatt: an open arabic offline handwritten text database. Pattern Recogn 47:1096–1112
Mahmoud SA, Ahmad I, Alshayeb M, Al-Khatib WG, Parvez MT, Fink GA, Märgner V, El Abed H (2012) Khatt: Arabic offline handwritten text database. In: 2012 International conference on frontiers in handwriting recognition, pp 449–454. IEEE
Malisiewicz T, Gupta A, Efros AA Ensemble of exemplar-SVMs for object detection and beyond. In: 2011 International conference on computer vision, vol 2011
Nguyen HT, Nguyen CT, Ino T, Indurkhya B, Nakagawa M (2019) Text-independent writer identification using convolutional neural network. Pattern Recogn Lett 121:104–112
Pechwitz M, Maddouri S, Märgner V, Ellouze N, Amiri H (2002) Ifn/enit: database of handwritten arabic words
P5: Guidelines for electronic text encoding and interchange. https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-colophon.html. Accessed December 10th 2021
Rehman A, Naz S, Razzak MI (2019) Writer identification using machine learning approaches: a comprehensive review. Multimed Tools Appl 78:10889–10931
Seuret M, Nicolaou A, Maier A, Christlein V, Stutzmann D (2020) Icfhr 2020 competition on image retrieval for historical handwritten fragments. In: 2020 17th International conference on frontiers in handwriting recognition (ICFHR), pp 216–221. IEEE
Slimane F, Awaida S, Mezghani A, Parvez MT, Kanoun S, Mahmoud SA, Märgner V (2014) Icfhr2014 competition on arabic writer identification using ahtid/mw and khatt databases. In: 2014 14th international conference on frontiers in handwriting recognition, pp 797–802. IEEE
The Arabic Manuscripts in the Antiochian Orthodox Monasteries in Lebanon volume 1–2. University of Balamand
Acknowledgements
This research is funded by the EIPHI Graduate School (contract “ANR-17-EURE-0002”). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Quadro RTX 6000 GPU used for this research.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chammas, M., Makhoul, A., Demerjian, J. et al. A deep learning based system for writer identification in handwritten Arabic historical manuscripts. Multimed Tools Appl 81, 30769–30784 (2022). https://doi.org/10.1007/s11042-022-12673-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12673-x