Abstract
In the current paper, we have proposed a new multi-modal authorship verification approach for social media texts. Authorship verification is a task of verifying whether an unknown text is written by a suspect or not. Use of social media like Facebook and Twitter is increasing day by day because of digitization. People have grown accustomed to regularly post or tweet about their everyday life, memorable incidences, random thoughts, opinions, and much more. Emojis are widely used in these tweets and posts. The writing style of a user can differ from others, since word choices, sentence structures, usage of punctuation symbols, and use of emoji can be different. We have applied a multi-modal Siamese-based framework for automatic extraction of features from the given texts and emojis. After the extraction of features, the extracted features are applied to a neural network–based architecture for binary classification. A multi-modal Twitter-based dataset is created for evaluating the performance of the proposed framework. We obtained an average accuracy of 61.56% with 78.08%, 61.50%, and 58.32% precision, recall, and f-measure values, respectively.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The dataset can be received for research purpose by mailing the authors.
References
Ahmed H, et al. 2019. Sample size in arabic authorship verification. Association for Computational Linguistics.
Ahmed H. Dynamic similarity threshold in authorship verification: evidence from classical arabic. Procedia Comput Sci 2017;117:145–52.
Ahmed H. The role of linguistic feature categories in authorship verification. Procedia Comput Sci 2018;142: 214–21.
Ahmed H. Distance-based authorship verification across modern standard arabic genres. Proceedings of the 3rd workshop on arabic corpus linguistics; 2019. p. 89–96.
Al-Ghadir AI, Azmi AM. A study of arabic social media users—posting behavior and author’s gender prediction. Cogn Comput 2019;11(1):71–86.
Bagnall D. 2015.
Bartoli A, Dagri A, De Lorenzo A, Medvet E, Tarlao F. An author verification approach based on differential features. Conference and labs of the evaluation forum. CEUR; 2015.
Bevendorff J, Hagen M, Stein B, Potthast M. Bias analysis and mitigation in the evaluation of authorship verification. Proceedings of the 57th annual meeting of the association for computational linguistic; 2019. p. 6301–6.
Boumber D, Zhang Y, Hosseinia M, Mukherjee A, Vilalta R. 2019. Robust authorship verification with transfer learning. Tech. rep., EasyChair.
Brocardo ML, Traore I, Saad S, Woungang I. Authorship verification for short messages using stylometry. 2013 International conference on computer, information and telecommunication systems (CITS). IEEE; 2013. p. 1–6.
Brocardo ML, Traore I, Woungang I. Authorship verification of e-mail and tweet messages applied for continuous authentication. J Comput Syst Sci 2015;81(8):1429–40.
Bromley J, Guyon I, LeCun Y, Säckinger E, Shah R. Signature verification using a“ siamese” time delay neural network. Advances in neural information processing systems; 1994. p. 737–44.
Canales O, Monaco V, Murphy T, Zych E, Stewart J, Castro CTA, Sotoye O, Torres L, Truley G. 2011. A stylometry system for authenticating students taking online tests. P. of Student-Faculty Research Day, Ed., CSIS. Pace University.
Castro DC, Arcia YA, Brioso MP, Guillena RM. Authorship verification, average similarity analysis. Proceedings of the international conference recent advances in natural language processing; 2015. p. 84–90.
Ding SH, Fung BC, Iqbal F, Cheung WK. Learning stylometric representations for authorship analysis. IEEE Trans Cybern 2017;49(1):107–21.
Eisner B, Rocktäschel T, Augenstein I, Bošnjak M, Riedel S. 2016. emoji2vec: learning emoji representations from their description. arXiv:1609.08359.
Fréry J, Largeron C, Juganaru-Mathieu M. 2014. Ujm at clef in author identification. In: Proceedings CLEF-2014, Working Notes, pp 1042–48.
Frery J, Largeron C, Juganaru-Mathieu M. Ujm at clef in author verification based on optimized classification trees. Proc. int. conf. CLEF notebook PAN; 2014. p. 1042–8.
Halvani O, Graner L, Vogel I. Authorship verification in the absence of explicit features and thresholds. European conference on information retrieval. Springer; 2018. p. 454–65.
Hochreiter S, Schmidhuber J. Long short-term memory. Neur Comput 1997;9(8):1735–80.
Hosseinia M, Mukherjee A. 2018. Experiments with neural networks for small and large scale authorship verification. arXiv:1803.06456.
Hürlimann M, Weck B, van den Berg E, Suster S, Nissim M. Glad: groningen lightweight authorship detection. CLEF (Working Notes); 2015.
Kestemont M, Tschuggnall M, Stamatatos E, Daelemans W, Specht G, Stein B, Potthast M. Overview of the author identification task at pan-2018: cross-domain authorship attribution and style change detection. Working notes papers of the CLEF 2018 evaluation labs. Avignon, France, September 10-14, 2018/Cappellato, Linda [edit.]; et al; 2018. p. 1–25.
Khonji M, Iraqi Y. A slightly-modified gi-based author-verifier with lots of features (asgalf). CLEF (Working Notes) 2014;1180:977–83.
Koch G, Zemel R, Salakhutdinov R. Siamese neural networks for one-shot image recognition. ICML deep learning workshop; 2015.
Kocher M, Savoy J. A simple and efficient algorithm for authorship verification. J Assoc Inform Sci Technol 2017;68(1):259–69.
Koppel M, Schler J. Authorship verification as a one-class classification problem. Proceedings of the twenty-first international conference on machine learning. ACM; 2004 . p. 62.
Koppel M, Schler J, Argamon S. Authorship attribution in the wild. Lang Resour Eval 2011;45(1):83–94.
Koppel M, Schler J, Bonchek-Dokow E. Measuring differentiability: unmasking pseudonymous authors. J Mach Learn Res 2007;8:1261–76.
Koppel M, Winter Y. Determining if two documents are written by the same author. J Assoc Inform Sci Technol 2014;65(1):178–87.
Li Y, Yang L, Xu B, Wang J, Lin H. 2019. Improving user attribute classification with text and social network attention. Cogn Comput, 1–10.
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems; 2013. p. 3111–9.
Moreau E, Jayapal A, Lynch G, Vogel C. 2015. Author verification: basic stacked generalization applied to predictions from a set of heterogeneous learners-notebook for pan at clef 2015.
Pennington J, Socher R, Manning C. Glove: global vectors for word representation. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP); 2014. p. 1532–43.
Poria S, Cambria E, Hussain A, Huang GB. Towards an intelligent framework for multimodal affective data analysis. Neur Netw 2015;63:104–16.
Poria S, Chaturvedi I, Cambria E, Hussain A. Convolutional mkl based multimodal emotion recognition and sentiment analysis. 2016 IEEE 16th international conference on data mining (ICDM). IEEE; 2016. p. 439–48.
Potha N, Stamatatos E. A profile-based method for authorship verification. Hellenic conference on artificial intelligence. Springer; 2014. p. 313–26.
Potha N, Stamatatos E. An improved impostors me verification. International conference of the cross-language evaluation forum for European languages. Springer; 2017. p. 138–44.
Potha N, Stamatatos E. Intrinsic author verification using topic modeling. Proceedings of the 10th Hellenic conference on artificial intelligence. ACM; 2018. p. 20.
Potha N, Stamatatos E. Dynamic ensemble selection for author verification. European conference on information retrieval. Springer; 2019. p. 102–15.
Schwartz R, Tsur O, Rappoport A, Koppel M. Authorship attribution of micro-messages. Proceedings of the 2013 conference on empirical methods in natural language processing; 2013. p. 1880–91.
Seidman S. Authorship verification using the impostors method. CLEF 2013 evaluation labs and workshop-online working notes. Citeseer; 2013.
Stamatatos E. Authorship verification: a review of recent advances. Res Comput Sci 2016;123:9–25.
Young T, Hazarika D, Poria S, Cambria E. Recent trends in deep learning based natural language processing. IEEE Comput Intell Mag 2018;13(3):55–75.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Suman, C., Saha, S., Bhattacharyya, P. et al. Emoji Helps! A Multi-modal Siamese Architecture for Tweet User Verification. Cogn Comput 13, 261–276 (2021). https://doi.org/10.1007/s12559-020-09715-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-020-09715-7