iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://api.crossref.org/works/10.7717/PEERJ-CS.1617
{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,19]],"date-time":"2024-09-19T16:33:50Z","timestamp":1726763630835},"reference-count":48,"publisher":"PeerJ","license":[{"start":{"date-parts":[[2023,10,18]],"date-time":"2023-10-18T00:00:00Z","timestamp":1697587200000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"abstract":"Social media platforms have become inundated with offensive language. This issue must be addressed for the growth of online social networks (OSNs) and a healthy online environment. While significant research has been devoted to identifying toxic content in major languages like English, this remains an open area of research in the low-resource Pashto language. This study aims to develop an AI model for the automatic detection of offensive textual content in Pashto. To achieve this goal, we have developed a benchmark dataset called the Pashto Offensive Language Dataset (POLD), which comprises tweets collected from Twitter and manually classified into two categories: \u201coffensive\u201d and \u201cnot offensive\u201d. To discriminate these two categories, we investigated the classic deep learning classifiers based on neural networks, including CNNs and RNNs, using static word embeddings: Word2Vec, fastText, and GloVe as features. Furthermore, we examined two transfer learning approaches. In the first approach, we fine-tuned the pre-trained multilingual language model, XLM-R, using the POLD dataset, whereas, in the second approach, we trained a monolingual BERT model for Pashto from scratch using a custom-developed text corpus. Pashto BERT was then fine-tuned similarly to XLM-R. The performance of all the deep learning and transformer learning models was evaluated using the POLD dataset. The experimental results demonstrate that our pre-trained Pashto BERT model outperforms the other models, achieving an F1-score of 94.34% and an accuracy of 94.77%.<\/jats:p>","DOI":"10.7717\/peerj-cs.1617","type":"journal-article","created":{"date-parts":[[2023,10,18]],"date-time":"2023-10-18T08:04:12Z","timestamp":1697616252000},"page":"e1617","source":"Crossref","is-referenced-by-count":3,"title":["Pashto offensive language detection: a benchmark dataset and monolingual Pashto BERT"],"prefix":"10.7717","volume":"9","author":[{"given":"Ijazul","family":"Haq","sequence":"first","affiliation":[{"name":"School of Cyber Science and Engineering, Shanghai Jiao Tong University, Shanghai, Minhang, China"}]},{"given":"Weidong","family":"Qiu","sequence":"additional","affiliation":[{"name":"School of Cyber Science and Engineering, Shanghai Jiao Tong University, Shanghai, Minhang, China"}]},{"given":"Jie","family":"Guo","sequence":"additional","affiliation":[{"name":"School of Cyber Science and Engineering, Shanghai Jiao Tong University, Shanghai, Minhang, China"}]},{"given":"Peng","family":"Tang","sequence":"additional","affiliation":[{"name":"School of Cyber Science and Engineering, Shanghai Jiao Tong University, Shanghai, Minhang, China"}]}],"member":"4443","published-online":{"date-parts":[[2023,10,18]]},"reference":[{"key":"10.7717\/peerj-cs.1617\/ref-1","doi-asserted-by":"publisher","first-page":"315","DOI":"10.1016\/j.procs.2018.10.491","article-title":"Towards accurate detection of offensive language in online communication in arabic","volume":"142","author":"Alakrot","year":"2018","journal-title":"Procedia Computer Science"},{"key":"10.7717\/peerj-cs.1617\/ref-2","doi-asserted-by":"publisher","first-page":"101365","DOI":"10.1016\/j.csl.2022.101365","article-title":"Hate speech detection on Twitter using transfer learning","volume":"74","author":"Ali","year":"2022","journal-title":"Computer Speech & Language"},{"issue":"1","key":"10.7717\/peerj-cs.1617\/ref-3","first-page":"59","article-title":"The harm in hate speech","volume":"29","author":"Allan","year":"2013","journal-title":"Constitutional Commentary"},{"key":"10.7717\/peerj-cs.1617\/ref-4","doi-asserted-by":"publisher","first-page":"100096","DOI":"10.1016\/j.osnem.2020.100096","article-title":"Hate and offensive speech detection on Arabic social media","volume":"19","author":"Alsafari","year":"2020","journal-title":"Online Social Networks and Media"},{"issue":"5","key":"10.7717\/peerj-cs.1617\/ref-5","doi-asserted-by":"publisher","first-page":"972","DOI":"10.14569\/IJACSA.2022.01305109","article-title":"BERT-based approach to arabic hate speech and offensive language detection in Twitter: exploiting emojis and sentiment analysis","volume":"13","author":"Althobaiti","year":"2022","journal-title":"International Journal of Advanced Computer Science and Applications"},{"key":"10.7717\/peerj-cs.1617\/ref-6","doi-asserted-by":"publisher","first-page":"203","DOI":"10.1016\/j.tcs.2022.06.020","article-title":"Deep learning and natural language processing in computation for offensive language detection in online social networks by feature selection and ensemble classification techniques","volume":"943","author":"Anand","year":"2022","journal-title":"Theoretical Computer Science"},{"key":"10.7717\/peerj-cs.1617\/ref-7","first-page":"478","article-title":"Overview of MEX-A3T at IberLEF 2019: authorship and aggressiveness analysis in Mexican Spanish Tweets","author":"Arag\u00f3n","year":"2019"},{"key":"10.7717\/peerj-cs.1617\/ref-8","doi-asserted-by":"publisher","DOI":"10.1109\/taffc.2022.3219229","article-title":"Pars-OFF: a benchmark for offensive language detection on farsi social media","author":"Ataei","year":"2022","journal-title":"IEEE Transactions on Affective Computing"},{"key":"10.7717\/peerj-cs.1617\/ref-9","first-page":"54","article-title":"Semeval-2019 task 5: multilingual detection of hate speech against immigrants and women in twitter","author":"Basile","year":"2019"},{"key":"10.7717\/peerj-cs.1617\/ref-10","doi-asserted-by":"publisher","first-page":"e906","DOI":"10.7717\/peerj-cs.906","article-title":"Detecting racism and xenophobia using deep learning models on Twitter data: CNN, LSTM and BERT","volume":"8","author":"Ben\u00edtez-Andrades","year":"2022","journal-title":"PeerJ Computer Science"},{"key":"10.7717\/peerj-cs.1617\/ref-11","first-page":"71","article-title":"Detecting offensive language in social media to protect adolescent online safety","author":"Chen","year":"2012"},{"issue":"3","key":"10.7717\/peerj-cs.1617\/ref-12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.2202\/1944-2866.1173","article-title":"Fighting hate and bigotry on the Internet","volume":"3","author":"Cohen-Almagor","year":"2011","journal-title":"Policy & Internet"},{"key":"10.7717\/peerj-cs.1617\/ref-13","article-title":"Unsupervised cross-lingual representation learning at scale","author":"Conneau","year":"2019"},{"key":"10.7717\/peerj-cs.1617\/ref-14","first-page":"693","article-title":"Improving cyberbullying detection with user context","author":"Dadvar","year":"2013"},{"key":"10.7717\/peerj-cs.1617\/ref-15","first-page":"512","article-title":"Automated hate speech detection and the problem of offensive language","author":"Davidson","year":"2017"},{"key":"10.7717\/peerj-cs.1617\/ref-16","first-page":"86","article-title":"Hate me, hate me not: hate speech detection on facebook","author":"Del Vigna","year":"2017"},{"key":"10.7717\/peerj-cs.1617\/ref-17","doi-asserted-by":"crossref","article-title":"Cold: a benchmark for chinese offensive language detection","author":"Deng","year":"2022","DOI":"10.18653\/v1\/2022.emnlp-main.796"},{"key":"10.7717\/peerj-cs.1617\/ref-18","article-title":"Bert: pre-training of deep bidirectional transformers for language understanding","author":"Devlin","year":"2018"},{"issue":"8","key":"10.7717\/peerj-cs.1617\/ref-19","doi-asserted-by":"publisher","first-page":"6048","DOI":"10.1016\/j.jksuci.2021.07.013","article-title":"A multilingual offensive language detection method based on transfer learning from transformer fine-tuning model","volume":"34","author":"El-Alami","year":"2022","journal-title":"Journal of King Saud University-Computer and Information Sciences"},{"key":"10.7717\/peerj-cs.1617\/ref-20","doi-asserted-by":"publisher","first-page":"102970","DOI":"10.1016\/j.specom.2023.102970","article-title":"Correction of whitespace and word segmentation in noisy Pashto text using CRF","volume":"153","author":"Haq","year":"2023","journal-title":"Speech Communication"},{"issue":"6","key":"10.7717\/peerj-cs.1617\/ref-21","doi-asserted-by":"publisher","first-page":"1344","DOI":"10.14569\/IJACSA.2023.01406142","article-title":"NLPashto: NLP toolkit for low-resource Pashto language","volume":"14","author":"Haq","year":"2023","journal-title":"International Journal of Advanced Computer Science and Applications"},{"key":"10.7717\/peerj-cs.1617\/ref-22","first-page":"196","article-title":"Transfer learning across arabic dialects for offensive language detection","author":"Husain","year":"2022"},{"key":"10.7717\/peerj-cs.1617\/ref-23","doi-asserted-by":"publisher","first-page":"e1169","DOI":"10.7717\/peerj-cs.1169","article-title":"Identification of offensive language in Urdu using semantic and embedding models","volume":"8","author":"Hussain","year":"2022","journal-title":"PeerJ Computer Science"},{"key":"10.7717\/peerj-cs.1617\/ref-24","doi-asserted-by":"crossref","article-title":"Multi-label hate speech and abusive language detection in Indonesian Twitter","author":"Ibrohim","year":"2019","DOI":"10.18653\/v1\/W19-3506"},{"issue":"7","key":"10.7717\/peerj-cs.1617\/ref-25","doi-asserted-by":"publisher","first-page":"1669","DOI":"10.53106\/160792642022122307021","article-title":"Sentiment analysis of social media content in pashto language using deep learning algorithms","volume":"23","author":"Iqbal","year":"2022","journal-title":"Journal of Internet Technology"},{"key":"10.7717\/peerj-cs.1617\/ref-26","doi-asserted-by":"publisher","first-page":"267","DOI":"10.1515\/JPLR.2008.013","article-title":"The pragmatics of swearing","volume":"4","author":"Jay","year":"2008","journal-title":"Journal of Political Research"},{"key":"10.7717\/peerj-cs.1617\/ref-27","doi-asserted-by":"publisher","first-page":"4335","DOI":"10.1016\/j.jksuci.2022.05.006","article-title":"BiCHAT: BiLSTM with deep CNN and hierarchical attention for hate speech detection","volume":"34","author":"Khan","year":"2022","journal-title":"The Journal of King Saud University Computer and Information Sciences"},{"key":"10.7717\/peerj-cs.1617\/ref-28","doi-asserted-by":"crossref","article-title":"Sentencepiece: a simple and language independent subword tokenizer and detokenizer for neural text processing","author":"Kudo","year":"2018","DOI":"10.18653\/v1\/D18-2012"},{"key":"10.7717\/peerj-cs.1617\/ref-29","first-page":"1","article-title":"Benchmarking aggression identification in social media","author":"Kumar","year":"2018"},{"issue":"22","key":"10.7717\/peerj-cs.1617\/ref-30","doi-asserted-by":"publisher","first-page":"10706","DOI":"10.3390\/app112210706","article-title":"Detecting aggressiveness in tweets: a hybrid model for detecting cyberbullying in the spanish language","volume":"11","author":"Lepe-Fa\u00fandez","year":"2021","journal-title":"Applied Sciences"},{"key":"10.7717\/peerj-cs.1617\/ref-31","article-title":"Roberta: a robustly optimized bert pretraining approach","author":"Liu","year":"2019"},{"issue":"17","key":"10.7717\/peerj-cs.1617\/ref-32","doi-asserted-by":"publisher","first-page":"6468","DOI":"10.3390\/s22176468","article-title":"Machine learning and lexicon approach to texts processing in the detection of degrees of toxicity in online discussions","volume":"22","author":"Machov\u00e1","year":"2022","journal-title":"Sensors"},{"key":"10.7717\/peerj-cs.1617\/ref-33","first-page":"14","article-title":"Overview of the hasoc track at fire 2019: Hate speech and offensive content identification in indo-european languages","author":"Mandl","year":"2019"},{"key":"10.7717\/peerj-cs.1617\/ref-34","doi-asserted-by":"publisher","DOI":"10.1007\/s10586-022-03956-x","article-title":"BERT-based ensemble learning for multi-aspect hate speech detection","author":"Mazari","year":"2023","journal-title":"Cluster Computing"},{"key":"10.7717\/peerj-cs.1617\/ref-35","doi-asserted-by":"publisher","first-page":"214","DOI":"10.1016\/j.inffus.2023.03.015","article-title":"Finding hate speech with auxiliary emotion detection from self-training multi-label learning perspective","volume":"96","author":"Min","year":"2023","journal-title":"Information Fusion"},{"key":"10.7717\/peerj-cs.1617\/ref-36","doi-asserted-by":"crossref","article-title":"Abusive language detection on arabic social media","author":"Mubarak","year":"2017","DOI":"10.18653\/v1\/W17-3008"},{"key":"10.7717\/peerj-cs.1617\/ref-37","first-page":"517","article-title":"Offensive language detection in turkish tweets with bert models","author":"\u00d6zberk","year":"2021"},{"issue":"21","key":"10.7717\/peerj-cs.1617\/ref-38","doi-asserted-by":"publisher","first-page":"4654","DOI":"10.3390\/s19214654","article-title":"Detecting and monitoring hate speech in Twitter","volume":"19","author":"Pereira-Kohatsu","year":"2019","journal-title":"Sensors"},{"key":"10.7717\/peerj-cs.1617\/ref-39","article-title":"Offensive language identification in Greek","author":"Pitenis","year":"2020"},{"issue":"22","key":"10.7717\/peerj-cs.1617\/ref-40","doi-asserted-by":"publisher","first-page":"2810","DOI":"10.3390\/electronics10222810","article-title":"Cyberbullying detection: hybrid models based on machine learning and natural language processing techniques","volume":"10","author":"Raj","year":"2021","journal-title":"Electronics"},{"issue":"1","key":"10.7717\/peerj-cs.1617\/ref-41","first-page":"1","article-title":"Multilingual offensive language identification for low-resource languages","volume":"21","author":"Ranasinghe","year":"2021","journal-title":"Transactions on Asian and Low-Resource Language Information Processing"},{"key":"10.7717\/peerj-cs.1617\/ref-42","first-page":"1","article-title":"Overview of the GermEval 2021 shared task on the identification of toxic, engaging, and fact-claiming comments","author":"Risch","year":"2021"},{"key":"10.7717\/peerj-cs.1617\/ref-43","first-page":"1668","article-title":"The risk of racial bias in hate speech detection","author":"Sap","year":"2019"},{"key":"10.7717\/peerj-cs.1617\/ref-44","first-page":"5149","article-title":"Japanese and korean voice search","author":"Schuster","year":"2012"},{"key":"10.7717\/peerj-cs.1617\/ref-45","doi-asserted-by":"publisher","first-page":"101404","DOI":"10.1016\/j.csl.2022.101404","article-title":"Offensive language detection in Tamil YouTube comments by adapters and cross-domain knowledge transfer","volume":"76","author":"Subramanian","year":"2022","journal-title":"Computer Speech & Language"},{"issue":"1","key":"10.7717\/peerj-cs.1617\/ref-46","doi-asserted-by":"publisher","first-page":"94","DOI":"10.1007\/s42979-021-00977-y","article-title":"Towards offensive language identification for Tamil code-mixed YouTube comments and posts","volume":"3","author":"Vasantharajan","year":"2022","journal-title":"SN Computer Science"},{"issue":"2","key":"10.7717\/peerj-cs.1617\/ref-47","doi-asserted-by":"crossref","first-page":"1775","DOI":"10.32604\/csse.2023.027841","article-title":"Deep-bert: transfer learning for classifying multilingual offensive texts on social media","volume":"44","author":"Wadud","year":"2022","journal-title":"Computer Systems Science and Engineering"},{"key":"10.7717\/peerj-cs.1617\/ref-48","doi-asserted-by":"crossref","article-title":"Predicting the type and target of offensive posts in social media","author":"Zampieri","year":"2019","DOI":"10.18653\/v1\/N19-1144"}],"container-title":["PeerJ Computer Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/peerj.com\/articles\/cs-1617.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/peerj.com\/articles\/cs-1617.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/peerj.com\/articles\/cs-1617.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/peerj.com\/articles\/cs-1617.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,18]],"date-time":"2023-10-18T08:04:23Z","timestamp":1697616263000},"score":1,"resource":{"primary":{"URL":"https:\/\/peerj.com\/articles\/cs-1617"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,18]]},"references-count":48,"alternative-id":["10.7717\/peerj-cs.1617"],"URL":"https:\/\/doi.org\/10.7717\/peerj-cs.1617","archive":["CLOCKSS","LOCKSS","Portico"],"relation":{},"ISSN":["2376-5992"],"issn-type":[{"value":"2376-5992","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,10,18]]},"article-number":"e1617"}}