iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://api.crossref.org/works/10.3390/DATA6070071
{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,7,20]],"date-time":"2024-07-20T06:29:44Z","timestamp":1721456984919},"reference-count":17,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2021,6,26]],"date-time":"2021-06-26T00:00:00Z","timestamp":1624665600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"Criminal investigations collect and analyze the facts related to a crime, from which the investigators can deduce evidence to be used in court. It is a multidisciplinary and applied science, which includes interviews, interrogations, evidence collection, preservation of the chain of custody, and other methods and techniques of investigation. These techniques produce both digital and paper documents that have to be carefully analyzed to identify correlations and interactions among suspects, places, license plates, and other entities that are mentioned in the investigation. The computerized processing of these documents is a helping hand to the criminal investigation, as it allows the automatic identification of entities and their relations, being some of which difficult to identify manually. There exists a wide set of dedicated tools, but they have a major limitation: they are unable to process criminal reports in the Portuguese language, as an annotated corpus for that purpose does not exist. This paper presents an annotated corpus, composed of a collection of anonymized crime-related documents, which were extracted from official and open sources. The dataset was produced as the result of an exploratory initiative to collect crime-related data from websites and conditioned-access police reports. The dataset was evaluated and a mean precision of 0.808, recall of 0.722, and F1-score of 0.733 were obtained with the classification of the annotated named-entities present in the crime-related documents. This corpus can be employed to benchmark Machine Learning (ML) and Natural Language Processing (NLP) methods and tools to detect and correlate entities in the documents. Some examples are sentence detection, named-entity recognition, and identification of terms related to the criminal domain.<\/jats:p>","DOI":"10.3390\/data6070071","type":"journal-article","created":{"date-parts":[[2021,6,28]],"date-time":"2021-06-28T02:24:57Z","timestamp":1624847097000},"page":"71","source":"Crossref","is-referenced-by-count":4,"title":["An Annotated Corpus of Crime-Related Portuguese Documents for NLP and Machine Learning Processing"],"prefix":"10.3390","volume":"6","author":[{"ORCID":"http:\/\/orcid.org\/0000-0001-8285-7005","authenticated-orcid":false,"given":"Gon\u00e7alo","family":"Carnaz","sequence":"first","affiliation":[{"name":"Informatics Departament, University of \u00c9vora, 7002-554 \u00c9vora, Portugal"}]},{"ORCID":"http:\/\/orcid.org\/0000-0003-3448-6726","authenticated-orcid":false,"given":"M\u00e1rio","family":"Antunes","sequence":"additional","affiliation":[{"name":"Computer Science and Communication Research Centre (CIIC), School of Technology and Management, Polytechnic of Leiria, 2411-901 Leiria, Portugal"},{"name":"INESC TEC, CRACS, 4200-465 Porto, Portugal"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-0793-0003","authenticated-orcid":false,"given":"Vitor Beires","family":"Nogueira","sequence":"additional","affiliation":[{"name":"Informatics Departament, University of \u00c9vora, 7002-554 \u00c9vora, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2021,6,26]]},"reference":[{"key":"ref_1","first-page":"975","article-title":"Digital chain of custody: State of the art","volume":"114","author":"Prayudi","year":"2015","journal-title":"Int. J. Comput. Appl."},{"key":"ref_2","unstructured":"Stasko, J., G\u00f6rg, C., Liu, Z., and Singhal, K. (November, January 30). Jigsaw: Supporting investigative analysis through interactive visualization. Proceedings of the VAST IEEE Symposium on Visual Analytics Science and Technology, Sacramento, CA, USA."},{"key":"ref_3","first-page":"13","article-title":"Implementation of a police intelligence analysis framework","volume":"5","author":"Stampouli","year":"2011","journal-title":"Int. J. Secur. Its Appl."},{"key":"ref_4","unstructured":"Hosseinkhani, J., Chaprut, S., and Taherdoost, H. (2012, January 24\u201326). Criminal network mining by web structure and content mining. Advances in Remote Sensing, Finite Differences and Information Security. Proceedings of the 11th WSEAS International Conference on Information Security and Privacy (ISP \u201912), Prague, Czech Republic."},{"key":"ref_5","first-page":"36","article-title":"Semantic Mining and Analysis of Heterogeneous Data for Novel Intelligence Insights","volume":"1","author":"Adderley","year":"2014","journal-title":"Fourth Int. Conf. Adv. Inf. Min. Manag."},{"key":"ref_6","first-page":"189","article-title":"Fighting Organized Crime Through Open Source Intelligence: Regulatory Strategies of the CAPER Project","volume":"271","author":"Casanovas","year":"2014","journal-title":"Front. Artif. Intell. Appl."},{"key":"ref_7","first-page":"275","article-title":"Environmental scanning and knowledge representation for the detection of organised crime threats","volume":"8577 LNAI","author":"Brewster","year":"2014","journal-title":"Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics)"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/llc\/7.1.1","article-title":"Corpus design criteria","volume":"7","author":"Atkins","year":"1992","journal-title":"Lit. Linguist. Comput."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Carnaz, G., Nogueira, V.B., and Antunes, M. (2021). A Graph Database Representation of Portuguese Criminal-Related Documents. Informatics, 8.","DOI":"10.3390\/informatics8020037"},{"key":"ref_10","first-page":"13:1","article-title":"Knowledge Representation of Crime-Related Events: A Preliminary Approach","volume":"Volume 74","author":"Rodrigues","year":"2019","journal-title":"8th Symposium on Languages, Applications and Technologies (SLATE 2019)"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Wiedemann, G., Yimam, S.M., and Biemann, C. (2018). A Multilingual Information Extraction Pipeline for Investigative Journalism. arXiv.","DOI":"10.18653\/v1\/D18-2014"},{"key":"ref_12","first-page":"199","article-title":"The Explanation Related to the Relationship between Drug Abuse and Crime","volume":"14","author":"Biabani","year":"2020","journal-title":"Q. J. Soc. Dev. (Previously Human Dev.)"},{"key":"ref_13","first-page":"747","article-title":"A 5w1h based annotation scheme for semantic role labeling of English tweets","volume":"22","author":"Chakma","year":"2018","journal-title":"Comput. Sist."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"189","DOI":"10.2307\/806690","article-title":"The correlation of english and journalism","volume":"38","author":"Griffin","year":"1949","journal-title":"Engl. J."},{"key":"ref_15","unstructured":"Braz, J. (2013). Investiga\u00e7 ao Criminal, Almedina."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Das, A., Ghosh, A., and Bandyopadhyay, S. (2010, January 21\u201323). Semantic role labeling for Bengali using 5Ws. Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering (NLPKE-2010), Beijing, China.","DOI":"10.1109\/NLPKE.2010.5587772"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Hamborg, F., Lachnit, S., Schubotz, M., Hepp, T., and Gipp, B. (2018). Giveme5W: Main event retrieval from news articles by extraction of the five journalistic w questions. International Conference on Information, Springer.","DOI":"10.1007\/978-3-319-78105-1_39"}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/6\/7\/71\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,14]],"date-time":"2024-07-14T18:07:23Z","timestamp":1720980443000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/6\/7\/71"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,26]]},"references-count":17,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2021,7]]}},"alternative-id":["data6070071"],"URL":"https:\/\/doi.org\/10.3390\/data6070071","relation":{},"ISSN":["2306-5729"],"issn-type":[{"value":"2306-5729","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,6,26]]}}}