Abstract
Multi-document summarization (MDS) is a topic of much attention in extensive knowledge areas. Extractive MDS techniques intend to shrink the text from a document compilation by enclosing essential content and minimizing unnecessary data. MDS is more challenging than single document summarization and has several weaknesses, including an inaccurate selection of important sentences, a percentage of low coverage, and redundancy among the sentences. To address these issues, our proposed system focuses on pioneering an innovative automated extractive MDS approach. The process begins with original document pre-processing, followed by the extraction of features such as modified TF-IDF, Bag of Word (BOW), and concept similarity (CS) features. These features are then inputted into a Long Short-Term Memory (LSTM) framework. The model's weights are fine-tuned using the Improved Dingo Optimization (IDO) technique. The proposed model is evaluated on the Amazon Review and DUC-2002 datasets and compared its performance with various existing algorithms. The results demonstrated significant enhancements over baseline models, with an accuracy of 0.922862 for the Amazon Review dataset and 0.899730 for the DUC2002 dataset. These findings underscore the effectiveness of our developed technique in improving the accuracy of extractive multi-document summarization.
Similar content being viewed by others
References
Kumar Y, Kaur K, Kaur S (2021) Study of automatic text summarization approaches in different languages. Artif Intell Rev 54:5897–5929
Elayeb B, Chouigui A, Bounhas M, Khiroun OB (2020) Automatic Arabic text summarization using analogical proportions. Cogn Comput 12:1043–1069
Abdi A, Shamsuddin SM, Hasan S, Piran J (2019) Automatic sentiment oriented summarization of multi-documents using soft computing. Soft Comput 23:10551–10568
Cardinaels E, Hollander S, White BJ (2019) Automatic summarization of earnings releases: attributes and effects on investors’ judgments. Rev Account Stud 24:860–890
Venkatachalam S, Subbiah LP, Rajendiran R, Venkatachalam N (2020) An ontology-based information extraction and summarization of multiple news articles. Int J Inf Technol 12:547–557
Tran N-T, Nghiem M-Q, Nguyen NT, Nguyen NL-T, Van Chi N, Dinh D (2020) Vims: a high-quality vietnamese dataset for abstractive multi-document summarization. Lang Resour Eval 54:893–920
Debnath D, Das R, Pakray P (2021) Extractive single document summarization using multi-objective modified cat swarm optimization approach: ESDS-MCSO. Neural Comput Appl 1–16
Mishra SK, Saini N, Saha S, Bhattacharyya P (2022) Scientific document summarization in multi-objective clustering framework. Appl Intell 52:1520–1543
Roul RK (2021) Topic modeling combined with classification technique for extractive multi-document text summarization. Soft Comput 25:1113–1127
Lamsiyah S, El Mahdaouy A, Ouatik El Alaoui S, Espinasse B (2021) Unsupervised query-focused multi-document summarization based on transfer learning from sentence embedding models, BM25 model, and maximal marginal relevance criterion. J Ambient Intell Humaniz Comput 1–18
Diao Y, Lin H, Yang L, Fan X, Chu Y, Wu D, Zhang D, Xu K (2020) Crhasum: extractive text summarization with contextualized representation hierarchical-attention summarization network. Neural Comput Appl 32:11491–11503
Agarwal MC, Agarwal S, Chakraborty UK (2022) Extractive Text Summarization Using Convolutional Neural Network. Applied Soft Computing. Apple Academic Press, pp 135–151
Bairwa AK, Joshi S, Singh D (2021) Dingo optimizer: A nature-inspired metaheuristic approach for engineering problems. Math Probl Eng 2021:1–12
Moosavi SHS, Bardsiri VK (2019) Poor and rich optimization algorithm: A new human-based and multi populations algorithm. Eng Appl Artif Intell 86:165–181
Sharma H, Hazrati G, Bansal JC (2019) Spider monkey optimization algorithm. Evolutionary and swarm intelligence algorithms. pp 43–59
Mohammad-Azari S, Bozorg-Haddad O, Chu X (2018) Shark smell optimization (SSO) algorithm. Advanced optimization by nature-inspired algorithms. pp 93–103
Mirjalili S, Gandomi AH, Mirjalili SZ, Saremi S, Faris H, Mirjalili SM (2017) Salp swarm algorithm: A bio-inspired optimizer for engineering design problems. Adv Eng Softw 114:163–191
Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang G, Cai J et al (2018) Recent advances in convolutional neural networks. Pattern Recogn 77:354–377
Mohan Y, Chee SS, Xin DKP, Foong LP (2016) Artificial neural network for classification of depressive and normal ineeg. In: 2016 IEEEEMBS conference on biomedical engineering and sciences (IECBES), IEEE, pp 286–290
Kao L-J, Chiu CC (2020) Application of integrated recurrent neural network with multivariate adaptive regression splines on spc-epc process. J Manuf Syst 57:109–118
Zhou X, Lin J, Zhang Z, Shao Z, Chen S, Liu H (2020) Improved itracker combined with bidirectional long short-term memory for 3d gaze estimation using appearance cues. Neurocomputing 390:217–225
Li P, Huang L, Ren G-j (2020) Topic detection and summarization of user reviews, arXiv preprint arXiv:2006.00148
Tomer M, Kumar M (2022) Multi-document extractive text summarization based on firefly algorithm. J King Saud Univ - Comput Inf Sci 34(8):6057–65
Mojrian M, Mirroshandel SA (2021) A novel extractive multi-document text summarization system using quantum-inspired genetic algorithm:Mtsqiga. Expert Syst Appl 171:114555
Sanchez-Gomez JM, Vega-Rodríguez MA, Pérez CJ (2020) A decomposition-based multi-objective optimization approach for extractive multi-document text summarization. Appl Soft Comput 91:106231
Khan A, Salim N, Kumar YJ (2015) A framework for multi-document abstractive summarization based on semantic role labelling. Appl Soft Comput 30:737–747
Alami N, Meknassi M, En-nahnahi N, El Adlouni Y, Ammor O (2021) Unsupervised neural networks for automatic arabic text summarization using document clustering and topic modeling. Exp Syst Appl 172:114652
Sanchez-Gomez JM, Vega-Rodríguez MA, Pérez CJ (2021) The impact of term-weighting schemes and similarity measures on extractive multidocument text summarization. Exp Syst Appl 169:114510
Alzuhair A, Al-Dhelaan M (2019) An approach for combining multiple weighting schemes and ranking methods in graph-based multi-document summarization. IEEE Access 7:120375–120386
Hark C, Karcı A (2020) Karcı summarization: A simple and effective approachfor automatic text summarization using karcı entropy. Inf Process Manag 57:102187
Hernández-Castañeda Á, García-Hernández RA, Ledeneva Y, Millán-Hernández CE (2022) Language-independent extractive automatic text summarization based on automatic keyword extraction. Comput Speech Lang 71:101267
Patel D, Shah S, Chhinkaniwala H (2019) Fuzzy logic based multi-document summarization with improved sentence scoring and redundancy removal technique. Expert Syst Appl 134:167–177
Grefenstette G (1999) In: van Halteren H (ed) Tokenization in Syntactic Wordclass Tagging. pp 117–133
Soumya S, Pramod K (2021) Fine grained sentiment analysis of Malayalam tweets using lexicon based and machine learning based approaches. In: 2021 4th Biennial International Conference on Nascent Technologies in Engineering (ICNTE), IEEE, pp 1–6
Kim D, Seo D, Cho S, Kang P (2019) Multi-co-training for document classification using various document representations: Tf–idf, lda, and doc2vec. Inf Sci 477:15–29
Guo A, Yang T (2016) Research and improvement of feature words weightbased on tfidf algorithm. In: 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference, IEEE, pp 415–419
Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38:39–41
Gupta VK, Siddiqui TJ (2012) Multi-document summarization using sentence clustering. In: 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI), IEEE, pp 1–5
Mamidala KK, Sanampudi S (2021) Text summarization on Telugu e-news based on long-short term memory with rectified Adam optimizer. Int J Com Dig Sys
Wagh MB, Gomathi N (2019) Improved gwo-cs algorithm-based optimalrouting strategy in vanet. J Netw Commun Syst 2:34–42
Acı Çİ, Gülcan H (2019) A modified dragonfly optimization algorithm for single-and multiobjective problems using Brownian motion. Comput Intell Neurosci 2019
Townsend JT (1971) Theoretical analysis of an alphabetic confusion matrix. Percept Psychophys 9:40–50
Lin CY (2004) Rouge: A package for automatic evaluation of summaries. InText summarization branches out. pp 74–81
Kaggle (2022) Amazon review dataset. https://www.kaggle.com/currie32/summarizing-text-withamazon-reviews/data
NIST (2022) DUC-2002 Dataset. https://www-nlpir.nist.gov/projects/duc/data.html
Funding
No specific money was given to this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical approval
Not relevant.
Informed consent
Not relevant.
Conflict of interest
The authors say they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Singh, G., Mittal, N. & Chouhan, S.S. A deep learning framework for multi-document summarization using LSTM with improved Dingo Optimizer (IDO). Multimed Tools Appl 83, 69669–69691 (2024). https://doi.org/10.1007/s11042-024-18248-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-024-18248-2