iBet uBet web content aggregator. Adding the entire web to your favor.

Link to original content: https://doi.org/10.1145/3368640.3368653

A hybrid method for missing value imputation | Proceedings of the 23rd Pan-Hellenic Conference on Informatics

research-article

A hybrid method for missing value imputation

Authors:

Aikaterini Karanikola,

Sotiris KotsiantisAuthors Info & Claims

PCI '19: Proceedings of the 23rd Pan-Hellenic Conference on Informatics

Pages 74 - 79

https://doi.org/10.1145/3368640.3368653

Published: 28 November 2019 Publication History

Abstract

Missing values are a common incurrence in a great number of real-world datasets, emerging from diverse domains of interest. In research, missing data constitute a significant problem as it can affect the conclusions drawn from them. Considering this, the difficulty of data preprocessing is increasing as selecting an inappropriate way to handle missing information can lead to untrustworthy results. Unfortunately, like in most cases in Machine Learning, there is not a single solution that fits in every task related to the problem. For this reason, many strategies have been proposed to successfully deal with this issue. One of the most well-known, besides efficient, is imputation. Replacing a missing value with an estimation apparently eliminates the problem and provides complete datasets but the difficulty shifts in selecting the right method to impute missing values. A widely used imputation method that can be found in libraries of the most noted statistical and Machine Learning suites is IRMI. In this work, we propose a variant of IRMI in order to maintain the advantages of this famous imputation method, while outperforming its traditional variant used in many Machine Learning software tools. To achieve this, the benefits of boosting as well as decision tree theory are exploiting. To test the efficiency of our method, a series of experiments over 30 datasets was executed, measuring the classification accuracy of the proposed method to prove that outperforms its rivals, which include classic, as well as more sophisticated imputation strategies. Finally, the results of our study are provided, along with the conclusions that arise from them.

References

[1]

Acuña, E. and Rodriguez, C. 2004. The Treatment of Missing Values and its Effect on Classifier Accuracy. Classification, Clustering, and Data Mining Applications. 1995 (2004), 639--647.

[2]

Armitage, E.G. et al. 2015. Missing value imputation strategies for metabolomics data. Electrophoresis. 36, 24 (2015), 3050--3060.

[3]

Batista, G.E.A.P.A. and Monard, M.C. 2010. An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence (2010).

[4]

van Buuren, S. 2018. Flexible Imputation of Missing Data, Second Edition.

[5]

Croiseau, P. et al. 2007. Dealing with missing data in family-based association studies: A multiple imputation approach. Human Heredity. 63, 3--4 (2007), 229--238.

[6]

Demšar, J. 2006. Statistical Comparisons of Classifiers over Multiple Data Sets. Jour. of Machine Learning Research. 7, (2006), 1--30.

[7]

Dua, D. and Graff, C. 2017. {UCI} Machine Learning Repository.

[8]

Enders, C.K. 2017. Multiple imputation as a flexible tool for missing data handling in clinical research. Behaviour Research and Therapy. 98, (2017), 4--18.

[9]

Freund, Y. and Schapire, R.E. 1995. A decision-theoretic generalization of online learning and an application to boosting. Lecture Notes in Comp. Science. 904, (1995), 23--37.

[10]

Friedman, J. et al. 2000. Additive logistic regression: a statistical view of boosting. The Annals of Statistics. 28, 2 (2000), 337--407.

[11]

Gajawada, S. and Toshniwal, D. 2012. Missing Value Imputation Method Based on Clusteringand Nearest Neighbours. International Journal of Future Computer and Communication. 1, 2 (2012), 206--208.

[12]

Grzymala-Busse, J.W. et al. 2005. Handling missing attribute values in preterm birth data sets. Lecture Notes in Computer Science. 3642 LNAI, (2005), 342--351.

Digital Library

[13]

Haukoos, J.S. and Newgard, C.D. 2007. Advanced Statistics: Missing Data in Clinical Research-Part 1: An Introduction and Conceptual Framework. Academic Emergency Medicine. 14, 7 (2007), 662--668.

[14]

Hayati Rezvan, P. et al. 2015. The rise of multiple imputation: A review of the reporting and implementation of the method in medical research Data collection, quality, and reporting. BMC Medical Research Methodology. 15, 1 (2015), 1--14.

[15]

Kwak, S.K. and Kim, J.H. 2017. Statistical data preparation: Management of missing values and outliers. Korean Journal of Anesthesiology. 70, 4 (2017), 407--411.

[16]

Lall, R. 2016. How multiple imputation makes a difference. Political Analysis. 24, 4 (2016), 414--433.

[17]

Langkamp, D.L. et al. 2010. Techniques for handling missing data in secondary analyses of large surveys. Academic Pediatrics. 10, 3 (2010), 205--210.

[18]

Li, D. et al. 2004. Towards missing data imputation: A study of fuzzy K-means clustering method. Lecture Notes in AI. 3066, c (2004), 573--579.

[19]

Liu, Z.G. et al. 2016. Adaptive imputation of missing values for incomplete pattern classification. Pattern Recognition. 52, (2016), 85--95.

Digital Library

[20]

Manly, C.A. and Wells, R.S. 2015. Reporting the Use of Multiple Imputation for Missing Data in Higher Education Research. Research in Higher Education. 56, 4 (2015), 397--409.

[21]

Pampaka, M. et al. 2016. Handling missing data: analysis of a challenging data set using multiple imputation. Int. Jour. of Research and Method in Education. 39, 1 (2016), 19--37.

[22]

Quinlan, J.R. 2006. Learing With Continuous Classes.Pdf. 92, (2006), 343--348.

[23]

Raghunathan, T. et al. 2001. A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey methodology. 27, 1 (2001), 85--96.

[24]

Rubin, D.B. 1988. AN OVERVIEW OF MULTIPLE IMPUTATION Donald B. Rubin, Harvard University One Oxford Street, Cambridge, MA 02138. Methods. (1988).

[25]

Schapire, R.E. et al 1998. Improved Boosting Algorithms Using Confidencerated Predictions A Generalized Analysis of Adaboost m i. ReCALL. 1997 (1998).

[26]

Sharma, R. et al. 2015. Comparative Analysis of Classification Techniques in Data Mining Using Different Datasets, JCSMC, 44, 12 (2015), 125--134.

[27]

Takahashi, M. and Ito, T. 2012. Multiple Imputation of Turnover in EDINET Data: Toward the Improvement of Imputation for the Economic Census. Work Session on Statistical Data Editing, UNECE. March 2011 (2012), 1--10.

[28]

Templ, M. et al. 2011. Iterative stepwise regression imputation using standard and robust methods. Comp. Stat. and Data Analysis. 55, 10 (2011), 2793--2806.

Digital Library

[29]

Tutz, G. and Ramzan, S. 2015. Improved methods for the imputation of missing data by nearest neighbor methods. Comp. Stat. and Data Analysis. 90, 172 (2015), 84--99.

Digital Library

[30]

Wang, L. and Fu, D.M. 2009. Estimation of missing values using a weighted k-nearest neighbors algorithm. Proceedings - 2009 ESIAT, ESIAT 2009. 3, 2 (2009), 660--663.

Digital Library

Cited By

Shivashankar KMartini A(2022)Maintainability Challenges in ML: A Systematic Literature Review2022 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)10.1109/SEAA56994.2022.00018(60-67)Online publication date: Aug-2022
https://doi.org/10.1109/SEAA56994.2022.00018
Alabadla MSidi FIshak IIbrahim HAffendey LChe Ani ZJabar MBukar UDevaraj NMuda ATharek AOmar NJaya M(2022)Systematic Review of Using Machine Learning in Imputing Missing ValuesIEEE Access10.1109/ACCESS.2022.316084110(44483-44502)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3160841

Index Terms

A hybrid method for missing value imputation
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms

Recommendations

Empirical comparison of supervised learning techniques for missing value imputation
Abstract
Many data mining algorithms cannot handle incomplete datasets where some data samples are missing attribute values. To solve this problem, missing value imputation is usually conducted and commonly based on reasoning from observed data or complete ...
Missing value imputation based on data clustering
Transactions on computational science I

We propose an efficient nonparametric missing value imputation method based on clustering, called CMI (Clustering-based Missing value Imputation), for dealing with missing values in target attributes. In our approach, we impute the missing values of an ...
Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data

Motivation: Microarray data are used in a range of application areas in biology, although often it contains considerable numbers of missing values. These missing values can significantly affect subsequent statistical analysis and machine learning ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

PCI '19: Proceedings of the 23rd Pan-Hellenic Conference on Informatics

November 2019

165 pages

ISBN:9781450372923

DOI:10.1145/3368640

General Chairs:
Yannis Manolopoulos
Open University Cyprus, Cyprus
,
George Angelos Papadopoulos
University of Cyprus, Cyprus
,
Athena Stassopoulou
University of Nicosia, Cyprus
,
Program Chairs:
Ioanna Dionysiou
University of Nicosia, Cyprus
,
Ioannis Kyriakides
University of Nicosia, Cyprus
,
Nicolas Tsapatsoulis
Cyprus University of Technology, Cyprus

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 November 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

PCI '19

PCI '19: 23rd Pan-Hellenic Conference on Informatics

November 28 - 30, 2019

Nicosia, Cyprus

Acceptance Rates

PCI '19 Paper Acceptance Rate 18 of 35 submissions, 51%;

Overall Acceptance Rate 190 of 390 submissions, 49%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
117
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Shivashankar KMartini A(2022)Maintainability Challenges in ML: A Systematic Literature Review2022 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)10.1109/SEAA56994.2022.00018(60-67)Online publication date: Aug-2022
https://doi.org/10.1109/SEAA56994.2022.00018
Alabadla MSidi FIshak IIbrahim HAffendey LChe Ani ZJabar MBukar UDevaraj NMuda ATharek AOmar NJaya M(2022)Systematic Review of Using Machine Learning in Imputing Missing ValuesIEEE Access10.1109/ACCESS.2022.316084110(44483-44502)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3160841

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents