Comparison Between Hotdeck Method and Regression Method in Handling Health Science Missing Data

  • S K M Onny Priskila Airlangga University, Surabaya, East Java, Indonesia
  • M Soenarnatalina Airlangga University, Surabaya, East Java, Indonesia
  • N Hari Basuki Airlangga University, Surabaya, East Java, Indonesia
Keywords: Age, Hot deck, Imputation, Missing data, Regression

Abstract

Introduction: Missing data or missing value is information that is not available on a subject (case). Missing data occurs because some information on the object is not given, thus it is difficult to find or the actual information does not exist. The case of missing data is ignored as it will certainly make it difficult to obtain a high accuracy for result classification even though the most reliable classification algorithm is used. One method in handling the missing data problem is by imputation. Multiple imputation methods can be used to replace missing data with a constant value, hot deck, regression method, expectation maximization method, and multiple imputation. Purpose: To analyze, compare, and determine the best imputation method of missing data between hot deck and regression methods. Materials and Methods: Data used is the data of respondents who practice family planning in the town of Pasuruan, East Java, Indonesia, and age variable. Variable age is used as the simulation data is lost, then imputated by hot deck or regression. The original data results will be compared with the imputed data using t-test, Pearson correlation, and root mean square error (RMSE) test. Results: Results of imputation using simulated data age variable show that regression method is better than hot deck method in handling missing data on health science. Conclusion: The best method views from the results are not significant P value, r value close +1, and smallest RMSE value. Hot deck method resulted in P value not significant at 5% missing data, but the method has small r values even negative and RMSE were great. Regression method resulted in P value not significant data missing 5% and 10%. Besides looking at the results of the consistency analysis views also repeat values of P, r, and RMSE of value three methods

Author Biographies

S K M Onny Priskila, Airlangga University, Surabaya, East Java, Indonesia

Post Graduate Student, Department of Public Health, Faculty of Public Health, 

M Soenarnatalina, Airlangga University, Surabaya, East Java, Indonesia

Lecturer, Department of Statistics, Faculty of Public Health, 

N Hari Basuki, Airlangga University, Surabaya, East Java, Indonesia

Lecturer, Department of Statistics, Faculty of Public Health,

Published
2016-03-30
Section
Table of Contents