- March 22, 2023
- Posted by: RSIS
- Categories: IJRSI, Statistics
Effects of Missing Data on the Parameters of Multiple Regression Model (MRM)
Etaga Harrison. O*, Ngonadi Lilian, Aforka Kenechukwu F and Etaga Njideka C
Department of Statistics, Nnamdi Azikiwe University, Awka, Anambra State, Nigeria
*Corresponding author
Received: 27 January 2023; Revised: 15 February 2023; Accepted: 21 February 2023; Published: 21 March 2023
Abstract: Multiple Regression Models are used in prediction the nature of relationship between one dependent variable and more than more independent variables. There are so many assumptions the guide the estimation of the parameters of the model. The interpretations of parameters are always subjected to the nature of data involved. Missing values tends to limit the fullness of information in analysis. It is therefore necessary to check for the effect of missing data on the parameters of the Multiple Regression Model. Data were simulated using Binomial, Geometric, Normal and Exponential Distribution. The simulation was done at different sample sizes of 15, 25, 50 and 100. The level of missingness was moderated at 5%, 10%, 25% and 35%. Two methods of handling missing data were employed, listwise deletion and Mean imputation. Data were analysis using multiple regression and Analysis of Variance. The results shows that the least Mean Square Error (MSE) were obtained at different level of missingness depending of the distribution. There was a significant effect on the parameters of the multiple Regression base on sample sizes.
Keywords: Multiple, Regression, Model, Missing data, MSE
I. Introduction
Researchers have faced the problem of missing quantitative data at some point in the work. Missing data can occur if research informants refuse or forgot to answer a survey question or there might be lost of files as well as data might not be recorded properly. Given the high cost of collecting data, there cannot be wastage of effort of starting all over or to wait until soundproof methods of collecting information are developed. In statistics, Missing data/value is an occurrence when there is no data value stored for the variable in an observation.
Regression analysis is the study of the nature of relationship between dependent variable(s) and independent variable(s). The Simple Linear regression involves just one dependent and one independent variable. The situation where there exist one dependent variable and more than one independent variable is referred to as Multiple Regression (MR). When estimating the parameters of the Multiple Regression, Least Squares Method (LSM) is used most often. There are various factors that can affect the signs or magnitudes of the parameter(s), one of such is that of missing data. There is need to adequately address the problem of missing data before analysis the data to avoid reaching wrong conclusions.
When treating missing data, the most common method and the easiest to apply is the use of only those cases with complete information. An alternative to complete case analysis, there is the use of the mean as a replacement of the missing value. More recently, there are methods that are based on distributional models for the data (such as maximum likelihood and multiple imputation).
Methods of analyzing missing data require assumptions about the nature of the data and about the reasons for the missing observations that are not often acknowledged. There is need to carefully considered the required assumptions before treating missing data. Missing data can lead to problems that affect the interpretation and inference of research results, the understanding and explanation of conclusions made, the strength of the study design, the validity of conclusions about the relationship between variables and may limit the representativeness of the sample.
Avoiding missing data is the optimal means of handling incomplete observations. During data collection phase, the researcher has the opportunity to make decisions about what data to collect, and how to monitor data collection. The scale and distribution of the variables in the data and the reasons for missing data are two critical issues for applying the appropriate missing data techniques. This paper there seek to evaluate the effects of missing data on the parameter estimates of the Multiple Regression.