Robust Outlier Detection in a Multivariate Linear Regression Model

Submission Deadline-30th July 2024
June 2024 Issue : Publication Fee: 30$ USD Submit Now
Submission Deadline-20th July 2024
Special Issue of Education: Publication Fee: 30$ USD Submit Now

International Journal of Research and Scientific Innovation (IJRSI) | Volume VII, Issue III, March 2020 | ISSN 2321–2705

Robust Outlier Detection in a Multivariate Linear Regression Model

Onisokumen David1, Ijomah Maxwell A.2
 1Department of Mathematics/Statistics, Ignatius Ajuru University of Education, Nigeria
2Department of Mathematics/Statistics, University of Port Harcourt, Nigeria

IJRISS Call for paper

Abstract: – Outlier detection has been extensively studied and has gained widespread popularity in the field of statistics. As a consequence, many methods for detecting outlying observations have been developed and studied. However, a number of these approaches developed are specific to certain application domain in the univariate case, while apparently robust and useful have not made their way into general practice. In this paper, we considered Mahalanobis Distance technique, k-mean clustering technique and Principal component Analysis technique using data on birth weight, birth height and head circumference at birth from 100 infants from 2016 to 2019.To determine robustness among the multivariate outlier detection techniques, among others are selected for analysis. The Akaike’s, Schwarz’s and Hannan-Quinn criterion as well as the R2 were used to determine the most robust regression among the selected models. Findings indicates that the k-mean Clustering technique outperforms the other two technique in regression model.

Key words: Outlier Detection; Mahalanobis Distance; K- Clustering; Principal Component Analysis;

I. INTRODUCTION

Multiple regression models are widely used in applied statistical techniques to quantify the relationship between a response variable Y and multiple predictor variables Xi, and we utilize the relationship to predict the value of the response variable from a known level of predictor variable or variables.
The models take the general form