ABSTRACT
Outlier detection is a critical component of data preprocessing and analysis, as outliers can significantly influence statistical measures and model performance. This study investigates various methods for detecting outliers, with a focus on statistical and distance-based techniques. The literature review underscores the importance of identifying outliers and their potential impact on data analysis across diverse domains.
The study employs quantile regression, a robust statistical method less sensitive to outliers, to analyze obesity in teenagers. Additionally, the Mahalanobis distance, a versatile distance-based approach, is utilized to identify outliers in the obesity dataset without relying on assumptions about data distribution.
The analysis demonstrates the effectiveness of quantile regression in revealing the significant factors influencing obesity levels, such as genetics, sedentary lifestyle, and gender. The Mahalanobis distance method successfully partitions the data into normal and outlier observations, highlighting its utility in outlier detection.
The findings emphasize the importance of employing robust statistical methods and distance-based techniques for accurate outlier identification. Recommendations include combining multiple approaches, incorporating domain knowledge, and continuously updating outlier detection methods to improve the reliability and robustness of data analysis.
This study contributes to the understanding of outlier detection methods and their applications, providing valuable insights for researchers and analysts working with complex datasets across various fields.