使用方框图和多元线性回归在多元数据中查找异常值的统计方法

IF 0.7 4区 综合性期刊 Q3 MULTIDISCIPLINARY SCIENCES Sains Malaysiana Pub Date : 2023-09-30 DOI:10.17576/jsm-2023-5209-20
Theeraphat Thanwiset, W. Srisodaphol
{"title":"使用方框图和多元线性回归在多元数据中查找异常值的统计方法","authors":"Theeraphat Thanwiset, W. Srisodaphol","doi":"10.17576/jsm-2023-5209-20","DOIUrl":null,"url":null,"abstract":"The objective of this study was to propose a method for detecting outliers in multivariate data. It is based on a boxplot and multiple linear regression. In our proposed method, the box plot was initially applied to filter the data across all variables to split the data set into two sets: normal data (belonging to the upper and lower fences of the boxplot) and data that could be outliers. The normal data was then used to construct a multiple linear regression model and find the maximum error of the residual to denote the cut-off point. For the performance evaluation of the proposed method, a simulation study for multivariate normal data with and without contaminated data was conducted at various levels. The previous methods were compared with the performance of the proposed methods, namely, the Mahalanobis distance and Mahalanobis distance with the robust estimators using the minimum volume ellipsoid method, the minimum covariance determinant method, and the minimum vector variance method. The results showed that the proposed method had the best performance over other methods that were compared for all the contaminated levels. It was also found that when the proposed method was used with real data, it was able to find outlier values that were in line with the real data.","PeriodicalId":21366,"journal":{"name":"Sains Malaysiana","volume":null,"pages":null},"PeriodicalIF":0.7000,"publicationDate":"2023-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Statistical Methods for Finding Outliers in Multivariate Data Using a Boxplot and Multiple Linear Regression\",\"authors\":\"Theeraphat Thanwiset, W. Srisodaphol\",\"doi\":\"10.17576/jsm-2023-5209-20\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The objective of this study was to propose a method for detecting outliers in multivariate data. It is based on a boxplot and multiple linear regression. In our proposed method, the box plot was initially applied to filter the data across all variables to split the data set into two sets: normal data (belonging to the upper and lower fences of the boxplot) and data that could be outliers. The normal data was then used to construct a multiple linear regression model and find the maximum error of the residual to denote the cut-off point. For the performance evaluation of the proposed method, a simulation study for multivariate normal data with and without contaminated data was conducted at various levels. The previous methods were compared with the performance of the proposed methods, namely, the Mahalanobis distance and Mahalanobis distance with the robust estimators using the minimum volume ellipsoid method, the minimum covariance determinant method, and the minimum vector variance method. The results showed that the proposed method had the best performance over other methods that were compared for all the contaminated levels. It was also found that when the proposed method was used with real data, it was able to find outlier values that were in line with the real data.\",\"PeriodicalId\":21366,\"journal\":{\"name\":\"Sains Malaysiana\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2023-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sains Malaysiana\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.17576/jsm-2023-5209-20\",\"RegionNum\":4,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sains Malaysiana","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.17576/jsm-2023-5209-20","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

本研究的目的是提出一种在多元数据中检测异常值的方法。该方法基于箱形图和多元线性回归。在我们提出的方法中,首先应用箱形图对所有变量的数据进行筛选,从而将数据集分成两组:正常数据(属于箱形图的上下限)和可能是离群值的数据。然后使用正常数据构建多元线性回归模型,并找出残差的最大误差来表示分界点。为评估所提方法的性能,对有污染数据和无污染数据的多元正态数据进行了不同程度的模拟研究。将以前的方法与所提出方法的性能进行了比较,即使用最小体积椭球体法、最小协方差行列式法和最小向量方差法的马哈拉诺比斯距离和马哈拉诺比斯距离与稳健估计器。结果表明,在所有污染水平下,拟议方法的性能都优于其他比较方法。研究还发现,当拟议方法用于真实数据时,它能够找到与真实数据一致的离群值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Statistical Methods for Finding Outliers in Multivariate Data Using a Boxplot and Multiple Linear Regression
The objective of this study was to propose a method for detecting outliers in multivariate data. It is based on a boxplot and multiple linear regression. In our proposed method, the box plot was initially applied to filter the data across all variables to split the data set into two sets: normal data (belonging to the upper and lower fences of the boxplot) and data that could be outliers. The normal data was then used to construct a multiple linear regression model and find the maximum error of the residual to denote the cut-off point. For the performance evaluation of the proposed method, a simulation study for multivariate normal data with and without contaminated data was conducted at various levels. The previous methods were compared with the performance of the proposed methods, namely, the Mahalanobis distance and Mahalanobis distance with the robust estimators using the minimum volume ellipsoid method, the minimum covariance determinant method, and the minimum vector variance method. The results showed that the proposed method had the best performance over other methods that were compared for all the contaminated levels. It was also found that when the proposed method was used with real data, it was able to find outlier values that were in line with the real data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Sains Malaysiana
Sains Malaysiana MULTIDISCIPLINARY SCIENCES-
CiteScore
1.60
自引率
12.50%
发文量
196
审稿时长
3-6 weeks
期刊介绍: Sains Malaysiana is a refereed journal committed to the advancement of scholarly knowledge and research findings of the several branches of science and technology. It contains articles on Earth Sciences, Health Sciences, Life Sciences, Mathematical Sciences and Physical Sciences. The journal publishes articles, reviews, and research notes whose content and approach are of interest to a wide range of scholars. Sains Malaysiana is published by the UKM Press an its autonomous Editorial Board are drawn from the Faculty of Science and Technology, Universiti Kebangsaan Malaysia. In addition, distinguished scholars from local and foreign universities are appointed to serve as advisory board members and referees.
期刊最新文献
Machine Learning for Mapping and Forecasting Poverty in North Sumatera: A Data-Driven Approach Inhibition of Pre-Emergent Herbicide on Weedy Rice under Flooded and Saturated Soil Conditions in Rice Imobilisasi Nanopartikel Ag/TiO2 Ekstrak Beko pada Membran Fotomangkin Poliakrilonitril (PAN) untuk Penyingkiran Pewarna Metilena Biru Antarctic Spore-Forming Microorganisms from Deception Island Inhibit the Growth of Various Bacterial Strains Peranan Saiz Zarah Nano Zink Oksida Dalam Prestasi Pemangkinan Foto, Perencatan Bakteria dan Ketoksikan
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1