{"title":"线性模型离群值诊断的扩展 w 检验","authors":"Yangkang Yu, Ling Yang, Yunzhong Shen","doi":"10.1007/s00190-024-01855-0","DOIUrl":null,"url":null,"abstract":"<p>The issue of outliers has been a research focus in the field of geodesy. Based on a statistical testing method known as the <i>w</i>-test, data snooping along with its iterative form, iterative data snooping (IDS), is commonly used to diagnose outliers in linear models. However, in the case of multiple outliers, it may suffer from the masking and swamping effects, thereby limiting the detection and identification capabilities. This contribution is to investigate the cause of masking and swamping effects and propose a new method to mitigate these phenomena. First, based on the data division, an extended form of the <i>w</i>-test with its reliability measure is presented, and a theoretical reinterpretation of data snooping and IDS is provided. Then, to alleviate the effects of masking and swamping, a new outlier diagnostic method and its iterative form are proposed, namely data refining and iterative data refining (IDR). In general, if the total observations are initially divided into an inlying set and an outlying set, data snooping can be considered a process of selecting outliers from the inlying set to the outlying set. Conversely, data refining is then a reverse process to transfer inliers from the outlying set to the inlying one. Both theoretical analysis and practical examples show that IDR would keep stronger robustness than IDS due to the alleviation of masking and swamping effect, although it may pose a higher risk of precision loss when dealing with insufficient data.</p>","PeriodicalId":54822,"journal":{"name":"Journal of Geodesy","volume":"13 1","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An extended w-test for outlier diagnostics in linear models\",\"authors\":\"Yangkang Yu, Ling Yang, Yunzhong Shen\",\"doi\":\"10.1007/s00190-024-01855-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The issue of outliers has been a research focus in the field of geodesy. Based on a statistical testing method known as the <i>w</i>-test, data snooping along with its iterative form, iterative data snooping (IDS), is commonly used to diagnose outliers in linear models. However, in the case of multiple outliers, it may suffer from the masking and swamping effects, thereby limiting the detection and identification capabilities. This contribution is to investigate the cause of masking and swamping effects and propose a new method to mitigate these phenomena. First, based on the data division, an extended form of the <i>w</i>-test with its reliability measure is presented, and a theoretical reinterpretation of data snooping and IDS is provided. Then, to alleviate the effects of masking and swamping, a new outlier diagnostic method and its iterative form are proposed, namely data refining and iterative data refining (IDR). In general, if the total observations are initially divided into an inlying set and an outlying set, data snooping can be considered a process of selecting outliers from the inlying set to the outlying set. Conversely, data refining is then a reverse process to transfer inliers from the outlying set to the inlying one. Both theoretical analysis and practical examples show that IDR would keep stronger robustness than IDS due to the alleviation of masking and swamping effect, although it may pose a higher risk of precision loss when dealing with insufficient data.</p>\",\"PeriodicalId\":54822,\"journal\":{\"name\":\"Journal of Geodesy\",\"volume\":\"13 1\",\"pages\":\"\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2024-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Geodesy\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://doi.org/10.1007/s00190-024-01855-0\",\"RegionNum\":2,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GEOCHEMISTRY & GEOPHYSICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Geodesy","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s00190-024-01855-0","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOCHEMISTRY & GEOPHYSICS","Score":null,"Total":0}
引用次数: 0
摘要
异常值问题一直是大地测量领域的研究重点。基于一种称为 w 检验的统计检验方法,数据窥探及其迭代形式--迭代数据窥探(IDS)--通常用于诊断线性模型中的异常值。然而,在多个异常值的情况下,它可能会受到掩蔽和沼泽效应的影响,从而限制了检测和识别能力。本文旨在研究掩蔽效应和沼泽效应的原因,并提出一种新方法来缓解这些现象。首先,在数据划分的基础上,提出了 W 检验的扩展形式及其可靠性度量,并从理论上重新解释了数据窥探和 IDS。然后,为了减轻掩蔽和沼泽的影响,提出了一种新的离群值诊断方法及其迭代形式,即数据精炼和迭代数据精炼(IDR)。一般来说,如果最初将全部观测数据分为内含集和离群集,那么数据窥探可以被视为从内含集向离群集选择离群值的过程。反之,数据提炼则是一个将异常值从离群集转移到正常集的反向过程。理论分析和实际案例都表明,IDR 比 IDS 具有更强的鲁棒性,因为它减轻了掩蔽和沼泽效应,不过在处理数据不足时,它可能会带来更高的精度损失风险。
An extended w-test for outlier diagnostics in linear models
The issue of outliers has been a research focus in the field of geodesy. Based on a statistical testing method known as the w-test, data snooping along with its iterative form, iterative data snooping (IDS), is commonly used to diagnose outliers in linear models. However, in the case of multiple outliers, it may suffer from the masking and swamping effects, thereby limiting the detection and identification capabilities. This contribution is to investigate the cause of masking and swamping effects and propose a new method to mitigate these phenomena. First, based on the data division, an extended form of the w-test with its reliability measure is presented, and a theoretical reinterpretation of data snooping and IDS is provided. Then, to alleviate the effects of masking and swamping, a new outlier diagnostic method and its iterative form are proposed, namely data refining and iterative data refining (IDR). In general, if the total observations are initially divided into an inlying set and an outlying set, data snooping can be considered a process of selecting outliers from the inlying set to the outlying set. Conversely, data refining is then a reverse process to transfer inliers from the outlying set to the inlying one. Both theoretical analysis and practical examples show that IDR would keep stronger robustness than IDS due to the alleviation of masking and swamping effect, although it may pose a higher risk of precision loss when dealing with insufficient data.
期刊介绍:
The Journal of Geodesy is an international journal concerned with the study of scientific problems of geodesy and related interdisciplinary sciences. Peer-reviewed papers are published on theoretical or modeling studies, and on results of experiments and interpretations. Besides original research papers, the journal includes commissioned review papers on topical subjects and special issues arising from chosen scientific symposia or workshops. The journal covers the whole range of geodetic science and reports on theoretical and applied studies in research areas such as:
-Positioning
-Reference frame
-Geodetic networks
-Modeling and quality control
-Space geodesy
-Remote sensing
-Gravity fields
-Geodynamics