Nathan T. Cannon , Giacomo Savini , Seth M. Pantanelli , Kenneth Hoffer , Petros Aristodemou , Kamran Riaz , David Murphy , David Griffin , Christian Berry , Guillaume Debellemanière , Mathieu Gauvin , Avi Wallerstein , Woong-Joo Whang , Kyungmin Koh , Kazuno Negishi , Ken Hayashi , Diogo Hipólito-Fernandes , David L. Cooke
{"title":"哪个测试是最好的:评价传统和现代统计测试分析球面等效预测误差。","authors":"Nathan T. Cannon , Giacomo Savini , Seth M. Pantanelli , Kenneth Hoffer , Petros Aristodemou , Kamran Riaz , David Murphy , David Griffin , Christian Berry , Guillaume Debellemanière , Mathieu Gauvin , Avi Wallerstein , Woong-Joo Whang , Kyungmin Koh , Kazuno Negishi , Ken Hayashi , Diogo Hipólito-Fernandes , David L. Cooke","doi":"10.1016/j.ajo.2025.01.022","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>To characterize the performance of traditional and contemporary statistics tests for analysis of spherical equivalent prediction error (SEQ-PE) after cataract surgery, with regard to test significance and self-consistency.</div></div><div><h3>Design</h3><div>Comparison of the utility of statistical tests.</div></div><div><h3>Methods</h3><div>Subjects: Eyes from 5 academic centers and 2 private practices that had cataract surgery and postoperative manifest refraction between March 2011 and December 2022. SEQ-PE data were randomly divided into subsets with sample sizes of 100, 300, 500, 700, and 2600 eyes. Mean absolute error (MAE), median absolute error (MedAE), SD, root mean squared absolute error (RMSAE), and the proportion of eyes within 0.50 diopters (D) of predicted were calculated for 6 power prediction formulas and analyzed using Friedman post hoc Dunn, Cochran <em>Q</em> post hoc McNemar, Eyetemis, and Wilcox-Holladay-Wang-Koch (WHWK) statistical tests. All tests were corrected for multiple comparisons using the Holm correction. Main outcome measures: The percentage of significant relationships (Percent Significance), proportion of inconsistencies (Inconsistency Ratio), and proportion of self-consistent significant relationships (Significance Index) for each statistical test.</div></div><div><h3>Results</h3><div>Analysis was performed on 7839 eyes of 7839 patients. WHWK.MAE (42%), WHWK.SD (41%), Eyetemis.MAE (40%), WHWK.RMSAE (39%), and Dunn.MAE (34%) were more robust, respectively, than the remaining 3 tests by Percent Significance (all <em>P</em> < .001). Dunn.MAE had the best Inconsistency Ratio (0.11) in the 100-eye subsets. The same top 5 tests were most robust by Significance Index (0.39, 0.35, 0.35, 0.34, and 0.31, respectively; all <em>P</em> < .02). WHWK.SD and WHWK.RMSAE had the best Significance Indices (both 0.77) in the 2600-eye subsets. McNemar had the poorest Significance Index overall (0.09).</div></div><div><h3>Conclusions</h3><div>The 5 high-performing tests produced significant results more often and were also self-consistent. WHWK.MAE and McNemar were highest and lowest performing overall, respectively. Dunn.MAE may be useful in sample sizes <150 eyes.</div></div>","PeriodicalId":7568,"journal":{"name":"American Journal of Ophthalmology","volume":"273 ","pages":"Pages 33-42"},"PeriodicalIF":4.2000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Which Test Is Best: Evaluation of Traditional and Contemporary Statistical Tests for Analysis of Spherical Equivalent Prediction Error\",\"authors\":\"Nathan T. Cannon , Giacomo Savini , Seth M. Pantanelli , Kenneth Hoffer , Petros Aristodemou , Kamran Riaz , David Murphy , David Griffin , Christian Berry , Guillaume Debellemanière , Mathieu Gauvin , Avi Wallerstein , Woong-Joo Whang , Kyungmin Koh , Kazuno Negishi , Ken Hayashi , Diogo Hipólito-Fernandes , David L. Cooke\",\"doi\":\"10.1016/j.ajo.2025.01.022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Purpose</h3><div>To characterize the performance of traditional and contemporary statistics tests for analysis of spherical equivalent prediction error (SEQ-PE) after cataract surgery, with regard to test significance and self-consistency.</div></div><div><h3>Design</h3><div>Comparison of the utility of statistical tests.</div></div><div><h3>Methods</h3><div>Subjects: Eyes from 5 academic centers and 2 private practices that had cataract surgery and postoperative manifest refraction between March 2011 and December 2022. SEQ-PE data were randomly divided into subsets with sample sizes of 100, 300, 500, 700, and 2600 eyes. Mean absolute error (MAE), median absolute error (MedAE), SD, root mean squared absolute error (RMSAE), and the proportion of eyes within 0.50 diopters (D) of predicted were calculated for 6 power prediction formulas and analyzed using Friedman post hoc Dunn, Cochran <em>Q</em> post hoc McNemar, Eyetemis, and Wilcox-Holladay-Wang-Koch (WHWK) statistical tests. All tests were corrected for multiple comparisons using the Holm correction. Main outcome measures: The percentage of significant relationships (Percent Significance), proportion of inconsistencies (Inconsistency Ratio), and proportion of self-consistent significant relationships (Significance Index) for each statistical test.</div></div><div><h3>Results</h3><div>Analysis was performed on 7839 eyes of 7839 patients. WHWK.MAE (42%), WHWK.SD (41%), Eyetemis.MAE (40%), WHWK.RMSAE (39%), and Dunn.MAE (34%) were more robust, respectively, than the remaining 3 tests by Percent Significance (all <em>P</em> < .001). Dunn.MAE had the best Inconsistency Ratio (0.11) in the 100-eye subsets. The same top 5 tests were most robust by Significance Index (0.39, 0.35, 0.35, 0.34, and 0.31, respectively; all <em>P</em> < .02). WHWK.SD and WHWK.RMSAE had the best Significance Indices (both 0.77) in the 2600-eye subsets. McNemar had the poorest Significance Index overall (0.09).</div></div><div><h3>Conclusions</h3><div>The 5 high-performing tests produced significant results more often and were also self-consistent. WHWK.MAE and McNemar were highest and lowest performing overall, respectively. Dunn.MAE may be useful in sample sizes <150 eyes.</div></div>\",\"PeriodicalId\":7568,\"journal\":{\"name\":\"American Journal of Ophthalmology\",\"volume\":\"273 \",\"pages\":\"Pages 33-42\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American Journal of Ophthalmology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0002939425000467\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/2/7 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0002939425000467","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/7 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
目的:比较分析白内障术后球形等效预测误差(SEQ-PE)的传统统计学检验和现代统计学检验的显著性和自洽性。设计:统计检验的效用比较。对象:2011年3月至2022年12月,5个学术中心和2个私人诊所的白内障手术患者,术后明显屈光。方法:SEQ-PE数据随机分组,样本量分别为100、300、500、700、2600眼。计算6个功率预测公式的平均绝对误差(MAE)、中位数绝对误差(MedAE)、标准差(SD)、均方根绝对误差(RMSAE)和预测范围在0.50 D内的眼睛比例,并采用Friedman post - hoc Dunn、Cochran Q post - hoc McNemar、Eyetemis和wilcox - hollady - wang - koch (WHWK)统计检验进行分析。使用Holm校正对所有测试进行多重比较校正。主要结果测量:各统计检验的显著性关系百分比(Percent Significance)、不一致比例(inconsistent Ratio)和自一致显著关系比例(Significance Index)。结果:对7839例患者7839只眼进行分析。WHWK。Mae (42%), whwk。SD (41%), Eyetemis。Mae (40%), whwk。RMSAE(39%)和Dunn。MAE(34%)分别比其余三个检验的显著性更强(均p < 0.001)。邓恩。MAE在100眼亚群中具有最佳的不一致比(0.11)。显著性指数分别为0.39、0.35、0.35、0.34、0.31;p < 0.02)。WHWK。SD和WHWK。RMSAE在2600个眼亚群中具有最佳的显著性指数(均为0.77)。McNemar的总体显著性指数最低(0.09)。结论:五种高效的检测方法往往产生显著的结果,并且具有自洽性。WHWK。MAE和McNemar分别是整体表现最高和最低的。邓恩。MAE在样本量< 150眼的情况下可能有用。
Which Test Is Best: Evaluation of Traditional and Contemporary Statistical Tests for Analysis of Spherical Equivalent Prediction Error
Purpose
To characterize the performance of traditional and contemporary statistics tests for analysis of spherical equivalent prediction error (SEQ-PE) after cataract surgery, with regard to test significance and self-consistency.
Design
Comparison of the utility of statistical tests.
Methods
Subjects: Eyes from 5 academic centers and 2 private practices that had cataract surgery and postoperative manifest refraction between March 2011 and December 2022. SEQ-PE data were randomly divided into subsets with sample sizes of 100, 300, 500, 700, and 2600 eyes. Mean absolute error (MAE), median absolute error (MedAE), SD, root mean squared absolute error (RMSAE), and the proportion of eyes within 0.50 diopters (D) of predicted were calculated for 6 power prediction formulas and analyzed using Friedman post hoc Dunn, Cochran Q post hoc McNemar, Eyetemis, and Wilcox-Holladay-Wang-Koch (WHWK) statistical tests. All tests were corrected for multiple comparisons using the Holm correction. Main outcome measures: The percentage of significant relationships (Percent Significance), proportion of inconsistencies (Inconsistency Ratio), and proportion of self-consistent significant relationships (Significance Index) for each statistical test.
Results
Analysis was performed on 7839 eyes of 7839 patients. WHWK.MAE (42%), WHWK.SD (41%), Eyetemis.MAE (40%), WHWK.RMSAE (39%), and Dunn.MAE (34%) were more robust, respectively, than the remaining 3 tests by Percent Significance (all P < .001). Dunn.MAE had the best Inconsistency Ratio (0.11) in the 100-eye subsets. The same top 5 tests were most robust by Significance Index (0.39, 0.35, 0.35, 0.34, and 0.31, respectively; all P < .02). WHWK.SD and WHWK.RMSAE had the best Significance Indices (both 0.77) in the 2600-eye subsets. McNemar had the poorest Significance Index overall (0.09).
Conclusions
The 5 high-performing tests produced significant results more often and were also self-consistent. WHWK.MAE and McNemar were highest and lowest performing overall, respectively. Dunn.MAE may be useful in sample sizes <150 eyes.
期刊介绍:
The American Journal of Ophthalmology is a peer-reviewed, scientific publication that welcomes the submission of original, previously unpublished manuscripts directed to ophthalmologists and visual science specialists describing clinical investigations, clinical observations, and clinically relevant laboratory investigations. Published monthly since 1884, the full text of the American Journal of Ophthalmology and supplementary material are also presented online at www.AJO.com and on ScienceDirect.
The American Journal of Ophthalmology publishes Full-Length Articles, Perspectives, Editorials, Correspondences, Books Reports and Announcements. Brief Reports and Case Reports are no longer published. We recommend submitting Brief Reports and Case Reports to our companion publication, the American Journal of Ophthalmology Case Reports.
Manuscripts are accepted with the understanding that they have not been and will not be published elsewhere substantially in any format, and that there are no ethical problems with the content or data collection. Authors may be requested to produce the data upon which the manuscript is based and to answer expeditiously any questions about the manuscript or its authors.