控制尖空情况下局部误发现率的双重截断法

IF 16.4 1区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Accounts of Chemical Research Pub Date : 2024-06-05 DOI:10.1007/s00180-024-01510-4
Shinjune Kim, Youngjae Oh, Johan Lim, DoHwan Park, Erin M. Green, Mark L. Ramos, Jaesik Jeong
{"title":"控制尖空情况下局部误发现率的双重截断法","authors":"Shinjune Kim, Youngjae Oh, Johan Lim, DoHwan Park, Erin M. Green, Mark L. Ramos, Jaesik Jeong","doi":"10.1007/s00180-024-01510-4","DOIUrl":null,"url":null,"abstract":"<p>Many multiple test procedures, which control the false discovery rate, have been developed to identify some cases (e.g. genes) showing statistically significant difference between two different groups. However, a common issue encountered in some practical data sets is the presence of highly spiky null distributions. Existing methods struggle to control type I error in such cases due to the “inflated false positives,\" but this problem has not been addressed in previous literature. Our team recently encountered this issue while analyzing SET4 gene deletion data and proposed modeling the null distribution using a scale mixture normal distribution. However, the use of this approach is limited due to strong assumptions on the spiky peak. In this paper, we present a novel multiple test procedure that can be applied to any type of spiky peak data, including situations with no spiky peak or with one or two spiky peaks. Our approach involves truncating the central statistics around 0, which primarily contribute to the null spike, as well as the two tails that may be contaminated by alternative distributions. We refer to this method as the “double truncation method.\" After applying double truncation, we estimate the null density using the doubly truncated maximum likelihood estimator. We demonstrate numerically that our proposed method effectively controls the false discovery rate at the desired level using simulated data. Furthermore, we apply our method to two real data sets, namely the SET protein data and peony data.</p>","PeriodicalId":1,"journal":{"name":"Accounts of Chemical Research","volume":null,"pages":null},"PeriodicalIF":16.4000,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Double truncation method for controlling local false discovery rate in case of spiky null\",\"authors\":\"Shinjune Kim, Youngjae Oh, Johan Lim, DoHwan Park, Erin M. Green, Mark L. Ramos, Jaesik Jeong\",\"doi\":\"10.1007/s00180-024-01510-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Many multiple test procedures, which control the false discovery rate, have been developed to identify some cases (e.g. genes) showing statistically significant difference between two different groups. However, a common issue encountered in some practical data sets is the presence of highly spiky null distributions. Existing methods struggle to control type I error in such cases due to the “inflated false positives,\\\" but this problem has not been addressed in previous literature. Our team recently encountered this issue while analyzing SET4 gene deletion data and proposed modeling the null distribution using a scale mixture normal distribution. However, the use of this approach is limited due to strong assumptions on the spiky peak. In this paper, we present a novel multiple test procedure that can be applied to any type of spiky peak data, including situations with no spiky peak or with one or two spiky peaks. Our approach involves truncating the central statistics around 0, which primarily contribute to the null spike, as well as the two tails that may be contaminated by alternative distributions. We refer to this method as the “double truncation method.\\\" After applying double truncation, we estimate the null density using the doubly truncated maximum likelihood estimator. We demonstrate numerically that our proposed method effectively controls the false discovery rate at the desired level using simulated data. Furthermore, we apply our method to two real data sets, namely the SET protein data and peony data.</p>\",\"PeriodicalId\":1,\"journal\":{\"name\":\"Accounts of Chemical Research\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":16.4000,\"publicationDate\":\"2024-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Accounts of Chemical Research\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1007/s00180-024-01510-4\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accounts of Chemical Research","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s00180-024-01510-4","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

目前已开发出许多控制误发现率的多重检验程序,用于识别一些在两个不同组别之间显示出显著统计学差异的情况(如基因)。然而,在一些实际数据集中遇到的一个常见问题是存在高度尖峰的空分布。在这种情况下,由于 "虚假阳性 "的存在,现有的方法很难控制 I 类错误,但这一问题在以往的文献中还没有得到解决。我们的团队最近在分析 SET4 基因缺失数据时遇到了这个问题,并建议使用比例混合正态分布来模拟空分布。然而,由于对尖峰的强烈假设,这种方法的使用受到了限制。在本文中,我们提出了一种新的多重检验程序,它可应用于任何类型的尖峰数据,包括无尖峰或有一个或两个尖峰的情况。我们的方法包括截断 0 附近的中心统计量(这是空尖峰的主要贡献),以及可能被其他分布污染的两个尾部。我们将这种方法称为 "双重截断法"。应用双重截断法后,我们使用双重截断最大似然估计法估计空密度。我们利用模拟数据用数字证明了我们提出的方法能有效地将误发现率控制在理想水平。此外,我们还将我们的方法应用于两个真实数据集,即 SET 蛋白质数据和牡丹数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Double truncation method for controlling local false discovery rate in case of spiky null

Many multiple test procedures, which control the false discovery rate, have been developed to identify some cases (e.g. genes) showing statistically significant difference between two different groups. However, a common issue encountered in some practical data sets is the presence of highly spiky null distributions. Existing methods struggle to control type I error in such cases due to the “inflated false positives," but this problem has not been addressed in previous literature. Our team recently encountered this issue while analyzing SET4 gene deletion data and proposed modeling the null distribution using a scale mixture normal distribution. However, the use of this approach is limited due to strong assumptions on the spiky peak. In this paper, we present a novel multiple test procedure that can be applied to any type of spiky peak data, including situations with no spiky peak or with one or two spiky peaks. Our approach involves truncating the central statistics around 0, which primarily contribute to the null spike, as well as the two tails that may be contaminated by alternative distributions. We refer to this method as the “double truncation method." After applying double truncation, we estimate the null density using the doubly truncated maximum likelihood estimator. We demonstrate numerically that our proposed method effectively controls the false discovery rate at the desired level using simulated data. Furthermore, we apply our method to two real data sets, namely the SET protein data and peony data.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Accounts of Chemical Research
Accounts of Chemical Research 化学-化学综合
CiteScore
31.40
自引率
1.10%
发文量
312
审稿时长
2 months
期刊介绍: Accounts of Chemical Research presents short, concise and critical articles offering easy-to-read overviews of basic research and applications in all areas of chemistry and biochemistry. These short reviews focus on research from the author’s own laboratory and are designed to teach the reader about a research project. In addition, Accounts of Chemical Research publishes commentaries that give an informed opinion on a current research problem. Special Issues online are devoted to a single topic of unusual activity and significance. Accounts of Chemical Research replaces the traditional article abstract with an article "Conspectus." These entries synopsize the research affording the reader a closer look at the content and significance of an article. Through this provision of a more detailed description of the article contents, the Conspectus enhances the article's discoverability by search engines and the exposure for the research.
期刊最新文献
Management of Cholesteatoma: Hearing Rehabilitation. Congenital Cholesteatoma. Evaluation of Cholesteatoma. Management of Cholesteatoma: Extension Beyond Middle Ear/Mastoid. Recidivism and Recurrence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1