数据异质性如何影响基因鉴定中的知识和信息创新:统计学习视角

IF 15.6 1区 管理学 Q1 BUSINESS Journal of Innovation & Knowledge Pub Date : 2024-07-01 DOI:10.1016/j.jik.2024.100514
Jun Zhao , Fangyi Lao , Guan'ao Yan , Yi Zhang
{"title":"数据异质性如何影响基因鉴定中的知识和信息创新:统计学习视角","authors":"Jun Zhao ,&nbsp;Fangyi Lao ,&nbsp;Guan'ao Yan ,&nbsp;Yi Zhang","doi":"10.1016/j.jik.2024.100514","DOIUrl":null,"url":null,"abstract":"<div><p>Data heterogeneity, particularly noted in fields such as genetics, has been identified as a key feature of big data, posing significant challenges to innovation in knowledge and information. This paper focuses on characterizing and understanding the so-called \"curse of heterogeneity\" in gene identification for low infant birth weight from a statistical learning perspective. Owing to the computational and analytical advantages of expectile regression in handling heterogeneity, this paper proposes a flexible, regularized, partially linear additive expectile regression model for high-dimensional heterogeneous data. Unlike most existing works that assume Gaussian or sub-Gaussian error distributions, we adopt a more realistic, less stringent assumption that the errors have only finite moments. Additionally, we derive a two-step algorithm to address the reduced optimization problem and demonstrate that our method, with a probability approaching one, achieves optimal estimation accuracy. Furthermore, we demonstrate that the proposed algorithm converges at least linearly, ensuring the practical applicability of our method. Monte Carlo simulations reveal that our method's resulting estimator performs well in terms of estimation accuracy, model selection, and heterogeneity identification. Empirical analysis in gene trait expression further underscores the potential for guiding public health interventions.</p></div>","PeriodicalId":46792,"journal":{"name":"Journal of Innovation & Knowledge","volume":null,"pages":null},"PeriodicalIF":15.6000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2444569X24000532/pdfft?md5=d725e587bf75bb8e44fdeba06418c07b&pid=1-s2.0-S2444569X24000532-main.pdf","citationCount":"0","resultStr":"{\"title\":\"How data heterogeneity affects innovating knowledge and information in gene identification: A statistical learning perspective\",\"authors\":\"Jun Zhao ,&nbsp;Fangyi Lao ,&nbsp;Guan'ao Yan ,&nbsp;Yi Zhang\",\"doi\":\"10.1016/j.jik.2024.100514\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Data heterogeneity, particularly noted in fields such as genetics, has been identified as a key feature of big data, posing significant challenges to innovation in knowledge and information. This paper focuses on characterizing and understanding the so-called \\\"curse of heterogeneity\\\" in gene identification for low infant birth weight from a statistical learning perspective. Owing to the computational and analytical advantages of expectile regression in handling heterogeneity, this paper proposes a flexible, regularized, partially linear additive expectile regression model for high-dimensional heterogeneous data. Unlike most existing works that assume Gaussian or sub-Gaussian error distributions, we adopt a more realistic, less stringent assumption that the errors have only finite moments. Additionally, we derive a two-step algorithm to address the reduced optimization problem and demonstrate that our method, with a probability approaching one, achieves optimal estimation accuracy. Furthermore, we demonstrate that the proposed algorithm converges at least linearly, ensuring the practical applicability of our method. Monte Carlo simulations reveal that our method's resulting estimator performs well in terms of estimation accuracy, model selection, and heterogeneity identification. Empirical analysis in gene trait expression further underscores the potential for guiding public health interventions.</p></div>\",\"PeriodicalId\":46792,\"journal\":{\"name\":\"Journal of Innovation & Knowledge\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":15.6000,\"publicationDate\":\"2024-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2444569X24000532/pdfft?md5=d725e587bf75bb8e44fdeba06418c07b&pid=1-s2.0-S2444569X24000532-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Innovation & Knowledge\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2444569X24000532\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BUSINESS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Innovation & Knowledge","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2444569X24000532","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BUSINESS","Score":null,"Total":0}
引用次数: 0

摘要

数据异质性,尤其是遗传学等领域的数据异质性,已被认为是大数据的一个关键特征,对知识和信息创新构成了重大挑战。本文重点从统计学习的角度,描述和理解婴儿出生体重不足基因识别中所谓的 "异质性诅咒"。鉴于期望回归在处理异质性方面的计算和分析优势,本文提出了一种灵活的、正则化的、部分线性加法期望回归模型,用于高维异质性数据。与大多数假设高斯或亚高斯误差分布的现有著作不同,我们采用了更现实、更宽松的假设,即误差只有有限矩。此外,我们还推导出一种两步算法来解决简化优化问题,并证明我们的方法能以接近 1 的概率达到最佳估计精度。此外,我们还证明了所提出的算法至少是线性收敛的,从而确保了我们方法的实际应用性。蒙特卡罗模拟显示,我们的方法所产生的估计器在估计精度、模型选择和异质性识别方面表现良好。对基因性状表达的实证分析进一步凸显了该方法在指导公共卫生干预方面的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
How data heterogeneity affects innovating knowledge and information in gene identification: A statistical learning perspective

Data heterogeneity, particularly noted in fields such as genetics, has been identified as a key feature of big data, posing significant challenges to innovation in knowledge and information. This paper focuses on characterizing and understanding the so-called "curse of heterogeneity" in gene identification for low infant birth weight from a statistical learning perspective. Owing to the computational and analytical advantages of expectile regression in handling heterogeneity, this paper proposes a flexible, regularized, partially linear additive expectile regression model for high-dimensional heterogeneous data. Unlike most existing works that assume Gaussian or sub-Gaussian error distributions, we adopt a more realistic, less stringent assumption that the errors have only finite moments. Additionally, we derive a two-step algorithm to address the reduced optimization problem and demonstrate that our method, with a probability approaching one, achieves optimal estimation accuracy. Furthermore, we demonstrate that the proposed algorithm converges at least linearly, ensuring the practical applicability of our method. Monte Carlo simulations reveal that our method's resulting estimator performs well in terms of estimation accuracy, model selection, and heterogeneity identification. Empirical analysis in gene trait expression further underscores the potential for guiding public health interventions.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
16.10
自引率
12.70%
发文量
118
审稿时长
37 days
期刊介绍: The Journal of Innovation and Knowledge (JIK) explores how innovation drives knowledge creation and vice versa, emphasizing that not all innovation leads to knowledge, but enduring innovation across diverse fields fosters theory and knowledge. JIK invites papers on innovations enhancing or generating knowledge, covering innovation processes, structures, outcomes, and behaviors at various levels. Articles in JIK examine knowledge-related changes promoting innovation for societal best practices. JIK serves as a platform for high-quality studies undergoing double-blind peer review, ensuring global dissemination to scholars, practitioners, and policymakers who recognize innovation and knowledge as economic drivers. It publishes theoretical articles, empirical studies, case studies, reviews, and other content, addressing current trends and emerging topics in innovation and knowledge. The journal welcomes suggestions for special issues and encourages articles to showcase contextual differences and lessons for a broad audience. In essence, JIK is an interdisciplinary journal dedicated to advancing theoretical and practical innovations and knowledge across multiple fields, including Economics, Business and Management, Engineering, Science, and Education.
期刊最新文献
Configurations of resourceful and demanding attributes of organizational culture in US hotels: An innovative approach using topic modeling and fsQCA Seeding young entrepreneurs: The role of business incubators Exploring the other side of innovative managerial decision-making: Emotions Addressing barriers to big data implementation in sustainable smart cities: Improved zero-sum grey game and grey best-worst method Contribution of female inventors to technological collaboration between high-tech firms and university in close proximity: Effect of innovative firm's characteristics
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1