风险分析:生存数据分析与机器学习。阿尔茨海默病预测的应用

IF 1 4区 工程技术 Q4 MECHANICS Comptes Rendus Mecanique Pub Date : 2019-11-01 DOI:10.1016/j.crme.2019.11.007
Catherine Huber-Carol , Shulamith Gross , Filia Vonta
{"title":"风险分析:生存数据分析与机器学习。阿尔茨海默病预测的应用","authors":"Catherine Huber-Carol ,&nbsp;Shulamith Gross ,&nbsp;Filia Vonta","doi":"10.1016/j.crme.2019.11.007","DOIUrl":null,"url":null,"abstract":"<div><p>We present here the statistical models that are most in use in survival data analysis. The parametric ones are based on explicit distributions, depending only on real unknown parameters, while the preferred models are semi-parametric, like Cox model, which imply unknown functions to be estimated. Now, as big data sets are available, two types of methods are needed to deal with the resulting curse of dimensionality including non informative factors which spoil the informative part relative to the target: on one hand, methods that reduce the dimension while maximizing the information left in the reduced data, and then applying classical stochastic models; on the other hand algorithms that apply directly to big data, i.e. artificial intelligence (AI or machine learning). Actually, those algorithms have a probabilistic interpretation. We present here several of the former methods. As for the latter methods, which comprise neural networks, support vector machines, random forests and more (see second edition, January 2017 of Hastie, Tibshirani et al. (2005) <span>[1]</span>), we present the neural networks approach. Neural networks are known to be efficient for prediction on big data. As we analyzed, using a classical stochastic model, risk factors for Alzheimer on a data set of around 5000 patients and <span><math><mi>p</mi><mo>=</mo><mn>17</mn></math></span> factors, we were interested in comparing its prediction performance with the one of a neural network on this relatively small sample size data.</p></div>","PeriodicalId":50997,"journal":{"name":"Comptes Rendus Mecanique","volume":"347 11","pages":"Pages 817-830"},"PeriodicalIF":1.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.crme.2019.11.007","citationCount":"3","resultStr":"{\"title\":\"Risk analysis: Survival data analysis vs. machine learning. Application to Alzheimer prediction\",\"authors\":\"Catherine Huber-Carol ,&nbsp;Shulamith Gross ,&nbsp;Filia Vonta\",\"doi\":\"10.1016/j.crme.2019.11.007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>We present here the statistical models that are most in use in survival data analysis. The parametric ones are based on explicit distributions, depending only on real unknown parameters, while the preferred models are semi-parametric, like Cox model, which imply unknown functions to be estimated. Now, as big data sets are available, two types of methods are needed to deal with the resulting curse of dimensionality including non informative factors which spoil the informative part relative to the target: on one hand, methods that reduce the dimension while maximizing the information left in the reduced data, and then applying classical stochastic models; on the other hand algorithms that apply directly to big data, i.e. artificial intelligence (AI or machine learning). Actually, those algorithms have a probabilistic interpretation. We present here several of the former methods. As for the latter methods, which comprise neural networks, support vector machines, random forests and more (see second edition, January 2017 of Hastie, Tibshirani et al. (2005) <span>[1]</span>), we present the neural networks approach. Neural networks are known to be efficient for prediction on big data. As we analyzed, using a classical stochastic model, risk factors for Alzheimer on a data set of around 5000 patients and <span><math><mi>p</mi><mo>=</mo><mn>17</mn></math></span> factors, we were interested in comparing its prediction performance with the one of a neural network on this relatively small sample size data.</p></div>\",\"PeriodicalId\":50997,\"journal\":{\"name\":\"Comptes Rendus Mecanique\",\"volume\":\"347 11\",\"pages\":\"Pages 817-830\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1016/j.crme.2019.11.007\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Comptes Rendus Mecanique\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1631072119301780\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MECHANICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Comptes Rendus Mecanique","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1631072119301780","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MECHANICS","Score":null,"Total":0}
引用次数: 3

摘要

我们在这里提出了在生存数据分析中最常用的统计模型。参数模型是基于显式分布的,只依赖于真实的未知参数,而首选模型是半参数模型,如Cox模型,意味着需要估计未知函数。现在,随着大数据集的出现,需要两种方法来处理由此产生的包含非信息因素的维数灾难,这些非信息因素破坏了相对于目标的信息部分:一种是在降低维数的同时使降维后的数据中剩余的信息最大化,然后应用经典的随机模型;另一方面是直接应用于大数据的算法,即人工智能(AI或机器学习)。实际上,这些算法有一个概率解释。我们在这里介绍前几种方法。至于后一种方法,包括神经网络,支持向量机,随机森林等(见第二版,2017年1月的Hastie, Tibshirani et al.(2005)[1]),我们提出了神经网络方法。众所周知,神经网络在预测大数据方面效率很高。当我们使用经典的随机模型对大约5000名患者的数据集和p=17个因素进行阿尔茨海默病的风险因素分析时,我们有兴趣将其预测性能与神经网络在相对较小样本量数据上的预测性能进行比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Risk analysis: Survival data analysis vs. machine learning. Application to Alzheimer prediction

We present here the statistical models that are most in use in survival data analysis. The parametric ones are based on explicit distributions, depending only on real unknown parameters, while the preferred models are semi-parametric, like Cox model, which imply unknown functions to be estimated. Now, as big data sets are available, two types of methods are needed to deal with the resulting curse of dimensionality including non informative factors which spoil the informative part relative to the target: on one hand, methods that reduce the dimension while maximizing the information left in the reduced data, and then applying classical stochastic models; on the other hand algorithms that apply directly to big data, i.e. artificial intelligence (AI or machine learning). Actually, those algorithms have a probabilistic interpretation. We present here several of the former methods. As for the latter methods, which comprise neural networks, support vector machines, random forests and more (see second edition, January 2017 of Hastie, Tibshirani et al. (2005) [1]), we present the neural networks approach. Neural networks are known to be efficient for prediction on big data. As we analyzed, using a classical stochastic model, risk factors for Alzheimer on a data set of around 5000 patients and p=17 factors, we were interested in comparing its prediction performance with the one of a neural network on this relatively small sample size data.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Comptes Rendus Mecanique
Comptes Rendus Mecanique 物理-力学
CiteScore
1.40
自引率
0.00%
发文量
0
审稿时长
12 months
期刊介绍: The Comptes rendus - Mécanique cover all fields of the discipline: Logic, Combinatorics, Number Theory, Group Theory, Mathematical Analysis, (Partial) Differential Equations, Geometry, Topology, Dynamical systems, Mathematical Physics, Mathematical Problems in Mechanics, Signal Theory, Mathematical Economics, … The journal publishes original and high-quality research articles. These can be in either in English or in French, with an abstract in both languages. An abridged version of the main text in the second language may also be included.
期刊最新文献
Vortex-induced vibration of a square cylinder in wind tunnel Large-scale smooth plastic topology optimization using domain decomposition The Meyer’s estimate of solutions to Zaremba problem for second-order elliptic equations in divergent form 2D model simulating the hydro-rheological behavior of leather during convective drying Modal energetic analysis and dynamic response of worm gear drives with a new developed dynamic model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1