Catherine Huber-Carol , Shulamith Gross , Filia Vonta
{"title":"风险分析:生存数据分析与机器学习。阿尔茨海默病预测的应用","authors":"Catherine Huber-Carol , Shulamith Gross , Filia Vonta","doi":"10.1016/j.crme.2019.11.007","DOIUrl":null,"url":null,"abstract":"<div><p>We present here the statistical models that are most in use in survival data analysis. The parametric ones are based on explicit distributions, depending only on real unknown parameters, while the preferred models are semi-parametric, like Cox model, which imply unknown functions to be estimated. Now, as big data sets are available, two types of methods are needed to deal with the resulting curse of dimensionality including non informative factors which spoil the informative part relative to the target: on one hand, methods that reduce the dimension while maximizing the information left in the reduced data, and then applying classical stochastic models; on the other hand algorithms that apply directly to big data, i.e. artificial intelligence (AI or machine learning). Actually, those algorithms have a probabilistic interpretation. We present here several of the former methods. As for the latter methods, which comprise neural networks, support vector machines, random forests and more (see second edition, January 2017 of Hastie, Tibshirani et al. (2005) <span>[1]</span>), we present the neural networks approach. Neural networks are known to be efficient for prediction on big data. As we analyzed, using a classical stochastic model, risk factors for Alzheimer on a data set of around 5000 patients and <span><math><mi>p</mi><mo>=</mo><mn>17</mn></math></span> factors, we were interested in comparing its prediction performance with the one of a neural network on this relatively small sample size data.</p></div>","PeriodicalId":50997,"journal":{"name":"Comptes Rendus Mecanique","volume":"347 11","pages":"Pages 817-830"},"PeriodicalIF":1.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.crme.2019.11.007","citationCount":"3","resultStr":"{\"title\":\"Risk analysis: Survival data analysis vs. machine learning. Application to Alzheimer prediction\",\"authors\":\"Catherine Huber-Carol , Shulamith Gross , Filia Vonta\",\"doi\":\"10.1016/j.crme.2019.11.007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>We present here the statistical models that are most in use in survival data analysis. The parametric ones are based on explicit distributions, depending only on real unknown parameters, while the preferred models are semi-parametric, like Cox model, which imply unknown functions to be estimated. Now, as big data sets are available, two types of methods are needed to deal with the resulting curse of dimensionality including non informative factors which spoil the informative part relative to the target: on one hand, methods that reduce the dimension while maximizing the information left in the reduced data, and then applying classical stochastic models; on the other hand algorithms that apply directly to big data, i.e. artificial intelligence (AI or machine learning). Actually, those algorithms have a probabilistic interpretation. We present here several of the former methods. As for the latter methods, which comprise neural networks, support vector machines, random forests and more (see second edition, January 2017 of Hastie, Tibshirani et al. (2005) <span>[1]</span>), we present the neural networks approach. Neural networks are known to be efficient for prediction on big data. As we analyzed, using a classical stochastic model, risk factors for Alzheimer on a data set of around 5000 patients and <span><math><mi>p</mi><mo>=</mo><mn>17</mn></math></span> factors, we were interested in comparing its prediction performance with the one of a neural network on this relatively small sample size data.</p></div>\",\"PeriodicalId\":50997,\"journal\":{\"name\":\"Comptes Rendus Mecanique\",\"volume\":\"347 11\",\"pages\":\"Pages 817-830\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1016/j.crme.2019.11.007\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Comptes Rendus Mecanique\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1631072119301780\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MECHANICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Comptes Rendus Mecanique","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1631072119301780","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MECHANICS","Score":null,"Total":0}
引用次数: 3
摘要
我们在这里提出了在生存数据分析中最常用的统计模型。参数模型是基于显式分布的,只依赖于真实的未知参数,而首选模型是半参数模型,如Cox模型,意味着需要估计未知函数。现在,随着大数据集的出现,需要两种方法来处理由此产生的包含非信息因素的维数灾难,这些非信息因素破坏了相对于目标的信息部分:一种是在降低维数的同时使降维后的数据中剩余的信息最大化,然后应用经典的随机模型;另一方面是直接应用于大数据的算法,即人工智能(AI或机器学习)。实际上,这些算法有一个概率解释。我们在这里介绍前几种方法。至于后一种方法,包括神经网络,支持向量机,随机森林等(见第二版,2017年1月的Hastie, Tibshirani et al.(2005)[1]),我们提出了神经网络方法。众所周知,神经网络在预测大数据方面效率很高。当我们使用经典的随机模型对大约5000名患者的数据集和p=17个因素进行阿尔茨海默病的风险因素分析时,我们有兴趣将其预测性能与神经网络在相对较小样本量数据上的预测性能进行比较。
Risk analysis: Survival data analysis vs. machine learning. Application to Alzheimer prediction
We present here the statistical models that are most in use in survival data analysis. The parametric ones are based on explicit distributions, depending only on real unknown parameters, while the preferred models are semi-parametric, like Cox model, which imply unknown functions to be estimated. Now, as big data sets are available, two types of methods are needed to deal with the resulting curse of dimensionality including non informative factors which spoil the informative part relative to the target: on one hand, methods that reduce the dimension while maximizing the information left in the reduced data, and then applying classical stochastic models; on the other hand algorithms that apply directly to big data, i.e. artificial intelligence (AI or machine learning). Actually, those algorithms have a probabilistic interpretation. We present here several of the former methods. As for the latter methods, which comprise neural networks, support vector machines, random forests and more (see second edition, January 2017 of Hastie, Tibshirani et al. (2005) [1]), we present the neural networks approach. Neural networks are known to be efficient for prediction on big data. As we analyzed, using a classical stochastic model, risk factors for Alzheimer on a data set of around 5000 patients and factors, we were interested in comparing its prediction performance with the one of a neural network on this relatively small sample size data.
期刊介绍:
The Comptes rendus - Mécanique cover all fields of the discipline: Logic, Combinatorics, Number Theory, Group Theory, Mathematical Analysis, (Partial) Differential Equations, Geometry, Topology, Dynamical systems, Mathematical Physics, Mathematical Problems in Mechanics, Signal Theory, Mathematical Economics, …
The journal publishes original and high-quality research articles. These can be in either in English or in French, with an abstract in both languages. An abridged version of the main text in the second language may also be included.