A deep learning approach to censored regression

IF 3.7 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pattern Analysis and Applications Pub Date : 2024-02-28 DOI:10.1007/s10044-024-01216-9

Vlad-Rareş Dănăilă, Cătălin Buiu

{"title":"A deep learning approach to censored regression","authors":"Vlad-Rareş Dănăilă, Cătălin Buiu","doi":"10.1007/s10044-024-01216-9","DOIUrl":null,"url":null,"abstract":"<p>In censored regression, the outcomes are a mixture of known values (uncensored) and open intervals (censored), meaning that the outcome is either known with precision or is an unknown value above or below a known threshold. The use of censored data is widespread, and correctly modeling it is essential for many applications. Although the literature on censored regression is vast, deep learning approaches have been less frequently applied. This paper proposes three loss functions for training neural networks on censored data using gradient backpropagation: the tobit likelihood, the censored mean squared error, and the censored mean absolute error. We experimented with three variations in the tobit likelihood that arose from different ways of modeling the standard deviation variable: as a fixed value, a reparametrization, and an estimation using a separate neural network for heteroscedastic data. The tobit model yielded better results, but the other two losses are simpler to implement. Another central idea of our research was that data are often censored and truncated simultaneously. The proposed losses can handle simultaneous censoring and truncation at arbitrary values from above and below.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"52 1","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Analysis and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10044-024-01216-9","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In censored regression, the outcomes are a mixture of known values (uncensored) and open intervals (censored), meaning that the outcome is either known with precision or is an unknown value above or below a known threshold. The use of censored data is widespread, and correctly modeling it is essential for many applications. Although the literature on censored regression is vast, deep learning approaches have been less frequently applied. This paper proposes three loss functions for training neural networks on censored data using gradient backpropagation: the tobit likelihood, the censored mean squared error, and the censored mean absolute error. We experimented with three variations in the tobit likelihood that arose from different ways of modeling the standard deviation variable: as a fixed value, a reparametrization, and an estimation using a separate neural network for heteroscedastic data. The tobit model yielded better results, but the other two losses are simpler to implement. Another central idea of our research was that data are often censored and truncated simultaneously. The proposed losses can handle simultaneous censoring and truncation at arbitrary values from above and below.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

删减回归的深度学习方法

在有删减回归中，结果是已知值（无删减）和开放区间（有删减）的混合物，这意味着结果要么是精确的已知值，要么是高于或低于已知阈值的未知值。有删减数据的使用非常广泛，对其进行正确建模对许多应用都至关重要。虽然有关删减回归的文献浩如烟海，但深度学习方法的应用却并不频繁。本文提出了使用梯度反向传播对删减数据训练神经网络的三种损失函数：tobit 概率、删减均方误差和删减平均绝对误差。我们尝试了托比特似然的三种变化，这些变化源于对标准差变量的不同建模方法：固定值、重拟态以及使用单独的神经网络对异方差数据进行估计。tobit模型取得了更好的结果，但其他两种损失实现起来更简单。我们研究的另一个核心思想是，数据通常会同时被删减和截断。所提出的损耗可以同时处理剔除和截断，截断值可以是任意的上下值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Pattern Analysis and Applications 工程技术-计算机：人工智能

CiteScore

7.40

自引率

2.60%

发文量

审稿时长

13.5 months

期刊介绍： The journal publishes high quality articles in areas of fundamental research in intelligent pattern analysis and applications in computer science and engineering. It aims to provide a forum for original research which describes novel pattern analysis techniques and industrial applications of the current technology. In addition, the journal will also publish articles on pattern analysis applications in medical imaging. The journal solicits articles that detail new technology and methods for pattern recognition and analysis in applied domains including, but not limited to, computer vision and image processing, speech analysis, robotics, multimedia, document analysis, character recognition, knowledge engineering for pattern recognition, fractal analysis, and intelligent control. The journal publishes articles on the use of advanced pattern recognition and analysis methods including statistical techniques, neural networks, genetic algorithms, fuzzy pattern recognition, machine learning, and hardware implementations which are either relevant to the development of pattern analysis as a research area or detail novel pattern analysis applications. Papers proposing new classifier systems or their development, pattern analysis systems for real-time applications, fuzzy and temporal pattern recognition and uncertainty management in applied pattern recognition are particularly solicited.