{"title":"A deep learning approach to censored regression","authors":"Vlad-Rareş Dănăilă, Cătălin Buiu","doi":"10.1007/s10044-024-01216-9","DOIUrl":null,"url":null,"abstract":"<p>In censored regression, the outcomes are a mixture of known values (uncensored) and open intervals (censored), meaning that the outcome is either known with precision or is an unknown value above or below a known threshold. The use of censored data is widespread, and correctly modeling it is essential for many applications. Although the literature on censored regression is vast, deep learning approaches have been less frequently applied. This paper proposes three loss functions for training neural networks on censored data using gradient backpropagation: the tobit likelihood, the censored mean squared error, and the censored mean absolute error. We experimented with three variations in the tobit likelihood that arose from different ways of modeling the standard deviation variable: as a fixed value, a reparametrization, and an estimation using a separate neural network for heteroscedastic data. The tobit model yielded better results, but the other two losses are simpler to implement. Another central idea of our research was that data are often censored and truncated simultaneously. The proposed losses can handle simultaneous censoring and truncation at arbitrary values from above and below.</p>","PeriodicalId":54639,"journal":{"name":"Pattern Analysis and Applications","volume":"52 1","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Analysis and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10044-024-01216-9","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In censored regression, the outcomes are a mixture of known values (uncensored) and open intervals (censored), meaning that the outcome is either known with precision or is an unknown value above or below a known threshold. The use of censored data is widespread, and correctly modeling it is essential for many applications. Although the literature on censored regression is vast, deep learning approaches have been less frequently applied. This paper proposes three loss functions for training neural networks on censored data using gradient backpropagation: the tobit likelihood, the censored mean squared error, and the censored mean absolute error. We experimented with three variations in the tobit likelihood that arose from different ways of modeling the standard deviation variable: as a fixed value, a reparametrization, and an estimation using a separate neural network for heteroscedastic data. The tobit model yielded better results, but the other two losses are simpler to implement. Another central idea of our research was that data are often censored and truncated simultaneously. The proposed losses can handle simultaneous censoring and truncation at arbitrary values from above and below.
期刊介绍:
The journal publishes high quality articles in areas of fundamental research in intelligent pattern analysis and applications in computer science and engineering. It aims to provide a forum for original research which describes novel pattern analysis techniques and industrial applications of the current technology. In addition, the journal will also publish articles on pattern analysis applications in medical imaging. The journal solicits articles that detail new technology and methods for pattern recognition and analysis in applied domains including, but not limited to, computer vision and image processing, speech analysis, robotics, multimedia, document analysis, character recognition, knowledge engineering for pattern recognition, fractal analysis, and intelligent control. The journal publishes articles on the use of advanced pattern recognition and analysis methods including statistical techniques, neural networks, genetic algorithms, fuzzy pattern recognition, machine learning, and hardware implementations which are either relevant to the development of pattern analysis as a research area or detail novel pattern analysis applications. Papers proposing new classifier systems or their development, pattern analysis systems for real-time applications, fuzzy and temporal pattern recognition and uncertainty management in applied pattern recognition are particularly solicited.