Sequential autoencoders for feature engineering and pretraining in major depressive disorder risk prediction.

IF 2.5 Q2 HEALTH CARE SCIENCES & SERVICES JAMIA Open Pub Date : 2023-10-09 eCollection Date: 2023-12-01 DOI:10.1093/jamiaopen/ooad086
Barrett W Jones, Warren D Taylor, Colin G Walsh
{"title":"Sequential autoencoders for feature engineering and pretraining in major depressive disorder risk prediction.","authors":"Barrett W Jones, Warren D Taylor, Colin G Walsh","doi":"10.1093/jamiaopen/ooad086","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>We evaluated autoencoders as a feature engineering and pretraining technique to improve major depressive disorder (MDD) prognostic risk prediction. Autoencoders can represent temporal feature relationships not identified by aggregate features. The predictive performance of autoencoders of multiple sequential structures was evaluated as feature engineering and pretraining strategies on an array of prediction tasks and compared to a restricted Boltzmann machine (RBM) and random forests as a benchmark.</p><p><strong>Materials and methods: </strong>We study MDD patients from Vanderbilt University Medical Center. Autoencoder models with Attention and long-short-term memory (LSTM) layers were trained to create latent representations of the input data. Predictive performance was evaluated temporally by fitting random forest models to predict future outcomes with engineered features as input and using autoencoder weights to initialize neural network layers. We evaluated area under the precision-recall curve (AUPRC) trends and variation over the study population's treatment course.</p><p><strong>Results: </strong>The pretrained LSTM model improved predictive performance over pretrained Attention models and benchmarks in 3 of 4 outcomes including self-harm/suicide attempt (AUPRCs, LSTM pretrained = 0.012, Attention pretrained = 0.010, RBM = 0.009, random forest = 0.005). The use of autoencoders for feature engineering had varied results, with benchmarks outperforming LSTM and Attention encodings on the self-harm/suicide attempt outcome (AUPRCs, LSTM encodings = 0.003, Attention encodings = 0.004, RBM = 0.009, random forest = 0.005).</p><p><strong>Discussion: </strong>Improvement in prediction resulting from pretraining has the potential for increased clinical impact of MDD risk models. We did not find evidence that the use of temporal feature encodings was additive to predictive performance in the study population. This suggests that predictive information retained by model weights may be lost during encoding. LSTM pretrained model predictive performance is shown to be clinically useful and improves over state-of-the-art predictors in the MDD phenotype. LSTM model performance warrants consideration of use in future related studies.</p><p><strong>Conclusion: </strong>LSTM models with pretrained weights from autoencoders were able to outperform the benchmark and a pretrained Attention model. Future researchers developing risk models in MDD may benefit from the use of LSTM autoencoder pretrained weights.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"6 4","pages":"ooad086"},"PeriodicalIF":2.5000,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10561992/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMIA Open","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/jamiaopen/ooad086","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/12/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: We evaluated autoencoders as a feature engineering and pretraining technique to improve major depressive disorder (MDD) prognostic risk prediction. Autoencoders can represent temporal feature relationships not identified by aggregate features. The predictive performance of autoencoders of multiple sequential structures was evaluated as feature engineering and pretraining strategies on an array of prediction tasks and compared to a restricted Boltzmann machine (RBM) and random forests as a benchmark.

Materials and methods: We study MDD patients from Vanderbilt University Medical Center. Autoencoder models with Attention and long-short-term memory (LSTM) layers were trained to create latent representations of the input data. Predictive performance was evaluated temporally by fitting random forest models to predict future outcomes with engineered features as input and using autoencoder weights to initialize neural network layers. We evaluated area under the precision-recall curve (AUPRC) trends and variation over the study population's treatment course.

Results: The pretrained LSTM model improved predictive performance over pretrained Attention models and benchmarks in 3 of 4 outcomes including self-harm/suicide attempt (AUPRCs, LSTM pretrained = 0.012, Attention pretrained = 0.010, RBM = 0.009, random forest = 0.005). The use of autoencoders for feature engineering had varied results, with benchmarks outperforming LSTM and Attention encodings on the self-harm/suicide attempt outcome (AUPRCs, LSTM encodings = 0.003, Attention encodings = 0.004, RBM = 0.009, random forest = 0.005).

Discussion: Improvement in prediction resulting from pretraining has the potential for increased clinical impact of MDD risk models. We did not find evidence that the use of temporal feature encodings was additive to predictive performance in the study population. This suggests that predictive information retained by model weights may be lost during encoding. LSTM pretrained model predictive performance is shown to be clinically useful and improves over state-of-the-art predictors in the MDD phenotype. LSTM model performance warrants consideration of use in future related studies.

Conclusion: LSTM models with pretrained weights from autoencoders were able to outperform the benchmark and a pretrained Attention model. Future researchers developing risk models in MDD may benefit from the use of LSTM autoencoder pretrained weights.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于重度抑郁症风险预测的特征工程和预训练的序列自动编码器。
目的:我们评估了自动编码器作为一种功能工程和预训练技术,以改善重度抑郁障碍(MDD)的预后风险预测。自动编码器可以表示未由聚合特征识别的时间特征关系。将多个序列结构的自动编码器的预测性能作为一系列预测任务的特征工程和预训练策略进行评估,并与限制玻尔兹曼机(RBM)和随机森林作为基准进行比较。材料和方法:我们研究了范德比尔特大学医学中心的MDD患者。训练具有注意力和长短期记忆(LSTM)层的自动编码器模型,以创建输入数据的潜在表示。通过拟合随机森林模型以预测未来结果,将工程特征作为输入,并使用自动编码器权重初始化神经网络层,对预测性能进行了时间评估。我们评估了精确回忆曲线下面积(AUPRC)在研究人群治疗过程中的趋势和变化。结果:在包括自残/自杀未遂在内的4种结果中,预训练的LSTM模型比预训练的注意力模型和基准提高了预测性能(AUPRCs,LSTM预训练 = 0.012,注意力预训练 = 0.010,RBM = 0.009,随机森林 = 0.005)。将自动编码器用于特征工程的结果各不相同,在自残/自杀未遂结果方面,基准测试优于LSTM和注意力编码(AUPRCs,LSTM编码 = 0.003,注意编码 = 0.004,RBM = 0.009,随机森林 = 0.005)。讨论:预训练带来的预测改进有可能增加MDD风险模型的临床影响。我们没有发现证据表明,在研究人群中,时间特征编码的使用可以增加预测性能。这表明由模型权重保留的预测信息可能在编码期间丢失。LSTM预训练的模型预测性能被证明在临床上是有用的,并且在MDD表型中优于最先进的预测因子。LSTM模型的性能值得在未来的相关研究中考虑使用。结论:具有来自自动编码器的预训练权重的LSTM模型能够优于基准和预训练的注意力模型。未来开发MDD风险模型的研究人员可能会受益于LSTM自动编码器预训练权重的使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
JAMIA Open
JAMIA Open Medicine-Health Informatics
CiteScore
4.10
自引率
4.80%
发文量
102
审稿时长
16 weeks
期刊最新文献
Implementation of a rule-based algorithm to find patients eligible for cancer clinical trials. Implications of mappings between International Classification of Diseases clinical diagnosis codes and Human Phenotype Ontology terms. MMFP-Tableau: enabling precision mitochondrial medicine through integration, visualization, and analytics of clinical and research health system electronic data. Addressing ethical issues in healthcare artificial intelligence using a lifecycle-informed process. Development of an evidence- and consensus-based Digital Healthcare Equity Framework.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1