Pub Date : 2024-01-12DOI: 10.1016/j.patter.2023.100893
Vasileios C. Pezoulas, Fanis Kalatzis, Themis P. Exarchos, Andreas Goules, Athanasios G. Tzioufas, Dimitrios I. Fotiadis
Although several studies have deployed gradient boosting trees (GBT) as a robust classifier for federated learning tasks (federated GBT [FGBT]), even with dropout rates (federated gradient boosting trees with dropout rate [FDART]), none of them have investigated the overfitting effects of FGBT across heterogeneous and highly imbalanced datasets within federated environments nor the effect of dropouts in the loss function. In this work, we present the federated hybrid boosted forests (FHBF) algorithm, which incorporates a hybrid weight update approach to overcome ill-posed problems that arise from overfitting effects during the training across highly imbalanced datasets in the cloud. Eight case studies were conducted to stress the performance of FHBF against existing algorithms toward the development of robust AI models for lymphoma development across 18 European federated databases. Our results highlight the robustness of FHBF, yielding an average loss of 0.527 compared with FGBT (0.611) and FDART (0.584) with increased classification performance (0.938 sensitivity, 0.732 specificity).
{"title":"FHBF: Federated hybrid boosted forests with dropout rates for supervised learning tasks across highly imbalanced clinical datasets","authors":"Vasileios C. Pezoulas, Fanis Kalatzis, Themis P. Exarchos, Andreas Goules, Athanasios G. Tzioufas, Dimitrios I. Fotiadis","doi":"10.1016/j.patter.2023.100893","DOIUrl":"https://doi.org/10.1016/j.patter.2023.100893","url":null,"abstract":"<p>Although several studies have deployed gradient boosting trees (GBT) as a robust classifier for federated learning tasks (federated GBT [FGBT]), even with dropout rates (federated gradient boosting trees with dropout rate [FDART]), none of them have investigated the overfitting effects of FGBT across heterogeneous and highly imbalanced datasets within federated environments nor the effect of dropouts in the loss function. In this work, we present the federated hybrid boosted forests (FHBF) algorithm, which incorporates a hybrid weight update approach to overcome ill-posed problems that arise from overfitting effects during the training across highly imbalanced datasets in the cloud. Eight case studies were conducted to stress the performance of FHBF against existing algorithms toward the development of robust AI models for lymphoma development across 18 European federated databases. Our results highlight the robustness of FHBF, yielding an average loss of 0.527 compared with FGBT (0.611) and FDART (0.584) with increased classification performance (0.938 sensitivity, 0.732 specificity).</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139464463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-12DOI: 10.1016/j.patter.2023.100912
Hanchuan Peng, Peng Xie, Feng Xiong
In a recent paper at Patterns, Hanchuan Peng, Peng Xie, and Feng Xiong from Southeast University describe a deep learning method to characterize complete single-neuron morphologies, which can discover neuron projection patterns of diverse cells and learn neuronal morphology representation. In this interview, the authors shared the story behind the paper and their research experience.
This interview is a companion to these authors’ recent paper, “DSM: Deep sequential model for complete neuronal morphology representation and feature extraction.”1
{"title":"Meet the authors: Hanchuan Peng, Peng Xie, and Feng Xiong","authors":"Hanchuan Peng, Peng Xie, Feng Xiong","doi":"10.1016/j.patter.2023.100912","DOIUrl":"https://doi.org/10.1016/j.patter.2023.100912","url":null,"abstract":"<p>In a recent paper at <em>Patterns</em>, Hanchuan Peng, Peng Xie, and Feng Xiong from Southeast University describe a deep learning method to characterize complete single-neuron morphologies, which can discover neuron projection patterns of diverse cells and learn neuronal morphology representation. In this interview, the authors shared the story behind the paper and their research experience.</p><p>This interview is a companion to these authors’ recent paper, “DSM: Deep sequential model for complete neuronal morphology representation and feature extraction.”<span><sup>1</sup></span></p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139465357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-12DOI: 10.1016/j.patter.2023.100916
Andrew L. Hufton
Abstract not available
无摘要
{"title":"Looking forward to the new year","authors":"Andrew L. Hufton","doi":"10.1016/j.patter.2023.100916","DOIUrl":"https://doi.org/10.1016/j.patter.2023.100916","url":null,"abstract":"Abstract not available","PeriodicalId":36242,"journal":{"name":"Patterns","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139465367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-12DOI: 10.1016/j.patter.2023.100894
Li-Ju Wang, Michael Ning, Tapsya Nayak, Michael J. Kasper, Satdarshan P. Monga, Yufei Huang, Yidong Chen, Yu-Chiao Chiu
Advancing precision oncology requires accurate prediction of treatment response and accessible prediction models. To this end, we present shinyDeepDR, a user-friendly implementation of our innovative deep learning model, DeepDR, for predicting anti-cancer drug sensitivity. The web tool makes DeepDR more accessible to researchers without extensive programming experience. Using shinyDeepDR, users can upload mutation and/or gene expression data from a cancer sample (cell line or tumor) and perform two main functions: "Find Drug," which predicts the sample’s response to 265 approved and investigational anti-cancer compounds, and "Find Sample," which searches for cell lines in the Cancer Cell Line Encyclopedia (CCLE) and tumors in The Cancer Genome Atlas (TCGA) with genomics profiles similar to those of the query sample to study potential effective treatments. shinyDeepDR provides an interactive interface to interpret prediction results and to investigate individual compounds. In conclusion, shinyDeepDR is an intuitive and free-to-use web tool for in silico anti-cancer drug screening.
{"title":"shinyDeepDR: A user-friendly R Shiny app for predicting anti-cancer drug response using deep learning","authors":"Li-Ju Wang, Michael Ning, Tapsya Nayak, Michael J. Kasper, Satdarshan P. Monga, Yufei Huang, Yidong Chen, Yu-Chiao Chiu","doi":"10.1016/j.patter.2023.100894","DOIUrl":"https://doi.org/10.1016/j.patter.2023.100894","url":null,"abstract":"<p>Advancing precision oncology requires accurate prediction of treatment response and accessible prediction models. To this end, we present shinyDeepDR, a user-friendly implementation of our innovative deep learning model, DeepDR, for predicting anti-cancer drug sensitivity. The web tool makes DeepDR more accessible to researchers without extensive programming experience. Using shinyDeepDR, users can upload mutation and/or gene expression data from a cancer sample (cell line or tumor) and perform two main functions: \"Find Drug,\" which predicts the sample’s response to 265 approved and investigational anti-cancer compounds, and \"Find Sample,\" which searches for cell lines in the Cancer Cell Line Encyclopedia (CCLE) and tumors in The Cancer Genome Atlas (TCGA) with genomics profiles similar to those of the query sample to study potential effective treatments. shinyDeepDR provides an interactive interface to interpret prediction results and to investigate individual compounds. In conclusion, shinyDeepDR is an intuitive and free-to-use web tool for <em>in silico</em> anti-cancer drug screening.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139465104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-12DOI: 10.1016/j.patter.2023.100914
Roel Roscam Abbing, Robert W. Gehl
Since Elon Musk’s purchase of Twitter/X and subsequent changes to that platform, computational social science researchers may be considering shifting their research programs to Mastodon and the fediverse. This article sounds several notes of caution about such a shift. We explain key differences between the fediverse and X, ultimately arguing that research must be with the fediverse, not on it.
自从埃隆-马斯克收购 Twitter/X 并随后对该平台进行修改后,计算社会科学研究人员可能会考虑将他们的研究项目转移到 Mastodon 和 fediverse 上。本文对这种转变提出了几点警示。我们解释了联邦宇宙和 X 之间的主要区别,最终认为研究必须与联邦宇宙一起进行,而不是在其上进行。
{"title":"Shifting your research from X to Mastodon? Here’s what you need to know","authors":"Roel Roscam Abbing, Robert W. Gehl","doi":"10.1016/j.patter.2023.100914","DOIUrl":"https://doi.org/10.1016/j.patter.2023.100914","url":null,"abstract":"<p>Since Elon Musk’s purchase of Twitter/X and subsequent changes to that platform, computational social science researchers may be considering shifting their research programs to Mastodon and the fediverse. This article sounds several notes of caution about such a shift. We explain key differences between the fediverse and X, ultimately arguing that research must be with the fediverse, not on it.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139470806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-11DOI: 10.1016/j.patter.2023.100911
Yi Liu, Yuelei Zhang, Xiao Chang, Xiaoping Liu
Crosstalk among cells is vital for maintaining the biological function and intactness of systems. Most existing methods for investigating cell-cell communications are based on ligand-receptor (L-R) expression, and they focus on the study between two cells. Thus, the final communication inference results are particularly sensitive to the completeness and accuracy of the prior biological knowledge. Because existing L-R research focuses mainly on humans, most existing methods can only examine cell-cell communication for humans. As far as we know, there is currently no effective method to overcome this species limitation. Here, we propose MDIC3 (matrix decomposition to infer cell-cell communication), an unsupervised tool to investigate cell-cell communication in any species, and the results are not limited by specific L-R pairs or signaling pathways. By comparing it with existing methods for the inference of cell-cell communication, MDIC3 obtained better performance in both humans and mice.
{"title":"MDIC3: Matrix decomposition to infer cell-cell communication","authors":"Yi Liu, Yuelei Zhang, Xiao Chang, Xiaoping Liu","doi":"10.1016/j.patter.2023.100911","DOIUrl":"https://doi.org/10.1016/j.patter.2023.100911","url":null,"abstract":"<p>Crosstalk among cells is vital for maintaining the biological function and intactness of systems. Most existing methods for investigating cell-cell communications are based on ligand-receptor (L-R) expression, and they focus on the study between two cells. Thus, the final communication inference results are particularly sensitive to the completeness and accuracy of the prior biological knowledge. Because existing L-R research focuses mainly on humans, most existing methods can only examine cell-cell communication for humans. As far as we know, there is currently no effective method to overcome this species limitation. Here, we propose MDIC3 (matrix decomposition to infer cell-cell communication), an unsupervised tool to investigate cell-cell communication in any species, and the results are not limited by specific L-R pairs or signaling pathways. By comparing it with existing methods for the inference of cell-cell communication, MDIC3 obtained better performance in both humans and mice.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139464318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-08DOI: 10.1016/j.patter.2023.100910
Michael R. Waters, Matthew Inkman, Kay Jayachandran, Roman M. Kowalchuk, Clifford Robinson, Julie K. Schwarz, S. Joshua Swamidass, Obi L. Griffith, Jeffrey J. Szymanski, Jin Zhang
Big genomic data and artificial intelligence (AI) are ushering in an era of precision medicine, providing opportunities to study previously under-represented subtypes and rare diseases rather than categorize them as variances. However, clinical researchers face challenges in accessing such novel technologies as well as reliable methods to study small datasets or subcohorts with unique phenotypes. To address this need, we developed an integrative approach, GAiN, to capture patterns of gene expression from small datasets on the basis of an ensemble of generative adversarial networks (GANs) while leveraging big population data. Where conventional biostatistical methods fail, GAiN reliably discovers differentially expressed genes (DEGs) and enriched pathways between two cohorts with limited numbers of samples (n = 10) when benchmarked against a gold standard. GAiN is freely available at GitHub. Thus, GAiN may serve as a crucial tool for gene expression analysis in scenarios with limited samples, as in the context of rare diseases, under-represented populations, or limited investigator resources.
{"title":"GAiN: An integrative tool utilizing generative adversarial neural networks for augmented gene expression analysis","authors":"Michael R. Waters, Matthew Inkman, Kay Jayachandran, Roman M. Kowalchuk, Clifford Robinson, Julie K. Schwarz, S. Joshua Swamidass, Obi L. Griffith, Jeffrey J. Szymanski, Jin Zhang","doi":"10.1016/j.patter.2023.100910","DOIUrl":"https://doi.org/10.1016/j.patter.2023.100910","url":null,"abstract":"<p>Big genomic data and artificial intelligence (AI) are ushering in an era of precision medicine, providing opportunities to study previously under-represented subtypes and rare diseases rather than categorize them as variances. However, clinical researchers face challenges in accessing such novel technologies as well as reliable methods to study small datasets or subcohorts with unique phenotypes. To address this need, we developed an integrative approach, GAiN, to capture patterns of gene expression from small datasets on the basis of an ensemble of generative adversarial networks (GANs) while leveraging big population data. Where conventional biostatistical methods fail, GAiN reliably discovers differentially expressed genes (DEGs) and enriched pathways between two cohorts with limited numbers of samples (n = 10) when benchmarked against a gold standard. GAiN is freely available at GitHub. Thus, GAiN may serve as a crucial tool for gene expression analysis in scenarios with limited samples, as in the context of rare diseases, under-represented populations, or limited investigator resources.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139397206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MicroRNAs are recognized as key drivers in many cancers but targeting them with small molecules remains a challenge. We present RiboStrike, a deep-learning framework that identifies small molecules against specific microRNAs. To demonstrate its capabilities, we applied it to microRNA-21 (miR-21), a known driver of breast cancer. To ensure selectivity toward miR-21, we performed counter-screens against miR-122 and DICER. Auxiliary models were used to evaluate toxicity and rank the candidates. Learning from various datasets, we screened a pool of nine million molecules and identified eight, three of which showed anti-miR-21 activity in both reporter assays and RNA sequencing experiments. Target selectivity of these compounds was assessed using microRNA profiling and RNA sequencing analysis. The top candidate was tested in a xenograft mouse model of breast cancer metastasis, demonstrating a significant reduction in lung metastases. These results demonstrate RiboStrike’s ability to nominate compounds that target the activity of miRNAs in cancer.
{"title":"Functional microRNA-targeting drug discovery by graph-based deep learning","authors":"Arash Keshavarzi Arshadi, Milad Salem, Heather Karner, Kristle Garcia, Abolfazl Arab, Jiann Shiun Yuan, Hani Goodarzi","doi":"10.1016/j.patter.2023.100909","DOIUrl":"https://doi.org/10.1016/j.patter.2023.100909","url":null,"abstract":"<p>MicroRNAs are recognized as key drivers in many cancers but targeting them with small molecules remains a challenge. We present RiboStrike, a deep-learning framework that identifies small molecules against specific microRNAs. To demonstrate its capabilities, we applied it to microRNA-21 (miR-21), a known driver of breast cancer. To ensure selectivity toward miR-21, we performed counter-screens against miR-122 and DICER. Auxiliary models were used to evaluate toxicity and rank the candidates. Learning from various datasets, we screened a pool of nine million molecules and identified eight, three of which showed anti-miR-21 activity in both reporter assays and RNA sequencing experiments. Target selectivity of these compounds was assessed using microRNA profiling and RNA sequencing analysis. The top candidate was tested in a xenograft mouse model of breast cancer metastasis, demonstrating a significant reduction in lung metastases. These results demonstrate RiboStrike’s ability to nominate compounds that target the activity of miRNAs in cancer.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139083942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-28DOI: 10.1016/j.patter.2023.100907
Fengda Zhang, Zitao Shuai, Kun Kuang, Fei Wu, Yueting Zhuang, Jun Xiao
Federated learning (FL) is a promising approach for healthcare institutions to train high-quality medical models collaboratively while protecting sensitive data privacy. However, FL models encounter fairness issues at diverse levels, leading to performance disparities across different subpopulations. To address this, we propose Federated Learning with Unified Fairness Objective (FedUFO), a unified framework consolidating diverse fairness levels within FL. By leveraging distributionally robust optimization and a unified uncertainty set, it ensures consistent performance across all subpopulations and enhances the overall efficacy of FL in healthcare and other domains while maintaining accuracy levels comparable with those of existing methods. Our model was validated by applying it to four digital healthcare tasks using real-world datasets in federated settings. Our collaborative machine learning paradigm not only promotes artificial intelligence in digital healthcare but also fosters social equity by embodying fairness.
{"title":"Unified fair federated learning for digital healthcare","authors":"Fengda Zhang, Zitao Shuai, Kun Kuang, Fei Wu, Yueting Zhuang, Jun Xiao","doi":"10.1016/j.patter.2023.100907","DOIUrl":"https://doi.org/10.1016/j.patter.2023.100907","url":null,"abstract":"<p>Federated learning (FL) is a promising approach for healthcare institutions to train high-quality medical models collaboratively while protecting sensitive data privacy. However, FL models encounter fairness issues at diverse levels, leading to performance disparities across different subpopulations. To address this, we propose Federated Learning with Unified Fairness Objective (FedUFO), a unified framework consolidating diverse fairness levels within FL. By leveraging distributionally robust optimization and a unified uncertainty set, it ensures consistent performance across all subpopulations and enhances the overall efficacy of FL in healthcare and other domains while maintaining accuracy levels comparable with those of existing methods. Our model was validated by applying it to four digital healthcare tasks using real-world datasets in federated settings. Our collaborative machine learning paradigm not only promotes artificial intelligence in digital healthcare but also fosters social equity by embodying fairness.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2023-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139070983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-27DOI: 10.1016/j.patter.2023.100906
Jun Wen, Jue Hou, Clara-Lea Bonzel, Yihan Zhao, Victor M. Castro, Vivian S. Gainer, Dana Weisenfeld, Tianrun Cai, Yuk-Lam Ho, Vidul A. Panickan, Lauren Costa, Chuan Hong, J. Michael Gaziano, Katherine P. Liao, Junwei Lu, Kelly Cho, Tianxi Cai
Electronic health record (EHR) data are increasingly used to support real-world evidence studies but are limited by the lack of precise timings of clinical events. Here, we propose a label-efficient incident phenotyping (LATTE) algorithm to accurately annotate the timing of clinical events from longitudinal EHR data. By leveraging the pre-trained semantic embeddings, LATTE selects predictive features and compresses their information into longitudinal visit embeddings through visit attention learning. LATTE models the sequential dependency between the target event and visit embeddings to derive the timings. To improve label efficiency, LATTE constructs longitudinal silver-standard labels from unlabeled patients to perform semi-supervised training. LATTE is evaluated on the onset of type 2 diabetes, heart failure, and relapses of multiple sclerosis. LATTE consistently achieves substantial improvements over benchmark methods while providing high prediction interpretability. The event timings are shown to help discover risk factors of heart failure among patients with rheumatoid arthritis.
{"title":"LATTE: Label-efficient incident phenotyping from longitudinal electronic health records","authors":"Jun Wen, Jue Hou, Clara-Lea Bonzel, Yihan Zhao, Victor M. Castro, Vivian S. Gainer, Dana Weisenfeld, Tianrun Cai, Yuk-Lam Ho, Vidul A. Panickan, Lauren Costa, Chuan Hong, J. Michael Gaziano, Katherine P. Liao, Junwei Lu, Kelly Cho, Tianxi Cai","doi":"10.1016/j.patter.2023.100906","DOIUrl":"https://doi.org/10.1016/j.patter.2023.100906","url":null,"abstract":"<p>Electronic health record (EHR) data are increasingly used to support real-world evidence studies but are limited by the lack of precise timings of clinical events. Here, we propose a label-efficient incident phenotyping (LATTE) algorithm to accurately annotate the timing of clinical events from longitudinal EHR data. By leveraging the pre-trained semantic embeddings, LATTE selects predictive features and compresses their information into longitudinal visit embeddings through visit attention learning. LATTE models the sequential dependency between the target event and visit embeddings to derive the timings. To improve label efficiency, LATTE constructs longitudinal silver-standard labels from unlabeled patients to perform semi-supervised training. LATTE is evaluated on the onset of type 2 diabetes, heart failure, and relapses of multiple sclerosis. LATTE consistently achieves substantial improvements over benchmark methods while providing high prediction interpretability. The event timings are shown to help discover risk factors of heart failure among patients with rheumatoid arthritis.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":null,"pages":null},"PeriodicalIF":6.5,"publicationDate":"2023-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139053941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}