Pub Date : 2024-02-23eCollection Date: 2024-01-01DOI: 10.34133/hds.0113
Yizhen Luo, Xing Yi Liu, Kai Yang, Kui Huang, Massimo Hong, Jiahuan Zhang, Yushuai Wu, Zaiqing Nie
Background: In real-world drug discovery, human experts typically grasp molecular knowledge of drugs and proteins from multimodal sources including molecular structures, structured knowledge from knowledge bases, and unstructured knowledge from biomedical literature. Existing multimodal approaches in AI drug discovery integrate either structured or unstructured knowledge independently, which compromises the holistic understanding of biomolecules. Besides, they fail to address the missing modality problem, where multimodal information is missing for novel drugs and proteins. Methods: In this work, we present KEDD, a unified, end-to-end deep learning framework that jointly incorporates both structured and unstructured knowledge for vast AI drug discovery tasks. The framework first incorporates independent representation learning models to extract the underlying characteristics from each modality. Then, it applies a feature fusion technique to calculate the prediction results. To mitigate the missing modality problem, we leverage sparse attention and a modality masking technique to reconstruct the missing features based on top relevant molecules. Results: Benefiting from structured and unstructured knowledge, our framework achieves a deeper understanding of biomolecules. KEDD outperforms state-of-the-art models by an average of 5.2% on drug-target interaction prediction, 2.6% on drug property prediction, 1.2% on drug-drug interaction prediction, and 4.1% on protein-protein interaction prediction. Through qualitative analysis, we reveal KEDD's promising potential in assisting real-world applications. Conclusions: By incorporating biomolecular expertise from multimodal knowledge, KEDD bears promise in accelerating drug discovery.
{"title":"Toward Unified AI Drug Discovery with Multimodal Knowledge.","authors":"Yizhen Luo, Xing Yi Liu, Kai Yang, Kui Huang, Massimo Hong, Jiahuan Zhang, Yushuai Wu, Zaiqing Nie","doi":"10.34133/hds.0113","DOIUrl":"10.34133/hds.0113","url":null,"abstract":"<p><p><b>Background:</b> In real-world drug discovery, human experts typically grasp molecular knowledge of drugs and proteins from multimodal sources including molecular structures, structured knowledge from knowledge bases, and unstructured knowledge from biomedical literature. Existing multimodal approaches in AI drug discovery integrate either structured or unstructured knowledge independently, which compromises the holistic understanding of biomolecules. Besides, they fail to address the missing modality problem, where multimodal information is missing for novel drugs and proteins. <b>Methods:</b> In this work, we present KEDD, a unified, end-to-end deep learning framework that jointly incorporates both structured and unstructured knowledge for vast AI drug discovery tasks. The framework first incorporates independent representation learning models to extract the underlying characteristics from each modality. Then, it applies a feature fusion technique to calculate the prediction results. To mitigate the missing modality problem, we leverage sparse attention and a modality masking technique to reconstruct the missing features based on top relevant molecules. <b>Results:</b> Benefiting from structured and unstructured knowledge, our framework achieves a deeper understanding of biomolecules. KEDD outperforms state-of-the-art models by an average of 5.2% on drug-target interaction prediction, 2.6% on drug property prediction, 1.2% on drug-drug interaction prediction, and 4.1% on protein-protein interaction prediction. Through qualitative analysis, we reveal KEDD's promising potential in assisting real-world applications. <b>Conclusions:</b> By incorporating biomolecular expertise from multimodal knowledge, KEDD bears promise in accelerating drug discovery.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10886071/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140133417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenhao Zhang, Yang Yang, Qinghua Cui, Dongyu Zhao, Chunmei Cui
{"title":"Identification and analysis of sex-biased copy number alterations","authors":"Chenhao Zhang, Yang Yang, Qinghua Cui, Dongyu Zhao, Chunmei Cui","doi":"10.34133/hds.0121","DOIUrl":"https://doi.org/10.34133/hds.0121","url":null,"abstract":"","PeriodicalId":73207,"journal":{"name":"Health data science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140442547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adeolu Z Ogunleye, Chayanit Piyawajanusorn, G. Ghislat, Pedro Ballester
{"title":"Large-scale machine learning analysis reveals DNA-methylation and gene-expression response signatures for gemcitabine-treated pancreatic cancer","authors":"Adeolu Z Ogunleye, Chayanit Piyawajanusorn, G. Ghislat, Pedro Ballester","doi":"10.34133/hds.0108","DOIUrl":"https://doi.org/10.34133/hds.0108","url":null,"abstract":"","PeriodicalId":73207,"journal":{"name":"Health data science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139007094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using mobile-phone data to assess socio-economic disparities in unhealthy food reliance during the COVID-19 pandemic","authors":"Charles Alba, Ruopeng An","doi":"10.34133/hds.0101","DOIUrl":"https://doi.org/10.34133/hds.0101","url":null,"abstract":"","PeriodicalId":73207,"journal":{"name":"Health data science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139208949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transforming health care through a learning health system approach in the digital era: Chronic kidney disease management in China","authors":"Guilan Kong, Jinwei Wang, Hongbo Lin, Beiyan Bao, Charles Friedman, Luxia Zhang","doi":"10.34133/hds.0102","DOIUrl":"https://doi.org/10.34133/hds.0102","url":null,"abstract":"","PeriodicalId":73207,"journal":{"name":"Health data science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139201723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Racha Gouareb, Alban Bornet, Dimitrios Proios, Sónia Gonçalves Pereira, Douglas Teodoro
{"title":"Detection of Patients at Risk of Multi-Drug Resistant Enterobacteriaceae Infection using Graph Neural Networks: a Retrospective Study","authors":"Racha Gouareb, Alban Bornet, Dimitrios Proios, Sónia Gonçalves Pereira, Douglas Teodoro","doi":"10.34133/hds.0099","DOIUrl":"https://doi.org/10.34133/hds.0099","url":null,"abstract":"","PeriodicalId":73207,"journal":{"name":"Health data science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135273078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recent progress in wearable brain-computer interface (BCI) devices based on electroencephalogram (EEG) for medical applications: A review","authors":"Jiayan Zhang, Junshi Li, Zhe Huang, Dong Huang, Huaiqiang Yu, Zhihong Li","doi":"10.34133/hds.0096","DOIUrl":"https://doi.org/10.34133/hds.0096","url":null,"abstract":"","PeriodicalId":73207,"journal":{"name":"Health data science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135366492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-02eCollection Date: 2023-01-01DOI: 10.34133/hds.0019
Nancy Kagendi, Matilu Mwau
Background: Machine learning models are not in routine use for predicting HIV status. Our objective is to describe the development of a machine learning model to predict HIV viral load (VL) hotspots as an early warning system in Kenya, based on routinely collected data by affiliate entities of the Ministry of Health. Based on World Health Organization's recommendations, hotspots are health facilities with ≥20% people living with HIV whose VL is not suppressed. Prediction of VL hotspots provides an early warning system to health administrators to optimize treatment and resources distribution.
Methods: A random forest model was built to predict the hotspot status of a health facility in the upcoming month, starting from 2016. Prior to model building, the datasets were cleaned and checked for outliers and multicollinearity at the patient level. The patient-level data were aggregated up to the facility level before model building. We analyzed data from 4 million tests and 4,265 facilities. The dataset at the health facility level was divided into train (75%) and test (25%) datasets.
Results: The model discriminates hotspots from non-hotspots with an accuracy of 78%. The F1 score of the model is 69% and the Brier score is 0.139. In December 2019, our model correctly predicted 434 VL hotspots in addition to the observed 446 VL hotspots.
Conclusion: The hotspot mapping model can be essential to antiretroviral therapy programs. This model can provide support to decision-makers to identify VL hotspots ahead in time using cost-efficient routinely collected data.
{"title":"A Machine Learning Approach to Predict HIV Viral Load Hotspots in Kenya Using Real-World Data.","authors":"Nancy Kagendi, Matilu Mwau","doi":"10.34133/hds.0019","DOIUrl":"10.34133/hds.0019","url":null,"abstract":"<p><strong>Background: </strong>Machine learning models are not in routine use for predicting HIV status. Our objective is to describe the development of a machine learning model to predict HIV viral load (VL) hotspots as an early warning system in Kenya, based on routinely collected data by affiliate entities of the Ministry of Health. Based on World Health Organization's recommendations, hotspots are health facilities with ≥20% people living with HIV whose VL is not suppressed. Prediction of VL hotspots provides an early warning system to health administrators to optimize treatment and resources distribution.</p><p><strong>Methods: </strong>A random forest model was built to predict the hotspot status of a health facility in the upcoming month, starting from 2016. Prior to model building, the datasets were cleaned and checked for outliers and multicollinearity at the patient level. The patient-level data were aggregated up to the facility level before model building. We analyzed data from 4 million tests and 4,265 facilities. The dataset at the health facility level was divided into train (75%) and test (25%) datasets.</p><p><strong>Results: </strong>The model discriminates hotspots from non-hotspots with an accuracy of 78%. The F1 score of the model is 69% and the Brier score is 0.139. In December 2019, our model correctly predicted 434 VL hotspots in addition to the observed 446 VL hotspots.</p><p><strong>Conclusion: </strong>The hotspot mapping model can be essential to antiretroviral therapy programs. This model can provide support to decision-makers to identify VL hotspots ahead in time using cost-efficient routinely collected data.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10880164/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48874541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}