首页 > 最新文献

Health data science最新文献

英文 中文
Integrating Machine Learning into Statistical Methods in Disease Risk Prediction Modeling: A Systematic Review. 将机器学习融入疾病风险预测建模的统计方法:系统综述。
Pub Date : 2024-07-23 eCollection Date: 2024-01-01 DOI: 10.34133/hds.0165
Meng Zhang, Yongqi Zheng, Xiagela Maidaiti, Baosheng Liang, Yongyue Wei, Feng Sun

Background: Disease prediction models often use statistical methods or machine learning, both with their own corresponding application scenarios, raising the risk of errors when used alone. Integrating machine learning into statistical methods may yield robust prediction models. This systematic review aims to comprehensively assess current development of global disease prediction integration models. Methods: PubMed, EMbase, Web of Science, CNKI, VIP, WanFang, and SinoMed databases were searched to collect studies on prediction models integrating machine learning into statistical methods from database inception to 2023 May 1. Information including basic characteristics of studies, integrating approaches, application scenarios, modeling details, and model performance was extracted. Results: A total of 20 eligible studies in English and 1 in Chinese were included. Five studies concentrated on diagnostic models, while 16 studies concentrated on predicting disease occurrence or prognosis. Integrating strategies of classification models included majority voting, weighted voting, stacking, and model selection (when statistical methods and machine learning disagreed). Regression models adopted strategies including simple statistics, weighted statistics, and stacking. AUROC of integration models surpassed 0.75 and performed better than statistical methods and machine learning in most studies. Stacking was used for situations with >100 predictors and needed relatively larger amount of training data. Conclusion: Research on integrating machine learning into statistical methods in prediction models remains limited, but some studies have exhibited great potential that integration models outperform single models. This study provides insights for the selection of integration methods for different scenarios. Future research could emphasize on the improvement and validation of integrating strategies.

背景:疾病预测模型通常使用统计方法或机器学习,这两种方法都有各自相应的应用场景,单独使用时会增加出错的风险。将机器学习融入统计方法可能会产生稳健的预测模型。本系统综述旨在全面评估当前全球疾病预测整合模型的发展情况。研究方法检索PubMed、EMbase、Web of Science、CNKI、VIP、万方和SinoMed数据库,收集从数据库建立到2023年5月1日有关将机器学习融入统计方法的预测模型的研究。提取的信息包括研究的基本特征、整合方法、应用场景、建模细节和模型性能。结果:共纳入了 20 项符合条件的英文研究和 1 项中文研究。其中 5 项研究侧重于诊断模型,16 项研究侧重于预测疾病的发生或预后。分类模型的整合策略包括多数投票、加权投票、堆叠和模型选择(当统计方法和机器学习出现分歧时)。回归模型采用的策略包括简单统计、加权统计和堆叠。在大多数研究中,整合模型的 AUROC 超过 0.75,表现优于统计方法和机器学习。堆叠用于预测因子大于 100 个的情况,需要相对较多的训练数据。结论在预测模型中将机器学习与统计方法相结合的研究仍然有限,但一些研究显示出整合模型优于单一模型的巨大潜力。本研究为在不同情况下选择集成方法提供了启示。未来的研究可以重点关注整合策略的改进和验证。
{"title":"Integrating Machine Learning into Statistical Methods in Disease Risk Prediction Modeling: A Systematic Review.","authors":"Meng Zhang, Yongqi Zheng, Xiagela Maidaiti, Baosheng Liang, Yongyue Wei, Feng Sun","doi":"10.34133/hds.0165","DOIUrl":"https://doi.org/10.34133/hds.0165","url":null,"abstract":"<p><p><b>Background:</b> Disease prediction models often use statistical methods or machine learning, both with their own corresponding application scenarios, raising the risk of errors when used alone. Integrating machine learning into statistical methods may yield robust prediction models. This systematic review aims to comprehensively assess current development of global disease prediction integration models. <b>Methods:</b> PubMed, EMbase, Web of Science, CNKI, VIP, WanFang, and SinoMed databases were searched to collect studies on prediction models integrating machine learning into statistical methods from database inception to 2023 May 1. Information including basic characteristics of studies, integrating approaches, application scenarios, modeling details, and model performance was extracted. <b>Results:</b> A total of 20 eligible studies in English and 1 in Chinese were included. Five studies concentrated on diagnostic models, while 16 studies concentrated on predicting disease occurrence or prognosis. Integrating strategies of classification models included majority voting, weighted voting, stacking, and model selection (when statistical methods and machine learning disagreed). Regression models adopted strategies including simple statistics, weighted statistics, and stacking. AUROC of integration models surpassed 0.75 and performed better than statistical methods and machine learning in most studies. Stacking was used for situations with >100 predictors and needed relatively larger amount of training data. <b>Conclusion:</b> Research on integrating machine learning into statistical methods in prediction models remains limited, but some studies have exhibited great potential that integration models outperform single models. This study provides insights for the selection of integration methods for different scenarios. Future research could emphasize on the improvement and validation of integrating strategies.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11266123/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141763065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
2023 Beijing Health Data Science Summit. 2023 北京健康数据科学峰会。
Pub Date : 2024-06-07 eCollection Date: 2024-01-01 DOI: 10.34133/hds.0112

The 5th annual Beijing Health Data Science Summit, organized by the National Institute of Health Data Science at Peking University, recently concluded with resounding success. This year, the summit aimed to foster collaboration among researchers, practitioners, and stakeholders in the field of health data science to advance the use of data for better health outcomes. One significant highlight of this year's summit was the introduction of the Abstract Competition, organized by Health Data Science, a Science Partner Journal, which focused on the use of cutting-edge data science methodologies, particularly the application of artificial intelligence in the healthcare scenarios. The competition provided a platform for researchers to showcase their groundbreaking work and innovations. In total, the summit received 61 abstract submissions. Following a rigorous evaluation process by the Abstract Review Committee, eight exceptional abstracts were selected to compete in the final round and give presentations in the Abstract Competition. The winners of the Abstract Competition are as follows:•First Prize: "Interpretable Machine Learning for Predicting Outcomes of Childhood Kawasaki Disease: Electronic Health Record Analysis" presented by researchers from the Chinese Academy of Medical Sciences, Peking Union Medical College, and Chongqing Medical University (presenter Yifan Duan).•Second Prize: "Survival Disparities among Mobility Patterns of Patients with Cancer: A Population-Based Study" presented by a team from Peking University (presenter Fengyu Wen).•Third Prize: "Deep Learning-Based Real-Time Predictive Model for the Development of Acute Stroke" presented by researchers from Beijing Tiantan Hospital (presenter Lan Lan). We extend our heartfelt gratitude to the esteemed panel of judges whose expertise and dedication ensured the fairness and quality of the competition. The judging panel included Jiebo Luo from the University of Rochester (chair), Shenda Hong from Peking University, Xiaozhong Liu from Worcester Polytechnic Institute, Liu Yang from Hong Kong Baptist University, Ma Jianzhu from Tsinghua University, Ting Ma from Harbin Institute of Technology, and Jian Tang from Mila-Quebec Artificial Intelligence Institute. We wish to convey our deep appreciation to Zixuan He and Haoyang Hong for their invaluable assistance in the meticulous planning and execution of the event. As the 2023 Beijing Health Data Science Summit comes to a close, we look forward to welcoming all participants to join us in 2024. Together, we will continue to advance the frontiers of health data science and work toward a healthier future for all.

近日,由北京大学国家健康数据科学研究院主办的第五届北京健康数据科学峰会圆满落下帷幕。今年的峰会旨在促进健康数据科学领域的研究人员、从业人员和利益相关者之间的合作,推动数据的使用,以取得更好的健康成果。今年峰会的一大亮点是引入了由科学伙伴期刊《健康数据科学》组织的摘要竞赛,该竞赛侧重于前沿数据科学方法的使用,特别是人工智能在医疗保健场景中的应用。竞赛为研究人员提供了一个展示其突破性工作和创新的平台。峰会共收到 61 份摘要提交。经过摘要评审委员会的严格评审,最终有八份优秀摘要入围决赛,并在摘要竞赛中发表演讲。摘要竞赛的获奖者如下:--一等奖:一等奖:"预测儿童川崎病结果的可解释机器学习:一等奖:中国医学科学院、北京协和医学院和重庆医科大学的研究人员(演讲人:段一帆)提交的 "预测儿童川崎病预后的可解释机器学习:电子健康记录分析":二等奖:"癌症患者流动模式的生存差异:三等奖:"基于深度学习的实时预测":三等奖:"基于深度学习的急性脑卒中发病实时预测模型",由北京天坛医院的研究人员(演讲者兰兰)提交。我们衷心感谢尊敬的评审团,他们的专业知识和敬业精神确保了竞赛的公平性和质量。评审团成员包括罗切斯特大学的罗杰波(主席)、北京大学的洪申达、伍斯特理工学院的刘晓钟、香港浸会大学的刘洋、清华大学的马建柱、哈尔滨工业大学的马婷和魁北克米拉人工智能研究所的唐健。何子璇和洪浩洋为本次活动的精心策划和执行提供了宝贵的帮助,在此深表感谢。2023 北京健康数据科学峰会即将落下帷幕,我们期待着 2024 年所有与会者的加入。我们将携手并进,继续推动健康数据科学的前沿发展,为所有人创造更加健康的未来而努力。
{"title":"2023 Beijing Health Data Science Summit.","authors":"","doi":"10.34133/hds.0112","DOIUrl":"10.34133/hds.0112","url":null,"abstract":"<p><p>The 5th annual Beijing Health Data Science Summit, organized by the National Institute of Health Data Science at Peking University, recently concluded with resounding success. This year, the summit aimed to foster collaboration among researchers, practitioners, and stakeholders in the field of health data science to advance the use of data for better health outcomes. One significant highlight of this year's summit was the introduction of the Abstract Competition, organized by <i>Health Data Science</i>, a Science Partner Journal, which focused on the use of cutting-edge data science methodologies, particularly the application of artificial intelligence in the healthcare scenarios. The competition provided a platform for researchers to showcase their groundbreaking work and innovations. In total, the summit received 61 abstract submissions. Following a rigorous evaluation process by the Abstract Review Committee, eight exceptional abstracts were selected to compete in the final round and give presentations in the Abstract Competition. The winners of the Abstract Competition are as follows:•First Prize: \"Interpretable Machine Learning for Predicting Outcomes of Childhood Kawasaki Disease: Electronic Health Record Analysis\" presented by researchers from the Chinese Academy of Medical Sciences, Peking Union Medical College, and Chongqing Medical University (presenter Yifan Duan).•Second Prize: \"Survival Disparities among Mobility Patterns of Patients with Cancer: A Population-Based Study\" presented by a team from Peking University (presenter Fengyu Wen).•Third Prize: \"Deep Learning-Based Real-Time Predictive Model for the Development of Acute Stroke\" presented by researchers from Beijing Tiantan Hospital (presenter Lan Lan). We extend our heartfelt gratitude to the esteemed panel of judges whose expertise and dedication ensured the fairness and quality of the competition. The judging panel included Jiebo Luo from the University of Rochester (chair), Shenda Hong from Peking University, Xiaozhong Liu from Worcester Polytechnic Institute, Liu Yang from Hong Kong Baptist University, Ma Jianzhu from Tsinghua University, Ting Ma from Harbin Institute of Technology, and Jian Tang from Mila-Quebec Artificial Intelligence Institute. We wish to convey our deep appreciation to Zixuan He and Haoyang Hong for their invaluable assistance in the meticulous planning and execution of the event. As the 2023 Beijing Health Data Science Summit comes to a close, we look forward to welcoming all participants to join us in 2024. Together, we will continue to advance the frontiers of health data science and work toward a healthier future for all.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11157085/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141297495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Associations of Socioeconomic Status Inequity with Incident Age-related Macular Degeneration in Middle-aged and Elderly Population 社会经济地位不平等与中老年人群老年黄斑变性发病率的关系
Pub Date : 2024-05-19 DOI: 10.34133/hds.0148
Yanlin Qu, Guanran Zhang, Zhenyu Wu, H. Luo, Renjie Chen, Huixun Jia, Xiaodong Sun
{"title":"Associations of Socioeconomic Status Inequity with Incident Age-related Macular Degeneration in Middle-aged and Elderly Population","authors":"Yanlin Qu, Guanran Zhang, Zhenyu Wu, H. Luo, Renjie Chen, Huixun Jia, Xiaodong Sun","doi":"10.34133/hds.0148","DOIUrl":"https://doi.org/10.34133/hds.0148","url":null,"abstract":"","PeriodicalId":73207,"journal":{"name":"Health data science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141123592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Association between abortion and all-cause and cause-specific premature mortality: a prospective cohort study from the UK Biobank 人工流产与全因和特定原因过早死亡之间的关系:英国生物库前瞻性队列研究
Pub Date : 2024-05-19 DOI: 10.34133/hds.0147
Shaohua Yin, Yingying Yang, Qin Wang, Wei Guo, Qian He, Lei Yuan, Keyi Si
{"title":"Association between abortion and all-cause and cause-specific premature mortality: a prospective cohort study from the UK Biobank","authors":"Shaohua Yin, Yingying Yang, Qin Wang, Wei Guo, Qian He, Lei Yuan, Keyi Si","doi":"10.34133/hds.0147","DOIUrl":"https://doi.org/10.34133/hds.0147","url":null,"abstract":"","PeriodicalId":73207,"journal":{"name":"Health data science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141124542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Association Between Body Mass Index and Brain Health in Adults: A 16-Year Population-Based Cohort and Mendelian Randomization Study 成人体重指数与脑健康之间的关系:一项为期 16 年的基于人群的队列和孟德尔随机研究
Pub Date : 2024-03-01 DOI: 10.34133/hds.0087
Han Lv, Na Zeng, Mengyi Li, Jing Sun, Ning Wu, Mingze Xu, Qian Chen, Xinyu Zhao, Shuohua Chen, Wenjuan Liu, Xiaoshuai Li, Pengfei Zhao, Max Wintermark, Ying Hui, Jing Li, Shouling Wu, Zhenchang Wang
{"title":"Association Between Body Mass Index and Brain Health in Adults: A 16-Year Population-Based Cohort and Mendelian Randomization Study","authors":"Han Lv, Na Zeng, Mengyi Li, Jing Sun, Ning Wu, Mingze Xu, Qian Chen, Xinyu Zhao, Shuohua Chen, Wenjuan Liu, Xiaoshuai Li, Pengfei Zhao, Max Wintermark, Ying Hui, Jing Li, Shouling Wu, Zhenchang Wang","doi":"10.34133/hds.0087","DOIUrl":"https://doi.org/10.34133/hds.0087","url":null,"abstract":"","PeriodicalId":73207,"journal":{"name":"Health data science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140085080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Do Scholars Respond Faster Than Google Trends in Discussing COVID-19 Issues? An Approach to Textual Big Data. 学者在讨论 COVID-19 问题时的反应速度是否快于谷歌趋势?文本大数据的一种方法。
Pub Date : 2024-02-26 eCollection Date: 2024-01-01 DOI: 10.34133/hds.0116
Benson Shu Yan Lam, Amanda Man Ying Chu, Jacky Ngai Lam Chan, Mike Ka Pui So

Background: The COVID-19 pandemic has posed various difficulties for policymakers, such as the identification of health issues, establishment of policy priorities, formulation of regulations, and promotion of economic competitiveness. Evidence-based practices and data-driven decision-making have been recognized as valuable tools for improving the policymaking process. Nevertheless, due to the abundance of data, there is a need to develop sophisticated analytical techniques and tools to efficiently extract and analyze the data. Methods: Using Oxford COVID-19 Government Response Tracker, we categorize the policy responses into 6 different categories: (a) containment and closure, (b) health systems, (c) vaccines, (d) economic, (e) country, and (f) others. We proposed a novel research framework to compare the response times of the scholars and the general public. To achieve this, we analyzed more than 400,000 research abstracts published over the past 2.5 years, along with text information from Google Trends as a proxy for topics of public concern. We introduced an innovative text-mining method: coherent topic clustering to analyze the huge number of abstracts. Results: Our results show that the research abstracts not only discussed almost all of the COVID-19 issues earlier than Google Trends did, but they also provided more in-depth coverage. This should help policymakers identify core COVID-19 issues and act earlier. Besides, our clustering method can better reflect the main messages of the abstracts than a recent advanced deep learning-based topic modeling tool. Conclusion: Scholars generally have a faster response in discussing COVID-19 issues than Google Trends.

背景:COVID-19 大流行给政策制定者带来了各种困难,如确定健康问题、确立政策优先事项、制定法规和提高经济竞争力。循证实践和数据驱动决策已被视为改善决策过程的宝贵工具。然而,由于数据量巨大,有必要开发先进的分析技术和工具,以便有效地提取和分析数据。方法:利用牛津 COVID-19 政府响应跟踪器,我们将政策响应分为 6 个不同的类别:(a) 遏制和关闭;(b) 卫生系统;(c) 疫苗;(d) 经济;(e) 国家;(f) 其他。我们提出了一个新颖的研究框架来比较学者和公众的反应时间。为此,我们分析了过去 2.5 年中发表的 40 多万份研究摘要,以及谷歌趋势(Google Trends)中的文本信息,作为公众关注话题的代表。我们引入了一种创新的文本挖掘方法:连贯主题聚类来分析海量摘要。结果我们的结果表明,研究摘要不仅比谷歌趋势更早地讨论了 COVID-19 的几乎所有问题,而且还提供了更深入的报道。这应有助于政策制定者识别 COVID-19 的核心问题并尽早采取行动。此外,与最新的基于深度学习的主题建模工具相比,我们的聚类方法能更好地反映摘要的主要信息。结论与谷歌趋势相比,学者们在讨论 COVID-19 问题时通常反应更快。
{"title":"Do Scholars Respond Faster Than Google Trends in Discussing COVID-19 Issues? An Approach to Textual Big Data.","authors":"Benson Shu Yan Lam, Amanda Man Ying Chu, Jacky Ngai Lam Chan, Mike Ka Pui So","doi":"10.34133/hds.0116","DOIUrl":"10.34133/hds.0116","url":null,"abstract":"<p><p><b>Background:</b> The COVID-19 pandemic has posed various difficulties for policymakers, such as the identification of health issues, establishment of policy priorities, formulation of regulations, and promotion of economic competitiveness. Evidence-based practices and data-driven decision-making have been recognized as valuable tools for improving the policymaking process. Nevertheless, due to the abundance of data, there is a need to develop sophisticated analytical techniques and tools to efficiently extract and analyze the data. <b>Methods:</b> Using Oxford COVID-19 Government Response Tracker, we categorize the policy responses into 6 different categories: (a) containment and closure, (b) health systems, (c) vaccines, (d) economic, (e) country, and (f) others. We proposed a novel research framework to compare the response times of the scholars and the general public. To achieve this, we analyzed more than 400,000 research abstracts published over the past 2.5 years, along with text information from Google Trends as a proxy for topics of public concern. We introduced an innovative text-mining method: coherent topic clustering to analyze the huge number of abstracts. <b>Results:</b> Our results show that the research abstracts not only discussed almost all of the COVID-19 issues earlier than Google Trends did, but they also provided more in-depth coverage. This should help policymakers identify core COVID-19 issues and act earlier. Besides, our clustering method can better reflect the main messages of the abstracts than a recent advanced deep learning-based topic modeling tool. <b>Conclusion:</b> Scholars generally have a faster response in discussing COVID-19 issues than Google Trends.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10895931/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140133416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward Unified AI Drug Discovery with Multimodal Knowledge. 利用多模态知识实现统一的人工智能药物发现。
Pub Date : 2024-02-23 eCollection Date: 2024-01-01 DOI: 10.34133/hds.0113
Yizhen Luo, Xing Yi Liu, Kai Yang, Kui Huang, Massimo Hong, Jiahuan Zhang, Yushuai Wu, Zaiqing Nie

Background: In real-world drug discovery, human experts typically grasp molecular knowledge of drugs and proteins from multimodal sources including molecular structures, structured knowledge from knowledge bases, and unstructured knowledge from biomedical literature. Existing multimodal approaches in AI drug discovery integrate either structured or unstructured knowledge independently, which compromises the holistic understanding of biomolecules. Besides, they fail to address the missing modality problem, where multimodal information is missing for novel drugs and proteins. Methods: In this work, we present KEDD, a unified, end-to-end deep learning framework that jointly incorporates both structured and unstructured knowledge for vast AI drug discovery tasks. The framework first incorporates independent representation learning models to extract the underlying characteristics from each modality. Then, it applies a feature fusion technique to calculate the prediction results. To mitigate the missing modality problem, we leverage sparse attention and a modality masking technique to reconstruct the missing features based on top relevant molecules. Results: Benefiting from structured and unstructured knowledge, our framework achieves a deeper understanding of biomolecules. KEDD outperforms state-of-the-art models by an average of 5.2% on drug-target interaction prediction, 2.6% on drug property prediction, 1.2% on drug-drug interaction prediction, and 4.1% on protein-protein interaction prediction. Through qualitative analysis, we reveal KEDD's promising potential in assisting real-world applications. Conclusions: By incorporating biomolecular expertise from multimodal knowledge, KEDD bears promise in accelerating drug discovery.

背景:在现实世界的药物发现中,人类专家通常从多模态来源掌握药物和蛋白质的分子知识,包括分子结构、知识库中的结构化知识和生物医学文献中的非结构化知识。现有的人工智能药物发现多模态方法独立整合了结构化知识或非结构化知识,影响了对生物分子的整体理解。此外,它们也无法解决缺失模态问题,即新型药物和蛋白质的多模态信息缺失。方法在这项工作中,我们提出了 KEDD--一个统一的端到端深度学习框架,它能将结构化和非结构化知识联合起来,用于庞大的人工智能药物发现任务。该框架首先结合独立的表征学习模型,从每种模式中提取基本特征。然后,它应用特征融合技术来计算预测结果。为了缓解缺失模态问题,我们利用稀疏注意力和模态掩蔽技术,根据顶级相关分子重建缺失特征。结果受益于结构化和非结构化知识,我们的框架加深了对生物分子的理解。在药物-靶点相互作用预测、药物性质预测、药物-药物相互作用预测和蛋白质-蛋白质相互作用预测方面,KEDD的表现分别比最先进的模型平均高出5.2%、2.6%、1.2%和4.1%。通过定性分析,我们揭示了 KEDD 在协助实际应用方面的巨大潜力。结论:通过结合多模态知识中的生物分子专业知识,KEDD有望加速药物发现。
{"title":"Toward Unified AI Drug Discovery with Multimodal Knowledge.","authors":"Yizhen Luo, Xing Yi Liu, Kai Yang, Kui Huang, Massimo Hong, Jiahuan Zhang, Yushuai Wu, Zaiqing Nie","doi":"10.34133/hds.0113","DOIUrl":"10.34133/hds.0113","url":null,"abstract":"<p><p><b>Background:</b> In real-world drug discovery, human experts typically grasp molecular knowledge of drugs and proteins from multimodal sources including molecular structures, structured knowledge from knowledge bases, and unstructured knowledge from biomedical literature. Existing multimodal approaches in AI drug discovery integrate either structured or unstructured knowledge independently, which compromises the holistic understanding of biomolecules. Besides, they fail to address the missing modality problem, where multimodal information is missing for novel drugs and proteins. <b>Methods:</b> In this work, we present KEDD, a unified, end-to-end deep learning framework that jointly incorporates both structured and unstructured knowledge for vast AI drug discovery tasks. The framework first incorporates independent representation learning models to extract the underlying characteristics from each modality. Then, it applies a feature fusion technique to calculate the prediction results. To mitigate the missing modality problem, we leverage sparse attention and a modality masking technique to reconstruct the missing features based on top relevant molecules. <b>Results:</b> Benefiting from structured and unstructured knowledge, our framework achieves a deeper understanding of biomolecules. KEDD outperforms state-of-the-art models by an average of 5.2% on drug-target interaction prediction, 2.6% on drug property prediction, 1.2% on drug-drug interaction prediction, and 4.1% on protein-protein interaction prediction. Through qualitative analysis, we reveal KEDD's promising potential in assisting real-world applications. <b>Conclusions:</b> By incorporating biomolecular expertise from multimodal knowledge, KEDD bears promise in accelerating drug discovery.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10886071/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140133417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification and analysis of sex-biased copy number alterations 性别差异拷贝数改变的鉴定和分析
Pub Date : 2024-02-21 DOI: 10.34133/hds.0121
Chenhao Zhang, Yang Yang, Qinghua Cui, Dongyu Zhao, Chunmei Cui
{"title":"Identification and analysis of sex-biased copy number alterations","authors":"Chenhao Zhang, Yang Yang, Qinghua Cui, Dongyu Zhao, Chunmei Cui","doi":"10.34133/hds.0121","DOIUrl":"https://doi.org/10.34133/hds.0121","url":null,"abstract":"","PeriodicalId":73207,"journal":{"name":"Health data science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140442547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large-scale machine learning analysis reveals DNA-methylation and gene-expression response signatures for gemcitabine-treated pancreatic cancer 大规模机器学习分析揭示了吉西他滨治疗胰腺癌的DNA甲基化和基因表达反应特征
Pub Date : 2023-12-12 DOI: 10.34133/hds.0108
Adeolu Z Ogunleye, Chayanit Piyawajanusorn, G. Ghislat, Pedro Ballester
{"title":"Large-scale machine learning analysis reveals DNA-methylation and gene-expression response signatures for gemcitabine-treated pancreatic cancer","authors":"Adeolu Z Ogunleye, Chayanit Piyawajanusorn, G. Ghislat, Pedro Ballester","doi":"10.34133/hds.0108","DOIUrl":"https://doi.org/10.34133/hds.0108","url":null,"abstract":"","PeriodicalId":73207,"journal":{"name":"Health data science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139007094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
See your stories: Visualisation for Narrative Medicine 看看你的故事叙事医学的可视化
Pub Date : 2023-12-04 DOI: 10.34133/hds.0103
Hua Ma, Xiaoru Yuan, Xu Sun, Glyn Lawson, Qingfeng Wang
{"title":"See your stories: Visualisation for Narrative Medicine","authors":"Hua Ma, Xiaoru Yuan, Xu Sun, Glyn Lawson, Qingfeng Wang","doi":"10.34133/hds.0103","DOIUrl":"https://doi.org/10.34133/hds.0103","url":null,"abstract":"","PeriodicalId":73207,"journal":{"name":"Health data science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138603135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Health data science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1