GRL-PUL:基于图表示学习和正向无标记学习预测微生物与药物的关联。

IF 3 4区 生物学 Q3 BIOCHEMISTRY & MOLECULAR BIOLOGY Molecular omics Pub Date : 2024-11-14 DOI:10.1039/d4mo00117f
Jinqing Liang, Yuping Sun, Jie Ling
{"title":"GRL-PUL:基于图表示学习和正向无标记学习预测微生物与药物的关联。","authors":"Jinqing Liang, Yuping Sun, Jie Ling","doi":"10.1039/d4mo00117f","DOIUrl":null,"url":null,"abstract":"<p><p>Extensive research has confirmed the widespread presence of microorganisms in the human body and their crucial impact on human health, with drugs being an effective method of regulation. Hence it is essential to identify potential microbe-drug associations (MDAs). Owing to the limitations of wet experiments, such as high costs and long durations, computational methods for binary classification tasks have become valuable alternatives for traditional experimental approaches. Since validated negative MDAs are absent in existing datasets, most methods randomly sample negatives from unlabeled data, which evidently leads to false negative issues. In this manuscript, we propose a novel model based on graph representation learning and positive-unlabeled learning (GRL-PUL), to infer potential MDAs. Firstly, we screen reliable negative samples by applying weighted matrix factorization and the PU-bagging strategy on the known microbe-drug bipartite network. Then, we combine muti-model attributes and constructed a microbe-drug heterogeneous network. After that, graph attention auto-encoder module, an encoder combining graph convolutional networks and graph attention networks, is introduced to extract informative embeddings based on the microbe-drug heterogeneous network. Lastly, we adopt a modified random forest as the final classifier. Comparison experiments with five baseline models on three benchmark datasets show that our model surpasses other methods in terms of the AUC, AUPR, ACC, F1-score and MCC. Moreover, several case studies show that GRL-PUL could capably predict latent MDAs. Notably, we further verify the effectiveness of a reliable negative sample selection module by migrating it to other state-of-the-art models, and the experimental results demonstrate its ability to substantially improve their prediction performance.</p>","PeriodicalId":19065,"journal":{"name":"Molecular omics","volume":" ","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GRL-PUL: predicting microbe-drug association based on graph representation learning and positive unlabeled learning.\",\"authors\":\"Jinqing Liang, Yuping Sun, Jie Ling\",\"doi\":\"10.1039/d4mo00117f\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Extensive research has confirmed the widespread presence of microorganisms in the human body and their crucial impact on human health, with drugs being an effective method of regulation. Hence it is essential to identify potential microbe-drug associations (MDAs). Owing to the limitations of wet experiments, such as high costs and long durations, computational methods for binary classification tasks have become valuable alternatives for traditional experimental approaches. Since validated negative MDAs are absent in existing datasets, most methods randomly sample negatives from unlabeled data, which evidently leads to false negative issues. In this manuscript, we propose a novel model based on graph representation learning and positive-unlabeled learning (GRL-PUL), to infer potential MDAs. Firstly, we screen reliable negative samples by applying weighted matrix factorization and the PU-bagging strategy on the known microbe-drug bipartite network. Then, we combine muti-model attributes and constructed a microbe-drug heterogeneous network. After that, graph attention auto-encoder module, an encoder combining graph convolutional networks and graph attention networks, is introduced to extract informative embeddings based on the microbe-drug heterogeneous network. Lastly, we adopt a modified random forest as the final classifier. Comparison experiments with five baseline models on three benchmark datasets show that our model surpasses other methods in terms of the AUC, AUPR, ACC, F1-score and MCC. Moreover, several case studies show that GRL-PUL could capably predict latent MDAs. Notably, we further verify the effectiveness of a reliable negative sample selection module by migrating it to other state-of-the-art models, and the experimental results demonstrate its ability to substantially improve their prediction performance.</p>\",\"PeriodicalId\":19065,\"journal\":{\"name\":\"Molecular omics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular omics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1039/d4mo00117f\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular omics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1039/d4mo00117f","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

大量研究证实,微生物在人体内广泛存在,并对人类健康产生重要影响,而药物则是一种有效的调节方法。因此,确定潜在的微生物-药物关联(MDA)至关重要。由于湿法实验存在成本高、时间长等局限性,用于二元分类任务的计算方法已成为传统实验方法的重要替代方法。由于现有数据集中没有经过验证的阴性 MDA,大多数方法都是从无标记数据中随机抽取阴性样本,这显然会导致假阴性问题。在本手稿中,我们提出了一种基于图表示学习和正向无标记学习(GRL-PUL)的新型模型,用于推断潜在的 MDAs。首先,我们通过在已知的微生物-药物双方格网络上应用加权矩阵因式分解和 PU-bagging策略来筛选可靠的阴性样本。然后,结合多模型属性,构建微生物-药物异构网络。之后,我们引入图注意自动编码器模块,这是一种结合了图卷积网络和图注意网络的编码器,可基于微生物-药物异构网络提取信息嵌入。最后,我们采用改进的随机森林作为最终分类器。在三个基准数据集上与五个基线模型的对比实验表明,我们的模型在AUC、AUPR、ACC、F1-score和MCC方面都超过了其他方法。此外,一些案例研究表明,GRL-PUL 可以预测潜在的 MDA。值得注意的是,我们通过将可靠的负样本选择模块移植到其他最先进的模型中,进一步验证了该模块的有效性,实验结果表明该模块能够大幅提高这些模型的预测性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
GRL-PUL: predicting microbe-drug association based on graph representation learning and positive unlabeled learning.

Extensive research has confirmed the widespread presence of microorganisms in the human body and their crucial impact on human health, with drugs being an effective method of regulation. Hence it is essential to identify potential microbe-drug associations (MDAs). Owing to the limitations of wet experiments, such as high costs and long durations, computational methods for binary classification tasks have become valuable alternatives for traditional experimental approaches. Since validated negative MDAs are absent in existing datasets, most methods randomly sample negatives from unlabeled data, which evidently leads to false negative issues. In this manuscript, we propose a novel model based on graph representation learning and positive-unlabeled learning (GRL-PUL), to infer potential MDAs. Firstly, we screen reliable negative samples by applying weighted matrix factorization and the PU-bagging strategy on the known microbe-drug bipartite network. Then, we combine muti-model attributes and constructed a microbe-drug heterogeneous network. After that, graph attention auto-encoder module, an encoder combining graph convolutional networks and graph attention networks, is introduced to extract informative embeddings based on the microbe-drug heterogeneous network. Lastly, we adopt a modified random forest as the final classifier. Comparison experiments with five baseline models on three benchmark datasets show that our model surpasses other methods in terms of the AUC, AUPR, ACC, F1-score and MCC. Moreover, several case studies show that GRL-PUL could capably predict latent MDAs. Notably, we further verify the effectiveness of a reliable negative sample selection module by migrating it to other state-of-the-art models, and the experimental results demonstrate its ability to substantially improve their prediction performance.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Molecular omics
Molecular omics Biochemistry, Genetics and Molecular Biology-Biochemistry
CiteScore
5.40
自引率
3.40%
发文量
91
期刊介绍: Molecular Omics publishes high-quality research from across the -omics sciences. Topics include, but are not limited to: -omics studies to gain mechanistic insight into biological processes – for example, determining the mode of action of a drug or the basis of a particular phenotype, such as drought tolerance -omics studies for clinical applications with validation, such as finding biomarkers for diagnostics or potential new drug targets -omics studies looking at the sub-cellular make-up of cells – for example, the subcellular localisation of certain proteins or post-translational modifications or new imaging techniques -studies presenting new methods and tools to support omics studies, including new spectroscopic/chromatographic techniques, chip-based/array technologies and new classification/data analysis techniques. New methods should be proven and demonstrate an advance in the field. Molecular Omics only accepts articles of high importance and interest that provide significant new insight into important chemical or biological problems. This could be fundamental research that significantly increases understanding or research that demonstrates clear functional benefits. Papers reporting new results that could be routinely predicted, do not show a significant improvement over known research, or are of interest only to the specialist in the area are not suitable for publication in Molecular Omics.
期刊最新文献
GRL-PUL: predicting microbe-drug association based on graph representation learning and positive unlabeled learning. MobiChIP: a compatible library construction method of single-cell ChIP-seq based droplets. Sustained hypoxia but not intermittent hypoxia induces HIF-1α transcriptional response in human aortic endothelial cells. Influence of sex, age, ethnicity/race, and body mass index on the cerumen volatilome using two data analysis approaches: binary and semiquantitative. Investigation of the motif activity of transcription regulators in pancreatic β-like cell subpopulations differentiated from human induced pluripotent stem cells.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1