首页 > 最新文献

IEEE/ACM Transactions on Computational Biology and Bioinformatics最新文献

英文 中文
Constrained Pseudo-time Ordering for Clinical Transcriptomics Data. 临床转录组学数据的受限伪时间排序
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-08-13 DOI: 10.1109/TCBB.2024.3442669
Sachin Mathur, Hamid Mattoo, Ziv Bar-Joseph

Time series RNASeq studies can enable understanding of the dynamics of disease progression and treatment response in patients. They also provide information on biomarkers, activated and repressed pathways, and more. While useful, data from multiple patients is challenging to integrate due to the heterogeneity in treatment response among patients, and the small number of timepoints that are usually profiled. Due to the heterogeneity among patients, relying on the sampled time points to integrate data across individuals is challenging and does not lead to correct reconstruction of the response patterns. To address these challenges, we developed a new constrained based pseudotime ordering method for analyzing transcriptomics data in clinical and response studies. Our method allows the assignment of samples to their correct placement on the response curve while respecting the individual patient order. We use polynomials to represent gene expression over the duration of the study and an EM algorithm to determine parameters and locations. Application to three treatment response datasets shows that our method improves on prior methods and leads to accurate orderings that provide new biological insight on the disease and response. Code for the method is available at https://github.com/Sanofi-Public/ RDCS-bulkRNASeq-pseudo ordering.

时间序列 RNASeq 研究有助于了解患者的疾病进展动态和治疗反应。它们还能提供生物标记物、激活和抑制通路等方面的信息。来自多个患者的数据虽然有用,但由于患者之间治疗反应的异质性以及通常分析的时间点数量较少,整合这些数据具有挑战性。由于患者之间存在异质性,依靠采样时间点来整合不同个体的数据具有挑战性,而且无法正确重建反应模式。为了应对这些挑战,我们开发了一种新的基于约束的伪时间排序方法,用于分析临床和反应研究中的转录组学数据。我们的方法允许将样本分配到反应曲线上的正确位置,同时尊重患者的个体排序。我们使用多项式来表示研究期间的基因表达,并使用 EM 算法来确定参数和位置。对三个治疗反应数据集的应用表明,我们的方法改进了之前的方法,并能准确排序,为疾病和反应提供新的生物学见解。该方法的代码见 https://github.com/Sanofi-Public/ RDCS-bulkRNASeq-pseudo ordering。
{"title":"Constrained Pseudo-time Ordering for Clinical Transcriptomics Data.","authors":"Sachin Mathur, Hamid Mattoo, Ziv Bar-Joseph","doi":"10.1109/TCBB.2024.3442669","DOIUrl":"10.1109/TCBB.2024.3442669","url":null,"abstract":"<p><p>Time series RNASeq studies can enable understanding of the dynamics of disease progression and treatment response in patients. They also provide information on biomarkers, activated and repressed pathways, and more. While useful, data from multiple patients is challenging to integrate due to the heterogeneity in treatment response among patients, and the small number of timepoints that are usually profiled. Due to the heterogeneity among patients, relying on the sampled time points to integrate data across individuals is challenging and does not lead to correct reconstruction of the response patterns. To address these challenges, we developed a new constrained based pseudotime ordering method for analyzing transcriptomics data in clinical and response studies. Our method allows the assignment of samples to their correct placement on the response curve while respecting the individual patient order. We use polynomials to represent gene expression over the duration of the study and an EM algorithm to determine parameters and locations. Application to three treatment response datasets shows that our method improves on prior methods and leads to accurate orderings that provide new biological insight on the disease and response. Code for the method is available at https://github.com/Sanofi-Public/ RDCS-bulkRNASeq-pseudo ordering.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141975612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AGML: Adaptive Graph-based Multi-label Learning for Prediction of RBP and AS Event Associations During EMT. AGML:基于图形的自适应多标签学习,用于预测 EMT 期间的 RBP 和 AS 事件关联。
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-08-12 DOI: 10.1109/TCBB.2024.3440913
Yushan Qiu, Wensheng Chen, Wai-Ki Ching, Hongmin Cai, Hao Jiang, Quan Zou

Increasing evidence has indicated that RNA-binding proteins (RBPs) play an essential role in mediating alternative splicing (AS) events during epithelial-mesenchymal transition (EMT). However, due to the substantial cost and complexity of biological experiments, how AS events are regulated and influenced remains largely unknown. Thus, it is important to construct effective models for inferring hidden RBP-AS event associations during EMT process. In this paper, a novel and efficient model was developed to identify AS event-related candidate RBPs based on Adaptive Graph-based Multi-Label learning (AGML). In particular, we propose to adaptively learn a new affinity graph to capture the intrinsic structure of data for both RBPs and AS events. Multi-view similarity matrices are employed for maintaining the intrinsic structure and guiding the adaptive graph learning. We then simultaneously update the RBP and AS event associations that are predicted from both spaces by applying multi-label learning. The experimental results have shown that our AGML achieved AUC values of 0.9521 and 0.9873 by 5-fold and leave-one-out cross-validations, respectively, indicating the superiority and effectiveness of our proposed model. Furthermore, AGML can serve as an efficient and reliable tool for uncovering novel AS events-associated RBPs and is applicable for predicting the associations between other biological entities. The source code of AGML is available at https://github.com/yushanqiu/AGML.

越来越多的证据表明,在上皮-间质转化(EMT)过程中,RNA 结合蛋白(RBPs)在介导替代剪接(AS)事件中起着至关重要的作用。然而,由于生物实验成本高昂且十分复杂,AS 事件如何受到调控和影响在很大程度上仍是未知数。因此,构建有效的模型来推断 EMT 过程中隐藏的 RBP-AS 事件关联非常重要。本文基于基于自适应图的多标签学习(AGML),开发了一种新颖高效的模型来识别AS事件相关的候选RBP。特别是,我们建议自适应学习一种新的亲和图,以捕捉 RBPs 和 AS 事件的数据内在结构。多视图相似性矩阵用于保持内在结构和指导自适应图学习。然后,我们通过应用多标签学习,同时更新从两个空间预测出的 RBP 和 AS 事件关联。实验结果表明,通过五倍交叉验证和留一交叉验证,我们的 AGML 的 AUC 值分别达到了 0.9521 和 0.9873,这表明我们提出的模型是优越和有效的。此外,AGML可以作为发现新型AS事件相关RBPs的高效可靠工具,并适用于预测其他生物实体之间的关联。AGML 的源代码见 https://github.com/yushanqiu/AGML。
{"title":"AGML: Adaptive Graph-based Multi-label Learning for Prediction of RBP and AS Event Associations During EMT.","authors":"Yushan Qiu, Wensheng Chen, Wai-Ki Ching, Hongmin Cai, Hao Jiang, Quan Zou","doi":"10.1109/TCBB.2024.3440913","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3440913","url":null,"abstract":"<p><p>Increasing evidence has indicated that RNA-binding proteins (RBPs) play an essential role in mediating alternative splicing (AS) events during epithelial-mesenchymal transition (EMT). However, due to the substantial cost and complexity of biological experiments, how AS events are regulated and influenced remains largely unknown. Thus, it is important to construct effective models for inferring hidden RBP-AS event associations during EMT process. In this paper, a novel and efficient model was developed to identify AS event-related candidate RBPs based on Adaptive Graph-based Multi-Label learning (AGML). In particular, we propose to adaptively learn a new affinity graph to capture the intrinsic structure of data for both RBPs and AS events. Multi-view similarity matrices are employed for maintaining the intrinsic structure and guiding the adaptive graph learning. We then simultaneously update the RBP and AS event associations that are predicted from both spaces by applying multi-label learning. The experimental results have shown that our AGML achieved AUC values of 0.9521 and 0.9873 by 5-fold and leave-one-out cross-validations, respectively, indicating the superiority and effectiveness of our proposed model. Furthermore, AGML can serve as an efficient and reliable tool for uncovering novel AS events-associated RBPs and is applicable for predicting the associations between other biological entities. The source code of AGML is available at https://github.com/yushanqiu/AGML.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141971037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial Deep Learning-Empowered Big Data Analytics in Biomedical Applications and Digital Healthcare 编辑本段 深度学习驱动的生物医学应用和数字医疗大数据分析
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-08-08 DOI: 10.1109/TCBB.2024.3371808
Xiaokang Zhou;Carson K. Leung;Kevin I-Kai Wang;Giancarlo Fortino
Deep learning and big data analysis are among the most important research topics in the fields of biomedical applications and digital healthcare. With the fast development of artificial intelligence (AI) and Internets of Things (IoT) technologies, deep learning (DL) for big data analytics—including affective learning, reinforcement learning, and transfer learning—are widely applied to sense, learn, and interact with human health. Examples of biomedical applications include smart biomaterials, biomedical imaging, heartbeat/blood pressure measurement, and eye tracking. These biomedical applications collect healthcare data through remote sensors and transfer the data to a centralized system for analysis. With an enormous amount of historical data, DL and big data analysis technologies are able to identify potential linkage between features and possible risks, raise important decision for medical diagnosis, and provide precious advice for better healthcare treatment and lifestyle. Although significant progress has been made with AI, DL, and big data analytic technologies for medical and healthcare research, there remain gaps between the computer-aided treatment design and real-world healthcare demands. In addition, there are unexplored areas in the fields of healthcare and biomedical applications with cutting-edge AI and DL technologies. Hence, exploring the possibility of DL and big data analytics in the fields of biomedical applications and digital healthcare is in high demand.
深度学习和大数据分析是生物医学应用和数字医疗领域最重要的研究课题之一。随着人工智能(AI)和物联网(IoT)技术的快速发展,用于大数据分析的深度学习(DL)--包括情感学习、强化学习和迁移学习--被广泛应用于人类健康的感知、学习和交互。生物医学应用的例子包括智能生物材料、生物医学成像、心跳/血压测量和眼球跟踪。这些生物医学应用通过远程传感器收集医疗保健数据,并将数据传输到中央系统进行分析。面对海量的历史数据,DL 和大数据分析技术能够识别特征与可能风险之间的潜在联系,提出重要的医疗诊断决策,并为更好的医疗治疗和生活方式提供宝贵建议。尽管人工智能、数字图书馆和大数据分析技术在医疗保健研究方面取得了重大进展,但计算机辅助治疗设计与现实世界的医疗保健需求之间仍存在差距。此外,在医疗保健和生物医学应用领域,前沿的人工智能和 DL 技术还有一些尚未开发的领域。因此,探索 DL 和大数据分析在生物医学应用和数字医疗领域的可能性是非常有必要的。
{"title":"Editorial Deep Learning-Empowered Big Data Analytics in Biomedical Applications and Digital Healthcare","authors":"Xiaokang Zhou;Carson K. Leung;Kevin I-Kai Wang;Giancarlo Fortino","doi":"10.1109/TCBB.2024.3371808","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3371808","url":null,"abstract":"Deep learning and big data analysis are among the most important research topics in the fields of biomedical applications and digital healthcare. With the fast development of artificial intelligence (AI) and Internets of Things (IoT) technologies, deep learning (DL) for big data analytics—including affective learning, reinforcement learning, and transfer learning—are widely applied to sense, learn, and interact with human health. Examples of biomedical applications include smart biomaterials, biomedical imaging, heartbeat/blood pressure measurement, and eye tracking. These biomedical applications collect healthcare data through remote sensors and transfer the data to a centralized system for analysis. With an enormous amount of historical data, DL and big data analysis technologies are able to identify potential linkage between features and possible risks, raise important decision for medical diagnosis, and provide precious advice for better healthcare treatment and lifestyle. Although significant progress has been made with AI, DL, and big data analytic technologies for medical and healthcare research, there remain gaps between the computer-aided treatment design and real-world healthcare demands. In addition, there are unexplored areas in the fields of healthcare and biomedical applications with cutting-edge AI and DL technologies. Hence, exploring the possibility of DL and big data analytics in the fields of biomedical applications and digital healthcare is in high demand.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10631783","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141965894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DMAMP: A deep-learning model for detecting antimicrobial peptides and their multi-activities. DMAMP:用于检测抗菌肽及其多重活性的深度学习模型。
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-08-06 DOI: 10.1109/TCBB.2024.3439541
Qiaozhen Meng, Genlang Chen, Shixin Zheng, Yulai Lin, Bin Liu, Jijun Tang, Fei Guo

Due to the broad-spectrum and high-efficiency antibacterial activity, antimicrobial peptides (AMPs) and their functions have been studied in the field of drug discovery. Using biological experiments to detect the AMPs and corresponding activities require a high cost, whereas computational technologies do so for much less. Currently, most computational methods solve the identification of AMPs and their activities as two independent tasks, which ignore the relationship between them. Therefore, the combination and sharing of patterns for two tasks is a crucial problem that needs to be addressed. In this study, we propose a deep learning model, called DMAMP, for detecting AMPs and activities simultaneously, which is benefited from multi-task learning. The first stage is to utilize convolutional neural network models and residual blocks to extract the sharing hidden features from two related tasks. The next stage is to use two fully connected layers to learn the distinct information of two tasks. Meanwhile, the original evolutionary features from the peptide sequence are also fed to the predictor of the second task to complement the forgotten information. The experiments on the independent test dataset demonstrate that our method performs better than the single-task model with 4.28% of Matthews Correlation Coefficient (MCC) on the first task, and achieves 0.2627 of an average MCC which is higher than the single-task model and two existing methods for five activities on the second task. To understand whether features derived from the convolutional layers of models capture the differences between target classes, we visualize these high-dimensional features by projecting into 3D space. In addition, we show that our predictor has the ability to identify peptides that achieve activity against Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2). We hope that our proposed method can give new insights into the discovery of novel antiviral peptide drugs.

由于具有广谱高效的抗菌活性,抗菌肽(AMPs)及其功能已被用于药物发现领域的研究。利用生物学实验检测 AMPs 及其相应活性需要高昂的成本,而利用计算技术则只需较低的成本。目前,大多数计算方法将 AMPs 及其活性的识别作为两个独立的任务来解决,忽略了它们之间的关系。因此,两个任务的模式组合与共享是一个亟待解决的关键问题。在本研究中,我们提出了一种名为 DMAMP 的深度学习模型,用于同时检测 AMPs 和活动,该模型得益于多任务学习。第一阶段是利用卷积神经网络模型和残差块从两个相关任务中提取共享隐藏特征。下一阶段是利用两个全连接层来学习两个任务的不同信息。同时,肽序列中的原始进化特征也会被输入到第二个任务的预测器中,以补充被遗忘的信息。在独立测试数据集上的实验表明,我们的方法在第一项任务上的马修斯相关系数(MCC)为 4.28%,优于单任务模型;在第二项任务的五项活动中,平均马修斯相关系数为 0.2627,高于单任务模型和两种现有方法。为了了解从卷积层模型中得出的特征是否捕捉到了目标类别之间的差异,我们将这些高维特征投影到三维空间,使其可视化。此外,我们还展示了我们的预测器能够识别出具有抗严重急性呼吸系统综合症冠状病毒-2(SARS-CoV-2)活性的多肽。我们希望我们提出的方法能为新型抗病毒多肽药物的发现提供新的见解。
{"title":"DMAMP: A deep-learning model for detecting antimicrobial peptides and their multi-activities.","authors":"Qiaozhen Meng, Genlang Chen, Shixin Zheng, Yulai Lin, Bin Liu, Jijun Tang, Fei Guo","doi":"10.1109/TCBB.2024.3439541","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3439541","url":null,"abstract":"<p><p>Due to the broad-spectrum and high-efficiency antibacterial activity, antimicrobial peptides (AMPs) and their functions have been studied in the field of drug discovery. Using biological experiments to detect the AMPs and corresponding activities require a high cost, whereas computational technologies do so for much less. Currently, most computational methods solve the identification of AMPs and their activities as two independent tasks, which ignore the relationship between them. Therefore, the combination and sharing of patterns for two tasks is a crucial problem that needs to be addressed. In this study, we propose a deep learning model, called DMAMP, for detecting AMPs and activities simultaneously, which is benefited from multi-task learning. The first stage is to utilize convolutional neural network models and residual blocks to extract the sharing hidden features from two related tasks. The next stage is to use two fully connected layers to learn the distinct information of two tasks. Meanwhile, the original evolutionary features from the peptide sequence are also fed to the predictor of the second task to complement the forgotten information. The experiments on the independent test dataset demonstrate that our method performs better than the single-task model with 4.28% of Matthews Correlation Coefficient (MCC) on the first task, and achieves 0.2627 of an average MCC which is higher than the single-task model and two existing methods for five activities on the second task. To understand whether features derived from the convolutional layers of models capture the differences between target classes, we visualize these high-dimensional features by projecting into 3D space. In addition, we show that our predictor has the ability to identify peptides that achieve activity against Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2). We hope that our proposed method can give new insights into the discovery of novel antiviral peptide drugs.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141897345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hyb_SEnc: An Antituberculosis Peptide Predictor Based on a Hybrid Feature Vector and Stacked Ensemble Learning. Hyb_SEnc:基于混合特征向量和堆叠集合学习的抗结核肽预测器
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-31 DOI: 10.1109/TCBB.2024.3425644
Xiuhao Fu, Hao Duan, Xiaofeng Zang, Chunling Liu, Xingfeng Li, Qingchen Zhang, Zilong Zhang, Quan Zou, Feifei Cui

Tuberculosis has plagued mankind since ancient times, and the struggle between humans and tuberculosis continues. Mycobacterium tuberculosis is the leading cause of tuberculosis, infecting nearly one-third of the world's population. The rise of peptide drugs has created a new direction in the treatment of tuberculosis. Therefore, for the treatment of tuberculosis, the prediction of anti-tuberculosis peptides is crucial.This paper proposes an anti-tuberculosis peptide prediction method based on hybrid features and stacked ensemble learning. First, a random forest (RF) and extremely randomized tree (ERT) are selected as first-level learning of stacked ensembles. Then, the five best-performing feature encoding methods are selected to obtain the hybrid feature vector, and then the decision tree and recursive feature elimination (DT-RFE) are used to refine the hybrid feature vector. After selection, the optimal feature subset is used as the input of the stacked ensemble model. At the same time, logistic regression (LR) is used as a stacked ensemble secondary learner to build the final stacked ensemble model Hyb_SEnc. The prediction accuracy of Hyb_SEnc achieved 94.68% and 95.74% on the independent test sets of AntiTb_MD and AntiTb_RD, respectively. In addition, we provide a user-friendly Web server (http://www.bioailab. com/Hyb_SEnc). The source code is freely available at https://github.com/fxh1001/Hyb_SEnc.

结核病自古以来就困扰着人类,人类与结核病的斗争仍在继续。结核分枝杆菌是结核病的主要病因,感染了全球近三分之一的人口。多肽药物的兴起为结核病的治疗开辟了新的方向。因此,对于结核病的治疗,抗结核肽的预测至关重要。本文提出了一种基于混合特征和堆叠集合学习的抗结核肽预测方法。首先,选择随机森林(RF)和极随机树(ERT)作为堆叠集合的一级学习。然后,选择五种表现最好的特征编码方法来获得混合特征向量,再用决策树和递归特征消除(DT-RFE)来完善混合特征向量。经过选择后,最优特征子集被用作堆叠集合模型的输入。同时,使用逻辑回归(LR)作为堆叠集合二级学习器,建立最终的堆叠集合模型 Hyb_SEnc。在 AntiTb_MD 和 AntiTb_RD 的独立测试集上,Hyb_SEnc 的预测准确率分别达到 94.68% 和 95.74%。此外,我们还提供了一个用户友好型网络服务器(http://www.bioailab. com/Hyb_SEnc)。源代码可在 https://github.com/fxh1001/Hyb_SEnc 免费获取。
{"title":"Hyb_SEnc: An Antituberculosis Peptide Predictor Based on a Hybrid Feature Vector and Stacked Ensemble Learning.","authors":"Xiuhao Fu, Hao Duan, Xiaofeng Zang, Chunling Liu, Xingfeng Li, Qingchen Zhang, Zilong Zhang, Quan Zou, Feifei Cui","doi":"10.1109/TCBB.2024.3425644","DOIUrl":"10.1109/TCBB.2024.3425644","url":null,"abstract":"<p><p>Tuberculosis has plagued mankind since ancient times, and the struggle between humans and tuberculosis continues. Mycobacterium tuberculosis is the leading cause of tuberculosis, infecting nearly one-third of the world's population. The rise of peptide drugs has created a new direction in the treatment of tuberculosis. Therefore, for the treatment of tuberculosis, the prediction of anti-tuberculosis peptides is crucial.This paper proposes an anti-tuberculosis peptide prediction method based on hybrid features and stacked ensemble learning. First, a random forest (RF) and extremely randomized tree (ERT) are selected as first-level learning of stacked ensembles. Then, the five best-performing feature encoding methods are selected to obtain the hybrid feature vector, and then the decision tree and recursive feature elimination (DT-RFE) are used to refine the hybrid feature vector. After selection, the optimal feature subset is used as the input of the stacked ensemble model. At the same time, logistic regression (LR) is used as a stacked ensemble secondary learner to build the final stacked ensemble model Hyb_SEnc. The prediction accuracy of Hyb_SEnc achieved 94.68% and 95.74% on the independent test sets of AntiTb_MD and AntiTb_RD, respectively. In addition, we provide a user-friendly Web server (http://www.bioailab. com/Hyb_SEnc). The source code is freely available at https://github.com/fxh1001/Hyb_SEnc.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141859585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KGRLFF: Detecting Drug-Drug Interactions Based on Knowledge Graph Representation Learning and Feature Fusion. KGRLFF:基于知识图谱表示学习和特征融合的药物相互作用检测。
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-29 DOI: 10.1109/TCBB.2024.3434992
Xiaoli Lin, Zhuang Yin, Xiaolong Zhang, Jing Hu

Accurate prediction of drug-drug interactions (DDIs) plays an important role in improving the efficiency of drug development and ensuring the safety of combination therapy. Most existing models rely on a single source of information to predict DDIs, and few models can perform tasks on biomedical knowledge graphs. This paper proposes a new hybrid method, namely Knowledge Graph Representation Learning and Feature Fusion (KGRLFF), to fully exploit the information from the biomedical knowledge graph and molecular structure of drugs to better predict DDIs. KGRLFF first uses a Bidirectional Random Walk sampling method based on the PageRank algorithm (BRWP) to obtain higher-order neighborhood information of drugs in the knowledge graph, including neighboring nodes, semantic relations, and higher-order information associated with triple facts. Then, an embedded representation learning model named Knowledge Graph-based Cyclic Recursive Aggregation (KGCRA) is used to learn the embedded representations of drugs by recursively propagating and aggregating messages with drugs as both the source and destination. In addition, the model learns the molecular structures of the drugs to obtain the structured features. Finally, a Feature Representation Fusion Strategy (FRFS) was developed to integrate embedded representations and structured feature representations. Experimental results showed that KGRLFF is feasible for predicting potential DDIs.

准确预测药物间相互作用(DDIs)对于提高药物开发效率和确保联合疗法的安全性具有重要作用。现有模型大多依赖单一信息源预测 DDIs,很少有模型能在生物医学知识图谱上执行任务。本文提出了一种新的混合方法,即知识图谱表征学习与特征融合(KGRLFF),以充分利用生物医学知识图谱和药物分子结构的信息,更好地预测DDIs。KGRLFF首先使用基于PageRank算法(BRWP)的双向随机游走采样方法获取知识图谱中药物的高阶邻域信息,包括邻近节点、语义关系以及与三重事实相关的高阶信息。然后,一个名为 "基于知识图谱的循环递归聚合(KGCRA)"的嵌入式表征学习模型通过递归传播和聚合以药物为源和目的的信息来学习药物的嵌入式表征。此外,该模型还能学习药物的分子结构,从而获得结构化特征。最后,开发了一种特征表征融合策略(FRFS)来整合嵌入式表征和结构化特征表征。实验结果表明,KGRLFF 对预测潜在的 DDIs 是可行的。
{"title":"KGRLFF: Detecting Drug-Drug Interactions Based on Knowledge Graph Representation Learning and Feature Fusion.","authors":"Xiaoli Lin, Zhuang Yin, Xiaolong Zhang, Jing Hu","doi":"10.1109/TCBB.2024.3434992","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3434992","url":null,"abstract":"<p><p>Accurate prediction of drug-drug interactions (DDIs) plays an important role in improving the efficiency of drug development and ensuring the safety of combination therapy. Most existing models rely on a single source of information to predict DDIs, and few models can perform tasks on biomedical knowledge graphs. This paper proposes a new hybrid method, namely Knowledge Graph Representation Learning and Feature Fusion (KGRLFF), to fully exploit the information from the biomedical knowledge graph and molecular structure of drugs to better predict DDIs. KGRLFF first uses a Bidirectional Random Walk sampling method based on the PageRank algorithm (BRWP) to obtain higher-order neighborhood information of drugs in the knowledge graph, including neighboring nodes, semantic relations, and higher-order information associated with triple facts. Then, an embedded representation learning model named Knowledge Graph-based Cyclic Recursive Aggregation (KGCRA) is used to learn the embedded representations of drugs by recursively propagating and aggregating messages with drugs as both the source and destination. In addition, the model learns the molecular structures of the drugs to obtain the structured features. Finally, a Feature Representation Fusion Strategy (FRFS) was developed to integrate embedded representations and structured feature representations. Experimental results showed that KGRLFF is feasible for predicting potential DDIs.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141792317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HGLA: Biomolecular Interaction Prediction based on Mixed High-Order Graph Convolution with Filter Network via LSTM and Channel Attention. HGLA:通过 LSTM 和通道注意,基于混合高阶图卷积与滤波网络的生物分子相互作用预测。
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-26 DOI: 10.1109/TCBB.2024.3434399
Zhen Zhang, Zhaohong Deng, Ruibo Li, Wei Zhang, Qiongdan Lou, Kup-Sze Choi, Shitong Wang

Predicting biomolecular interactions is significant for understanding biological systems. Most existing methods for link prediction are based on graph convolution. Although graph convolution methods are advantageous in extracting structure information of biomolecular interactions, two key challenges still remain. One is how to consider both the immediate and highorder neighbors. Another is how to reduce noise when aggregating high-order neighbors. To address these challenges, we propose a novel method, called mixed high-order graph convolution with filter network via LSTM and channel attention (HGLA), to predict biomolecular interactions. Firstly, the basic and high-order features are extracted respectively through the traditional graph convolutional network (GCN) and the two-layer Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing (MixHop). Secondly, these features are mixed and input into the filter network composed of LayerNorm, SENet and LSTM to generate filtered features, which are concatenated and used for link prediction. The advantages of HGLA are: 1) HGLA processes high-order features separately, rather than simply concatenating them; 2) HGLA better balances the basic features and high-order features; 3) HGLA effectively filters the noise from high-order neighbors. It outperforms state-ofthe-art networks on four benchmark datasets. The codes are available at https://github.com/zznb123/HGLA.

预测生物分子相互作用对于了解生物系统意义重大。现有的链接预测方法大多基于图卷积。虽然图卷积方法在提取生物分子相互作用的结构信息方面具有优势,但仍存在两个关键挑战。一个是如何同时考虑近邻和高阶相邻。另一个挑战是如何在聚合高阶邻域时减少噪音。为了解决这些难题,我们提出了一种新方法,即通过 LSTM 和通道注意(channel attention,HGLA)与滤波网络的混合高阶图卷积(mixed high-order graph convolution with filter network)来预测生物分子相互作用。首先,通过传统的图卷积网络(GCN)和双层高阶图卷积架构(Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing,MixHop)分别提取基本特征和高阶特征。其次,将这些特征混合后输入由 LayerNorm、SENet 和 LSTM 组成的滤波网络,生成滤波后的特征,并将其连接起来用于链接预测。HGLA 的优势在于1) HGLA 单独处理高阶特征,而不是简单地将它们串联起来;2) HGLA 更好地平衡了基本特征和高阶特征;3) HGLA 有效地过滤了来自高阶邻域的噪声。在四个基准数据集上,它的表现优于最先进的网络。代码见 https://github.com/zznb123/HGLA。
{"title":"HGLA: Biomolecular Interaction Prediction based on Mixed High-Order Graph Convolution with Filter Network via LSTM and Channel Attention.","authors":"Zhen Zhang, Zhaohong Deng, Ruibo Li, Wei Zhang, Qiongdan Lou, Kup-Sze Choi, Shitong Wang","doi":"10.1109/TCBB.2024.3434399","DOIUrl":"10.1109/TCBB.2024.3434399","url":null,"abstract":"<p><p>Predicting biomolecular interactions is significant for understanding biological systems. Most existing methods for link prediction are based on graph convolution. Although graph convolution methods are advantageous in extracting structure information of biomolecular interactions, two key challenges still remain. One is how to consider both the immediate and highorder neighbors. Another is how to reduce noise when aggregating high-order neighbors. To address these challenges, we propose a novel method, called mixed high-order graph convolution with filter network via LSTM and channel attention (HGLA), to predict biomolecular interactions. Firstly, the basic and high-order features are extracted respectively through the traditional graph convolutional network (GCN) and the two-layer Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing (MixHop). Secondly, these features are mixed and input into the filter network composed of LayerNorm, SENet and LSTM to generate filtered features, which are concatenated and used for link prediction. The advantages of HGLA are: 1) HGLA processes high-order features separately, rather than simply concatenating them; 2) HGLA better balances the basic features and high-order features; 3) HGLA effectively filters the noise from high-order neighbors. It outperforms state-ofthe-art networks on four benchmark datasets. The codes are available at https://github.com/zznb123/HGLA.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141765973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning-assisted high-throughput screening for Anti-MRSA compounds. 机器学习辅助高通量筛选抗 MRSA 化合物。
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-26 DOI: 10.1109/TCBB.2024.3434340
Fadi Shehadeh, LewisOscar Felix, Markos Kalligeros, Adnan Shehadeh, Beth Burgwyn Fuchs, Frederick M Ausubel, Paul P Sotiriadis, Eleftherios Mylonakis

Background: Antimicrobial resistance is a major public health threat, and new agents are needed. Computational approaches have been proposed to reduce the cost and time needed for compound screening.

Aims: A machine learning (ML) model was developed for the in silico screening of low molecular weight molecules.

Methods: We used the results of a high-throughput Caenorhabditis elegans methicillin-resistant Staphylococcus aureus (MRSA) liquid infection assay to develop ML models for compound prioritization and quality control.

Results: The compound prioritization model achieved an AUC of 0.795 with a sensitivity of 81% and a specificity of 70%. When applied to a validation set of 22,768 compounds, the model identified 81% of the active compounds identified by high-throughput screening (HTS) among only 30.6% of the total 22,768 compounds, resulting in a 2.67-fold increase in hit rate. When we retrained the model on all the compounds of the HTS dataset, it further identified 45 discordant molecules classified as non-hits by the HTS, with 42/45 (93%) having known antimicrobial activity.

Conclusion: Our ML approach can be used to increase HTS efficiency by reducing the number of compounds that need to be physically screened and identifying potential missed hits, making HTS more accessible and reducing barriers to entry.

背景:抗菌药耐药性是一个重大的公共卫生威胁,需要新的制剂。目的:我们开发了一种机器学习(ML)模型,用于对低分子量分子进行硅学筛选:我们利用高通量秀丽隐杆线虫耐甲氧西林金黄色葡萄球菌(MRSA)液体感染试验的结果,开发了用于化合物优先排序和质量控制的机器学习模型:化合物优先排序模型的 AUC 为 0.795,灵敏度为 81%,特异度为 70%。当应用于由 22,768 个化合物组成的验证集时,该模型仅从总数 22,768 个化合物中的 30.6% 中识别出了 81% 通过高通量筛选 (HTS) 确定的活性化合物,从而使命中率提高了 2.67 倍。当我们在高通量筛选数据集的所有化合物上重新训练模型时,它进一步识别出了 45 个被高通量筛选归类为非命中的不和谐分子,其中 42/45 (93%)具有已知的抗菌活性:我们的 ML 方法可用于提高 HTS 效率,减少需要物理筛选的化合物数量,并识别潜在的漏检分子,从而使 HTS 更容易获得并降低进入门槛。
{"title":"Machine learning-assisted high-throughput screening for Anti-MRSA compounds.","authors":"Fadi Shehadeh, LewisOscar Felix, Markos Kalligeros, Adnan Shehadeh, Beth Burgwyn Fuchs, Frederick M Ausubel, Paul P Sotiriadis, Eleftherios Mylonakis","doi":"10.1109/TCBB.2024.3434340","DOIUrl":"10.1109/TCBB.2024.3434340","url":null,"abstract":"<p><strong>Background: </strong>Antimicrobial resistance is a major public health threat, and new agents are needed. Computational approaches have been proposed to reduce the cost and time needed for compound screening.</p><p><strong>Aims: </strong>A machine learning (ML) model was developed for the in silico screening of low molecular weight molecules.</p><p><strong>Methods: </strong>We used the results of a high-throughput Caenorhabditis elegans methicillin-resistant Staphylococcus aureus (MRSA) liquid infection assay to develop ML models for compound prioritization and quality control.</p><p><strong>Results: </strong>The compound prioritization model achieved an AUC of 0.795 with a sensitivity of 81% and a specificity of 70%. When applied to a validation set of 22,768 compounds, the model identified 81% of the active compounds identified by high-throughput screening (HTS) among only 30.6% of the total 22,768 compounds, resulting in a 2.67-fold increase in hit rate. When we retrained the model on all the compounds of the HTS dataset, it further identified 45 discordant molecules classified as non-hits by the HTS, with 42/45 (93%) having known antimicrobial activity.</p><p><strong>Conclusion: </strong>Our ML approach can be used to increase HTS efficiency by reducing the number of compounds that need to be physically screened and identifying potential missed hits, making HTS more accessible and reducing barriers to entry.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141765974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diffusing on Two Levels and Optimizing for Multiple Properties: A Novel Approach to Generating Molecules with Desirable Properties. 两级扩散和优化多种特性:生成具有理想特性的分子的新方法。
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-26 DOI: 10.1109/TCBB.2024.3434461
Siyuan Guo, Jihong Guan, Shuigeng Zhou

In the past decade, Artificial Intelligence (AI) driven drug design and discovery has been a hot research topic in the AI area, where an important branch is molecule generation by generative models, from GAN-based models and VAE-based models to the latest diffusion-based models. However, most existing models pursue mainly the basic properties like validity and uniqueness of the generated molecules, a few go further to explicitly optimize one single important molecular property (e.g., QED or PlogP), which makes most generated molecules little usefulness in practice. In this paper, we present a novel approach to generating molecules with desirable properties, which expands the diffusion model framework with multiple innovative designs. The novelty is two-fold. On the one hand, considering that the structures of molecules are complex and diverse, and molecular properties are usually determined by some substructures (e.g., pharmacophores), we propose to perform diffusion on two structural levels: molecules and molecular fragments respectively, with which a mixed Gaussian distribution is obtained for the reverse diffusion process. To get desirable molecular fragments, we develop a novel electronic effect based fragmentation method. On the other hand, we introduce two ways to explicitly optimize multiple molecular properties under the diffusion model framework. First, as potential drug molecules must be chemically valid, we optimize molecular validity by an energy-guidance function. Second, since potential drug molecules should be desirable in various properties, we employ a multi-objective mechanism to optimize multiple molecular properties simultaneously. Extensive experiments with two benchmark datasets QM9 and ZINC250k show that the molecules generated by our proposed method have better validity, uniqueness, novelty, Fr´echet ChemNet Distance (FCD), QED, and PlogP than those generated by current SOTA models. The Code of D2L-OMP is available at https://github.com/bz99bz/D2L-OMP.

在过去十年中,人工智能(AI)驱动的药物设计与发现一直是人工智能领域的研究热点,其中一个重要分支是通过生成模型生成分子,从基于 GAN 的模型、基于 VAE 的模型到最新的基于扩散的模型。然而,大多数现有模型主要追求生成分子的有效性和唯一性等基本属性,少数模型则进一步明确优化某个重要的分子属性(如 QED 或 PlogP),这使得大多数生成的分子在实践中用处不大。在本文中,我们提出了一种生成具有理想特性的分子的新方法,通过多种创新设计扩展了扩散模型框架。新颖之处有两方面。一方面,考虑到分子结构复杂多样,而分子特性通常由一些子结构(如药理结构)决定,我们建议分别在分子和分子片段这两个结构层次上进行扩散,从而获得混合高斯分布的反向扩散过程。为了得到理想的分子片段,我们开发了一种基于电子效应的新型破碎方法。另一方面,我们介绍了在扩散模型框架下明确优化多种分子特性的两种方法。首先,由于潜在药物分子必须具有化学有效性,我们通过能量引导函数来优化分子有效性。其次,由于潜在药物分子应具有各种理想特性,我们采用了一种多目标机制来同时优化多种分子特性。用两个基准数据集 QM9 和 ZINC250k 进行的大量实验表明,我们提出的方法生成的分子在有效性、唯一性、新颖性、Fr´echet ChemNet Distance (FCD)、QED 和 PlogP 等方面都优于目前的 SOTA 模型。D2L-OMP 的代码见 https://github.com/bz99bz/D2L-OMP。
{"title":"Diffusing on Two Levels and Optimizing for Multiple Properties: A Novel Approach to Generating Molecules with Desirable Properties.","authors":"Siyuan Guo, Jihong Guan, Shuigeng Zhou","doi":"10.1109/TCBB.2024.3434461","DOIUrl":"10.1109/TCBB.2024.3434461","url":null,"abstract":"<p><p>In the past decade, Artificial Intelligence (AI) driven drug design and discovery has been a hot research topic in the AI area, where an important branch is molecule generation by generative models, from GAN-based models and VAE-based models to the latest diffusion-based models. However, most existing models pursue mainly the basic properties like validity and uniqueness of the generated molecules, a few go further to explicitly optimize one single important molecular property (e.g., QED or PlogP), which makes most generated molecules little usefulness in practice. In this paper, we present a novel approach to generating molecules with desirable properties, which expands the diffusion model framework with multiple innovative designs. The novelty is two-fold. On the one hand, considering that the structures of molecules are complex and diverse, and molecular properties are usually determined by some substructures (e.g., pharmacophores), we propose to perform diffusion on two structural levels: molecules and molecular fragments respectively, with which a mixed Gaussian distribution is obtained for the reverse diffusion process. To get desirable molecular fragments, we develop a novel electronic effect based fragmentation method. On the other hand, we introduce two ways to explicitly optimize multiple molecular properties under the diffusion model framework. First, as potential drug molecules must be chemically valid, we optimize molecular validity by an energy-guidance function. Second, since potential drug molecules should be desirable in various properties, we employ a multi-objective mechanism to optimize multiple molecular properties simultaneously. Extensive experiments with two benchmark datasets QM9 and ZINC250k show that the molecules generated by our proposed method have better validity, uniqueness, novelty, Fr´echet ChemNet Distance (FCD), QED, and PlogP than those generated by current SOTA models. The Code of D2L-OMP is available at https://github.com/bz99bz/D2L-OMP.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141765972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MLRR-ATV: A Robust Manifold Nonnegative LowRank Representation with Adaptive Total-Variation Regularization for scRNA-seq Data Clustering. MLRR-ATV:用于 scRNA-seq 数据聚类的具有自适应总变异正则化功能的稳健歧面非负低方根表示。
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-24 DOI: 10.1109/TCBB.2024.3432740
Gao-Fei Wang, Juan Wang, Shasha Yuan, Chun-Hou Zheng, Jin-Xing Liu

Since genomics was proposed, the exploration of genes has been the focus of research. The emergence of single-cell RNA sequencing (scRNA-seq) technology makes it possible to explore gene expression at the single-cell level. Due to the limitations of sequencing technology, the data contains a lot of noise. At the same time, it also has the characteristics of highdimensional and sparse. Clustering is a common method of analyzing scRNA-seq data. This paper proposes a novel singlecell clustering method called Robust Manifold Nonnegative LowRank Representation with Adaptive Total-Variation Regularization (MLRR-ATV). The Adaptive Total-Variation (ATV) regularization is introduced into Low-Rank Representation (LRR) model to reduce the influence of noise through gradient learning. Then, the linear and nonlinear manifold structures in the data are learned through Euclidean distance and cosine similarity, and more valuable information is retained. Because the model is non-convex, we use the Alternating Direction Method of Multipliers (ADMM) to optimize the model. We tested the performance of the MLRRATV model on eight real scRNA-seq datasets and selected nine state-of-the-art methods as comparison methods. The experimental results show that the performance of the MLRRATV model is better than the other nine methods.

自基因组学提出以来,对基因的探索一直是研究的重点。单细胞 RNA 测序(scRNA-seq)技术的出现使得在单细胞水平上探索基因表达成为可能。由于测序技术的局限性,数据中含有大量噪声。同时,它还具有高维和稀疏的特点。聚类是分析 scRNA-seq 数据的常用方法。本文提出了一种新的单细胞聚类方法--自适应总变异正则化(MLRR-ATV)的鲁棒性表层非负低方根表示法(Robust Manifold Nonnegative LowRank Representation with Adaptive Total-Variation Regularization)。该方法将自适应总变异(ATV)正则化引入低方根表示(LRR)模型,通过梯度学习降低噪声的影响。然后,通过欧氏距离和余弦相似性学习数据中的线性和非线性流形结构,保留更多有价值的信息。由于模型是非凸的,我们使用交替方向乘法(ADMM)来优化模型。我们在八个真实的 scRNA-seq 数据集上测试了 MLRRATV 模型的性能,并选择了九种最先进的方法作为对比方法。实验结果表明,MLRRATV 模型的性能优于其他九种方法。
{"title":"MLRR-ATV: A Robust Manifold Nonnegative LowRank Representation with Adaptive Total-Variation Regularization for scRNA-seq Data Clustering.","authors":"Gao-Fei Wang, Juan Wang, Shasha Yuan, Chun-Hou Zheng, Jin-Xing Liu","doi":"10.1109/TCBB.2024.3432740","DOIUrl":"10.1109/TCBB.2024.3432740","url":null,"abstract":"<p><p>Since genomics was proposed, the exploration of genes has been the focus of research. The emergence of single-cell RNA sequencing (scRNA-seq) technology makes it possible to explore gene expression at the single-cell level. Due to the limitations of sequencing technology, the data contains a lot of noise. At the same time, it also has the characteristics of highdimensional and sparse. Clustering is a common method of analyzing scRNA-seq data. This paper proposes a novel singlecell clustering method called Robust Manifold Nonnegative LowRank Representation with Adaptive Total-Variation Regularization (MLRR-ATV). The Adaptive Total-Variation (ATV) regularization is introduced into Low-Rank Representation (LRR) model to reduce the influence of noise through gradient learning. Then, the linear and nonlinear manifold structures in the data are learned through Euclidean distance and cosine similarity, and more valuable information is retained. Because the model is non-convex, we use the Alternating Direction Method of Multipliers (ADMM) to optimize the model. We tested the performance of the MLRRATV model on eight real scRNA-seq datasets and selected nine state-of-the-art methods as comparison methods. The experimental results show that the performance of the MLRRATV model is better than the other nine methods.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141758476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE/ACM Transactions on Computational Biology and Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1