Current Bioinformatics最新文献_第4页

Relational Graph Convolution Network with Multi Features for AntiCOVID-19 Drugs Discovery using 3CLpro Potential Target 利用 3CLpro 潜在靶点发现具有多种特征的关系图卷积网络用于抗 COVID-19 药物研究

IF 4 3区生物学 Q3 BIOCHEMICAL RESEARCH METHODS

Current Bioinformatics

Pub Date : 2024-03-11 DOI: 10.2174/0115748936280392240219054047

Medard Edmund Mswahili, Goodwill Erasmo Ndomba, Young Jin Kim, Kyuri Jo, Young-Seob Jeong

Background: The potential of graph neural networks (GNNs) to revolutionize the analysis of non-Euclidean data has gained attention recently, making them attractive models for deep machine learning. However, insufficient compound or moleculargraphs and feature representations might significantly impair and jeopardize their full potential. Despite the devastating impacts of ongoing COVID-19 across the globe, for which there is no drug with proven efficacy that has been shown tobe effective. As various stages of drug discovery and repositioning require the accurate prediction of drugtarget interactions(DTI), here, we propose a relational graph convolution network using multi-features based on the developed drug chemicalcompound-coronavirus target graph representation and combination of features. During the implementation of the model, we further introduced the use of not only the feature module to understand the topological structure of drugs but also the structure of the proven drug target (i.e., 3CLpro) for SARS-Cov-2 that shares a genome sequence similar to that of other members of the beta-coronavirus group such as SARS-Cov, MERS-CoV, bat coronavirus. Our feature comprises topologicalinformation in molecular SMILES and local chemical context in the SMILES sequence for the drug chemical compound and drug target. Our proposed method prevailed with high and compelling performance accuracy of 97.30% which could beprioritized as the potential and promising prediction route for the development of novel oral antiviral medicine for COVID-19 drugs. Objective: Forecasting DTI stands as a pivotal aspect of drug discovery. The focus on computational methods in DTI prediction has intensified due to the considerable expense and time investment associated with conducting extensive in vitro and in vivo experiments. Machine learning techniques, particularly deep learning, have found broad applications in DTI prediction. We are convinced that this study could be prioritized and utilized as the promising predictive route for the development of novel oral antiviral treatments for COVID-19 and other variants of coronaviruses. Methods: This study addressed the problem of COVID-19 drugs using proposed RGCN with multifeatures as an attractive and potential route. This study focused mainly on the prediction of novel antiviral drugs against coronaviruses using graph-based methodology, namely RGCN. This research further utilized the features of both drugs and common potential drug targets found in betacoronaviruses group to deepen understanding of their underlying relation. Results: Our suggested approach prevailed with a high and convincing performance accuracy of 97.30%, which may be utilizedas a top priority to support and advance this field in the prediction and development of novel antiviral treatments against coronaviruses and their variants. Conclusion: We recursively performed experiments using the proposed method on our constructed DCCCvT graph dataset from our c

背景：最近，图神经网络（GNN）彻底改变非欧几里得数据分析的潜力备受关注，使其成为具有吸引力的深度机器学习模型。然而，不充分的复合图或分子图和特征表示可能会极大地损害和危及它们的全部潜力。尽管 COVID-19 正在全球范围内造成破坏性影响，但目前还没有证明有效的药物。由于药物发现和重新定位的各个阶段都需要对药物靶点相互作用（DTI）进行准确预测，在此，我们基于已开发的药物化学合成物-冠状病毒靶点图表示和特征组合，提出了一种使用多特征的关系图卷积网络。在该模型的实施过程中，我们不仅进一步引入了使用特征模块来了解药物的拓扑结构，还引入了针对 SARS-Cov-2（与 SARS-Cov、MERS-CoV、蝙蝠冠状病毒等其他乙型冠状病毒群成员的基因组序列相似）的已证实药物靶标（即 3CLpro）的结构。我们的特征包括分子 SMILES 中的拓扑信息以及药物化合物和药物靶点的 SMILES 序列中的局部化学背景。我们提出的方法准确率高达 97.30%，可作为开发 COVID-19 新型口服抗病毒药物的潜在预测途径。目标：预测 DTI 是药物发现的关键环节。由于进行大量的体外和体内实验需要投入大量的费用和时间，因此在 DTI 预测中对计算方法的关注日益加强。机器学习技术，尤其是深度学习，已在 DTI 预测中得到广泛应用。我们相信，这项研究可以作为开发针对 COVID-19 和其他冠状病毒变种的新型口服抗病毒疗法的有前途的预测途径，并优先加以利用。研究方法本研究利用具有多特征的 RGCN 作为一种有吸引力的潜在途径来解决 COVID-19 药物问题。本研究主要侧重于使用基于图的方法（即 RGCN）预测针对冠状病毒的新型抗病毒药物。本研究进一步利用了这两种药物的特征以及在 betacoronaviruses 组中发现的常见潜在药物靶点，以加深对其潜在关系的理解。研究结果我们建议的方法准确率高达 97.30%，令人信服，可作为该领域预测和开发新型冠状病毒及其变种抗病毒疗法的首要支持和推动因素。结论我们在从收集的数据集中构建的 DCCCvT 图数据集上使用所提出的方法进行了递归实验，发现我们的模型在 T7 特征上取得了可比的最佳平均准确率性能，其次是 T7、R6 和 L8 的组合。本研究中提出的模型结果优于之前的相关研究。

{"title":"Relational Graph Convolution Network with Multi Features for AntiCOVID-19 Drugs Discovery using 3CLpro Potential Target","authors":"Medard Edmund Mswahili, Goodwill Erasmo Ndomba, Young Jin Kim, Kyuri Jo, Young-Seob Jeong","doi":"10.2174/0115748936280392240219054047","DOIUrl":"https://doi.org/10.2174/0115748936280392240219054047","url":null,"abstract":"Background: The potential of graph neural networks (GNNs) to revolutionize the analysis of non-Euclidean data has gained attention recently, making them attractive models for deep machine learning. However, insufficient compound or moleculargraphs and feature representations might significantly impair and jeopardize their full potential. Despite the devastating impacts of ongoing COVID-19 across the globe, for which there is no drug with proven efficacy that has been shown tobe effective. As various stages of drug discovery and repositioning require the accurate prediction of drugtarget interactions(DTI), here, we propose a relational graph convolution network using multi-features based on the developed drug chemicalcompound-coronavirus target graph representation and combination of features. During the implementation of the model, we further introduced the use of not only the feature module to understand the topological structure of drugs but also the structure of the proven drug target (i.e., 3CLpro) for SARS-Cov-2 that shares a genome sequence similar to that of other members of the beta-coronavirus group such as SARS-Cov, MERS-CoV, bat coronavirus. Our feature comprises topologicalinformation in molecular SMILES and local chemical context in the SMILES sequence for the drug chemical compound and drug target. Our proposed method prevailed with high and compelling performance accuracy of 97.30% which could beprioritized as the potential and promising prediction route for the development of novel oral antiviral medicine for COVID-19 drugs. Objective: Forecasting DTI stands as a pivotal aspect of drug discovery. The focus on computational methods in DTI prediction has intensified due to the considerable expense and time investment associated with conducting extensive in vitro and in vivo experiments. Machine learning techniques, particularly deep learning, have found broad applications in DTI prediction. We are convinced that this study could be prioritized and utilized as the promising predictive route for the development of novel oral antiviral treatments for COVID-19 and other variants of coronaviruses. Methods: This study addressed the problem of COVID-19 drugs using proposed RGCN with multifeatures as an attractive and potential route. This study focused mainly on the prediction of novel antiviral drugs against coronaviruses using graph-based methodology, namely RGCN. This research further utilized the features of both drugs and common potential drug targets found in betacoronaviruses group to deepen understanding of their underlying relation. Results: Our suggested approach prevailed with a high and convincing performance accuracy of 97.30%, which may be utilizedas a top priority to support and advance this field in the prediction and development of novel antiviral treatments against coronaviruses and their variants. Conclusion: We recursively performed experiments using the proposed method on our constructed DCCCvT graph dataset from our c","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"43 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140105422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MCHAN: Prediction of Human Microbe-drug Associations Based on Multiview Contrastive Hypergraph Attention Network MCHAN：基于多视角对比超图注意力网络的人类微生物-药物关联预测

IF 4 3区生物学 Q3 BIOCHEMICAL RESEARCH METHODS

Current Bioinformatics

Pub Date : 2024-03-01 DOI: 10.2174/0115748936288616240212073805

Guanghui Li, Ziyan Cao, Cheng Liang, Qiu Xiao, Jiawei Luo

Background: Complex and diverse microbial communities play a pivotal role in human health and have become a new drug target. Exploring the connections between drugs and microbes not only provides profound insights into their mechanisms but also drives progress in drug discovery and repurposing. The use of wet lab experiments to identify associations is time-consuming and laborious. Hence, the advancement of precise and efficient computational methods can effectively improve the efficiency of association identification between microorganisms and drugs. Objective: In this experiment, we propose a new deep learning model, a new multiview comparative hypergraph attention network (MCHAN) method for human microbe–drug association prediction. Methods: First, we fuse multiple similarity matrices to obtain a fused microbial and drug similarity network. By combining graph convolutional networks with attention mechanisms, we extract key information from multiple perspectives. Then, we construct two network topologies based on the above fused data. One topology incorporates the concept of hypernodes to capture implicit relationships between microbes and drugs using virtual nodes to construct a hyperheterogeneous graph. Next, we propose a cross-contrastive learning task that facilitates the simultaneous guidance of graph embeddings from both perspectives, without the need for any labels. This approach allows us to bring nodes with similar features and network topologies closer while pushing away other nodes. Finally, we employ attention mechanisms to merge the outputs of the GCN and predict the associations between drugs and microbes. Results: To confirm the effectiveness of this method, we conduct experiments on three distinct datasets. The results demonstrate that the MCHAN model surpasses other methods in terms of performance. Furthermore, case studies provide additional evidence confirming the consistent predictive accuracy of the MCHAN model. Conclusion: MCHAN is expected to become a valuable tool for predicting potential associations between microbiota and drugs in the future.

背景：复杂多样的微生物群落在人类健康中发挥着举足轻重的作用，并已成为新的药物靶点。探索药物与微生物之间的联系不仅能深入了解它们的作用机制，还能推动药物发现和再利用的进展。使用湿实验室实验来确定关联既费时又费力。因此，精确高效的计算方法可以有效提高微生物与药物之间关联识别的效率。目标：在本实验中，我们提出了一种新的深度学习模型--新的多视图比较超图注意网络（MCHAN）方法，用于人类微生物与药物的关联预测。方法：首先，我们融合多个相似性矩阵，得到一个融合的微生物和药物相似性网络。通过将图卷积网络与注意力机制相结合，我们从多个角度提取了关键信息。然后，我们根据上述融合数据构建两种网络拓扑结构。一种拓扑结合了超节点的概念，利用虚拟节点捕捉微生物和药物之间的隐含关系，从而构建超异构图。接下来，我们提出了一种交叉对比学习任务，有助于同时从两个角度指导图嵌入，而无需任何标签。通过这种方法，我们可以拉近具有相似特征和网络拓扑结构的节点，同时推开其他节点。最后，我们利用注意力机制合并 GCN 的输出，预测药物与微生物之间的关联。结果为了证实这种方法的有效性，我们在三个不同的数据集上进行了实验。结果表明，MCHAN 模型在性能上超越了其他方法。此外，案例研究提供了更多证据，证实了 MCHAN 模型始终如一的预测准确性。结论未来，MCHAN有望成为预测微生物群与药物之间潜在关联的重要工具。

{"title":"MCHAN: Prediction of Human Microbe-drug Associations Based on Multiview Contrastive Hypergraph Attention Network","authors":"Guanghui Li, Ziyan Cao, Cheng Liang, Qiu Xiao, Jiawei Luo","doi":"10.2174/0115748936288616240212073805","DOIUrl":"https://doi.org/10.2174/0115748936288616240212073805","url":null,"abstract":"Background: Complex and diverse microbial communities play a pivotal role in human health and have become a new drug target. Exploring the connections between drugs and microbes not only provides profound insights into their mechanisms but also drives progress in drug discovery and repurposing. The use of wet lab experiments to identify associations is time-consuming and laborious. Hence, the advancement of precise and efficient computational methods can effectively improve the efficiency of association identification between microorganisms and drugs. Objective: In this experiment, we propose a new deep learning model, a new multiview comparative hypergraph attention network (MCHAN) method for human microbe–drug association prediction. Methods: First, we fuse multiple similarity matrices to obtain a fused microbial and drug similarity network. By combining graph convolutional networks with attention mechanisms, we extract key information from multiple perspectives. Then, we construct two network topologies based on the above fused data. One topology incorporates the concept of hypernodes to capture implicit relationships between microbes and drugs using virtual nodes to construct a hyperheterogeneous graph. Next, we propose a cross-contrastive learning task that facilitates the simultaneous guidance of graph embeddings from both perspectives, without the need for any labels. This approach allows us to bring nodes with similar features and network topologies closer while pushing away other nodes. Finally, we employ attention mechanisms to merge the outputs of the GCN and predict the associations between drugs and microbes. Results: To confirm the effectiveness of this method, we conduct experiments on three distinct datasets. The results demonstrate that the MCHAN model surpasses other methods in terms of performance. Furthermore, case studies provide additional evidence confirming the consistent predictive accuracy of the MCHAN model. Conclusion: MCHAN is expected to become a valuable tool for predicting potential associations between microbiota and drugs in the future.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"226 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140019928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Network Subgraph-based Method: Alignment-free Technique for Molecular Network Analysis 基于网络子图的方法：分子网络分析的无对齐技术

IF 4 3区生物学 Q3 BIOCHEMICAL RESEARCH METHODS

Current Bioinformatics

Pub Date : 2024-02-22 DOI: 10.2174/0115748936285057240126062220

Efendi Zaenudin, Ezra B. Wijaya, Venugopala Reddy Mekala, Ka-Lok Ng

Objective: We propose a novel method to compare directed networks by decomposing the network into small modules, the so-called network subgraph approach, which is distinct from the network motif approach because it does not depend on null model assumptions. Method: We developed an alignment-free algorithm called the Subgraph Identification Algorithm (SIA), which could generate all subgraphs that have five connected nodes (5-node subgraph). There were 9,364 such modules. Then, we applied the SIA method to examine 17 cancer networks and measured the similarity between the two networks by gauging the similarity level using Jensen- Shannon entropy (HJS). Method: We developed an alignment-free algorithm called the Subgraph Identification Algorithm (SIA), which could generate all subgraphs that have five connected nodes (5-node subgraph). There were 9,364 such modules. Then, we applied the SIA method to examine 17 cancer networks and measured the similarity between the two networks by gauging the similarity level using Jensen- Shannon entropy (HJS). Results:: We identified and examined the biological meaning of 5-node regulatory modules and pairs of cancer networks with the smallest HJS values. The two pairs of networks that show similar patterns are (i) endometrial cancer and hepatocellular carcinoma and (ii) breast cancer and pathways in cancer. Some studies have provided experimental data supporting the 5-node regulatory modules. result: We identify and examine the biological meaning of 5-node regulatory modules and pairs of cancer networks which have the smallest HJS values. These two pairs of networks that show similar patterns are (i) endometrial cancer and hepatocellular carcinoma, and (ii) breast cancer and pathways in cancer. Some literature studies provide experimental data to support the 5-node regulatory modules. Conclusion: Our method is an alignment-free approach that measures the topological similarity of 5-node regulatory modules and aligns two directed networks based on their topology. These modules capture complex interactions among multiple genes that cannot be detected using existing methods that only consider single-gene relations. We analyzed the biological relevance of the regulatory modules and used the subgraph method to identify the modules that shared the same topology across 2 cancer networks out of 17 cancer networks. We validated our findings using evidence from the literature.

目的：我们提出了一种通过将网络分解成小模块来比较有向网络的新方法，即所谓的网络子图方法，这种方法与网络图案方法不同，因为它不依赖于空模型假设。方法：我们开发了一种名为 "子图识别算法"（SIA）的无对齐算法，它可以生成所有具有五个连接节点的子图（5 节点子图）。共有 9364 个这样的模块。然后，我们应用 SIA 方法研究了 17 个癌症网络，并使用詹森-香农熵（HJS）测量了两个网络的相似度。方法：我们开发了一种名为 "子图识别算法（SIA）"的无对齐算法，该算法可以生成所有具有五个连接节点的子图（五节点子图）。共有 9364 个这样的模块。然后，我们应用 SIA 方法研究了 17 个癌症网络，并使用詹森-香农熵（HJS）测量了两个网络的相似度。结果我们确定并研究了 HJS 值最小的 5 节点调控模块和癌症网络对的生物学意义。表现出相似模式的两对网络是：(i) 子宫内膜癌和肝细胞癌；(ii) 乳腺癌和癌症中的通路。一些研究提供了支持 5 节点调控模块的实验数据：我们识别并研究了 5 节点调控模块和 HJS 值最小的癌症网络对的生物学意义。这两对显示出相似模式的网络是：(i) 子宫内膜癌和肝细胞癌；(ii) 乳腺癌和癌症路径。一些文献研究提供了支持 5 节点调控模块的实验数据。结论我们的方法是一种免配准方法，可测量 5 节点调控模块的拓扑相似性，并根据其拓扑结构配准两个有向网络。这些模块捕捉了多个基因之间复杂的相互作用，而现有的方法只考虑单基因关系，无法检测到这些相互作用。我们分析了调控模块的生物学相关性，并使用子图方法从 17 个癌症网络中找出了在 2 个癌症网络中拓扑结构相同的模块。我们利用文献中的证据验证了我们的发现。

{"title":"Network Subgraph-based Method: Alignment-free Technique for Molecular Network Analysis","authors":"Efendi Zaenudin, Ezra B. Wijaya, Venugopala Reddy Mekala, Ka-Lok Ng","doi":"10.2174/0115748936285057240126062220","DOIUrl":"https://doi.org/10.2174/0115748936285057240126062220","url":null,"abstract":"Objective: We propose a novel method to compare directed networks by decomposing the network into small modules, the so-called network subgraph approach, which is distinct from the network motif approach because it does not depend on null model assumptions. Method: We developed an alignment-free algorithm called the Subgraph Identification Algorithm (SIA), which could generate all subgraphs that have five connected nodes (5-node subgraph). There were 9,364 such modules. Then, we applied the SIA method to examine 17 cancer networks and measured the similarity between the two networks by gauging the similarity level using Jensen- Shannon entropy (HJS). Method: We developed an alignment-free algorithm called the Subgraph Identification Algorithm (SIA), which could generate all subgraphs that have five connected nodes (5-node subgraph). There were 9,364 such modules. Then, we applied the SIA method to examine 17 cancer networks and measured the similarity between the two networks by gauging the similarity level using Jensen- Shannon entropy (HJS). Results:: We identified and examined the biological meaning of 5-node regulatory modules and pairs of cancer networks with the smallest HJS values. The two pairs of networks that show similar patterns are (i) endometrial cancer and hepatocellular carcinoma and (ii) breast cancer and pathways in cancer. Some studies have provided experimental data supporting the 5-node regulatory modules. result: We identify and examine the biological meaning of 5-node regulatory modules and pairs of cancer networks which have the smallest HJS values. These two pairs of networks that show similar patterns are (i) endometrial cancer and hepatocellular carcinoma, and (ii) breast cancer and pathways in cancer. Some literature studies provide experimental data to support the 5-node regulatory modules. Conclusion: Our method is an alignment-free approach that measures the topological similarity of 5-node regulatory modules and aligns two directed networks based on their topology. These modules capture complex interactions among multiple genes that cannot be detected using existing methods that only consider single-gene relations. We analyzed the biological relevance of the regulatory modules and used the subgraph method to identify the modules that shared the same topology across 2 cancer networks out of 17 cancer networks. We validated our findings using evidence from the literature.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139953791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A-RFP: An Adaptive Residue Flexibility Prediction Method Improving Protein-ligand Docking Based on Homologous Proteins A-RFP：基于同源蛋白质的自适应残基柔性预测方法，用于改善蛋白质配体对接

IF 4 3区生物学 Q3 BIOCHEMICAL RESEARCH METHODS

Current Bioinformatics

Pub Date : 2024-02-20 DOI: 10.2174/0115748936258790240101062642

Chuqi Lei, Senbiao Fang, Yaohang Li, Fei Guo, Min Li

background: computational molecular docking plays an important role in determining the precise receptor-ligand conformation, which becomes a powerful tool for drug discovery. In the past 30 years, most computational docking methods treat the receptor structure as a rigid body, although flexible docking often yields higher accuracy. The main disadvantage of flexible docking is its significantly higher computational cost. Due to the fact that different protein pock-et residues exhibit different degrees of flexibility, semi-flexible docking methods, balancing rigid docking and flexible docking, have demonstrated success in predicting highly accurate conformations with a relatively low computational cost. method: In our study, the number of flexible pocket residues was assessed by quantitative analysis, and a novel adaptive residue flexibility prediction method, named A-RFP, was proposed to improve the docking performance. Based on the homologous information, a joint strategy is used to predict the pocket residue flexibility by combining RMSD, the distance between the residue sidechain and the ligand, and the sidechain orientation. For each receptor-ligand pair, A-RFP provides a docking conformation with the optimal affinity. result: By analyzing the docking affinities of 3507 target-ligand pairs in 5 different values ranging from 0 to 10, we found there is a general trend that the larger number of flexible residues inevitably improves the docking results by using Autodock Vina. However, a certain number of counterexamples still exist. To validate the effectiveness of A-RFP, the experimental assessment was tested in a small-scale virtual screening on 5 proteins, which confirmed that A-RFP could enhance the docking performance. And the flexible-receptor virtual screening on a low-similarity dataset with 85 receptors validates the accuracy of residue flexibility comprehensive evaluation. Moreover, we studied three receptors with FDA-approved drugs, which further proved A-RFP can play a suitable role in ligand discovery. conclusion: Our analysis confirms that the screening performance of the various number of flexible residues varies wildly across receptors. It suggests that a fine-grained docking method would offset the aforementioned deficiency. Thus, we presented A-RFP, an adaptive pocket residue flexibility prediction method based on homologous information. Without considering computational resources and time costs, A-RFP provides the optimal docking result.

背景：计算分子对接在确定受体-配体的精确构象方面发挥着重要作用，成为药物发现的有力工具。在过去的 30 年中，大多数计算对接方法都将受体结构视为刚体，尽管柔性对接通常能获得更高的精确度。柔性对接的主要缺点是计算成本较高。由于不同的蛋白质受体残基表现出不同程度的柔性，半柔性对接方法在刚性对接和柔性对接之间取得了平衡，成功地以相对较低的计算成本预测了高精度的构象：在我们的研究中，通过定量分析评估了柔性口袋残基的数量，并提出了一种名为 A-RFP 的新型自适应残基柔性预测方法，以提高对接性能。在同源信息的基础上，结合 RMSD、残基侧链与配体之间的距离以及侧链方向，采用联合策略预测口袋残基的柔性。对于每一对受体配体，A-RFP 都能提供一个具有最佳亲和力的对接构象：通过分析 3507 对目标物-配体在 5 个从 0 到 10 的不同数值范围内的对接亲和力，我们发现一个普遍的趋势是，柔性残基的数量越多，使用 Autodock Vina 不可避免地会改善对接结果。但是，仍然存在一定数量的反例。为了验证 A-RFP 的有效性，实验评估在 5 个蛋白质的小规模虚拟筛选中进行了测试，结果证实 A-RFP 可以提高对接性能。在一个包含 85 个受体的低相似性数据集上进行的柔性受体虚拟筛选验证了残基柔性综合评估的准确性。此外，我们还研究了三种与 FDA 批准药物配伍的受体，这进一步证明了 A-RFP 在配体发现中可以发挥合适的作用：我们的分析证实，不同数量的柔性残基在不同受体上的筛选性能差异很大。这表明细粒度对接方法可以弥补上述不足。因此，我们提出了基于同源信息的自适应口袋残基柔性预测方法 A-RFP。在不考虑计算资源和时间成本的情况下，A-RFP 提供了最佳的对接结果。

{"title":"A-RFP: An Adaptive Residue Flexibility Prediction Method Improving Protein-ligand Docking Based on Homologous Proteins","authors":"Chuqi Lei, Senbiao Fang, Yaohang Li, Fei Guo, Min Li","doi":"10.2174/0115748936258790240101062642","DOIUrl":"https://doi.org/10.2174/0115748936258790240101062642","url":null,"abstract":"background: computational molecular docking plays an important role in determining the precise receptor-ligand conformation, which becomes a powerful tool for drug discovery. In the past 30 years, most computational docking methods treat the receptor structure as a rigid body, although flexible docking often yields higher accuracy. The main disadvantage of flexible docking is its significantly higher computational cost. Due to the fact that different protein pock-et residues exhibit different degrees of flexibility, semi-flexible docking methods, balancing rigid docking and flexible docking, have demonstrated success in predicting highly accurate conformations with a relatively low computational cost. method: In our study, the number of flexible pocket residues was assessed by quantitative analysis, and a novel adaptive residue flexibility prediction method, named A-RFP, was proposed to improve the docking performance. Based on the homologous information, a joint strategy is used to predict the pocket residue flexibility by combining RMSD, the distance between the residue sidechain and the ligand, and the sidechain orientation. For each receptor-ligand pair, A-RFP provides a docking conformation with the optimal affinity. result: By analyzing the docking affinities of 3507 target-ligand pairs in 5 different values ranging from 0 to 10, we found there is a general trend that the larger number of flexible residues inevitably improves the docking results by using Autodock Vina. However, a certain number of counterexamples still exist. To validate the effectiveness of A-RFP, the experimental assessment was tested in a small-scale virtual screening on 5 proteins, which confirmed that A-RFP could enhance the docking performance. And the flexible-receptor virtual screening on a low-similarity dataset with 85 receptors validates the accuracy of residue flexibility comprehensive evaluation. Moreover, we studied three receptors with FDA-approved drugs, which further proved A-RFP can play a suitable role in ligand discovery. conclusion: Our analysis confirms that the screening performance of the various number of flexible residues varies wildly across receptors. It suggests that a fine-grained docking method would offset the aforementioned deficiency. Thus, we presented A-RFP, an adaptive pocket residue flexibility prediction method based on homologous information. Without considering computational resources and time costs, A-RFP provides the optimal docking result.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"93 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139926820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sia-m7G: Predicting m7G Sites through the Siamese Neural Network with an Attention Mechanism Sia-m7G：通过具有注意力机制的连体神经网络预测 m7G 位点

IF 4 3区生物学 Q3 BIOCHEMICAL RESEARCH METHODS

Current Bioinformatics

Pub Date : 2024-02-09 DOI: 10.2174/0115748936285540240116065719

Jia Zheng, Yetong Zhou

Background: The chemical modification of RNA plays a crucial role in many biological processes. N7-methylguanosine (m7G), being one of the most important epigenetic modifications, plays an important role in gene expression, processing metabolism, and protein synthesis. Detecting the exact location of m7G sites in the transcriptome is key to understanding their relevant mechanism in gene expression. On the basis of experimentally validated data, several machine learning or deep learning tools have been designed to identify internal m7G sites and have shown advantages over traditional experimental methods in terms of speed, cost-effectiveness and robustness. Aims: In this study, we aim to develop a computational model to help predict the exact location of m7G sites in humans. Objective: Simple and advanced encoding methods and deep learning networks are designed to achieve excellent m7G prediction efficiently. Methods: Three types of feature extractions and six classification algorithms were tested to identify m7G sites. Our final model, named Sia-m7G, adopts one-hot encoding and a delicate Siamese neural network with an attention mechanism. In addition, multiple 10-fold cross-validation tests were conducted to evaluate our predictor. Results: Sia-m7G achieved the highest sensitivity, specificity and accuracy on 10-fold crossvalidation tests compared with the other six m7G predictors. Nucleotide preference and model visualization analyses were conducted to strengthen the interpretability of Sia-m7G and provide a further understanding of m7G site fragments in genomic sequences. Conclusion: Sia-m7G has significant advantages over other classifiers and predictors, which proves the superiority of the Siamese neural network algorithm in identifying m7G sites.

背景：RNA 的化学修饰在许多生物过程中起着至关重要的作用。N7-甲基鸟苷（m7G）是最重要的表观遗传修饰之一，在基因表达、加工代谢和蛋白质合成中发挥着重要作用。检测 m7G 位点在转录组中的确切位置是了解其在基因表达中的相关机制的关键。在实验验证数据的基础上，人们设计了一些机器学习或深度学习工具来识别内部的 m7G 位点，与传统的实验方法相比，这些工具在速度、成本效益和鲁棒性方面都显示出了优势。目的：在本研究中，我们旨在开发一种计算模型，帮助预测人类 m7G 位点的确切位置。目标：通过简单、先进的编码方法和深度分析技术，预测人类 m7G 位点的准确位置：设计简单而先进的编码方法和深度学习网络，以高效实现出色的 m7G 预测。方法：测试了三种特征提取和六种分类算法，以识别 m7G 位点。我们的最终模型被命名为 Sia-m7G，它采用了单次热编码和具有注意机制的精致连体神经网络。此外，我们还进行了多次 10 倍交叉验证测试，以评估我们的预测器。结果与其他六种 m7G 预测因子相比，Sia-m7G 在 10 倍交叉验证测试中的灵敏度、特异性和准确性都是最高的。进行了核苷酸偏好和模型可视化分析，以加强 Sia-m7G 的可解释性，并进一步了解基因组序列中的 m7G 位点片段。结论与其他分类器和预测器相比，Sia-m7G 具有显著优势，这证明了连体神经网络算法在识别 m7G 位点方面的优越性。

{"title":"Sia-m7G: Predicting m7G Sites through the Siamese Neural Network with an Attention Mechanism","authors":"Jia Zheng, Yetong Zhou","doi":"10.2174/0115748936285540240116065719","DOIUrl":"https://doi.org/10.2174/0115748936285540240116065719","url":null,"abstract":"Background: The chemical modification of RNA plays a crucial role in many biological processes. N7-methylguanosine (m7G), being one of the most important epigenetic modifications, plays an important role in gene expression, processing metabolism, and protein synthesis. Detecting the exact location of m7G sites in the transcriptome is key to understanding their relevant mechanism in gene expression. On the basis of experimentally validated data, several machine learning or deep learning tools have been designed to identify internal m7G sites and have shown advantages over traditional experimental methods in terms of speed, cost-effectiveness and robustness. Aims: In this study, we aim to develop a computational model to help predict the exact location of m7G sites in humans. Objective: Simple and advanced encoding methods and deep learning networks are designed to achieve excellent m7G prediction efficiently. Methods: Three types of feature extractions and six classification algorithms were tested to identify m7G sites. Our final model, named Sia-m7G, adopts one-hot encoding and a delicate Siamese neural network with an attention mechanism. In addition, multiple 10-fold cross-validation tests were conducted to evaluate our predictor. Results: Sia-m7G achieved the highest sensitivity, specificity and accuracy on 10-fold crossvalidation tests compared with the other six m7G predictors. Nucleotide preference and model visualization analyses were conducted to strengthen the interpretability of Sia-m7G and provide a further understanding of m7G site fragments in genomic sequences. Conclusion: Sia-m7G has significant advantages over other classifiers and predictors, which proves the superiority of the Siamese neural network algorithm in identifying m7G sites.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"12 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139759975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integrated Machine Learning Algorithms for Stratification of Patients with Bladder Cancer 用于膀胱癌患者分层的集成机器学习算法

IF 4 3区生物学 Q3 BIOCHEMICAL RESEARCH METHODS

Current Bioinformatics

Pub Date : 2024-02-07 DOI: 10.2174/0115748936288453240124082031

Yuanyuan He, Haodong Wei, Siqing Liao, Ruiming Ou, Yuqiang Xiong, Yongchun Zuo, Lei Yang

Background: Bladder cancer is a prevalent malignancy globally, characterized by rising incidence and mortality rates. Stratifying bladder cancer patients into different subtypes is crucial for the effective treatment of this form of cancer. Therefore, there is a need to develop a stratification model specific to bladder cancer. Purpose: This study aims to establish a prognostic prediction model for bladder cancer, with the primary goal of accurately predicting prognosis and treatment outcomes. objective: This study aims to establish a prognostic prediction model for bladder cancer, with the primary goal of accurately predicting prognosis and treatment outcomes. Methods: We collected datasets from 10 bladder cancer samples sourced from the Gene Expression Omnibus (GEO), the Cancer Genome Atlas (TCGA) databases, and IMvigor210 dataset. The machine learning based algorithms were used to generate 96 models for establishing the risk score for each patient. Based on the risk score, all the patients was classified into two different risk score groups. Results: The two groups of bladder cancer patients exhibited significant differences in prognosis, biological functions, and drug sensitivity. Nomogram model demonstrated that the risk score had a robust predictive effect with good clinical utility. Conclusion: The risk score constructed in this study can be utilized to predict the prognosis, response to drug treatment, and immunotherapy of bladder cancer patients, providing assistance for personalized clinical treatment of bladder cancer. other: None

背景：膀胱癌是一种全球流行的恶性肿瘤，发病率和死亡率不断上升。将膀胱癌患者分为不同亚型对有效治疗这种癌症至关重要。因此，有必要开发一种专门针对膀胱癌的分层模型。目的：本研究旨在建立膀胱癌预后预测模型，主要目的是准确预测预后和治疗效果：本研究旨在建立膀胱癌预后预测模型，主要目的是准确预测预后和治疗效果。方法：我们收集了 10 个膀胱癌患者的数据集：我们从基因表达总库（GEO）、癌症基因组图谱（TCGA）数据库和 IMvigor210 数据集中收集了 10 个膀胱癌样本的数据集。利用基于机器学习的算法生成了 96 个模型，为每位患者确定了风险评分。根据风险评分，所有患者被分为两个不同的风险评分组。结果显示两组膀胱癌患者在预后、生物功能和药物敏感性方面存在显著差异。提名图模型表明，风险评分具有很强的预测效果和良好的临床实用性。结论本研究构建的风险评分可用于预测膀胱癌患者的预后、对药物治疗的反应和免疫治疗，为膀胱癌的个性化临床治疗提供帮助。其他：无

{"title":"Integrated Machine Learning Algorithms for Stratification of Patients with Bladder Cancer","authors":"Yuanyuan He, Haodong Wei, Siqing Liao, Ruiming Ou, Yuqiang Xiong, Yongchun Zuo, Lei Yang","doi":"10.2174/0115748936288453240124082031","DOIUrl":"https://doi.org/10.2174/0115748936288453240124082031","url":null,"abstract":"Background: Bladder cancer is a prevalent malignancy globally, characterized by rising incidence and mortality rates. Stratifying bladder cancer patients into different subtypes is crucial for the effective treatment of this form of cancer. Therefore, there is a need to develop a stratification model specific to bladder cancer. Purpose: This study aims to establish a prognostic prediction model for bladder cancer, with the primary goal of accurately predicting prognosis and treatment outcomes. objective: This study aims to establish a prognostic prediction model for bladder cancer, with the primary goal of accurately predicting prognosis and treatment outcomes. Methods: We collected datasets from 10 bladder cancer samples sourced from the Gene Expression Omnibus (GEO), the Cancer Genome Atlas (TCGA) databases, and IMvigor210 dataset. The machine learning based algorithms were used to generate 96 models for establishing the risk score for each patient. Based on the risk score, all the patients was classified into two different risk score groups. Results: The two groups of bladder cancer patients exhibited significant differences in prognosis, biological functions, and drug sensitivity. Nomogram model demonstrated that the risk score had a robust predictive effect with good clinical utility. Conclusion: The risk score constructed in this study can be utilized to predict the prognosis, response to drug treatment, and immunotherapy of bladder cancer patients, providing assistance for personalized clinical treatment of bladder cancer. other: None","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"58 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139760199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Genotype and Phenotype Association Analysis Based on Multi-omics Statistical Data 基于多组学统计数据的基因型与表型关联分析

IF 4 3区生物学 Q3 BIOCHEMICAL RESEARCH METHODS

Current Bioinformatics

Pub Date : 2024-02-07 DOI: 10.2174/0115748936276861240109045208

Xinpeng Guo, Yafei Song, Dongyan Xu, Xueping Jin, Xuequn Shang

Background: When using clinical data for multi-omics analysis, there are issues such as the insufficient number of omics data types and relatively small sample size due to the protection of patients' privacy, the requirements of data management by various institutions, and the relatively large number of features of each omics data. This paper describes the analysis of multi-omics pathway relationships using statistical data in the absence of clinical data. Methods: We proposed a novel approach to exploit easily accessible statistics in public databases. This approach introduces phenotypic associations that are not included in the clinical data and uses these data to build a three-layer heterogeneous network. To simplify the analysis, we decomposed the three-layer network into double two-layer networks to predict the weights of the inter-layer associations. By adding a hyperparameter β, the weights of the two layers of the network were merged, and then k-fold cross-validation was used to evaluate the accuracy of this method. In calculating the weights of the two-layer networks, the RWR with fixed restart probability was combined with PBMDA and CIPHER to generate the PCRWR with biased weights and improved accuracy. Results: The area under the receiver operating characteristic curve was increased by approximately 7% in the case of the RWR with initial weights. Conclusion: Multi-omics statistical data were used to establish genotype and phenotype correlation networks for analysis, which was similar to the effect of clinical multi-omics analysis.

背景：在利用临床数据进行多组学分析时，由于患者隐私的保护、各机构对数据管理的要求以及各组学数据特征相对较多等原因，存在组学数据类型不够多、样本量相对较小等问题。本文介绍了在没有临床数据的情况下，利用统计数据对多组学通路关系进行分析的方法。方法：我们提出了一种利用公共数据库中易于获取的统计数据的新方法。这种方法引入了临床数据中未包含的表型关联，并利用这些数据构建了一个三层异构网络。为简化分析，我们将三层网络分解为双层网络，以预测层间关联的权重。通过添加一个超参数 β，合并两层网络的权重，然后使用 k 倍交叉验证来评估这种方法的准确性。在计算两层网络的权重时，将具有固定重启概率的 RWR 与 PBMDA 和 CIPHER 结合起来，生成了具有偏置权重的 PCRWR，并提高了准确性。结果带有初始权重的 RWR 的接收器工作特征曲线下面积增加了约 7%。结论利用多组学统计数据建立基因型和表型相关网络进行分析，其效果与临床多组学分析相似。

{"title":"Genotype and Phenotype Association Analysis Based on Multi-omics Statistical Data","authors":"Xinpeng Guo, Yafei Song, Dongyan Xu, Xueping Jin, Xuequn Shang","doi":"10.2174/0115748936276861240109045208","DOIUrl":"https://doi.org/10.2174/0115748936276861240109045208","url":null,"abstract":"Background: When using clinical data for multi-omics analysis, there are issues such as the insufficient number of omics data types and relatively small sample size due to the protection of patients' privacy, the requirements of data management by various institutions, and the relatively large number of features of each omics data. This paper describes the analysis of multi-omics pathway relationships using statistical data in the absence of clinical data. Methods: We proposed a novel approach to exploit easily accessible statistics in public databases. This approach introduces phenotypic associations that are not included in the clinical data and uses these data to build a three-layer heterogeneous network. To simplify the analysis, we decomposed the three-layer network into double two-layer networks to predict the weights of the inter-layer associations. By adding a hyperparameter β, the weights of the two layers of the network were merged, and then k-fold cross-validation was used to evaluate the accuracy of this method. In calculating the weights of the two-layer networks, the RWR with fixed restart probability was combined with PBMDA and CIPHER to generate the PCRWR with biased weights and improved accuracy. Results: The area under the receiver operating characteristic curve was increased by approximately 7% in the case of the RWR with initial weights. Conclusion: Multi-omics statistical data were used to establish genotype and phenotype correlation networks for analysis, which was similar to the effect of clinical multi-omics analysis.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"12 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139760088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing Drug-Target Binding Affinity Prediction through Deep Learning and Protein Secondary Structure Integration 通过深度学习和蛋白质二级结构整合加强药物与靶点的结合亲和力预测

IF 4 3区生物学 Q3 BIOCHEMICAL RESEARCH METHODS

Current Bioinformatics

Pub Date : 2024-02-07 DOI: 10.2174/0115748936285519240110070209

Runhua Zhang, Baozhong Zhu, Tengsheng Jiang, Zhiming Cui, Hongjie Wu

Background: Conventional approaches to drug discovery are often characterized by lengthy and costly processes. To expedite the discovery of new drugs, the integration of artificial intelligence (AI) in predicting drug-target binding affinity (DTA) has emerged as a crucial approach. Despite the proliferation of deep learning methods for DTA prediction, many of these methods primarily concentrate on the amino acid sequence of proteins. Yet, the interactions between drug compounds and targets occur within distinct segments within the protein structures, whereas the primary sequence primarily captures global protein features. Consequently, it falls short of fully elucidating the intricate relationship between drugs and their respective targets. Objective: This study aims to employ advanced deep-learning techniques to forecast DTA while incorporating information about the secondary structure of proteins. Methods: In our research, both the primary sequence of protein and the secondary structure of protein were leveraged for protein representation. While the primary sequence played the role of the overarching feature, the secondary structure was employed as the localized feature. Convolutional neural networks and graph neural networks were utilized to independently model the intricate features of target proteins and drug compounds. This approach enhanced our ability to capture drugtarget interactions more effectively Results: We have introduced a novel method for predicting DTA. In comparison to DeepDTA, our approach demonstrates significant enhancements, achieving a 3.9% increase in the Concordance Index (CI) and a remarkable 34% reduction in Mean Squared Error (MSE) when evaluated on the KIBA dataset. Conclusion: In conclusion, our results unequivocally demonstrate that augmenting DTA prediction with the inclusion of the protein's secondary structure as a localized feature yields significantly improved accuracy compared to relying solely on the primary structure.

背景：传统的药物发现方法通常具有过程冗长、成本高昂的特点。为了加快新药的发现，人工智能（AI）在预测药物与靶点结合亲和力（DTA）方面的整合已成为一种重要方法。尽管用于 DTA 预测的深度学习方法层出不穷，但其中许多方法主要集中于蛋白质的氨基酸序列。然而，药物化合物与靶点之间的相互作用发生在蛋白质结构的不同片段中，而主序列主要捕捉的是蛋白质的整体特征。因此，这种方法无法完全阐明药物与各自靶标之间错综复杂的关系。研究目的本研究旨在采用先进的深度学习技术预测 DTA，同时纳入蛋白质二级结构的相关信息。研究方法在我们的研究中，蛋白质的一级序列和二级结构都被用来表示蛋白质。一级序列是总体特征，二级结构则是局部特征。我们利用卷积神经网络和图神经网络对目标蛋白质和药物化合物的复杂特征进行独立建模。这种方法提高了我们更有效地捕捉药物与目标相互作用的能力：我们推出了一种预测 DTA 的新方法。与 DeepDTA 相比，我们的方法有了显著提高，在 KIBA 数据集上进行评估时，一致性指数 (CI) 提高了 3.9%，平均平方误差 (MSE) 显著降低了 34%。结论总之，我们的研究结果清楚地表明，通过将蛋白质的二级结构作为局部特征来增强 DTA 预测，与仅仅依赖一级结构相比，准确率有了显著提高。

{"title":"Enhancing Drug-Target Binding Affinity Prediction through Deep Learning and Protein Secondary Structure Integration","authors":"Runhua Zhang, Baozhong Zhu, Tengsheng Jiang, Zhiming Cui, Hongjie Wu","doi":"10.2174/0115748936285519240110070209","DOIUrl":"https://doi.org/10.2174/0115748936285519240110070209","url":null,"abstract":"Background: Conventional approaches to drug discovery are often characterized by lengthy and costly processes. To expedite the discovery of new drugs, the integration of artificial intelligence (AI) in predicting drug-target binding affinity (DTA) has emerged as a crucial approach. Despite the proliferation of deep learning methods for DTA prediction, many of these methods primarily concentrate on the amino acid sequence of proteins. Yet, the interactions between drug compounds and targets occur within distinct segments within the protein structures, whereas the primary sequence primarily captures global protein features. Consequently, it falls short of fully elucidating the intricate relationship between drugs and their respective targets. Objective: This study aims to employ advanced deep-learning techniques to forecast DTA while incorporating information about the secondary structure of proteins. Methods: In our research, both the primary sequence of protein and the secondary structure of protein were leveraged for protein representation. While the primary sequence played the role of the overarching feature, the secondary structure was employed as the localized feature. Convolutional neural networks and graph neural networks were utilized to independently model the intricate features of target proteins and drug compounds. This approach enhanced our ability to capture drugtarget interactions more effectively Results: We have introduced a novel method for predicting DTA. In comparison to DeepDTA, our approach demonstrates significant enhancements, achieving a 3.9% increase in the Concordance Index (CI) and a remarkable 34% reduction in Mean Squared Error (MSE) when evaluated on the KIBA dataset. Conclusion: In conclusion, our results unequivocally demonstrate that augmenting DTA prediction with the inclusion of the protein's secondary structure as a localized feature yields significantly improved accuracy compared to relying solely on the primary structure.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"24 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139759749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Inferring Gene Regulatory Networks from Single-Cell Time-Course Data Based on Temporal Convolutional Networks 基于时序卷积网络从单细胞时程数据推断基因调控网络

IF 4 3区生物学 Q3 BIOCHEMICAL RESEARCH METHODS

Current Bioinformatics

Pub Date : 2024-02-04 DOI: 10.2174/0115748936282613231211112920

Dayu Tan, Jing Wang, Zhaolong Cheng, Yansen Su, Chunhou Zheng

Objective: This work aims to infer causal relationships between genes and construct dynamic gene regulatory networks using time-course scRNA-seq data. Methods: We propose an analytical method for inferring GRNs from single-cell time-course data based on temporal convolutional networks (scTGRN), which provides a supervised learning approach to infer causal relationships among genes. scTGRN constructs a 4D tensor representing gene expression features for each gene pair, then inputs the constructed 4D tensor into the temporal convolutional network to train and infer the causal relationship between genes. Results: We validate the performance of scTGRN on five real datasets and four simulated datasets, and the experimental results show that scTGRN outperforms existing models in constructing GRNs. In addition, we test the performance of scTGRN on gene function assignment, and scTGRN outperforms other models. Conclusion: The analysis shows that scTGRN can not only accurately identify the causal relationship between genes, but also can be used to achieve gene function assignment.

目的：本研究旨在利用时序 scRNA-seq 数据推断基因之间的因果关系并构建动态基因调控网络。方法：我们提出了一种基于时序卷积网络（scTGRN）从单细胞时序数据中推断基因调控网络（GRN）的分析方法，该方法提供了一种监督学习方法来推断基因之间的因果关系。结果我们在五个真实数据集和四个模拟数据集上验证了 scTGRN 的性能，实验结果表明 scTGRN 在构建 GRN 方面优于现有模型。此外，我们还测试了 scTGRN 在基因功能分配方面的性能，结果表明 scTGRN 优于其他模型。结论分析表明，scTGRN 不仅能准确识别基因之间的因果关系，还能用于实现基因功能分配。

{"title":"Inferring Gene Regulatory Networks from Single-Cell Time-Course Data Based on Temporal Convolutional Networks","authors":"Dayu Tan, Jing Wang, Zhaolong Cheng, Yansen Su, Chunhou Zheng","doi":"10.2174/0115748936282613231211112920","DOIUrl":"https://doi.org/10.2174/0115748936282613231211112920","url":null,"abstract":"Objective: This work aims to infer causal relationships between genes and construct dynamic gene regulatory networks using time-course scRNA-seq data. Methods: We propose an analytical method for inferring GRNs from single-cell time-course data based on temporal convolutional networks (scTGRN), which provides a supervised learning approach to infer causal relationships among genes. scTGRN constructs a 4D tensor representing gene expression features for each gene pair, then inputs the constructed 4D tensor into the temporal convolutional network to train and infer the causal relationship between genes. Results: We validate the performance of scTGRN on five real datasets and four simulated datasets, and the experimental results show that scTGRN outperforms existing models in constructing GRNs. In addition, we test the performance of scTGRN on gene function assignment, and scTGRN outperforms other models. Conclusion: The analysis shows that scTGRN can not only accurately identify the causal relationship between genes, but also can be used to achieve gene function assignment.","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"40 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139689088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Transformer-based Named Entity Recognition for Clinical Cancer Drug Toxicity by Positive-unlabeled Learning and KL Regularizers 通过正向无标记学习和 KL 正则，基于变换器的临床癌症药物毒性命名实体识别技术

IF 4 3区生物学 Q3 BIOCHEMICAL RESEARCH METHODS

Current Bioinformatics

Pub Date : 2024-02-04 DOI: 10.2174/0115748936278299231213045441

Weixin Xie, Jiayu Xu, Chengkui Zhao, Jin Li, Shuangze Han, Tianyu Shao, Limei Wang, Weixing Feng

Background: With increasing rates of polypharmacy, the vigilant surveillance of clinical drug toxicity has emerged as an important concern. Named Entity Recognition (NER) stands as an indispensable undertaking, essential for the extraction of valuable insights regarding drug safety from the biomedical literature. In recent years, significant advancements have been achieved in the deep learning models on NER tasks. Nonetheless, the effectiveness of these NER techniques relies on the availability of substantial volumes of annotated data, which is labor-intensive and inefficient. background: With increasing rates of polypharmacy, clinical drug toxicity has been closely monitored. Named Entity Recognition (NER) is a vital task for extracting valuable drug safety information from biomedical literature. Recently, many deep learning models in biomedical domain have made great progress for NER, especially pre-trained language models. However, these NER methods require large amounts of high-quality manually annotated data with named entities, which is labor intensive and inefficient. Methods: This study introduces a novel approach that diverges from the conventional reliance on manually annotated data. It employs a transformer-based technique known as Positive-Unlabeled Learning (PULearning), which incorporates adaptive learning and is applied to the clinical cancer drug toxicity corpus. To improve the precision of prediction, we employ relative position embeddings within the transformer encoder. Additionally, we formulate a composite loss function that integrates two Kullback-Leibler (KL) regularizers to align with PULearning assumptions. The outcomes demonstrate that our approach attains the targeted performance for NER tasks, solely relying on unlabeled data and named entity dictionaries. objective: To improve the performance of prediction Conclusion: Our model achieves an overall NER performance with an F1 of 0.819. Specifically, it attains F1 of 0.841, 0.801 and 0.815 for DRUG, CANCER, and TOXI entities, respectively. A comprehensive analysis of the results validates the effectiveness of our approach in comparison to existing PULearning methods on biomedical NER tasks. Additionally, a visualization of the associations among three identified entities is provided, offering a valuable reference for querying their interrelationships. method: In this work, instead of relying on the manually labeled data, a transformer-based Positive-Unlabeled Learning (PULearning) is proposed with adaptive learning and applied on the clinical cancer drug toxicity corpus. To improve the precision of prediction, relative position embeddings are used in transformer encoder. And then, a mixed loss is designed with two Kullback-Leibler (KL) regularizers for PULearning assumptions. Through adaptive sampling, our approach meets the expected performance for NER task only using unlabeled data and named entity dictionaries. result: The overall NER performance of our model obtains 0

背景：随着多种药物使用率的增加，对临床药物毒性的警惕性监测已成为一个重要的关注点。命名实体识别（NER）是一项不可或缺的工作，对于从生物医学文献中提取有关药物安全性的宝贵见解至关重要。近年来，深度学习模型在 NER 任务方面取得了重大进展。然而，这些核糖核酸（NER）技术的有效性依赖于大量注释数据的可用性，而这是一项劳动密集型且效率低下的工作：随着多种药物的使用率越来越高，临床药物毒性受到了密切关注。命名实体识别（NER）是从生物医学文献中提取有价值的药物安全性信息的重要任务。最近，生物医学领域的许多深度学习模型在 NER 方面取得了很大进展，尤其是预训练语言模型。然而，这些 NER 方法需要大量高质量的人工标注命名实体的数据，劳动强度大且效率低。方法本研究引入了一种新方法，与传统的依赖人工标注数据的方法不同。它采用了一种被称为正向无标注学习（PULearning）的基于转换器的技术，该技术结合了自适应学习，并应用于临床癌症药物毒性语料库。为了提高预测精度，我们在变换器编码器中采用了相对位置嵌入技术。此外，我们还制定了一个复合损失函数，其中整合了两个库尔巴克-莱伯勒（KL）正则，以符合 PULearning 假设。结果表明，我们的方法仅依靠未标注数据和命名实体字典就能实现 NER 任务的目标性能：提高预测性能结论：我们的模型实现了整体 NER 性能的提高：我们的模型实现了整体 NER 性能，F1 为 0.819。具体来说，它对 DRUG、CANCER 和 TOXI 实体的 F1 分别为 0.841、0.801 和 0.815。对结果的综合分析验证了我们的方法与现有的 PULearning 方法相比在生物医学 NER 任务中的有效性。此外，我们还提供了三个已识别实体之间关联的可视化方法，为查询它们之间的相互关系提供了有价值的参考：在这项工作中，我们提出了一种基于转换器的正向无标注学习（PULearning）方法，并将其应用于临床癌症药物毒性语料库。为了提高预测精度，在变换器编码器中使用了相对位置嵌入。然后，针对 PULearning 假设，设计了带有两个 Kullback-Leibler (KL) 正则的混合损失。通过自适应采样，我们的方法仅在使用未标记数据和命名实体词典的情况下就达到了 NER 任务的预期性能：我们模型的总体 NER 性能获得了 0.819 的 F1 分数，而在 DRUG、CANCER 和 TOXI 上的 F1 分数分别为 0.841、0.801 和 0.815：无

{"title":"Transformer-based Named Entity Recognition for Clinical Cancer Drug Toxicity by Positive-unlabeled Learning and KL Regularizers","authors":"Weixin Xie, Jiayu Xu, Chengkui Zhao, Jin Li, Shuangze Han, Tianyu Shao, Limei Wang, Weixing Feng","doi":"10.2174/0115748936278299231213045441","DOIUrl":"https://doi.org/10.2174/0115748936278299231213045441","url":null,"abstract":"Background: With increasing rates of polypharmacy, the vigilant surveillance of clinical drug toxicity has emerged as an important concern. Named Entity Recognition (NER) stands as an indispensable undertaking, essential for the extraction of valuable insights regarding drug safety from the biomedical literature. In recent years, significant advancements have been achieved in the deep learning models on NER tasks. Nonetheless, the effectiveness of these NER techniques relies on the availability of substantial volumes of annotated data, which is labor-intensive and inefficient. background: With increasing rates of polypharmacy, clinical drug toxicity has been closely monitored. Named Entity Recognition (NER) is a vital task for extracting valuable drug safety information from biomedical literature. Recently, many deep learning models in biomedical domain have made great progress for NER, especially pre-trained language models. However, these NER methods require large amounts of high-quality manually annotated data with named entities, which is labor intensive and inefficient. Methods: This study introduces a novel approach that diverges from the conventional reliance on manually annotated data. It employs a transformer-based technique known as Positive-Unlabeled Learning (PULearning), which incorporates adaptive learning and is applied to the clinical cancer drug toxicity corpus. To improve the precision of prediction, we employ relative position embeddings within the transformer encoder. Additionally, we formulate a composite loss function that integrates two Kullback-Leibler (KL) regularizers to align with PULearning assumptions. The outcomes demonstrate that our approach attains the targeted performance for NER tasks, solely relying on unlabeled data and named entity dictionaries. objective: To improve the performance of prediction Conclusion: Our model achieves an overall NER performance with an F1 of 0.819. Specifically, it attains F1 of 0.841, 0.801 and 0.815 for DRUG, CANCER, and TOXI entities, respectively. A comprehensive analysis of the results validates the effectiveness of our approach in comparison to existing PULearning methods on biomedical NER tasks. Additionally, a visualization of the associations among three identified entities is provided, offering a valuable reference for querying their interrelationships. method: In this work, instead of relying on the manually labeled data, a transformer-based Positive-Unlabeled Learning (PULearning) is proposed with adaptive learning and applied on the clinical cancer drug toxicity corpus. To improve the precision of prediction, relative position embeddings are used in transformer encoder. And then, a mixed loss is designed with two Kullback-Leibler (KL) regularizers for PULearning assumptions. Through adaptive sampling, our approach meets the expected performance for NER task only using unlabeled data and named entity dictionaries. result: The overall NER performance of our model obtains 0","PeriodicalId":10801,"journal":{"name":"Current Bioinformatics","volume":"35 1","pages":""},"PeriodicalIF":4.0,"publicationDate":"2024-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139688945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0