首页 > 最新文献

IEEE/ACM Transactions on Computational Biology and Bioinformatics最新文献

英文 中文
Calculation of the Weight of Evidence for Combined Single-Cell and Extracellular Forensic DNA. 计算单细胞和细胞外法医 DNA 的综合证据权重。
IF 3.6 3区 生物学 Q1 Mathematics Pub Date : 2024-06-19 DOI: 10.1109/TCBB.2024.3416877
Desmond S Lun, Catherine M Grgicak

The weight of DNA evidence for forensic applications is typically assessed through the calculation of the likelihood ratio (LR). In the standard workflow, DNA is extracted from a collection of cells where the cells of an unknown number of donors are mixed. The DNA is then genotyped, and the LR is calculated through well-established methods. Recently, a method for calculating the LR from single-cell data has been presented. Rather than extracting the DNA while the cells are still mixed, single-cell data is procured by first isolating each cell. Extraction and fragment analysis of relevant forensic loci follows such that individual cells are genotyped. This workflow leads to significantly stronger weights of evidence, but it does not account for extracellular DNA that could also be present in the sample. In this paper, we present a method for calculation of an LR that combines single-cell and extracellular data. We demonstrate the calculation on example data and show that the combined LR can lead to stronger conclusions than would be obtained from calculating LRs on the single-cell and extracellular DNA separately.

法医应用中 DNA 证据的权重通常通过计算似然比 (LR) 来评估。在标准工作流程中,DNA 从细胞集合中提取,其中混合了未知数量供体的细胞。然后对 DNA 进行基因分型,并通过成熟的方法计算 LR。最近,有人提出了一种从单细胞数据计算 LR 的方法。这种方法不是在细胞仍处于混合状态时提取 DNA,而是先分离每个细胞,然后获取单细胞数据。随后对相关法医位点进行提取和片段分析,从而对单个细胞进行基因分型。这种工作流程可大大提高证据的权重,但却无法考虑样本中可能存在的细胞外 DNA。本文介绍了一种结合单细胞和细胞外数据计算 LR 的方法。我们在实例数据上演示了计算方法,结果表明,与分别计算单细胞和细胞外 DNA 的 LR 相比,综合 LR 能得出更有力的结论。
{"title":"Calculation of the Weight of Evidence for Combined Single-Cell and Extracellular Forensic DNA.","authors":"Desmond S Lun, Catherine M Grgicak","doi":"10.1109/TCBB.2024.3416877","DOIUrl":"10.1109/TCBB.2024.3416877","url":null,"abstract":"<p><p>The weight of DNA evidence for forensic applications is typically assessed through the calculation of the likelihood ratio (LR). In the standard workflow, DNA is extracted from a collection of cells where the cells of an unknown number of donors are mixed. The DNA is then genotyped, and the LR is calculated through well-established methods. Recently, a method for calculating the LR from single-cell data has been presented. Rather than extracting the DNA while the cells are still mixed, single-cell data is procured by first isolating each cell. Extraction and fragment analysis of relevant forensic loci follows such that individual cells are genotyped. This workflow leads to significantly stronger weights of evidence, but it does not account for extracellular DNA that could also be present in the sample. In this paper, we present a method for calculation of an LR that combines single-cell and extracellular data. We demonstrate the calculation on example data and show that the combined LR can lead to stronger conclusions than would be obtained from calculating LRs on the single-cell and extracellular DNA separately.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141426798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Intra-Inter Graph Representation Learning for Protein-Protein Binding Sites Prediction. 用于蛋白质-蛋白质结合位点预测的内部图表示学习。
IF 3.6 3区 生物学 Q1 Mathematics Pub Date : 2024-06-19 DOI: 10.1109/TCBB.2024.3416341
Wenting Zhao, Gongping Xu, Long Wang, Zhen Cui, Tong Zhang, Jian Yang

Graph neural networks have drawn increasing attention and achieved remarkable progress recently due to their potential applications for a large amount of irregular data. It is a natural way to represent protein as a graph. In this work, we focus on protein-protein binding sites prediction between the ligand and receptor proteins. Previous work just simply adopts graph convolution to learn residue representations of ligand and receptor proteins, then concatenates them and feeds the concatenated representation into a fully connected layer to make predictions, losing much of the information contained in complexes and failing to obtain an optimal prediction. In this paper, we present Intra-Inter Graph Representation Learning for protein-protein binding sites prediction (IIGRL). Specifically, for intra-graph learning, we maximize the mutual information between local node representation and global graph summary to encourage node representation to embody the global information of protein graph. Then we explore fusing two separate ligand and receptor graphs as a whole graph and learning affinities between their residues/nodes to propagate information to each other, which could effectively capture inter-protein information and further enhance the discrimination of residue pairs. Extensive experiments on multiple benchmarks demonstrate that the proposed IIGRL model outperforms state-of-the-art methods.

图神经网络因其在大量不规则数据中的潜在应用而日益受到关注,并在最近取得了显著进展。将蛋白质表示为图是一种自然的方法。在这项工作中,我们主要研究配体和受体蛋白之间的蛋白结合位点预测。以往的工作只是简单地采用图卷积来学习配体和受体蛋白的残基表示,然后将它们连接起来,并将连接后的表示送入全连接层进行预测,这样会丢失很多复合物所包含的信息,也无法获得最佳预测结果。在本文中,我们提出了用于蛋白质-蛋白质结合位点预测的图内表征学习(IGRL)。具体来说,在图内学习中,我们最大化局部节点表示与全局图摘要之间的互信息,鼓励节点表示体现蛋白质图的全局信息。然后,我们探索将两个独立的配体和受体图融合为一个整体图,并学习其残基/节点之间的亲和性,从而将信息传播给对方,这可以有效捕捉蛋白质间的信息,进一步提高残基对的辨别能力。在多个基准上进行的广泛实验证明,所提出的 IIGRL 模型优于最先进的方法。
{"title":"Intra-Inter Graph Representation Learning for Protein-Protein Binding Sites Prediction.","authors":"Wenting Zhao, Gongping Xu, Long Wang, Zhen Cui, Tong Zhang, Jian Yang","doi":"10.1109/TCBB.2024.3416341","DOIUrl":"10.1109/TCBB.2024.3416341","url":null,"abstract":"<p><p>Graph neural networks have drawn increasing attention and achieved remarkable progress recently due to their potential applications for a large amount of irregular data. It is a natural way to represent protein as a graph. In this work, we focus on protein-protein binding sites prediction between the ligand and receptor proteins. Previous work just simply adopts graph convolution to learn residue representations of ligand and receptor proteins, then concatenates them and feeds the concatenated representation into a fully connected layer to make predictions, losing much of the information contained in complexes and failing to obtain an optimal prediction. In this paper, we present Intra-Inter Graph Representation Learning for protein-protein binding sites prediction (IIGRL). Specifically, for intra-graph learning, we maximize the mutual information between local node representation and global graph summary to encourage node representation to embody the global information of protein graph. Then we explore fusing two separate ligand and receptor graphs as a whole graph and learning affinities between their residues/nodes to propagate information to each other, which could effectively capture inter-protein information and further enhance the discrimination of residue pairs. Extensive experiments on multiple benchmarks demonstrate that the proposed IIGRL model outperforms state-of-the-art methods.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141426799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SAGCN: Using graph convolutional network with subgraph-aware for circRNA-drug sensitivity identification. SAGCN:使用具有子图感知功能的图卷积网络进行 circRNA 药物敏感性识别。
IF 3.6 3区 生物学 Q1 Mathematics Pub Date : 2024-06-17 DOI: 10.1109/TCBB.2024.3415058
Weicheng Sun, Chengjuan Ren, Jinsheng Xu, Ping Zhang

Circular RNAs (circRNAs) play a significant role in cancer development and therapy resistance. There is substantial evidence indicating that the expression of circRNAs affects the sensitivity of cells to drugs. Identifying circRNAs-drug sensitivity association (CDA) is helpful for disease treatment and drug discovery. However, the identification of CDA through conventional biological experiments is both time-consuming and costly. Therefore, it is urgent to develop computational methods to predict CDA. In this study, we propose a new computational method, the subgraph-aware graph convolutional network (SAGCN), for predicting CDA. SAGCN first construct a heterogeneous network composed of circRNA similarity network, drug similarity network, and circRNA-drug bipartite network. Then, a subgraph extractor is proposed to learn the latent subgraph structure of the heterogeneous network using a graph convolutional network. The extractor can capture 1-hop and 2-hop information and then a fusing attention mechanism is designed to integrate them adaptively. Simultaneously, a novel subgraph-aware attention mechanism is proposed to detect intrinsic subgraph structure. The final node feature representation is obtained to make the CDA prediction. Experimental results demonstrate that SAGCN obtained an average AUC of 0.9120 and AUPR of 0.8693, exceeding the performance of the most advanced models under 10-fold cross-validation. Case studies have demonstrated the potential of SAGCN in identifying associations between circRNA and drug sensitivity.

环状 RNA(circRNA)在癌症发展和抗药性方面发挥着重要作用。大量证据表明,circRNAs 的表达会影响细胞对药物的敏感性。鉴定 circRNAs-药物敏感性关联(CDA)有助于疾病治疗和药物发现。然而,通过传统生物学实验鉴定 CDA 既费时又费钱。因此,开发预测 CDA 的计算方法迫在眉睫。在本研究中,我们提出了一种预测 CDA 的新计算方法--子图感知图卷积网络(SAGCN)。SAGCN 首先构建一个由 circRNA 相似性网络、药物相似性网络和 circRNA-药物二元网络组成的异构网络。然后,提出一种子图提取器,利用图卷积网络学习异构网络的潜在子图结构。该提取器可以捕捉 1 跳和 2 跳信息,然后设计了一种融合关注机制来自适应地整合这些信息。同时,还提出了一种新颖的子图感知关注机制来检测内在的子图结构。最终得到的节点特征表示可用于 CDA 预测。实验结果表明,SAGCN 的平均 AUC 为 0.9120,AUPR 为 0.8693,超过了 10 倍交叉验证下最先进模型的性能。案例研究证明了 SAGCN 在识别 circRNA 与药物敏感性之间关联方面的潜力。
{"title":"SAGCN: Using graph convolutional network with subgraph-aware for circRNA-drug sensitivity identification.","authors":"Weicheng Sun, Chengjuan Ren, Jinsheng Xu, Ping Zhang","doi":"10.1109/TCBB.2024.3415058","DOIUrl":"10.1109/TCBB.2024.3415058","url":null,"abstract":"<p><p>Circular RNAs (circRNAs) play a significant role in cancer development and therapy resistance. There is substantial evidence indicating that the expression of circRNAs affects the sensitivity of cells to drugs. Identifying circRNAs-drug sensitivity association (CDA) is helpful for disease treatment and drug discovery. However, the identification of CDA through conventional biological experiments is both time-consuming and costly. Therefore, it is urgent to develop computational methods to predict CDA. In this study, we propose a new computational method, the subgraph-aware graph convolutional network (SAGCN), for predicting CDA. SAGCN first construct a heterogeneous network composed of circRNA similarity network, drug similarity network, and circRNA-drug bipartite network. Then, a subgraph extractor is proposed to learn the latent subgraph structure of the heterogeneous network using a graph convolutional network. The extractor can capture 1-hop and 2-hop information and then a fusing attention mechanism is designed to integrate them adaptively. Simultaneously, a novel subgraph-aware attention mechanism is proposed to detect intrinsic subgraph structure. The final node feature representation is obtained to make the CDA prediction. Experimental results demonstrate that SAGCN obtained an average AUC of 0.9120 and AUPR of 0.8693, exceeding the performance of the most advanced models under 10-fold cross-validation. Case studies have demonstrated the potential of SAGCN in identifying associations between circRNA and drug sensitivity.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141418756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recursive Self-Composite Approach Towards Structural Understanding of Boolean Networks. 实现布尔网络结构理解的递归自复合方法
IF 3.6 3区 生物学 Q1 Mathematics Pub Date : 2024-06-17 DOI: 10.1109/TCBB.2024.3415352
Jongrae Kim, Woojeong Lee, Kwang-Hyun Cho

Boolean networks have been widely used in systems biology to study the dynamical characteristics of biological networks such as steady-states or cycles, yet there has been little attention to the dynamic properties of network structures. Here, we systematically reveal the core network structures using a recursive self-composite of the logic update rules. We find that all Boolean update rules exhibit repeated cyclic logic structures, where each converged logic leads to the same states, defined as kernel states. Consequently, the period of state cycles is upper bounded by the number of logics in the converged logic cycle. In order to uncover the underlying dynamical characteristics by exploiting the repeating structures, we propose leaping and filling algorithms. The algorithms provide a way to avoid large string explosions during the self-composition procedures. Finally, we present three examples-a simple network with a long feedback structure, a T-cell receptor network and a cancer network-to demonstrate the usefulness of the proposed algorithm.

布尔网络在系统生物学中被广泛用于研究生物网络的动态特性,如稳态或循环,但人们很少关注网络结构的动态特性。在这里,我们利用逻辑更新规则的递归自复合系统地揭示了核心网络结构。我们发现,所有布尔更新规则都表现出重复循环的逻辑结构,其中每个收敛逻辑都会导致相同的状态,定义为内核状态。因此,状态循环周期的上限是收敛逻辑循环中的逻辑数。为了利用重复结构揭示潜在的动态特性,我们提出了跃迁和填充算法。这些算法提供了一种在自组合过程中避免大字符串爆炸的方法。最后,我们举了三个例子--具有长反馈结构的简单网络、T 细胞受体网络和癌症网络--来证明所提算法的实用性。
{"title":"Recursive Self-Composite Approach Towards Structural Understanding of Boolean Networks.","authors":"Jongrae Kim, Woojeong Lee, Kwang-Hyun Cho","doi":"10.1109/TCBB.2024.3415352","DOIUrl":"10.1109/TCBB.2024.3415352","url":null,"abstract":"<p><p>Boolean networks have been widely used in systems biology to study the dynamical characteristics of biological networks such as steady-states or cycles, yet there has been little attention to the dynamic properties of network structures. Here, we systematically reveal the core network structures using a recursive self-composite of the logic update rules. We find that all Boolean update rules exhibit repeated cyclic logic structures, where each converged logic leads to the same states, defined as kernel states. Consequently, the period of state cycles is upper bounded by the number of logics in the converged logic cycle. In order to uncover the underlying dynamical characteristics by exploiting the repeating structures, we propose leaping and filling algorithms. The algorithms provide a way to avoid large string explosions during the self-composition procedures. Finally, we present three examples-a simple network with a long feedback structure, a T-cell receptor network and a cancer network-to demonstrate the usefulness of the proposed algorithm.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141418755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SIG: Graph-Based Cancer Subtype Stratification With Gene Mutation Structural Information. SIG:利用基因突变结构信息进行基于图谱的癌症亚型分层。
IF 3.6 3区 生物学 Q1 Mathematics Pub Date : 2024-06-14 DOI: 10.1109/TCBB.2024.3414498
Chengcheng Zhang, Wei Li, Ming Deng, Yizhang Jiang, Xiaohui Cui, Ping Chen

Somatic tumors have a high-dimensional, sparse, and small sample size nature, making cancer subtype stratification based on somatic genomic data a challenge. Current methods for improving cancer clustering performance focus on dimension reduction, integrating multi-omics data, or generating realistic samples, yet ignore the associations between mutated genes within the patient-gene matrix. We refer to these associations as gene mutation structural information, which implicitly includes cancer subtype information and can enhance subtype clustering. We introduce a novel method for cancer subtype clustering called SIG(Structural Information within Graph). As cancer is driven by a combination of genes, we establish associations between mutated genes within the same patient sample, pair by pair, and use a graph to represent them. An association between two mutated genes corresponds to an edge in the graph. We then merge these associations among all mutated genes to obtain a structural information graph, which enriches the gene network and improves its relevance to cancer clustering. We integrate the somatic tumor genome with the enriched gene network and propagate it to cluster patients with mutations in similar network regions. Our method achieves superior clustering performance compared to SOTA methods, as demonstrated by clustering experiments on ovarian and LUAD datasets. The code is available at https://github.com/ChangSIG/SIG.git.

体细胞肿瘤具有高维、稀疏和样本量小的特点,因此基于体细胞基因组数据进行癌症亚型分层是一项挑战。目前提高癌症聚类性能的方法主要集中在降维、整合多组学数据或生成真实样本等方面,但却忽略了患者-基因矩阵中突变基因之间的关联。我们将这些关联称为基因突变结构信息,其中隐含了癌症亚型信息,可以增强亚型聚类。我们引入了一种新的癌症亚型聚类方法,称为 SIG(图内结构信息)。由于癌症是由基因组合驱动的,因此我们在同一患者样本中逐一建立突变基因之间的关联,并用图来表示它们。两个突变基因之间的关联对应于图中的一条边。然后,我们合并所有突变基因之间的关联,得到一个结构信息图,从而丰富基因网络,提高其与癌症聚类的相关性。我们将体细胞肿瘤基因组与丰富的基因网络整合在一起,并将其传播到相似网络区域的突变患者群中。与 SOTA 方法相比,我们的方法实现了更优越的聚类性能,卵巢和 LUAD 数据集的聚类实验证明了这一点。代码见 https://github.com/ChangSIG/SIG.git。
{"title":"SIG: Graph-Based Cancer Subtype Stratification With Gene Mutation Structural Information.","authors":"Chengcheng Zhang, Wei Li, Ming Deng, Yizhang Jiang, Xiaohui Cui, Ping Chen","doi":"10.1109/TCBB.2024.3414498","DOIUrl":"10.1109/TCBB.2024.3414498","url":null,"abstract":"<p><p>Somatic tumors have a high-dimensional, sparse, and small sample size nature, making cancer subtype stratification based on somatic genomic data a challenge. Current methods for improving cancer clustering performance focus on dimension reduction, integrating multi-omics data, or generating realistic samples, yet ignore the associations between mutated genes within the patient-gene matrix. We refer to these associations as gene mutation structural information, which implicitly includes cancer subtype information and can enhance subtype clustering. We introduce a novel method for cancer subtype clustering called SIG(Structural Information within Graph). As cancer is driven by a combination of genes, we establish associations between mutated genes within the same patient sample, pair by pair, and use a graph to represent them. An association between two mutated genes corresponds to an edge in the graph. We then merge these associations among all mutated genes to obtain a structural information graph, which enriches the gene network and improves its relevance to cancer clustering. We integrate the somatic tumor genome with the enriched gene network and propagate it to cluster patients with mutations in similar network regions. Our method achieves superior clustering performance compared to SOTA methods, as demonstrated by clustering experiments on ovarian and LUAD datasets. The code is available at https://github.com/ChangSIG/SIG.git.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141320781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PRFold-TNN: Protein Fold Recognition With an Ensemble Feature Selection Method Using PageRank Algorithm Based on Transformer. PRFold-TNN:使用基于变换器的 PageRank 算法的集合特征选择方法识别蛋白质折叠。
IF 3.6 3区 生物学 Q1 Mathematics Pub Date : 2024-06-14 DOI: 10.1109/TCBB.2024.3414497
Xinyi Qin, Lu Zhang, Min Liu, Guangzhong Liu

Understanding the tertiary structures of proteins is of great benefit to function in many aspects of human life. Protein fold recognition is a vital and salient means to know protein structure. Until now, researchers have successively proposed a variety of methods to realize protein fold recognition, but the novel and effective computational method is still needed to handle this problem with the continuous updating of protein structure databases. In this study, we develop a new protein structure dataset named AT and propose the PRFold-TNN model for protein fold recognition. Firstly, different types of feature extraction methods including AAC, HMM, HMM-Bigram and ACC are selected to extract corresponding features for protein sequences. Then an ensemble feature selection method based on PageRank algorithm integrating various tree-based algorithms is used to screen the fusion features. Ultimately, the classifier based on the Transformer model achieves the final prediction. Experiments show that the prediction accuracy is 86.27% on the AT dataset and 88.91% on the independent test set, indicating that the model can demonstrate superior performance and generalization ability in the problem of protein fold recognition. Furthermore, we also carry out research on the DD, EDD and TG benchmark datasets, and make them achieve prediction accuracy of 88.41%, 97.91% and 95.16%, which are at least 3.0%, 0.8% and 2.5% higher than those of the state-of-the-art methods. It can be concluded that the PRFold-TNN model is more prominent.

了解蛋白质的三级结构对人类生活中许多方面的功能都大有裨益。蛋白质折叠识别是了解蛋白质结构的重要手段。迄今为止,研究人员已经相继提出了多种实现蛋白质折叠识别的方法,但随着蛋白质结构数据库的不断更新,仍然需要新颖有效的计算方法来处理这一问题。在本研究中,我们建立了一个名为 AT 的新蛋白质结构数据集,并提出了用于蛋白质折叠识别的 PRFold-TNN 模型。首先,我们选择了不同类型的特征提取方法,包括 AAC、HMM、HMM-Bigram 和 ACC,以提取蛋白质序列的相应特征。然后,使用基于 PageRank 算法的集合特征选择方法来筛选融合特征。最终,基于 Transformer 模型的分类器实现了最终预测。实验结果表明,该模型在 AT 数据集上的预测准确率为 86.27%,在独立测试集上的预测准确率为 88.91%,表明该模型在蛋白质折叠识别问题上表现出了卓越的性能和泛化能力。此外,我们还对 DD、EDD 和 TG 基准数据集进行了研究,使它们的预测准确率分别达到 88.41%、97.91% 和 95.16%,比最先进方法的预测准确率至少高出 3.0%、0.8% 和 2.5%。由此可见,PRFold-TNN 模型的优势更为突出。
{"title":"PRFold-TNN: Protein Fold Recognition With an Ensemble Feature Selection Method Using PageRank Algorithm Based on Transformer.","authors":"Xinyi Qin, Lu Zhang, Min Liu, Guangzhong Liu","doi":"10.1109/TCBB.2024.3414497","DOIUrl":"10.1109/TCBB.2024.3414497","url":null,"abstract":"<p><p>Understanding the tertiary structures of proteins is of great benefit to function in many aspects of human life. Protein fold recognition is a vital and salient means to know protein structure. Until now, researchers have successively proposed a variety of methods to realize protein fold recognition, but the novel and effective computational method is still needed to handle this problem with the continuous updating of protein structure databases. In this study, we develop a new protein structure dataset named AT and propose the PRFold-TNN model for protein fold recognition. Firstly, different types of feature extraction methods including AAC, HMM, HMM-Bigram and ACC are selected to extract corresponding features for protein sequences. Then an ensemble feature selection method based on PageRank algorithm integrating various tree-based algorithms is used to screen the fusion features. Ultimately, the classifier based on the Transformer model achieves the final prediction. Experiments show that the prediction accuracy is 86.27% on the AT dataset and 88.91% on the independent test set, indicating that the model can demonstrate superior performance and generalization ability in the problem of protein fold recognition. Furthermore, we also carry out research on the DD, EDD and TG benchmark datasets, and make them achieve prediction accuracy of 88.41%, 97.91% and 95.16%, which are at least 3.0%, 0.8% and 2.5% higher than those of the state-of-the-art methods. It can be concluded that the PRFold-TNN model is more prominent.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141320780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Novel Antimicrobial Peptide Design Using Motif Match Score Representation. 利用动机匹配得分表示法设计新型抗菌肽。
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-06-12 DOI: 10.1109/TCBB.2024.3413021
Ummu Gulsum Soylemez, Malik Yousef, Zulal Kesmen, Burcu Bakir-Gungor

Antimicrobial peptides (AMPs) have drawn the interest of the researchers since they offer an alternative to the traditional antibiotics in the fight against antibiotic resistance and they exhibit additional pharmaceutically significant properties. Recently, computational approaches attemp to reveal how antibacterial activity is determined from a machine learning perspective and they aim to search and find the biological cues or characteristics that control antimicrobial activity via incorporating motif match scores. This study is dedicated to the development of a machine learning framework aimed at devising novel antimicrobial peptide (AMP) sequences potentially effective against Gram-positive /Gram-negative bacteria. In order to design newly generated sequences classified as either AMP or non-AMP, various classification models were trained. These novel sequences underwent validation utilizingthe "DBAASP:strain-specific antibacterial prediction based on machine learning approaches and data on AMP sequences" tool. The findings presented herein represent a significant stride in this computational research, streamlining the process of AMP creation or modification within wet lab environments.

抗菌肽(AMPs)引起了研究人员的兴趣,因为在抗击抗生素耐药性的斗争中,抗菌肽是传统抗生素的替代品,而且它们还具有其他重要的药学特性。最近,计算方法试图从机器学习的角度揭示抗菌活性是如何确定的,其目的是通过结合主题匹配得分来搜索和发现控制抗菌活性的生物线索或特征。本研究致力于开发一种机器学习框架,旨在设计出可能对革兰氏阳性/革兰氏阴性细菌有效的新型抗菌肽(AMP)序列。为了设计出新生成的序列,将其分类为 AMP 或非 AMP,对各种分类模型进行了训练。这些新序列利用 "DBAASP:基于机器学习方法和 AMP 序列数据的菌株特异性抗菌预测 "工具进行了验证。本文介绍的研究结果标志着这一计算研究取得了重大进展,简化了在湿实验室环境中创建或修改 AMP 的过程。
{"title":"Novel Antimicrobial Peptide Design Using Motif Match Score Representation.","authors":"Ummu Gulsum Soylemez, Malik Yousef, Zulal Kesmen, Burcu Bakir-Gungor","doi":"10.1109/TCBB.2024.3413021","DOIUrl":"10.1109/TCBB.2024.3413021","url":null,"abstract":"<p><p>Antimicrobial peptides (AMPs) have drawn the interest of the researchers since they offer an alternative to the traditional antibiotics in the fight against antibiotic resistance and they exhibit additional pharmaceutically significant properties. Recently, computational approaches attemp to reveal how antibacterial activity is determined from a machine learning perspective and they aim to search and find the biological cues or characteristics that control antimicrobial activity via incorporating motif match scores. This study is dedicated to the development of a machine learning framework aimed at devising novel antimicrobial peptide (AMP) sequences potentially effective against Gram-positive /Gram-negative bacteria. In order to design newly generated sequences classified as either AMP or non-AMP, various classification models were trained. These novel sequences underwent validation utilizingthe \"DBAASP:strain-specific antibacterial prediction based on machine learning approaches and data on AMP sequences\" tool. The findings presented herein represent a significant stride in this computational research, streamlining the process of AMP creation or modification within wet lab environments.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141310619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distantly Supervised Biomedical Relation Extraction Via Negative Learning and Noisy Student Self-Training. 通过负向学习和噪声学生自我训练实现远程监督生物医学关系提取
IF 3.6 3区 生物学 Q1 Mathematics Pub Date : 2024-06-11 DOI: 10.1109/TCBB.2024.3412174
Yuanfei Dai, Bin Zhang, Shiping Wang

Biomedical relation extraction aims to identify underlying relationships among entities, such as gene associations and drug interactions, within biomedical texts. Despite advancements in relation extraction in general knowledge domains, the scarcity of labeled training data remains a significant challenge in the biomedical field. This paper provides a novel approach for biomedical relation extraction that leverages a noisy student self-training strategy combined with negative learning. This method addresses the challenge of data insufficiency by utilizing distantly supervised data to generate high-quality labeled samples. Negative learning, as opposed to traditional positive learning, offers a more robust mechanism to discern and relabel noisy samples, preventing model overfitting. The integration of these techniques ensures enhanced noise reduction and relabeling capabilities, leading to improved performance even with noisy datasets. Experimental results demonstrate the effectiveness of the proposed framework in mitigating the impact of noisy data and outperforming existing benchmarks.

生物医学关系提取旨在识别生物医学文本中实体之间的潜在关系,如基因关联和药物相互作用。尽管在一般知识领域的关系提取方面取得了进展,但在生物医学领域,标注训练数据的稀缺性仍然是一个重大挑战。本文提供了一种新颖的生物医学关系提取方法,该方法利用噪声学生自我训练策略与负向学习相结合。该方法利用远距离监督数据生成高质量的标记样本,从而解决了数据不足的难题。与传统的正向学习相比,负向学习提供了一种更稳健的机制来识别和重新标记噪声样本,从而防止模型过拟合。这些技术的整合确保了更强的降噪和重新标注能力,从而提高了即使在高噪声数据集下的性能。实验结果表明,所提出的框架能有效减轻噪声数据的影响,并超越现有基准。
{"title":"Distantly Supervised Biomedical Relation Extraction Via Negative Learning and Noisy Student Self-Training.","authors":"Yuanfei Dai, Bin Zhang, Shiping Wang","doi":"10.1109/TCBB.2024.3412174","DOIUrl":"10.1109/TCBB.2024.3412174","url":null,"abstract":"<p><p>Biomedical relation extraction aims to identify underlying relationships among entities, such as gene associations and drug interactions, within biomedical texts. Despite advancements in relation extraction in general knowledge domains, the scarcity of labeled training data remains a significant challenge in the biomedical field. This paper provides a novel approach for biomedical relation extraction that leverages a noisy student self-training strategy combined with negative learning. This method addresses the challenge of data insufficiency by utilizing distantly supervised data to generate high-quality labeled samples. Negative learning, as opposed to traditional positive learning, offers a more robust mechanism to discern and relabel noisy samples, preventing model overfitting. The integration of these techniques ensures enhanced noise reduction and relabeling capabilities, leading to improved performance even with noisy datasets. Experimental results demonstrate the effectiveness of the proposed framework in mitigating the impact of noisy data and outperforming existing benchmarks.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141305853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of Inter-residue Multiple Distances and Exploration of Protein Multiple Conformations by Deep Learning. 通过深度学习预测残基间多重距离并探索蛋白质的多重构象。
IF 3.6 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-06-10 DOI: 10.1109/TCBB.2024.3411825
Fujin Zhang, Zhangwei Li, Kailong Zhao, Pengxin Zhao, Guijun Zhang

AlphaFold2 has achieved a major breakthrough in end-to-end prediction for static protein structures. However, protein conformational change is considered to be a key factor in protein biological function. Inter-residue multiple distances prediction is of great significance for research on protein multiple conformations exploration. In this study, we proposed an inter-residue multiple distances prediction method, DeepMDisPre, based on an improved network which integrates triangle update, axial attention and ResNet to predict multiple distances of residue pairs. We built a dataset which contains proteins with a single structure and proteins with multiple conformations to train the network. We tested DeepMDisPre on 114 proteins with multiple conformations. The results show that the inter-residue distance distribution predicted by DeepMDisPre tends to have multiple peaks for flexible residue pairs than for rigid residue pairs. On two cases of proteins with multiple conformations, we modeled the multiple conformations relatively accurately by using the predicted inter-residue multiple distances. In addition, we also tested the performance of DeepMDisPre on 279 proteins with a single structure. Experimental results demonstrate that the average contact accuracy of DeepMDisPre is higher than that of the comparative method. In terms of static protein modeling, the average TM-score of the 3D models built by DeepMDisPre is also improved compared with the comparative method. The executable program is freely available at https://github.com/iobio-zjut/DeepMDisPre.

AlphaFold2 在静态蛋白质结构的端到端预测方面取得了重大突破。然而,蛋白质构象变化被认为是影响蛋白质生物功能的关键因素。残基间多重距离预测对蛋白质多重构象探索研究具有重要意义。在本研究中,我们提出了一种基于改进网络的残基间多重距离预测方法--DeepMDisPre,该方法整合了三角形更新、轴向注意和 ResNet,可预测残基对的多重距离。我们建立了一个数据集,其中包含具有单一结构的蛋白质和具有多种构象的蛋白质,用于训练网络。我们在 114 个具有多种构象的蛋白质上测试了 DeepMDisPre。结果表明,与刚性残基对相比,DeepMDisPre 预测的柔性残基对的残基间距离分布往往有多个峰值。在两种具有多重构象的蛋白质中,我们利用预测的残基间多重距离对多重构象进行了相对准确的建模。此外,我们还在 279 个具有单一结构的蛋白质上测试了 DeepMDisPre 的性能。实验结果表明,DeepMDisPre 的平均接触精度高于比较方法。在静态蛋白质建模方面,DeepMDisPre 建立的三维模型的平均 TM 分数也比对比方法有所提高。可执行程序可在 https://github.com/iobio-zjut/DeepMDisPre 免费获取。
{"title":"Prediction of Inter-residue Multiple Distances and Exploration of Protein Multiple Conformations by Deep Learning.","authors":"Fujin Zhang, Zhangwei Li, Kailong Zhao, Pengxin Zhao, Guijun Zhang","doi":"10.1109/TCBB.2024.3411825","DOIUrl":"10.1109/TCBB.2024.3411825","url":null,"abstract":"<p><p>AlphaFold2 has achieved a major breakthrough in end-to-end prediction for static protein structures. However, protein conformational change is considered to be a key factor in protein biological function. Inter-residue multiple distances prediction is of great significance for research on protein multiple conformations exploration. In this study, we proposed an inter-residue multiple distances prediction method, DeepMDisPre, based on an improved network which integrates triangle update, axial attention and ResNet to predict multiple distances of residue pairs. We built a dataset which contains proteins with a single structure and proteins with multiple conformations to train the network. We tested DeepMDisPre on 114 proteins with multiple conformations. The results show that the inter-residue distance distribution predicted by DeepMDisPre tends to have multiple peaks for flexible residue pairs than for rigid residue pairs. On two cases of proteins with multiple conformations, we modeled the multiple conformations relatively accurately by using the predicted inter-residue multiple distances. In addition, we also tested the performance of DeepMDisPre on 279 proteins with a single structure. Experimental results demonstrate that the average contact accuracy of DeepMDisPre is higher than that of the comparative method. In terms of static protein modeling, the average TM-score of the 3D models built by DeepMDisPre is also improved compared with the comparative method. The executable program is freely available at https://github.com/iobio-zjut/DeepMDisPre.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141300549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transcription Factor Binding Site Prediction Using CnNet Approach. 使用 CnNet 方法预测转录因子结合位点。
IF 3.6 3区 生物学 Q1 Mathematics Pub Date : 2024-06-07 DOI: 10.1109/TCBB.2024.3411024
Mohamed Divan Masood, Manjula, Vijayan Sugumaran

Controlling the gene expression is the most important development in a living organism, which makes it easier to find different kinds of diseases and their causes. It's very difficult to know what factors control the gene expression. Transcription Factor (TF) is a protein that plays an important role in gene expression. Discovering the transcription factor has immense biological significance, however, it is challenging to develop novel techniques and evaluation for regulatory developments in biological structures. In this research, we mainly focus on 'sequence specificities' that can be ascertained from experimental data with 'deep learning' techniques, which offer a scalable, flexible and unified computational approach for predicting transcription factor binding. Specifically, Multiple Expression motifs for Motif Elicitation (MEME) technique with Convolution Neural Network (CNN) named as CnNet, has been used for discovering the 'sequence specificities' of DNA gene sequences dataset. This process involves two steps: a) discovering the motifs that are capable of identifying useful TF binding site by using MEME technique, and b) computing a score indicating the likelihood of a given sequence being a useful binding site by using CNN technique. The proposed CnNet approach predicts the TF binding score with much better accuracy compared to existing approaches. The source code and datasets used in this work are available at https://github.com/masoodbai/CnNet-Approach-for-TFBS.git.

控制基因表达是生物体内最重要的发展,这使得我们更容易发现各种疾病及其原因。要知道是哪些因素控制着基因的表达是非常困难的。转录因子(TF)是一种在基因表达中起重要作用的蛋白质。发现转录因子具有巨大的生物学意义,然而,开发新技术和评估生物结构中的调控发展具有挑战性。在这项研究中,我们主要关注可通过 "深度学习 "技术从实验数据中确定的 "序列特异性",它为预测转录因子的结合提供了一种可扩展、灵活和统一的计算方法。具体来说,多重表达动机激发(MEME)技术与名为 CnNet 的卷积神经网络(CNN)被用于发现 DNA 基因序列数据集的 "序列特异性"。这一过程包括两个步骤:a) 利用 MEME 技术发现能够识别有用的 TF 结合位点的图案;b) 利用 CNN 技术计算一个分数,表示给定序列成为有用结合位点的可能性。与现有方法相比,拟议的 CnNet 方法预测 TF 结合得分的准确性要高得多。本研究使用的源代码和数据集可在 https://github.com/masoodbai/CnNet-Approach-for-TFBS.git 上获取。
{"title":"Transcription Factor Binding Site Prediction Using CnNet Approach.","authors":"Mohamed Divan Masood, Manjula, Vijayan Sugumaran","doi":"10.1109/TCBB.2024.3411024","DOIUrl":"10.1109/TCBB.2024.3411024","url":null,"abstract":"<p><p>Controlling the gene expression is the most important development in a living organism, which makes it easier to find different kinds of diseases and their causes. It's very difficult to know what factors control the gene expression. Transcription Factor (TF) is a protein that plays an important role in gene expression. Discovering the transcription factor has immense biological significance, however, it is challenging to develop novel techniques and evaluation for regulatory developments in biological structures. In this research, we mainly focus on 'sequence specificities' that can be ascertained from experimental data with 'deep learning' techniques, which offer a scalable, flexible and unified computational approach for predicting transcription factor binding. Specifically, Multiple Expression motifs for Motif Elicitation (MEME) technique with Convolution Neural Network (CNN) named as CnNet, has been used for discovering the 'sequence specificities' of DNA gene sequences dataset. This process involves two steps: a) discovering the motifs that are capable of identifying useful TF binding site by using MEME technique, and b) computing a score indicating the likelihood of a given sequence being a useful binding site by using CNN technique. The proposed CnNet approach predicts the TF binding score with much better accuracy compared to existing approaches. The source code and datasets used in this work are available at https://github.com/masoodbai/CnNet-Approach-for-TFBS.git.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141288055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE/ACM Transactions on Computational Biology and Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1