首页 > 最新文献

Interdisciplinary Sciences: Computational Life Sciences最新文献

英文 中文
A Multi-view Molecular Pre-training with Generative Contrastive Learning. 利用生成式对比学习进行多视角分子预训练。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-01 Epub Date: 2024-05-06 DOI: 10.1007/s12539-024-00632-z
Yunwu Liu, Ruisheng Zhang, Yongna Yuan, Jun Ma, Tongfeng Li, Zhixuan Yu

Molecular representation learning can preserve meaningful molecular structures as embedding vectors, which is a necessary prerequisite for molecular property prediction. Yet, learning how to accurately represent molecules remains challenging. Previous approaches to learning molecular representations in an end-to-end manner potentially suffered information loss while neglecting the utilization of molecular generative representations. To obtain rich molecular feature information, the pre-training molecular representation model utilized different molecular representations to reduce information loss caused by a single molecular representation. Therefore, we provide the MVGC, a unique multi-view generative contrastive learning pre-training model. Our pre-training framework specifically acquires knowledge of three fundamental feature representations of molecules and effectively integrates them to predict molecular properties on benchmark datasets. Comprehensive experiments on seven classification tasks and three regression tasks demonstrate that our proposed MVGC model surpasses the majority of state-of-the-art approaches. Moreover, we explore the potential of the MVGC model to learn the representation of molecules with chemical significance.

分子表征学习可以保留有意义的分子结构作为嵌入向量,这是分子特性预测的必要前提。然而,学习如何准确地表示分子仍然具有挑战性。以往以端到端方式学习分子表征的方法可能会造成信息丢失,同时忽略了对分子生成表征的利用。为了获得丰富的分子特征信息,预训练分子表征模型利用了不同的分子表征,以减少单一分子表征造成的信息损失。因此,我们提供了一种独特的多视图生成对比学习预训练模型--MVGC。我们的预训练框架专门获取了分子的三种基本特征表征知识,并将它们有效地整合到基准数据集上预测分子特性。七项分类任务和三项回归任务的综合实验表明,我们提出的 MVGC 模型超越了大多数最先进的方法。此外,我们还探索了 MVGC 模型学习具有化学意义的分子表征的潜力。
{"title":"A Multi-view Molecular Pre-training with Generative Contrastive Learning.","authors":"Yunwu Liu, Ruisheng Zhang, Yongna Yuan, Jun Ma, Tongfeng Li, Zhixuan Yu","doi":"10.1007/s12539-024-00632-z","DOIUrl":"10.1007/s12539-024-00632-z","url":null,"abstract":"<p><p>Molecular representation learning can preserve meaningful molecular structures as embedding vectors, which is a necessary prerequisite for molecular property prediction. Yet, learning how to accurately represent molecules remains challenging. Previous approaches to learning molecular representations in an end-to-end manner potentially suffered information loss while neglecting the utilization of molecular generative representations. To obtain rich molecular feature information, the pre-training molecular representation model utilized different molecular representations to reduce information loss caused by a single molecular representation. Therefore, we provide the MVGC, a unique multi-view generative contrastive learning pre-training model. Our pre-training framework specifically acquires knowledge of three fundamental feature representations of molecules and effectively integrates them to predict molecular properties on benchmark datasets. Comprehensive experiments on seven classification tasks and three regression tasks demonstrate that our proposed MVGC model surpasses the majority of state-of-the-art approaches. Moreover, we explore the potential of the MVGC model to learn the representation of molecules with chemical significance.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"741-754"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140865514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel Detection Method for High-Order SNP Epistatic Interactions Based on Explicit-Encoding-Based Multitasking Harmony Search. 基于显式编码的多任务和谐搜索的高阶 SNP 表观相互作用的新型检测方法
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-01 Epub Date: 2024-07-02 DOI: 10.1007/s12539-024-00621-2
Shouheng Tuo, Jiewei Jiang

To elucidate the genetic basis of complex diseases, it is crucial to discover the single-nucleotide polymorphisms (SNPs) contributing to disease susceptibility. This is particularly challenging for high-order SNP epistatic interactions (HEIs), which exhibit small individual effects but potentially large joint effects. These interactions are difficult to detect due to the vast search space, encompassing billions of possible combinations, and the computational complexity of evaluating them. This study proposes a novel explicit-encoding-based multitasking harmony search algorithm (MTHS-EE-DHEI) specifically designed to address this challenge. The algorithm operates in three stages. First, a harmony search algorithm is employed, utilizing four lightweight evaluation functions, such as Bayesian network and entropy, to efficiently explore potential SNP combinations related to disease status. Second, a G-test statistical method is applied to filter out insignificant SNP combinations. Finally, two machine learning-based methods, multifactor dimensionality reduction (MDR) as well as random forest (RF), are employed to validate the classification performance of the remaining significant SNP combinations. This research aims to demonstrate the effectiveness of MTHS-EE-DHEI in identifying HEIs compared to existing methods, potentially providing valuable insights into the genetic architecture of complex diseases. The performance of MTHS-EE-DHEI was evaluated on twenty simulated disease datasets and three real-world datasets encompassing age-related macular degeneration (AMD), rheumatoid arthritis (RA), and breast cancer (BC). The results demonstrably indicate that MTHS-EE-DHEI outperforms four state-of-the-art algorithms in terms of both detection power and computational efficiency. The source code is available at https://github.com/shouhengtuo/MTHS-EE-DHEI.git .

要阐明复杂疾病的遗传基础,发现导致疾病易感性的单核苷酸多态性(SNPs)至关重要。这对于高阶 SNP 表观交互作用(HEIs)来说尤其具有挑战性,因为这种交互作用表现出较小的个体效应,但可能产生较大的联合效应。由于搜索空间巨大,包含数十亿种可能的组合,而且评估这些组合的计算复杂,因此很难检测到这些相互作用。本研究提出了一种基于显式编码的新型多任务和谐搜索算法(MTHS-EE-DHEI),专门用于解决这一难题。该算法分三个阶段运行。首先,采用和谐搜索算法,利用贝叶斯网络和熵等四种轻量级评估函数,有效探索与疾病状态相关的潜在 SNP 组合。其次,采用 G 检验统计方法过滤掉不重要的 SNP 组合。最后,采用多因素降维(MDR)和随机森林(RF)这两种基于机器学习的方法来验证剩余重要 SNP 组合的分类性能。这项研究旨在证明,与现有方法相比,MTHS-EE-DHEI 在识别 HEI 方面非常有效,有可能为复杂疾病的遗传结构提供有价值的见解。MTHS-EE-DHEI 的性能在二十个模拟疾病数据集和三个真实世界数据集上进行了评估,包括老年性黄斑变性(AMD)、类风湿性关节炎(RA)和乳腺癌(BC)。结果表明,MTHS-EE-DHEI 在检测能力和计算效率方面都优于四种最先进的算法。源代码见 https://github.com/shouhengtuo/MTHS-EE-DHEI.git 。
{"title":"A Novel Detection Method for High-Order SNP Epistatic Interactions Based on Explicit-Encoding-Based Multitasking Harmony Search.","authors":"Shouheng Tuo, Jiewei Jiang","doi":"10.1007/s12539-024-00621-2","DOIUrl":"10.1007/s12539-024-00621-2","url":null,"abstract":"<p><p>To elucidate the genetic basis of complex diseases, it is crucial to discover the single-nucleotide polymorphisms (SNPs) contributing to disease susceptibility. This is particularly challenging for high-order SNP epistatic interactions (HEIs), which exhibit small individual effects but potentially large joint effects. These interactions are difficult to detect due to the vast search space, encompassing billions of possible combinations, and the computational complexity of evaluating them. This study proposes a novel explicit-encoding-based multitasking harmony search algorithm (MTHS-EE-DHEI) specifically designed to address this challenge. The algorithm operates in three stages. First, a harmony search algorithm is employed, utilizing four lightweight evaluation functions, such as Bayesian network and entropy, to efficiently explore potential SNP combinations related to disease status. Second, a G-test statistical method is applied to filter out insignificant SNP combinations. Finally, two machine learning-based methods, multifactor dimensionality reduction (MDR) as well as random forest (RF), are employed to validate the classification performance of the remaining significant SNP combinations. This research aims to demonstrate the effectiveness of MTHS-EE-DHEI in identifying HEIs compared to existing methods, potentially providing valuable insights into the genetic architecture of complex diseases. The performance of MTHS-EE-DHEI was evaluated on twenty simulated disease datasets and three real-world datasets encompassing age-related macular degeneration (AMD), rheumatoid arthritis (RA), and breast cancer (BC). The results demonstrably indicate that MTHS-EE-DHEI outperforms four state-of-the-art algorithms in terms of both detection power and computational efficiency. The source code is available at https://github.com/shouhengtuo/MTHS-EE-DHEI.git .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"688-711"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141491788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GraphsformerCPI: Graph Transformer for Compound-Protein Interaction Prediction. GraphsformerCPI:用于化合物-蛋白质相互作用预测的图形转换器
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-01 Epub Date: 2024-03-08 DOI: 10.1007/s12539-024-00609-y
Jun Ma, Zhili Zhao, Tongfeng Li, Yunwu Liu, Jun Ma, Ruisheng Zhang

Accurately predicting compound-protein interactions (CPI) is a critical task in computer-aided drug design. In recent years, the exponential growth of compound activity and biomedical data has highlighted the need for efficient and interpretable prediction approaches. In this study, we propose GraphsformerCPI, an end-to-end deep learning framework that improves prediction performance and interpretability. GraphsformerCPI treats compounds and proteins as sequences of nodes with spatial structures, and leverages novel structure-enhanced self-attention mechanisms to integrate semantic and graph structural features within molecules for deep molecule representations. To capture the vital association between compound atoms and protein residues, we devise a dual-attention mechanism to effectively extract relational features through .cross-mapping. By extending the powerful learning capabilities of Transformers to spatial structures and extensively utilizing attention mechanisms, our model offers strong interpretability, a significant advantage over most black-box deep learning methods. To evaluate GraphsformerCPI, extensive experiments were conducted on benchmark datasets including human, C. elegans, Davis and KIBA datasets. We explored the impact of model depth and dropout rate on performance and compared our model against state-of-the-art baseline models. Our results demonstrate that GraphsformerCPI outperforms baseline models in classification datasets and achieves competitive performance in regression datasets. Specifically, on the human dataset, GraphsformerCPI achieves an average improvement of 1.6% in AUC, 0.5% in precision, and 5.3% in recall. On the KIBA dataset, the average improvement in Concordance index (CI) and mean squared error (MSE) is 3.3% and 7.2%, respectively. Molecular docking shows that our model provides novel insights into the intrinsic interactions and binding mechanisms. Our research holds practical significance in effectively predicting CPIs and binding affinities, identifying key atoms and residues, enhancing model interpretability.

准确预测化合物与蛋白质的相互作用(CPI)是计算机辅助药物设计的一项关键任务。近年来,化合物活性和生物医学数据的指数级增长凸显了对高效且可解释的预测方法的需求。在本研究中,我们提出了一种端到端的深度学习框架 GraphsformerCPI,它可以提高预测性能和可解释性。GraphsformerCPI 将化合物和蛋白质视为具有空间结构的节点序列,并利用新颖的结构增强自注意机制来整合分子内的语义和图结构特征,从而实现深度分子表征。为了捕捉化合物原子和蛋白质残基之间的重要关联,我们设计了一种双重注意机制,通过.交叉映射有效提取关系特征。通过将 Transformers 强大的学习能力扩展到空间结构并广泛利用注意力机制,我们的模型具有很强的可解释性,这是与大多数黑盒深度学习方法相比的显著优势。为了评估 GraphsformerCPI,我们在基准数据集上进行了广泛的实验,包括人类数据集、线虫数据集、戴维斯数据集和 KIBA 数据集。我们探索了模型深度和辍学率对性能的影响,并将我们的模型与最先进的基线模型进行了比较。结果表明,GraphsformerCPI 在分类数据集上的表现优于基线模型,在回归数据集上的表现也很有竞争力。具体来说,在人类数据集上,GraphsformerCPI 的 AUC 平均提高了 1.6%,精确度提高了 0.5%,召回率提高了 5.3%。在 KIBA 数据集上,一致性指数(CI)和均方误差(MSE)分别平均提高了 3.3% 和 7.2%。分子对接结果表明,我们的模型提供了关于内在相互作用和结合机制的新见解。我们的研究在有效预测 CPI 和结合亲和力、识别关键原子和残基、提高模型可解释性等方面具有实际意义。
{"title":"GraphsformerCPI: Graph Transformer for Compound-Protein Interaction Prediction.","authors":"Jun Ma, Zhili Zhao, Tongfeng Li, Yunwu Liu, Jun Ma, Ruisheng Zhang","doi":"10.1007/s12539-024-00609-y","DOIUrl":"10.1007/s12539-024-00609-y","url":null,"abstract":"<p><p>Accurately predicting compound-protein interactions (CPI) is a critical task in computer-aided drug design. In recent years, the exponential growth of compound activity and biomedical data has highlighted the need for efficient and interpretable prediction approaches. In this study, we propose GraphsformerCPI, an end-to-end deep learning framework that improves prediction performance and interpretability. GraphsformerCPI treats compounds and proteins as sequences of nodes with spatial structures, and leverages novel structure-enhanced self-attention mechanisms to integrate semantic and graph structural features within molecules for deep molecule representations. To capture the vital association between compound atoms and protein residues, we devise a dual-attention mechanism to effectively extract relational features through .cross-mapping. By extending the powerful learning capabilities of Transformers to spatial structures and extensively utilizing attention mechanisms, our model offers strong interpretability, a significant advantage over most black-box deep learning methods. To evaluate GraphsformerCPI, extensive experiments were conducted on benchmark datasets including human, C. elegans, Davis and KIBA datasets. We explored the impact of model depth and dropout rate on performance and compared our model against state-of-the-art baseline models. Our results demonstrate that GraphsformerCPI outperforms baseline models in classification datasets and achieves competitive performance in regression datasets. Specifically, on the human dataset, GraphsformerCPI achieves an average improvement of 1.6% in AUC, 0.5% in precision, and 5.3% in recall. On the KIBA dataset, the average improvement in Concordance index (CI) and mean squared error (MSE) is 3.3% and 7.2%, respectively. Molecular docking shows that our model provides novel insights into the intrinsic interactions and binding mechanisms. Our research holds practical significance in effectively predicting CPIs and binding affinities, identifying key atoms and residues, enhancing model interpretability.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"361-377"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140059303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GEnDDn: An lncRNA-Disease Association Identification Framework Based on Dual-Net Neural Architecture and Deep Neural Network. GEnDDn:基于双网神经架构和深度神经网络的 lncRNA-疾病关联识别框架
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-01 Epub Date: 2024-05-11 DOI: 10.1007/s12539-024-00619-w
Lihong Peng, Mengnan Ren, Liangliang Huang, Min Chen

Accumulating studies have demonstrated close relationships between long non-coding RNAs (lncRNAs) and diseases. Identification of new lncRNA-disease associations (LDAs) enables us to better understand disease mechanisms and further provides promising insights into cancer targeted therapy and anti-cancer drug design. Here, we present an LDA prediction framework called GEnDDn based on deep learning. GEnDDn mainly comprises two steps: First, features of both lncRNAs and diseases are extracted by combining similarity computation, non-negative matrix factorization, and graph attention auto-encoder, respectively. And each lncRNA-disease pair (LDP) is depicted as a vector based on concatenation operation on the extracted features. Subsequently, unknown LDPs are classified by aggregating dual-net neural architecture and deep neural network. Using six different evaluation metrics, we found that GEnDDn surpassed four competing LDA identification methods (SDLDA, LDNFSGB, IPCARF, LDASR) on the lncRNADisease and MNDR databases under fivefold cross-validation experiments on lncRNAs, diseases, LDPs, and independent lncRNAs and independent diseases, respectively. Ablation experiments further validated the powerful LDA prediction performance of GEnDDn. Furthermore, we utilized GEnDDn to find underlying lncRNAs for lung cancer and breast cancer. The results elucidated that there may be dense linkages between IFNG-AS1 and lung cancer as well as between HIF1A-AS1 and breast cancer. The results require further biomedical experimental verification. GEnDDn is publicly available at https://github.com/plhhnu/GEnDDn.

越来越多的研究表明,长非编码 RNA(lncRNA)与疾病之间存在密切关系。鉴定新的lncRNA-疾病关联(LDAs)能让我们更好地理解疾病机制,并进一步为癌症靶向治疗和抗癌药物设计提供有前景的见解。在此,我们提出一种基于深度学习的 LDA 预测框架,称为 GEnDDn。GEnDDn 主要包括两个步骤:首先,结合相似性计算、非负矩阵因式分解和图注意自动编码器,分别提取lncRNA和疾病的特征。然后根据提取的特征进行连接操作,将每对 lncRNA-疾病(LDP)描绘成一个向量。随后,通过聚合双网神经架构和深度神经网络对未知 LDP 进行分类。通过六种不同的评价指标,我们发现在lncRNADisease和MNDR数据库上,GEnDDn分别在lncRNAs、疾病、LDPs、独立lncRNAs和独立疾病的五倍交叉验证实验中超越了四种竞争性LDA识别方法(SDLDA、LDNFSGB、IPCARF、LDASR)。消融实验进一步验证了 GEnDDn 强大的 LDA 预测性能。此外,我们还利用GEnDDn找到了肺癌和乳腺癌的潜在lncRNA。结果表明,IFNG-AS1与肺癌以及HIF1A-AS1与乳腺癌之间可能存在紧密联系。这些结果还需要进一步的生物医学实验验证。GEnDDn可在https://github.com/plhhnu/GEnDDn 公开获取。
{"title":"GEnDDn: An lncRNA-Disease Association Identification Framework Based on Dual-Net Neural Architecture and Deep Neural Network.","authors":"Lihong Peng, Mengnan Ren, Liangliang Huang, Min Chen","doi":"10.1007/s12539-024-00619-w","DOIUrl":"10.1007/s12539-024-00619-w","url":null,"abstract":"<p><p>Accumulating studies have demonstrated close relationships between long non-coding RNAs (lncRNAs) and diseases. Identification of new lncRNA-disease associations (LDAs) enables us to better understand disease mechanisms and further provides promising insights into cancer targeted therapy and anti-cancer drug design. Here, we present an LDA prediction framework called GEnDDn based on deep learning. GEnDDn mainly comprises two steps: First, features of both lncRNAs and diseases are extracted by combining similarity computation, non-negative matrix factorization, and graph attention auto-encoder, respectively. And each lncRNA-disease pair (LDP) is depicted as a vector based on concatenation operation on the extracted features. Subsequently, unknown LDPs are classified by aggregating dual-net neural architecture and deep neural network. Using six different evaluation metrics, we found that GEnDDn surpassed four competing LDA identification methods (SDLDA, LDNFSGB, IPCARF, LDASR) on the lncRNADisease and MNDR databases under fivefold cross-validation experiments on lncRNAs, diseases, LDPs, and independent lncRNAs and independent diseases, respectively. Ablation experiments further validated the powerful LDA prediction performance of GEnDDn. Furthermore, we utilized GEnDDn to find underlying lncRNAs for lung cancer and breast cancer. The results elucidated that there may be dense linkages between IFNG-AS1 and lung cancer as well as between HIF1A-AS1 and breast cancer. The results require further biomedical experimental verification. GEnDDn is publicly available at https://github.com/plhhnu/GEnDDn.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"418-438"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140907796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting miRNA-Disease Associations by Combining Graph and Hypergraph Convolutional Network. 结合图谱和超图谱卷积网络预测 miRNA 与疾病的关联性
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-01 Epub Date: 2024-01-29 DOI: 10.1007/s12539-023-00599-3
Xujun Liang, Ming Guo, Longying Jiang, Ying Fu, Pengfei Zhang, Yongheng Chen

miRNAs are important regulators for many crucial biological processes. Many recent studies have shown that miRNAs are closely related to various human diseases and can be potential biomarkers or therapeutic targets for some diseases, such as cancers. Therefore, accurately predicting miRNA-disease associations is of great importance for understanding and curing diseases. However, how to efficiently utilize the characteristics of miRNAs and diseases and the information on known miRNA-disease associations for prediction is still not fully explored. In this study, we propose a novel computational method for predicting miRNA-disease associations. The proposed method combines the graph convolutional network and the hypergraph convolutional network. The graph convolutional network is utilized to extract the information from miRNA-similarity data as well as disease-similarity data. Based on the representations of miRNAs and diseases learned by the graph convolutional network, we further use the hypergraph convolutional network to capture the complex high-order interactions in the known miRNA-disease associations. We conduct comprehensive experiments with different datasets and predictive tasks. The results show that the proposed method consistently outperforms several other state-of-the-art methods. We also discuss the influence of hyper-parameters and model structures on the performance of our method. Some case studies also demonstrate that the predictive results of the method can be verified by independent experiments.

miRNA 是许多关键生物过程的重要调节因子。最近的许多研究表明,miRNA 与人类各种疾病密切相关,可以成为某些疾病(如癌症)的潜在生物标志物或治疗靶点。因此,准确预测 miRNA 与疾病的关联对于了解和治疗疾病具有重要意义。然而,如何有效地利用 miRNA 与疾病的特征以及已知 miRNA 与疾病关联的信息进行预测,目前还没有得到充分的探讨。在本研究中,我们提出了一种预测 miRNA 与疾病关联的新型计算方法。该方法结合了图卷积网络和超图卷积网络。图卷积网络用于从 miRNA 相似性数据和疾病相似性数据中提取信息。在图卷积网络学习到的 miRNA 和疾病表征的基础上,我们进一步利用超图卷积网络捕捉已知 miRNA 与疾病关联中复杂的高阶交互作用。我们利用不同的数据集和预测任务进行了全面的实验。结果表明,所提出的方法始终优于其他几种最先进的方法。我们还讨论了超参数和模型结构对我们方法性能的影响。一些案例研究还表明,该方法的预测结果可以通过独立实验进行验证。
{"title":"Predicting miRNA-Disease Associations by Combining Graph and Hypergraph Convolutional Network.","authors":"Xujun Liang, Ming Guo, Longying Jiang, Ying Fu, Pengfei Zhang, Yongheng Chen","doi":"10.1007/s12539-023-00599-3","DOIUrl":"10.1007/s12539-023-00599-3","url":null,"abstract":"<p><p>miRNAs are important regulators for many crucial biological processes. Many recent studies have shown that miRNAs are closely related to various human diseases and can be potential biomarkers or therapeutic targets for some diseases, such as cancers. Therefore, accurately predicting miRNA-disease associations is of great importance for understanding and curing diseases. However, how to efficiently utilize the characteristics of miRNAs and diseases and the information on known miRNA-disease associations for prediction is still not fully explored. In this study, we propose a novel computational method for predicting miRNA-disease associations. The proposed method combines the graph convolutional network and the hypergraph convolutional network. The graph convolutional network is utilized to extract the information from miRNA-similarity data as well as disease-similarity data. Based on the representations of miRNAs and diseases learned by the graph convolutional network, we further use the hypergraph convolutional network to capture the complex high-order interactions in the known miRNA-disease associations. We conduct comprehensive experiments with different datasets and predictive tasks. The results show that the proposed method consistently outperforms several other state-of-the-art methods. We also discuss the influence of hyper-parameters and model structures on the performance of our method. Some case studies also demonstrate that the predictive results of the method can be verified by independent experiments.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"289-303"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139574645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine Learning Accelerates De Novo Design of Antimicrobial Peptides. 机器学习加速抗菌肽的新设计
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-01 Epub Date: 2024-02-28 DOI: 10.1007/s12539-024-00612-3
Kedong Yin, Wen Xu, Shiming Ren, Qingpeng Xu, Shaojie Zhang, Ruiling Zhang, Mengwan Jiang, Yuhong Zhang, Degang Xu, Ruifang Li

Efficient and precise design of antimicrobial peptides (AMPs) is of great importance in the field of AMP development. Computing provides opportunities for peptide de novo design. In the present investigation, a new machine learning-based AMP prediction model, AP_Sin, was trained using 1160 AMP sequences and 1160 non-AMP sequences. The results showed that AP_Sin correctly classified 94.61% of AMPs on a comprehensive dataset, outperforming the mainstream and open-source models (Antimicrobial Peptide Scanner vr.2, iAMPpred and AMPlify) and being effective in identifying AMPs. In addition, a peptide sequence generator, AP_Gen, was devised based on the concept of recombining dominant amino acids and dipeptide compositions. After inputting the parameters of the 71 tridecapeptides from antimicrobial peptides database (APD3) into AP_Gen, a tridecapeptide bank consisting of de novo designed 17,496 tridecapeptide sequences were randomly generated, from which 2675 candidate AMP sequences were identified by AP_Sin. Chemical synthesis was performed on 180 randomly selected candidate AMP sequences, of which 18 showed high antimicrobial activities against a wide range of the tested pathogenic microorganisms, and 16 of which had a minimal inhibitory concentration of less than 10 μg/mL against at least one of the tested pathogenic microorganisms. The method established in this research accelerates the discovery of valuable candidate AMPs and provides a novel approach for de novo design of antimicrobial peptides.

高效、精确地设计抗菌肽(AMPs)在抗菌肽开发领域具有重要意义。计算为肽的全新设计提供了机会。本研究使用 1160 个 AMP 序列和 1160 个非 AMP 序列训练了一个新的基于机器学习的 AMP 预测模型 AP_Sin。结果表明,AP_Sin 在一个综合数据集上正确分类了 94.61% 的 AMP,优于主流和开源模型(Antimicrobial Peptide Scanner vr.2、iAMPpred 和 AMPlify),并能有效识别 AMP。此外,还根据重组优势氨基酸和二肽组成的概念设计了肽序列生成器 AP_Gen。将抗菌肽数据库(APD3)中 71 个三十肽基团的参数输入 AP_Gen 后,随机生成了一个由重新设计的 17,496 个三十肽基团序列组成的三十肽库,AP_Sin 从中识别出 2675 个候选 AMP 序列。对随机抽取的 180 个候选 AMP 序列进行了化学合成,其中 18 个序列对多种受试病原微生物具有较高的抗菌活性,16 个序列对至少一种受试病原微生物的最小抑菌浓度小于 10 μg/mL。这项研究建立的方法加快了发现有价值的候选 AMPs 的速度,为从头设计抗菌肽提供了一种新方法。
{"title":"Machine Learning Accelerates De Novo Design of Antimicrobial Peptides.","authors":"Kedong Yin, Wen Xu, Shiming Ren, Qingpeng Xu, Shaojie Zhang, Ruiling Zhang, Mengwan Jiang, Yuhong Zhang, Degang Xu, Ruifang Li","doi":"10.1007/s12539-024-00612-3","DOIUrl":"10.1007/s12539-024-00612-3","url":null,"abstract":"<p><p>Efficient and precise design of antimicrobial peptides (AMPs) is of great importance in the field of AMP development. Computing provides opportunities for peptide de novo design. In the present investigation, a new machine learning-based AMP prediction model, AP_Sin, was trained using 1160 AMP sequences and 1160 non-AMP sequences. The results showed that AP_Sin correctly classified 94.61% of AMPs on a comprehensive dataset, outperforming the mainstream and open-source models (Antimicrobial Peptide Scanner vr.2, iAMPpred and AMPlify) and being effective in identifying AMPs. In addition, a peptide sequence generator, AP_Gen, was devised based on the concept of recombining dominant amino acids and dipeptide compositions. After inputting the parameters of the 71 tridecapeptides from antimicrobial peptides database (APD3) into AP_Gen, a tridecapeptide bank consisting of de novo designed 17,496 tridecapeptide sequences were randomly generated, from which 2675 candidate AMP sequences were identified by AP_Sin. Chemical synthesis was performed on 180 randomly selected candidate AMP sequences, of which 18 showed high antimicrobial activities against a wide range of the tested pathogenic microorganisms, and 16 of which had a minimal inhibitory concentration of less than 10 μg/mL against at least one of the tested pathogenic microorganisms. The method established in this research accelerates the discovery of valuable candidate AMPs and provides a novel approach for de novo design of antimicrobial peptides.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"392-403"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139982902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformative Deep Neural Network Approaches in Kidney Ultrasound Segmentation: Empirical Validation with an Annotated Dataset. 肾脏超声波分割中的变革性深度神经网络方法:使用注释数据集进行经验验证。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-01 Epub Date: 2024-02-27 DOI: 10.1007/s12539-024-00620-3
Rashid Khan, Chuda Xiao, Yang Liu, Jinyu Tian, Zhuo Chen, Liyilei Su, Dan Li, Haseeb Hassan, Haoyu Li, Weiguo Xie, Wen Zhong, Bingding Huang

Kidney ultrasound (US) images are primarily employed for diagnosing different renal diseases. Among them, one is renal localization and detection, which can be carried out by segmenting the kidney US images. However, kidney segmentation from US images is challenging due to low contrast, speckle noise, fluid, variations in kidney shape, and modality artifacts. Moreover, well-annotated US datasets for renal segmentation and detection are scarce. This study aims to build a novel, well-annotated dataset containing 44,880 US images. In addition, we propose a novel training scheme that utilizes the encoder and decoder parts of a state-of-the-art segmentation algorithm. In the pre-processing step, pixel intensity normalization improves contrast and facilitates model convergence. The modified encoder-decoder architecture improves pyramid-shaped hole pooling, cascaded multiple-hole convolutions, and batch normalization. The pre-processing step gradually reconstructs spatial information, including the capture of complete object boundaries, and the post-processing module with a concave curvature reduces the false positive rate of the results. We present benchmark findings to validate the quality of the proposed training scheme and dataset. We applied six evaluation metrics and several baseline segmentation approaches to our novel kidney US dataset. Among the evaluated models, DeepLabv3+ performed well and achieved the highest dice, Hausdorff distance 95, accuracy, specificity, average symmetric surface distance, and recall scores of 89.76%, 9.91, 98.14%, 98.83%, 3.03, and 90.68%, respectively. The proposed training strategy aids state-of-the-art segmentation models, resulting in better-segmented predictions. Furthermore, the large, well-annotated kidney US public dataset will serve as a valuable baseline source for future medical image analysis research.

肾脏超声波(US)图像主要用于诊断不同的肾脏疾病。其中,肾脏定位和检测可通过分割肾脏 US 图像来实现。然而,由于对比度低、斑点噪声、流体、肾脏形状变化和模式伪影等原因,从 US 图像中分割肾脏具有挑战性。此外,用于肾脏分割和检测的注释良好的 US 数据集也很少。本研究旨在建立一个包含 44,880 张 US 图像的新型、注释完善的数据集。此外,我们还提出了一种新的训练方案,该方案利用了最先进的分割算法的编码器和解码器部分。在预处理步骤中,像素强度归一化可提高对比度并促进模型收敛。修改后的编码器-解码器架构改进了金字塔形孔池、级联多孔卷积和批量归一化。预处理步骤逐步重建空间信息,包括捕捉完整的物体边界,而带有凹曲率的后处理模块则降低了结果的误报率。我们提出了基准结果,以验证所提出的训练方案和数据集的质量。我们对新型肾脏 US 数据集采用了六种评估指标和几种基线分割方法。在接受评估的模型中,DeepLabv3+ 表现出色,在骰子、豪斯多夫距离 95、准确性、特异性、平均对称面距离和召回率方面分别取得了 89.76%、9.91、98.14%、98.83%、3.03 和 90.68% 的最高分。所提出的训练策略有助于最先进的分割模型,从而获得更好的分割预测结果。此外,美国肾脏公共数据集规模大、注释详尽,将成为未来医学图像分析研究的宝贵基准源。
{"title":"Transformative Deep Neural Network Approaches in Kidney Ultrasound Segmentation: Empirical Validation with an Annotated Dataset.","authors":"Rashid Khan, Chuda Xiao, Yang Liu, Jinyu Tian, Zhuo Chen, Liyilei Su, Dan Li, Haseeb Hassan, Haoyu Li, Weiguo Xie, Wen Zhong, Bingding Huang","doi":"10.1007/s12539-024-00620-3","DOIUrl":"10.1007/s12539-024-00620-3","url":null,"abstract":"<p><p>Kidney ultrasound (US) images are primarily employed for diagnosing different renal diseases. Among them, one is renal localization and detection, which can be carried out by segmenting the kidney US images. However, kidney segmentation from US images is challenging due to low contrast, speckle noise, fluid, variations in kidney shape, and modality artifacts. Moreover, well-annotated US datasets for renal segmentation and detection are scarce. This study aims to build a novel, well-annotated dataset containing 44,880 US images. In addition, we propose a novel training scheme that utilizes the encoder and decoder parts of a state-of-the-art segmentation algorithm. In the pre-processing step, pixel intensity normalization improves contrast and facilitates model convergence. The modified encoder-decoder architecture improves pyramid-shaped hole pooling, cascaded multiple-hole convolutions, and batch normalization. The pre-processing step gradually reconstructs spatial information, including the capture of complete object boundaries, and the post-processing module with a concave curvature reduces the false positive rate of the results. We present benchmark findings to validate the quality of the proposed training scheme and dataset. We applied six evaluation metrics and several baseline segmentation approaches to our novel kidney US dataset. Among the evaluated models, DeepLabv3+ performed well and achieved the highest dice, Hausdorff distance 95, accuracy, specificity, average symmetric surface distance, and recall scores of 89.76%, 9.91, 98.14%, 98.83%, 3.03, and 90.68%, respectively. The proposed training strategy aids state-of-the-art segmentation models, resulting in better-segmented predictions. Furthermore, the large, well-annotated kidney US public dataset will serve as a valuable baseline source for future medical image analysis research.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"439-454"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139982903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inference of Gene Regulatory Networks Based on Multi-view Hierarchical Hypergraphs. 基于多视角层次超图的基因调控网络推断。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-01 Epub Date: 2024-02-11 DOI: 10.1007/s12539-024-00604-3
Songyang Wu, Kui Jin, Mingjing Tang, Yuelong Xia, Wei Gao

Since gene regulation is a complex process in which multiple genes act simultaneously, accurately inferring gene regulatory networks (GRNs) is a long-standing challenge in systems biology. Although graph neural networks can formally describe intricate gene expression mechanisms, current GRN inference methods based on graph learning regard only transcription factor (TF)-target gene interactions as pairwise relationships, and cannot model the many-to-many high-order regulatory patterns that prevail among genes. Moreover, these methods often rely on limited prior regulatory knowledge, ignoring the structural information of GRNs in gene expression profiles. Therefore, we propose a multi-view hierarchical hypergraphs GRN (MHHGRN) inference model. Specifically, multiple heterogeneous biological information is integrated to construct multi-view hierarchical hypergraphs of TFs and target genes, using hypergraph convolution networks to model higher order complex regulatory relationships. Meanwhile, the coupled information diffusion mechanism and the cross-domain messaging mechanism facilitate the information sharing between genes to optimise gene embedding representations. Finally, a unique channel attention mechanism is used to adaptively learn feature representations from multiple views for GRN inference. Experimental results show that MHHGRN achieves better results than the baseline methods on the E. coli and S. cerevisiae benchmark datasets of the DREAM5 challenge, and it has excellent cross-species generalization, achieving comparable or better performance on scRNA-seq datasets from five mouse and two human cell lines.

由于基因调控是一个多基因同时作用的复杂过程,因此准确推断基因调控网络(GRN)是系统生物学长期面临的挑战。虽然图神经网络可以正式描述复杂的基因表达机制,但目前基于图学习的基因调控网络推断方法仅将转录因子(TF)与目标基因之间的相互作用视为配对关系,无法模拟基因之间普遍存在的多对多高阶调控模式。此外,这些方法往往依赖于有限的先验调控知识,忽略了基因表达谱中 GRN 的结构信息。因此,我们提出了一种多视图分层超图 GRN(MHHGRN)推断模型。具体来说,通过整合多种异构生物信息,构建 TFs 和靶基因的多视图分层超图,利用超图卷积网络建立高阶复杂调控关系模型。同时,耦合信息扩散机制和跨域信息传递机制促进了基因之间的信息共享,从而优化了基因嵌入表征。最后,一种独特的通道关注机制被用于自适应地学习多个视图的特征表征,以进行 GRN 推理。实验结果表明,MHHGRN 在 DREAM5 挑战赛的大肠杆菌和酿酒葡萄球菌基准数据集上取得了比基线方法更好的结果,而且它具有出色的跨物种泛化能力,在来自五种小鼠和两种人类细胞系的 scRNA-seq 数据集上取得了相当或更好的性能。
{"title":"Inference of Gene Regulatory Networks Based on Multi-view Hierarchical Hypergraphs.","authors":"Songyang Wu, Kui Jin, Mingjing Tang, Yuelong Xia, Wei Gao","doi":"10.1007/s12539-024-00604-3","DOIUrl":"10.1007/s12539-024-00604-3","url":null,"abstract":"<p><p>Since gene regulation is a complex process in which multiple genes act simultaneously, accurately inferring gene regulatory networks (GRNs) is a long-standing challenge in systems biology. Although graph neural networks can formally describe intricate gene expression mechanisms, current GRN inference methods based on graph learning regard only transcription factor (TF)-target gene interactions as pairwise relationships, and cannot model the many-to-many high-order regulatory patterns that prevail among genes. Moreover, these methods often rely on limited prior regulatory knowledge, ignoring the structural information of GRNs in gene expression profiles. Therefore, we propose a multi-view hierarchical hypergraphs GRN (MHHGRN) inference model. Specifically, multiple heterogeneous biological information is integrated to construct multi-view hierarchical hypergraphs of TFs and target genes, using hypergraph convolution networks to model higher order complex regulatory relationships. Meanwhile, the coupled information diffusion mechanism and the cross-domain messaging mechanism facilitate the information sharing between genes to optimise gene embedding representations. Finally, a unique channel attention mechanism is used to adaptively learn feature representations from multiple views for GRN inference. Experimental results show that MHHGRN achieves better results than the baseline methods on the E. coli and S. cerevisiae benchmark datasets of the DREAM5 challenge, and it has excellent cross-species generalization, achieving comparable or better performance on scRNA-seq datasets from five mouse and two human cell lines.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"318-332"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139717494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MetaV: A Pioneer in feature Augmented Meta-Learning Based Vision Transformer for Medical Image Classification. MetaV:用于医学图像分类的基于特征增强元学习的视觉变换器的先驱。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-01 Epub Date: 2024-06-29 DOI: 10.1007/s12539-024-00630-1
Shaharyar Alam Ansari, Arun Prakash Agrawal, Mohd Anas Wajid, Mohammad Saif Wajid, Aasim Zafar

Image classification, a fundamental task in computer vision, faces challenges concerning limited data handling, interpretability, improved feature representation, efficiency across diverse image types, and processing noisy data. Conventional architectural approaches have made insufficient progress in addressing these challenges, necessitating architectures capable of fine-grained classification, enhanced accuracy, and superior generalization. Among these, the vision transformer emerges as a noteworthy computer vision architecture. However, its reliance on substantial data for training poses a drawback due to its complexity and high data requirements. To surmount these challenges, this paper proposes an innovative approach, MetaV, integrating meta-learning into a vision transformer for medical image classification. N-way K-shot learning is employed to train the model, drawing inspiration from human learning mechanisms utilizing past knowledge. Additionally, deformational convolution and patch merging techniques are incorporated into the vision transformer model to mitigate complexity and overfitting while enhancing feature representation. Augmentation methods such as perturbation and Grid Mask are introduced to address the scarcity and noise in medical images, particularly for rare diseases. The proposed model is evaluated using diverse datasets including Break His, ISIC 2019, SIPaKMed, and STARE. The achieved performance accuracies of 89.89%, 87.33%, 94.55%, and 80.22% for Break His, ISIC 2019, SIPaKMed, and STARE, respectively, present evidence validating the superior performance of the proposed model in comparison to conventional models, setting a new benchmark for meta-vision image classification models.

图像分类是计算机视觉领域的一项基本任务,它面临的挑战包括有限的数据处理、可解释性、改进的特征表示、不同图像类型的效率以及噪声数据的处理。传统的架构方法在应对这些挑战方面没有取得足够的进展,因此需要能够进行细粒度分类、提高准确性和卓越通用性的架构。其中,视觉转换器是一种值得关注的计算机视觉架构。然而,由于其复杂性和对数据的高要求,它对大量训练数据的依赖构成了一个缺点。为了克服这些挑战,本文提出了一种创新方法--MetaV,将元学习集成到用于医学图像分类的视觉转换器中。从人类利用过去知识的学习机制中汲取灵感,采用 N 路 K-shot 学习来训练模型。此外,变形卷积和补丁合并技术被纳入视觉转换器模型,以减轻复杂性和过拟合,同时增强特征表示。此外,还引入了扰动和网格掩码等增强方法,以解决医学图像中的稀缺性和噪声问题,尤其是针对罕见疾病。我们使用不同的数据集对所提出的模型进行了评估,包括 Break His、ISIC 2019、SIPaKMed 和 STARE。Break His、ISIC 2019、SIPaKMed 和 STARE 的准确率分别为 89.89%、87.33%、94.55% 和 80.22%,证明了所提出的模型与传统模型相比具有更优越的性能,为元视觉图像分类模型树立了新的标杆。
{"title":"MetaV: A Pioneer in feature Augmented Meta-Learning Based Vision Transformer for Medical Image Classification.","authors":"Shaharyar Alam Ansari, Arun Prakash Agrawal, Mohd Anas Wajid, Mohammad Saif Wajid, Aasim Zafar","doi":"10.1007/s12539-024-00630-1","DOIUrl":"10.1007/s12539-024-00630-1","url":null,"abstract":"<p><p>Image classification, a fundamental task in computer vision, faces challenges concerning limited data handling, interpretability, improved feature representation, efficiency across diverse image types, and processing noisy data. Conventional architectural approaches have made insufficient progress in addressing these challenges, necessitating architectures capable of fine-grained classification, enhanced accuracy, and superior generalization. Among these, the vision transformer emerges as a noteworthy computer vision architecture. However, its reliance on substantial data for training poses a drawback due to its complexity and high data requirements. To surmount these challenges, this paper proposes an innovative approach, MetaV, integrating meta-learning into a vision transformer for medical image classification. N-way K-shot learning is employed to train the model, drawing inspiration from human learning mechanisms utilizing past knowledge. Additionally, deformational convolution and patch merging techniques are incorporated into the vision transformer model to mitigate complexity and overfitting while enhancing feature representation. Augmentation methods such as perturbation and Grid Mask are introduced to address the scarcity and noise in medical images, particularly for rare diseases. The proposed model is evaluated using diverse datasets including Break His, ISIC 2019, SIPaKMed, and STARE. The achieved performance accuracies of 89.89%, 87.33%, 94.55%, and 80.22% for Break His, ISIC 2019, SIPaKMed, and STARE, respectively, present evidence validating the superior performance of the proposed model in comparison to conventional models, setting a new benchmark for meta-vision image classification models.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"469-488"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141476538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LPI-SKMSC: Predicting LncRNA-Protein Interactions with Segmented k-mer Frequencies and Multi-space Clustering. LPI-SKMSC:利用分割 k-mer 频率和多空间聚类预测 LncRNA 与蛋白质的相互作用。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-01 Epub Date: 2024-01-11 DOI: 10.1007/s12539-023-00598-4
Dian-Zheng Sun, Zhan-Li Sun, Mengya Liu, Shuang-Hao Yong

 Long noncoding RNAs (lncRNAs) have significant regulatory roles in gene expression. Interactions with proteins are one of the ways lncRNAs play their roles. Since experiments to determine lncRNA-protein interactions (LPIs) are expensive and time-consuming, many computational methods for predicting LPIs have been proposed as alternatives. In the LPIs prediction problem, there commonly exists the imbalance in the distribution of positive and negative samples. However, there are few existing methods that give specific consideration to this problem. In this paper, we proposed a new clustering-based LPIs prediction method using segmented k-mer frequencies and multi-space clustering (LPI-SKMSC). It was dedicated to handling the imbalance of positive and negative samples. We constructed segmented k-mer frequencies to obtain global and local features of lncRNA and protein sequences. Then, the multi-space clustering was applied to LPI-SKMSC. The convolutional neural network (CNN)-based encoders were used to map different features of a sample to different spaces. It used multiple spaces to jointly constrain the classification of samples. Finally, the distances between the output features of the encoder and the cluster center in each space were calculated. The sum of distances in all spaces was compared with the cluster radius to predict the LPIs. We performed cross-validation on 3 public datasets and LPI-SKMSC showed the best performance compared to other existing methods. Experimental results showed that LPI-SKMSC could predict LPIs more effectively when faced with imbalanced positive and negative samples. In addition, we illustrated that our model was better at uncovering potential lncRNA-protein interaction pairs.

长非编码 RNA(lncRNA)在基因表达中具有重要的调控作用。与蛋白质相互作用是 lncRNA 发挥作用的方式之一。由于确定 lncRNA 与蛋白质相互作用(LPIs)的实验既昂贵又耗时,人们提出了许多预测 LPIs 的计算方法作为替代。在 LPIs 预测问题中,通常存在阳性样本和阴性样本分布不平衡的问题。然而,现有的方法很少专门考虑这一问题。在本文中,我们提出了一种新的基于聚类的 LPIs 预测方法(LPI-SKMSC),该方法使用分段 k-mer 频率和多空间聚类。该方法致力于处理正负样本的不平衡问题。我们构建了分段k-mer频率,以获得lncRNA和蛋白质序列的全局和局部特征。然后,将多空间聚类应用于 LPI-SKMSC。基于卷积神经网络(CNN)的编码器被用来将样本的不同特征映射到不同的空间。它使用多个空间来共同约束样本的分类。最后,计算编码器输出特性与每个空间的聚类中心之间的距离。将所有空间的距离总和与聚类半径进行比较,以预测 LPI。我们在 3 个公共数据集上进行了交叉验证,与其他现有方法相比,LPI-SKMSC 的性能最佳。实验结果表明,面对不平衡的正负样本,LPI-SKMSC 可以更有效地预测 LPI。此外,我们还证明了我们的模型能更好地发现潜在的 lncRNA 蛋白相互作用对。
{"title":"LPI-SKMSC: Predicting LncRNA-Protein Interactions with Segmented k-mer Frequencies and Multi-space Clustering.","authors":"Dian-Zheng Sun, Zhan-Li Sun, Mengya Liu, Shuang-Hao Yong","doi":"10.1007/s12539-023-00598-4","DOIUrl":"10.1007/s12539-023-00598-4","url":null,"abstract":"<p><p> Long noncoding RNAs (lncRNAs) have significant regulatory roles in gene expression. Interactions with proteins are one of the ways lncRNAs play their roles. Since experiments to determine lncRNA-protein interactions (LPIs) are expensive and time-consuming, many computational methods for predicting LPIs have been proposed as alternatives. In the LPIs prediction problem, there commonly exists the imbalance in the distribution of positive and negative samples. However, there are few existing methods that give specific consideration to this problem. In this paper, we proposed a new clustering-based LPIs prediction method using segmented k-mer frequencies and multi-space clustering (LPI-SKMSC). It was dedicated to handling the imbalance of positive and negative samples. We constructed segmented k-mer frequencies to obtain global and local features of lncRNA and protein sequences. Then, the multi-space clustering was applied to LPI-SKMSC. The convolutional neural network (CNN)-based encoders were used to map different features of a sample to different spaces. It used multiple spaces to jointly constrain the classification of samples. Finally, the distances between the output features of the encoder and the cluster center in each space were calculated. The sum of distances in all spaces was compared with the cluster radius to predict the LPIs. We performed cross-validation on 3 public datasets and LPI-SKMSC showed the best performance compared to other existing methods. Experimental results showed that LPI-SKMSC could predict LPIs more effectively when faced with imbalanced positive and negative samples. In addition, we illustrated that our model was better at uncovering potential lncRNA-protein interaction pairs.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"378-391"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139416997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Interdisciplinary Sciences: Computational Life Sciences
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1