Recognizing drug-target interactions (DTI) stands as a pivotal element in the expansive field of drug discovery. Traditional biological wet experiments, although valuable, are time-consuming and costly as methods. Recently, computational methods grounded in network learning have demonstrated great advantages by effective topological feature extraction and attracted extensive research attention. However, most existing network-based learning methods only consider the low-order binary correlation between individual drug and target, neglecting the potential higher-order correlation information derived from multiple drugs and targets. High-order information, as an essential component, exhibits complementarity with low-order information. Hence, the incorporation of higher-order associations between drugs and targets, while adequately integrating them with the existing lower-order information, could potentially yield substantial breakthroughs in predicting drug-target interactions. We propose a novel dual channels network-based learning model CHL-DTI that converges high-order information from hypergraphs and low-order information from ordinary graph for drug-target interaction prediction. The convergence of high-low order information in CHL-DTI is manifested in two key aspects. First, during the feature extraction stage, the model integrates both high-level semantic information and low-level topological information by combining hypergraphs and ordinary graph. Second, CHL-DTI fully fuse the innovative introduced drug-protein pairs (DPP) hypergraph network structure with ordinary topological network structure information. Extensive experimentation conducted on three public datasets showcases the superior performance of CHL-DTI in DTI prediction tasks when compared to SOTA methods. The source code of CHL-DTI is available at https://github.com/UPCLyy/CHL-DTI .
{"title":"CHL-DTI: A Novel High-Low Order Information Convergence Framework for Effective Drug-Target Interaction Prediction.","authors":"Shudong Wang, Yingye Liu, Yuanyuan Zhang, Kuijie Zhang, Xuanmo Song, Yu Zhang, Shanchen Pang","doi":"10.1007/s12539-024-00608-z","DOIUrl":"10.1007/s12539-024-00608-z","url":null,"abstract":"<p><p>Recognizing drug-target interactions (DTI) stands as a pivotal element in the expansive field of drug discovery. Traditional biological wet experiments, although valuable, are time-consuming and costly as methods. Recently, computational methods grounded in network learning have demonstrated great advantages by effective topological feature extraction and attracted extensive research attention. However, most existing network-based learning methods only consider the low-order binary correlation between individual drug and target, neglecting the potential higher-order correlation information derived from multiple drugs and targets. High-order information, as an essential component, exhibits complementarity with low-order information. Hence, the incorporation of higher-order associations between drugs and targets, while adequately integrating them with the existing lower-order information, could potentially yield substantial breakthroughs in predicting drug-target interactions. We propose a novel dual channels network-based learning model CHL-DTI that converges high-order information from hypergraphs and low-order information from ordinary graph for drug-target interaction prediction. The convergence of high-low order information in CHL-DTI is manifested in two key aspects. First, during the feature extraction stage, the model integrates both high-level semantic information and low-level topological information by combining hypergraphs and ordinary graph. Second, CHL-DTI fully fuse the innovative introduced drug-protein pairs (DPP) hypergraph network structure with ordinary topological network structure information. Extensive experimentation conducted on three public datasets showcases the superior performance of CHL-DTI in DTI prediction tasks when compared to SOTA methods. The source code of CHL-DTI is available at https://github.com/UPCLyy/CHL-DTI .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"568-578"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140131318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01Epub Date: 2024-03-01DOI: 10.1007/s12539-024-00606-1
Jin Deng, Kaijun Li, Wei Luo
Sarcomas are malignant tumors from mesenchymal tissue and are characterized by their complexity and diversity. The high recurrence rate making it important to understand the mechanisms behind their recurrence and to develop personalized treatments and drugs. However, previous studies on the association patterns of multi-modal data on sarcoma recurrence have overlooked the fact that genes do not act independently, but rather function within signaling pathways. Therefore, this study collected 290 whole solid images, 869 gene and 1387 pathway data of over 260 sarcoma samples from UCSC and TCGA to identify the association patterns of gene-pathway-cell related to sarcoma recurrences. Meanwhile, considering that most multi-modal data fusion methods based on the joint non-negative matrix factorization (NMF) model led to poor experimental repeatability due to random initialization of factorization parameters, the study proposed the singular value decomposition (SVD)-driven joint NMF model by applying the SVD method to calculate initialized weight and coefficient matrices to achieve the reproducibility of the results. The results of the experimental comparison indicated that the SVD algorithm enhances the performance of the joint NMF algorithm. Furthermore, the representative module indicated a significant relationship between genes in pathways and image features. Multi-level analysis provided valuable insights into the connections between biological processes, cellular features, and sarcoma recurrence. In addition, potential biomarkers were uncovered, while various mechanisms of sarcoma recurrence were identified from an imaging genetic perspective. Overall, the SVD-NMF model affords a novel perspective on combining multi-omics data to explore the association related to sarcoma recurrence.
{"title":"Singular Value Decomposition-Driven Non-negative Matrix Factorization with Application to Identify the Association Patterns of Sarcoma Recurrence.","authors":"Jin Deng, Kaijun Li, Wei Luo","doi":"10.1007/s12539-024-00606-1","DOIUrl":"10.1007/s12539-024-00606-1","url":null,"abstract":"<p><p>Sarcomas are malignant tumors from mesenchymal tissue and are characterized by their complexity and diversity. The high recurrence rate making it important to understand the mechanisms behind their recurrence and to develop personalized treatments and drugs. However, previous studies on the association patterns of multi-modal data on sarcoma recurrence have overlooked the fact that genes do not act independently, but rather function within signaling pathways. Therefore, this study collected 290 whole solid images, 869 gene and 1387 pathway data of over 260 sarcoma samples from UCSC and TCGA to identify the association patterns of gene-pathway-cell related to sarcoma recurrences. Meanwhile, considering that most multi-modal data fusion methods based on the joint non-negative matrix factorization (NMF) model led to poor experimental repeatability due to random initialization of factorization parameters, the study proposed the singular value decomposition (SVD)-driven joint NMF model by applying the SVD method to calculate initialized weight and coefficient matrices to achieve the reproducibility of the results. The results of the experimental comparison indicated that the SVD algorithm enhances the performance of the joint NMF algorithm. Furthermore, the representative module indicated a significant relationship between genes in pathways and image features. Multi-level analysis provided valuable insights into the connections between biological processes, cellular features, and sarcoma recurrence. In addition, potential biomarkers were uncovered, while various mechanisms of sarcoma recurrence were identified from an imaging genetic perspective. Overall, the SVD-NMF model affords a novel perspective on combining multi-omics data to explore the association related to sarcoma recurrence.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"554-567"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139996217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Molecular representation learning can preserve meaningful molecular structures as embedding vectors, which is a necessary prerequisite for molecular property prediction. Yet, learning how to accurately represent molecules remains challenging. Previous approaches to learning molecular representations in an end-to-end manner potentially suffered information loss while neglecting the utilization of molecular generative representations. To obtain rich molecular feature information, the pre-training molecular representation model utilized different molecular representations to reduce information loss caused by a single molecular representation. Therefore, we provide the MVGC, a unique multi-view generative contrastive learning pre-training model. Our pre-training framework specifically acquires knowledge of three fundamental feature representations of molecules and effectively integrates them to predict molecular properties on benchmark datasets. Comprehensive experiments on seven classification tasks and three regression tasks demonstrate that our proposed MVGC model surpasses the majority of state-of-the-art approaches. Moreover, we explore the potential of the MVGC model to learn the representation of molecules with chemical significance.
{"title":"A Multi-view Molecular Pre-training with Generative Contrastive Learning.","authors":"Yunwu Liu, Ruisheng Zhang, Yongna Yuan, Jun Ma, Tongfeng Li, Zhixuan Yu","doi":"10.1007/s12539-024-00632-z","DOIUrl":"10.1007/s12539-024-00632-z","url":null,"abstract":"<p><p>Molecular representation learning can preserve meaningful molecular structures as embedding vectors, which is a necessary prerequisite for molecular property prediction. Yet, learning how to accurately represent molecules remains challenging. Previous approaches to learning molecular representations in an end-to-end manner potentially suffered information loss while neglecting the utilization of molecular generative representations. To obtain rich molecular feature information, the pre-training molecular representation model utilized different molecular representations to reduce information loss caused by a single molecular representation. Therefore, we provide the MVGC, a unique multi-view generative contrastive learning pre-training model. Our pre-training framework specifically acquires knowledge of three fundamental feature representations of molecules and effectively integrates them to predict molecular properties on benchmark datasets. Comprehensive experiments on seven classification tasks and three regression tasks demonstrate that our proposed MVGC model surpasses the majority of state-of-the-art approaches. Moreover, we explore the potential of the MVGC model to learn the representation of molecules with chemical significance.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"741-754"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140865514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01Epub Date: 2024-07-02DOI: 10.1007/s12539-024-00621-2
Shouheng Tuo, Jiewei Jiang
To elucidate the genetic basis of complex diseases, it is crucial to discover the single-nucleotide polymorphisms (SNPs) contributing to disease susceptibility. This is particularly challenging for high-order SNP epistatic interactions (HEIs), which exhibit small individual effects but potentially large joint effects. These interactions are difficult to detect due to the vast search space, encompassing billions of possible combinations, and the computational complexity of evaluating them. This study proposes a novel explicit-encoding-based multitasking harmony search algorithm (MTHS-EE-DHEI) specifically designed to address this challenge. The algorithm operates in three stages. First, a harmony search algorithm is employed, utilizing four lightweight evaluation functions, such as Bayesian network and entropy, to efficiently explore potential SNP combinations related to disease status. Second, a G-test statistical method is applied to filter out insignificant SNP combinations. Finally, two machine learning-based methods, multifactor dimensionality reduction (MDR) as well as random forest (RF), are employed to validate the classification performance of the remaining significant SNP combinations. This research aims to demonstrate the effectiveness of MTHS-EE-DHEI in identifying HEIs compared to existing methods, potentially providing valuable insights into the genetic architecture of complex diseases. The performance of MTHS-EE-DHEI was evaluated on twenty simulated disease datasets and three real-world datasets encompassing age-related macular degeneration (AMD), rheumatoid arthritis (RA), and breast cancer (BC). The results demonstrably indicate that MTHS-EE-DHEI outperforms four state-of-the-art algorithms in terms of both detection power and computational efficiency. The source code is available at https://github.com/shouhengtuo/MTHS-EE-DHEI.git .
要阐明复杂疾病的遗传基础,发现导致疾病易感性的单核苷酸多态性(SNPs)至关重要。这对于高阶 SNP 表观交互作用(HEIs)来说尤其具有挑战性,因为这种交互作用表现出较小的个体效应,但可能产生较大的联合效应。由于搜索空间巨大,包含数十亿种可能的组合,而且评估这些组合的计算复杂,因此很难检测到这些相互作用。本研究提出了一种基于显式编码的新型多任务和谐搜索算法(MTHS-EE-DHEI),专门用于解决这一难题。该算法分三个阶段运行。首先,采用和谐搜索算法,利用贝叶斯网络和熵等四种轻量级评估函数,有效探索与疾病状态相关的潜在 SNP 组合。其次,采用 G 检验统计方法过滤掉不重要的 SNP 组合。最后,采用多因素降维(MDR)和随机森林(RF)这两种基于机器学习的方法来验证剩余重要 SNP 组合的分类性能。这项研究旨在证明,与现有方法相比,MTHS-EE-DHEI 在识别 HEI 方面非常有效,有可能为复杂疾病的遗传结构提供有价值的见解。MTHS-EE-DHEI 的性能在二十个模拟疾病数据集和三个真实世界数据集上进行了评估,包括老年性黄斑变性(AMD)、类风湿性关节炎(RA)和乳腺癌(BC)。结果表明,MTHS-EE-DHEI 在检测能力和计算效率方面都优于四种最先进的算法。源代码见 https://github.com/shouhengtuo/MTHS-EE-DHEI.git 。
{"title":"A Novel Detection Method for High-Order SNP Epistatic Interactions Based on Explicit-Encoding-Based Multitasking Harmony Search.","authors":"Shouheng Tuo, Jiewei Jiang","doi":"10.1007/s12539-024-00621-2","DOIUrl":"10.1007/s12539-024-00621-2","url":null,"abstract":"<p><p>To elucidate the genetic basis of complex diseases, it is crucial to discover the single-nucleotide polymorphisms (SNPs) contributing to disease susceptibility. This is particularly challenging for high-order SNP epistatic interactions (HEIs), which exhibit small individual effects but potentially large joint effects. These interactions are difficult to detect due to the vast search space, encompassing billions of possible combinations, and the computational complexity of evaluating them. This study proposes a novel explicit-encoding-based multitasking harmony search algorithm (MTHS-EE-DHEI) specifically designed to address this challenge. The algorithm operates in three stages. First, a harmony search algorithm is employed, utilizing four lightweight evaluation functions, such as Bayesian network and entropy, to efficiently explore potential SNP combinations related to disease status. Second, a G-test statistical method is applied to filter out insignificant SNP combinations. Finally, two machine learning-based methods, multifactor dimensionality reduction (MDR) as well as random forest (RF), are employed to validate the classification performance of the remaining significant SNP combinations. This research aims to demonstrate the effectiveness of MTHS-EE-DHEI in identifying HEIs compared to existing methods, potentially providing valuable insights into the genetic architecture of complex diseases. The performance of MTHS-EE-DHEI was evaluated on twenty simulated disease datasets and three real-world datasets encompassing age-related macular degeneration (AMD), rheumatoid arthritis (RA), and breast cancer (BC). The results demonstrably indicate that MTHS-EE-DHEI outperforms four state-of-the-art algorithms in terms of both detection power and computational efficiency. The source code is available at https://github.com/shouhengtuo/MTHS-EE-DHEI.git .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"688-711"},"PeriodicalIF":3.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141491788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-03-08DOI: 10.1007/s12539-024-00609-y
Jun Ma, Zhili Zhao, Tongfeng Li, Yunwu Liu, Jun Ma, Ruisheng Zhang
Accurately predicting compound-protein interactions (CPI) is a critical task in computer-aided drug design. In recent years, the exponential growth of compound activity and biomedical data has highlighted the need for efficient and interpretable prediction approaches. In this study, we propose GraphsformerCPI, an end-to-end deep learning framework that improves prediction performance and interpretability. GraphsformerCPI treats compounds and proteins as sequences of nodes with spatial structures, and leverages novel structure-enhanced self-attention mechanisms to integrate semantic and graph structural features within molecules for deep molecule representations. To capture the vital association between compound atoms and protein residues, we devise a dual-attention mechanism to effectively extract relational features through .cross-mapping. By extending the powerful learning capabilities of Transformers to spatial structures and extensively utilizing attention mechanisms, our model offers strong interpretability, a significant advantage over most black-box deep learning methods. To evaluate GraphsformerCPI, extensive experiments were conducted on benchmark datasets including human, C. elegans, Davis and KIBA datasets. We explored the impact of model depth and dropout rate on performance and compared our model against state-of-the-art baseline models. Our results demonstrate that GraphsformerCPI outperforms baseline models in classification datasets and achieves competitive performance in regression datasets. Specifically, on the human dataset, GraphsformerCPI achieves an average improvement of 1.6% in AUC, 0.5% in precision, and 5.3% in recall. On the KIBA dataset, the average improvement in Concordance index (CI) and mean squared error (MSE) is 3.3% and 7.2%, respectively. Molecular docking shows that our model provides novel insights into the intrinsic interactions and binding mechanisms. Our research holds practical significance in effectively predicting CPIs and binding affinities, identifying key atoms and residues, enhancing model interpretability.
{"title":"GraphsformerCPI: Graph Transformer for Compound-Protein Interaction Prediction.","authors":"Jun Ma, Zhili Zhao, Tongfeng Li, Yunwu Liu, Jun Ma, Ruisheng Zhang","doi":"10.1007/s12539-024-00609-y","DOIUrl":"10.1007/s12539-024-00609-y","url":null,"abstract":"<p><p>Accurately predicting compound-protein interactions (CPI) is a critical task in computer-aided drug design. In recent years, the exponential growth of compound activity and biomedical data has highlighted the need for efficient and interpretable prediction approaches. In this study, we propose GraphsformerCPI, an end-to-end deep learning framework that improves prediction performance and interpretability. GraphsformerCPI treats compounds and proteins as sequences of nodes with spatial structures, and leverages novel structure-enhanced self-attention mechanisms to integrate semantic and graph structural features within molecules for deep molecule representations. To capture the vital association between compound atoms and protein residues, we devise a dual-attention mechanism to effectively extract relational features through .cross-mapping. By extending the powerful learning capabilities of Transformers to spatial structures and extensively utilizing attention mechanisms, our model offers strong interpretability, a significant advantage over most black-box deep learning methods. To evaluate GraphsformerCPI, extensive experiments were conducted on benchmark datasets including human, C. elegans, Davis and KIBA datasets. We explored the impact of model depth and dropout rate on performance and compared our model against state-of-the-art baseline models. Our results demonstrate that GraphsformerCPI outperforms baseline models in classification datasets and achieves competitive performance in regression datasets. Specifically, on the human dataset, GraphsformerCPI achieves an average improvement of 1.6% in AUC, 0.5% in precision, and 5.3% in recall. On the KIBA dataset, the average improvement in Concordance index (CI) and mean squared error (MSE) is 3.3% and 7.2%, respectively. Molecular docking shows that our model provides novel insights into the intrinsic interactions and binding mechanisms. Our research holds practical significance in effectively predicting CPIs and binding affinities, identifying key atoms and residues, enhancing model interpretability.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"361-377"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140059303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-05-11DOI: 10.1007/s12539-024-00619-w
Lihong Peng, Mengnan Ren, Liangliang Huang, Min Chen
Accumulating studies have demonstrated close relationships between long non-coding RNAs (lncRNAs) and diseases. Identification of new lncRNA-disease associations (LDAs) enables us to better understand disease mechanisms and further provides promising insights into cancer targeted therapy and anti-cancer drug design. Here, we present an LDA prediction framework called GEnDDn based on deep learning. GEnDDn mainly comprises two steps: First, features of both lncRNAs and diseases are extracted by combining similarity computation, non-negative matrix factorization, and graph attention auto-encoder, respectively. And each lncRNA-disease pair (LDP) is depicted as a vector based on concatenation operation on the extracted features. Subsequently, unknown LDPs are classified by aggregating dual-net neural architecture and deep neural network. Using six different evaluation metrics, we found that GEnDDn surpassed four competing LDA identification methods (SDLDA, LDNFSGB, IPCARF, LDASR) on the lncRNADisease and MNDR databases under fivefold cross-validation experiments on lncRNAs, diseases, LDPs, and independent lncRNAs and independent diseases, respectively. Ablation experiments further validated the powerful LDA prediction performance of GEnDDn. Furthermore, we utilized GEnDDn to find underlying lncRNAs for lung cancer and breast cancer. The results elucidated that there may be dense linkages between IFNG-AS1 and lung cancer as well as between HIF1A-AS1 and breast cancer. The results require further biomedical experimental verification. GEnDDn is publicly available at https://github.com/plhhnu/GEnDDn.
{"title":"GEnDDn: An lncRNA-Disease Association Identification Framework Based on Dual-Net Neural Architecture and Deep Neural Network.","authors":"Lihong Peng, Mengnan Ren, Liangliang Huang, Min Chen","doi":"10.1007/s12539-024-00619-w","DOIUrl":"10.1007/s12539-024-00619-w","url":null,"abstract":"<p><p>Accumulating studies have demonstrated close relationships between long non-coding RNAs (lncRNAs) and diseases. Identification of new lncRNA-disease associations (LDAs) enables us to better understand disease mechanisms and further provides promising insights into cancer targeted therapy and anti-cancer drug design. Here, we present an LDA prediction framework called GEnDDn based on deep learning. GEnDDn mainly comprises two steps: First, features of both lncRNAs and diseases are extracted by combining similarity computation, non-negative matrix factorization, and graph attention auto-encoder, respectively. And each lncRNA-disease pair (LDP) is depicted as a vector based on concatenation operation on the extracted features. Subsequently, unknown LDPs are classified by aggregating dual-net neural architecture and deep neural network. Using six different evaluation metrics, we found that GEnDDn surpassed four competing LDA identification methods (SDLDA, LDNFSGB, IPCARF, LDASR) on the lncRNADisease and MNDR databases under fivefold cross-validation experiments on lncRNAs, diseases, LDPs, and independent lncRNAs and independent diseases, respectively. Ablation experiments further validated the powerful LDA prediction performance of GEnDDn. Furthermore, we utilized GEnDDn to find underlying lncRNAs for lung cancer and breast cancer. The results elucidated that there may be dense linkages between IFNG-AS1 and lung cancer as well as between HIF1A-AS1 and breast cancer. The results require further biomedical experimental verification. GEnDDn is publicly available at https://github.com/plhhnu/GEnDDn.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"418-438"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140907796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
miRNAs are important regulators for many crucial biological processes. Many recent studies have shown that miRNAs are closely related to various human diseases and can be potential biomarkers or therapeutic targets for some diseases, such as cancers. Therefore, accurately predicting miRNA-disease associations is of great importance for understanding and curing diseases. However, how to efficiently utilize the characteristics of miRNAs and diseases and the information on known miRNA-disease associations for prediction is still not fully explored. In this study, we propose a novel computational method for predicting miRNA-disease associations. The proposed method combines the graph convolutional network and the hypergraph convolutional network. The graph convolutional network is utilized to extract the information from miRNA-similarity data as well as disease-similarity data. Based on the representations of miRNAs and diseases learned by the graph convolutional network, we further use the hypergraph convolutional network to capture the complex high-order interactions in the known miRNA-disease associations. We conduct comprehensive experiments with different datasets and predictive tasks. The results show that the proposed method consistently outperforms several other state-of-the-art methods. We also discuss the influence of hyper-parameters and model structures on the performance of our method. Some case studies also demonstrate that the predictive results of the method can be verified by independent experiments.
{"title":"Predicting miRNA-Disease Associations by Combining Graph and Hypergraph Convolutional Network.","authors":"Xujun Liang, Ming Guo, Longying Jiang, Ying Fu, Pengfei Zhang, Yongheng Chen","doi":"10.1007/s12539-023-00599-3","DOIUrl":"10.1007/s12539-023-00599-3","url":null,"abstract":"<p><p>miRNAs are important regulators for many crucial biological processes. Many recent studies have shown that miRNAs are closely related to various human diseases and can be potential biomarkers or therapeutic targets for some diseases, such as cancers. Therefore, accurately predicting miRNA-disease associations is of great importance for understanding and curing diseases. However, how to efficiently utilize the characteristics of miRNAs and diseases and the information on known miRNA-disease associations for prediction is still not fully explored. In this study, we propose a novel computational method for predicting miRNA-disease associations. The proposed method combines the graph convolutional network and the hypergraph convolutional network. The graph convolutional network is utilized to extract the information from miRNA-similarity data as well as disease-similarity data. Based on the representations of miRNAs and diseases learned by the graph convolutional network, we further use the hypergraph convolutional network to capture the complex high-order interactions in the known miRNA-disease associations. We conduct comprehensive experiments with different datasets and predictive tasks. The results show that the proposed method consistently outperforms several other state-of-the-art methods. We also discuss the influence of hyper-parameters and model structures on the performance of our method. Some case studies also demonstrate that the predictive results of the method can be verified by independent experiments.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"289-303"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139574645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Efficient and precise design of antimicrobial peptides (AMPs) is of great importance in the field of AMP development. Computing provides opportunities for peptide de novo design. In the present investigation, a new machine learning-based AMP prediction model, AP_Sin, was trained using 1160 AMP sequences and 1160 non-AMP sequences. The results showed that AP_Sin correctly classified 94.61% of AMPs on a comprehensive dataset, outperforming the mainstream and open-source models (Antimicrobial Peptide Scanner vr.2, iAMPpred and AMPlify) and being effective in identifying AMPs. In addition, a peptide sequence generator, AP_Gen, was devised based on the concept of recombining dominant amino acids and dipeptide compositions. After inputting the parameters of the 71 tridecapeptides from antimicrobial peptides database (APD3) into AP_Gen, a tridecapeptide bank consisting of de novo designed 17,496 tridecapeptide sequences were randomly generated, from which 2675 candidate AMP sequences were identified by AP_Sin. Chemical synthesis was performed on 180 randomly selected candidate AMP sequences, of which 18 showed high antimicrobial activities against a wide range of the tested pathogenic microorganisms, and 16 of which had a minimal inhibitory concentration of less than 10 μg/mL against at least one of the tested pathogenic microorganisms. The method established in this research accelerates the discovery of valuable candidate AMPs and provides a novel approach for de novo design of antimicrobial peptides.
{"title":"Machine Learning Accelerates De Novo Design of Antimicrobial Peptides.","authors":"Kedong Yin, Wen Xu, Shiming Ren, Qingpeng Xu, Shaojie Zhang, Ruiling Zhang, Mengwan Jiang, Yuhong Zhang, Degang Xu, Ruifang Li","doi":"10.1007/s12539-024-00612-3","DOIUrl":"10.1007/s12539-024-00612-3","url":null,"abstract":"<p><p>Efficient and precise design of antimicrobial peptides (AMPs) is of great importance in the field of AMP development. Computing provides opportunities for peptide de novo design. In the present investigation, a new machine learning-based AMP prediction model, AP_Sin, was trained using 1160 AMP sequences and 1160 non-AMP sequences. The results showed that AP_Sin correctly classified 94.61% of AMPs on a comprehensive dataset, outperforming the mainstream and open-source models (Antimicrobial Peptide Scanner vr.2, iAMPpred and AMPlify) and being effective in identifying AMPs. In addition, a peptide sequence generator, AP_Gen, was devised based on the concept of recombining dominant amino acids and dipeptide compositions. After inputting the parameters of the 71 tridecapeptides from antimicrobial peptides database (APD3) into AP_Gen, a tridecapeptide bank consisting of de novo designed 17,496 tridecapeptide sequences were randomly generated, from which 2675 candidate AMP sequences were identified by AP_Sin. Chemical synthesis was performed on 180 randomly selected candidate AMP sequences, of which 18 showed high antimicrobial activities against a wide range of the tested pathogenic microorganisms, and 16 of which had a minimal inhibitory concentration of less than 10 μg/mL against at least one of the tested pathogenic microorganisms. The method established in this research accelerates the discovery of valuable candidate AMPs and provides a novel approach for de novo design of antimicrobial peptides.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"392-403"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139982902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-02-27DOI: 10.1007/s12539-024-00620-3
Rashid Khan, Chuda Xiao, Yang Liu, Jinyu Tian, Zhuo Chen, Liyilei Su, Dan Li, Haseeb Hassan, Haoyu Li, Weiguo Xie, Wen Zhong, Bingding Huang
Kidney ultrasound (US) images are primarily employed for diagnosing different renal diseases. Among them, one is renal localization and detection, which can be carried out by segmenting the kidney US images. However, kidney segmentation from US images is challenging due to low contrast, speckle noise, fluid, variations in kidney shape, and modality artifacts. Moreover, well-annotated US datasets for renal segmentation and detection are scarce. This study aims to build a novel, well-annotated dataset containing 44,880 US images. In addition, we propose a novel training scheme that utilizes the encoder and decoder parts of a state-of-the-art segmentation algorithm. In the pre-processing step, pixel intensity normalization improves contrast and facilitates model convergence. The modified encoder-decoder architecture improves pyramid-shaped hole pooling, cascaded multiple-hole convolutions, and batch normalization. The pre-processing step gradually reconstructs spatial information, including the capture of complete object boundaries, and the post-processing module with a concave curvature reduces the false positive rate of the results. We present benchmark findings to validate the quality of the proposed training scheme and dataset. We applied six evaluation metrics and several baseline segmentation approaches to our novel kidney US dataset. Among the evaluated models, DeepLabv3+ performed well and achieved the highest dice, Hausdorff distance 95, accuracy, specificity, average symmetric surface distance, and recall scores of 89.76%, 9.91, 98.14%, 98.83%, 3.03, and 90.68%, respectively. The proposed training strategy aids state-of-the-art segmentation models, resulting in better-segmented predictions. Furthermore, the large, well-annotated kidney US public dataset will serve as a valuable baseline source for future medical image analysis research.
肾脏超声波(US)图像主要用于诊断不同的肾脏疾病。其中,肾脏定位和检测可通过分割肾脏 US 图像来实现。然而,由于对比度低、斑点噪声、流体、肾脏形状变化和模式伪影等原因,从 US 图像中分割肾脏具有挑战性。此外,用于肾脏分割和检测的注释良好的 US 数据集也很少。本研究旨在建立一个包含 44,880 张 US 图像的新型、注释完善的数据集。此外,我们还提出了一种新的训练方案,该方案利用了最先进的分割算法的编码器和解码器部分。在预处理步骤中,像素强度归一化可提高对比度并促进模型收敛。修改后的编码器-解码器架构改进了金字塔形孔池、级联多孔卷积和批量归一化。预处理步骤逐步重建空间信息,包括捕捉完整的物体边界,而带有凹曲率的后处理模块则降低了结果的误报率。我们提出了基准结果,以验证所提出的训练方案和数据集的质量。我们对新型肾脏 US 数据集采用了六种评估指标和几种基线分割方法。在接受评估的模型中,DeepLabv3+ 表现出色,在骰子、豪斯多夫距离 95、准确性、特异性、平均对称面距离和召回率方面分别取得了 89.76%、9.91、98.14%、98.83%、3.03 和 90.68% 的最高分。所提出的训练策略有助于最先进的分割模型,从而获得更好的分割预测结果。此外,美国肾脏公共数据集规模大、注释详尽,将成为未来医学图像分析研究的宝贵基准源。
{"title":"Transformative Deep Neural Network Approaches in Kidney Ultrasound Segmentation: Empirical Validation with an Annotated Dataset.","authors":"Rashid Khan, Chuda Xiao, Yang Liu, Jinyu Tian, Zhuo Chen, Liyilei Su, Dan Li, Haseeb Hassan, Haoyu Li, Weiguo Xie, Wen Zhong, Bingding Huang","doi":"10.1007/s12539-024-00620-3","DOIUrl":"10.1007/s12539-024-00620-3","url":null,"abstract":"<p><p>Kidney ultrasound (US) images are primarily employed for diagnosing different renal diseases. Among them, one is renal localization and detection, which can be carried out by segmenting the kidney US images. However, kidney segmentation from US images is challenging due to low contrast, speckle noise, fluid, variations in kidney shape, and modality artifacts. Moreover, well-annotated US datasets for renal segmentation and detection are scarce. This study aims to build a novel, well-annotated dataset containing 44,880 US images. In addition, we propose a novel training scheme that utilizes the encoder and decoder parts of a state-of-the-art segmentation algorithm. In the pre-processing step, pixel intensity normalization improves contrast and facilitates model convergence. The modified encoder-decoder architecture improves pyramid-shaped hole pooling, cascaded multiple-hole convolutions, and batch normalization. The pre-processing step gradually reconstructs spatial information, including the capture of complete object boundaries, and the post-processing module with a concave curvature reduces the false positive rate of the results. We present benchmark findings to validate the quality of the proposed training scheme and dataset. We applied six evaluation metrics and several baseline segmentation approaches to our novel kidney US dataset. Among the evaluated models, DeepLabv3+ performed well and achieved the highest dice, Hausdorff distance 95, accuracy, specificity, average symmetric surface distance, and recall scores of 89.76%, 9.91, 98.14%, 98.83%, 3.03, and 90.68%, respectively. The proposed training strategy aids state-of-the-art segmentation models, resulting in better-segmented predictions. Furthermore, the large, well-annotated kidney US public dataset will serve as a valuable baseline source for future medical image analysis research.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"439-454"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139982903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01Epub Date: 2024-02-11DOI: 10.1007/s12539-024-00604-3
Songyang Wu, Kui Jin, Mingjing Tang, Yuelong Xia, Wei Gao
Since gene regulation is a complex process in which multiple genes act simultaneously, accurately inferring gene regulatory networks (GRNs) is a long-standing challenge in systems biology. Although graph neural networks can formally describe intricate gene expression mechanisms, current GRN inference methods based on graph learning regard only transcription factor (TF)-target gene interactions as pairwise relationships, and cannot model the many-to-many high-order regulatory patterns that prevail among genes. Moreover, these methods often rely on limited prior regulatory knowledge, ignoring the structural information of GRNs in gene expression profiles. Therefore, we propose a multi-view hierarchical hypergraphs GRN (MHHGRN) inference model. Specifically, multiple heterogeneous biological information is integrated to construct multi-view hierarchical hypergraphs of TFs and target genes, using hypergraph convolution networks to model higher order complex regulatory relationships. Meanwhile, the coupled information diffusion mechanism and the cross-domain messaging mechanism facilitate the information sharing between genes to optimise gene embedding representations. Finally, a unique channel attention mechanism is used to adaptively learn feature representations from multiple views for GRN inference. Experimental results show that MHHGRN achieves better results than the baseline methods on the E. coli and S. cerevisiae benchmark datasets of the DREAM5 challenge, and it has excellent cross-species generalization, achieving comparable or better performance on scRNA-seq datasets from five mouse and two human cell lines.
{"title":"Inference of Gene Regulatory Networks Based on Multi-view Hierarchical Hypergraphs.","authors":"Songyang Wu, Kui Jin, Mingjing Tang, Yuelong Xia, Wei Gao","doi":"10.1007/s12539-024-00604-3","DOIUrl":"10.1007/s12539-024-00604-3","url":null,"abstract":"<p><p>Since gene regulation is a complex process in which multiple genes act simultaneously, accurately inferring gene regulatory networks (GRNs) is a long-standing challenge in systems biology. Although graph neural networks can formally describe intricate gene expression mechanisms, current GRN inference methods based on graph learning regard only transcription factor (TF)-target gene interactions as pairwise relationships, and cannot model the many-to-many high-order regulatory patterns that prevail among genes. Moreover, these methods often rely on limited prior regulatory knowledge, ignoring the structural information of GRNs in gene expression profiles. Therefore, we propose a multi-view hierarchical hypergraphs GRN (MHHGRN) inference model. Specifically, multiple heterogeneous biological information is integrated to construct multi-view hierarchical hypergraphs of TFs and target genes, using hypergraph convolution networks to model higher order complex regulatory relationships. Meanwhile, the coupled information diffusion mechanism and the cross-domain messaging mechanism facilitate the information sharing between genes to optimise gene embedding representations. Finally, a unique channel attention mechanism is used to adaptively learn feature representations from multiple views for GRN inference. Experimental results show that MHHGRN achieves better results than the baseline methods on the E. coli and S. cerevisiae benchmark datasets of the DREAM5 challenge, and it has excellent cross-species generalization, achieving comparable or better performance on scRNA-seq datasets from five mouse and two human cell lines.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"318-332"},"PeriodicalIF":3.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139717494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}