Pub Date : 2024-12-01Epub Date: 2024-08-19DOI: 10.1007/s12539-024-00639-6
Dapeng Xiong, Kaicheng U, Jianfeng Sun, Adam P Cribbs
X-ray diffraction crystallography has been most widely used for protein three-dimensional (3D) structure determination for which whether proteins are crystallizable is a central prerequisite. Yet, there are a number of procedures during protein crystallization, including protein material production, purification, and crystal production, which take turns affecting the crystallization outcome. Due to the expensive and laborious nature of this multi-stage process, various computational tools have been developed to predict protein crystallization propensity, which is then used to guide the experimental determination. In this study, we presented a novel deep learning framework, PLMC, to improve multi-stage protein crystallization propensity prediction by leveraging a pre-trained protein language model. To effectively train PLMC, two groups of features of each protein were integrated into a more comprehensive representation, including protein language embeddings from the large-scale protein sequence database and a handcrafted feature set consisting of physicochemical, sequence-based and disordered-related information. These features were further separately embedded for refinement, and then concatenated for the final prediction. Notably, our extensive benchmarking tests demonstrate that PLMC greatly outperforms other state-of-the-art methods by achieving AUC scores of 0.773, 0.893, and 0.913, respectively, at the aforementioned individual stages, and 0.982 at the final crystallization stage. Furthermore, PLMC is shown to be superior for predicting the crystallization of both globular and membrane proteins, as demonstrated by an AUC score of 0.991 for the latter. These results suggest the significant potential of PLMC in assisting researchers with the experimental design of crystallizable protein variants.
{"title":"PLMC: Language Model of Protein Sequences Enhances Protein Crystallization Prediction.","authors":"Dapeng Xiong, Kaicheng U, Jianfeng Sun, Adam P Cribbs","doi":"10.1007/s12539-024-00639-6","DOIUrl":"10.1007/s12539-024-00639-6","url":null,"abstract":"<p><p>X-ray diffraction crystallography has been most widely used for protein three-dimensional (3D) structure determination for which whether proteins are crystallizable is a central prerequisite. Yet, there are a number of procedures during protein crystallization, including protein material production, purification, and crystal production, which take turns affecting the crystallization outcome. Due to the expensive and laborious nature of this multi-stage process, various computational tools have been developed to predict protein crystallization propensity, which is then used to guide the experimental determination. In this study, we presented a novel deep learning framework, PLMC, to improve multi-stage protein crystallization propensity prediction by leveraging a pre-trained protein language model. To effectively train PLMC, two groups of features of each protein were integrated into a more comprehensive representation, including protein language embeddings from the large-scale protein sequence database and a handcrafted feature set consisting of physicochemical, sequence-based and disordered-related information. These features were further separately embedded for refinement, and then concatenated for the final prediction. Notably, our extensive benchmarking tests demonstrate that PLMC greatly outperforms other state-of-the-art methods by achieving AUC scores of 0.773, 0.893, and 0.913, respectively, at the aforementioned individual stages, and 0.982 at the final crystallization stage. Furthermore, PLMC is shown to be superior for predicting the crystallization of both globular and membrane proteins, as demonstrated by an AUC score of 0.991 for the latter. These results suggest the significant potential of PLMC in assisting researchers with the experimental design of crystallizable protein variants.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"802-813"},"PeriodicalIF":3.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141999874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-08-21DOI: 10.1007/s12539-024-00649-4
Faiqa Maqsood, Wang Zhenfei, Muhammad Mumtaz Ali, Baozhi Qiu, Naveed Ur Rehman, Fahad Sabah, Tahir Mahmood, Irfanud Din, Raheem Sarwar
The kidney is an abdominal organ in the human body that supports filtering excess water and waste from the blood. Kidney diseases generally occur due to changes in certain supplements, medical conditions, obesity, and diet, which causes kidney function and ultimately leads to complications such as chronic kidney disease, kidney failure, and other renal disorders. Combining patient metadata with computed tomography (CT) images is essential to accurately and timely diagnosing such complications. Deep Neural Networks (DNNs) have transformed medical fields by providing high accuracy in complex tasks. However, the high computational cost of these models is a significant challenge, particularly in real-time applications. This paper proposed SpinalZFNet, a hybrid deep learning approach that integrates the architectural strengths of Spinal Network (SpinalNet) with the feature extraction capabilities of Zeiler and Fergus Network (ZFNet) to classify kidney disease accurately using CT images. This unique combination enhanced feature analysis, significantly improving classification accuracy while reducing the computational overhead. At first, the acquired CT images are pre-processed using a median filter, and the pre-processed image is segmented using Efficient Neural Network (ENet). Later, the images are augmented, and different features are extracted from the augmented CT images. The extracted features finally classify the kidney disease into normal, tumor, cyst, and stone using the proposed SpinalZFNet model. The SpinalZFNet outperformed other models, with 99.9% sensitivity, 99.5% specificity, precision 99.6%, 99.8% accuracy, and 99.7% F1-Score in classifying kidney disease.
{"title":"Artificial Intelligence-Based Classification of CT Images Using a Hybrid SpinalZFNet.","authors":"Faiqa Maqsood, Wang Zhenfei, Muhammad Mumtaz Ali, Baozhi Qiu, Naveed Ur Rehman, Fahad Sabah, Tahir Mahmood, Irfanud Din, Raheem Sarwar","doi":"10.1007/s12539-024-00649-4","DOIUrl":"10.1007/s12539-024-00649-4","url":null,"abstract":"<p><p>The kidney is an abdominal organ in the human body that supports filtering excess water and waste from the blood. Kidney diseases generally occur due to changes in certain supplements, medical conditions, obesity, and diet, which causes kidney function and ultimately leads to complications such as chronic kidney disease, kidney failure, and other renal disorders. Combining patient metadata with computed tomography (CT) images is essential to accurately and timely diagnosing such complications. Deep Neural Networks (DNNs) have transformed medical fields by providing high accuracy in complex tasks. However, the high computational cost of these models is a significant challenge, particularly in real-time applications. This paper proposed SpinalZFNet, a hybrid deep learning approach that integrates the architectural strengths of Spinal Network (SpinalNet) with the feature extraction capabilities of Zeiler and Fergus Network (ZFNet) to classify kidney disease accurately using CT images. This unique combination enhanced feature analysis, significantly improving classification accuracy while reducing the computational overhead. At first, the acquired CT images are pre-processed using a median filter, and the pre-processed image is segmented using Efficient Neural Network (ENet). Later, the images are augmented, and different features are extracted from the augmented CT images. The extracted features finally classify the kidney disease into normal, tumor, cyst, and stone using the proposed SpinalZFNet model. The SpinalZFNet outperformed other models, with 99.9% sensitivity, 99.5% specificity, precision 99.6%, 99.8% accuracy, and 99.7% F1-Score in classifying kidney disease.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"907-925"},"PeriodicalIF":3.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11512893/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142017327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-08-07DOI: 10.1007/s12539-024-00645-8
Wenzhi Liu, Pengli Lu
The exploration of the interactions between diseases and metabolites holds significant implications for the diagnosis and treatment of diseases. However, traditional experimental methods are time-consuming and costly, and current computational methods often overlook the influence of other biological entities on both. In light of these limitations, we proposed a novel deep learning model based on metapath aggregation of tripartite heterogeneous networks (MAHN) to explore disease-related metabolites. Specifically, we introduced microbes to construct a tripartite heterogeneous network and employed graph convolutional network and enhanced GraphSAGE to learn node features with metapath length 3. Additionally, we utilized node-level and semantic-level attention mechanisms, a more granular approach, to aggregate node features with metapath length 2. Finally, the reconstructed association probability is obtained by fusing features from different metapaths into the bilinear decoder. The experiments demonstrate that the proposed MAHN model achieved superior performance in five-fold cross-validation with Acc (91.85%), Pre (90.48%), Recall (93.53%), F1 (91.94%), AUC (97.39%), and AUPR (97.47%), outperforming four state-of-the-art algorithms. Case studies on two complex diseases, irritable bowel syndrome and obesity, further validate the predictive results, and the MAHN model is a trustworthy prediction tool for discovering potential metabolites. Moreover, deep learning models integrating multi-omics data represent the future mainstream direction for predicting disease-related biological entities.
{"title":"Predicting Disease-Metabolite Associations Based on the Metapath Aggregation of Tripartite Heterogeneous Networks.","authors":"Wenzhi Liu, Pengli Lu","doi":"10.1007/s12539-024-00645-8","DOIUrl":"10.1007/s12539-024-00645-8","url":null,"abstract":"<p><p>The exploration of the interactions between diseases and metabolites holds significant implications for the diagnosis and treatment of diseases. However, traditional experimental methods are time-consuming and costly, and current computational methods often overlook the influence of other biological entities on both. In light of these limitations, we proposed a novel deep learning model based on metapath aggregation of tripartite heterogeneous networks (MAHN) to explore disease-related metabolites. Specifically, we introduced microbes to construct a tripartite heterogeneous network and employed graph convolutional network and enhanced GraphSAGE to learn node features with metapath length 3. Additionally, we utilized node-level and semantic-level attention mechanisms, a more granular approach, to aggregate node features with metapath length 2. Finally, the reconstructed association probability is obtained by fusing features from different metapaths into the bilinear decoder. The experiments demonstrate that the proposed MAHN model achieved superior performance in five-fold cross-validation with Acc (91.85%), Pre (90.48%), Recall (93.53%), F1 (91.94%), AUC (97.39%), and AUPR (97.47%), outperforming four state-of-the-art algorithms. Case studies on two complex diseases, irritable bowel syndrome and obesity, further validate the predictive results, and the MAHN model is a trustworthy prediction tool for discovering potential metabolites. Moreover, deep learning models integrating multi-omics data represent the future mainstream direction for predicting disease-related biological entities.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"829-843"},"PeriodicalIF":3.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141901633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-07-21DOI: 10.1007/s12539-024-00643-w
Aubrey Chiarelli, Hana Dobrovolny
The development of antiviral treatments for SARS-CoV-2 was an important turning point for the pandemic. Availability of safe and effective antivirals has allowed people to return back to normal life. While SARS-CoV-2 antivirals are highly effective at preventing severe disease, there have been concerning reports of viral rebound in some patients after cessation of antiviral treatment. In this study, we use a mathematical model of viral infection to study the potential of different antivirals to prevent viral rebound. We find that antivirals that block production are most likely to result in viral rebound if the treatment time course is not sufficiently long. Since these antivirals do not prevent infection of cells, cells continue to be infected during treatment. When treatment is stopped, the infected cells will begin producing virus at the usual rate. Antivirals that prevent infection of cells are less likely to result in viral rebound since cells are not being infected during treatment. This study highlights the role of antiviral mechanism of action in increasing or reducing the probability of viral rebound.
{"title":"Viral Rebound After Antiviral Treatment: A Mathematical Modeling Study of the Role of Antiviral Mechanism of Action.","authors":"Aubrey Chiarelli, Hana Dobrovolny","doi":"10.1007/s12539-024-00643-w","DOIUrl":"10.1007/s12539-024-00643-w","url":null,"abstract":"<p><p>The development of antiviral treatments for SARS-CoV-2 was an important turning point for the pandemic. Availability of safe and effective antivirals has allowed people to return back to normal life. While SARS-CoV-2 antivirals are highly effective at preventing severe disease, there have been concerning reports of viral rebound in some patients after cessation of antiviral treatment. In this study, we use a mathematical model of viral infection to study the potential of different antivirals to prevent viral rebound. We find that antivirals that block production are most likely to result in viral rebound if the treatment time course is not sufficiently long. Since these antivirals do not prevent infection of cells, cells continue to be infected during treatment. When treatment is stopped, the infected cells will begin producing virus at the usual rate. Antivirals that prevent infection of cells are less likely to result in viral rebound since cells are not being infected during treatment. This study highlights the role of antiviral mechanism of action in increasing or reducing the probability of viral rebound.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"844-853"},"PeriodicalIF":3.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141734033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-08-19DOI: 10.1007/s12539-024-00636-9
Zhi Liu, Qinhan Zhang, Sixin Luo, Meiqiao Qin
Sleep staging is the most crucial work before diagnosing and treating sleep disorders. Traditional manual sleep staging is time-consuming and depends on the skill of experts. Nowadays, automatic sleep staging based on deep learning attracts more and more scientific researchers. As we know, the salient waves in sleep signals contain the most important information for automatic sleep staging. However, the key information is not fully utilized in existing deep learning methods since most of them only use CNN or RNN which could not capture multi-scale features in salient waves effectively. To tackle this limitation, we propose a lightweight end-to-end network for sleep stage prediction based on feature pyramid and joint attention. The feature pyramid module is designed to effectively extract multi-scale features in salient waves, and these features are then fed to the joint attention module to closely attend to the channel and location information of the salient waves. The proposed network has much fewer parameters and significant performance improvement, which is better than the state-of-the-art results. The overall accuracy and macro F1 score on the public dataset Sleep-EDF39, Sleep-EDF153 and SHHS are 90.1%, 87.8%, 87.4%, 84.4% and 86.9%, 83.9%, respectively. Ablation experiments confirm the effectiveness of each module.
{"title":"FPJA-Net: A Lightweight End-to-End Network for Sleep Stage Prediction Based on Feature Pyramid and Joint Attention.","authors":"Zhi Liu, Qinhan Zhang, Sixin Luo, Meiqiao Qin","doi":"10.1007/s12539-024-00636-9","DOIUrl":"10.1007/s12539-024-00636-9","url":null,"abstract":"<p><p>Sleep staging is the most crucial work before diagnosing and treating sleep disorders. Traditional manual sleep staging is time-consuming and depends on the skill of experts. Nowadays, automatic sleep staging based on deep learning attracts more and more scientific researchers. As we know, the salient waves in sleep signals contain the most important information for automatic sleep staging. However, the key information is not fully utilized in existing deep learning methods since most of them only use CNN or RNN which could not capture multi-scale features in salient waves effectively. To tackle this limitation, we propose a lightweight end-to-end network for sleep stage prediction based on feature pyramid and joint attention. The feature pyramid module is designed to effectively extract multi-scale features in salient waves, and these features are then fed to the joint attention module to closely attend to the channel and location information of the salient waves. The proposed network has much fewer parameters and significant performance improvement, which is better than the state-of-the-art results. The overall accuracy and macro F1 score on the public dataset Sleep-EDF39, Sleep-EDF153 and SHHS are 90.1%, 87.8%, 87.4%, 84.4% and 86.9%, 83.9%, respectively. Ablation experiments confirm the effectiveness of each module.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"769-780"},"PeriodicalIF":3.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141999873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-09-04DOI: 10.1007/s12539-024-00638-7
Weiyu Shi, Yan Zhang, Yeqing Sun, Zhengkui Lin
Using genes which have been experimentally-validated for diseases (functions) can develop machine learning methods to predict new disease/function-genes. However, the prediction of both function-genes and disease-genes faces the same problem: there are only certain positive examples, but no negative examples. To solve this problem, we proposed a function/disease-genes prediction algorithm based on network embedding (Variational Graph Auto-Encoders, VGAE) and one-class classification (Fast Minimum Covariance Determinant, Fast-MCD): VGAEMCD. Firstly, we constructed a protein-protein interaction (PPI) network centered on experimentally-validated genes; then VGAE was used to get the embeddings of nodes (genes) in the network; finally, the embeddings were input into the improved deep learning one-class classifier based on Fast-MCD to predict function/disease-genes. VGAEMCD can predict function-gene and disease-gene in a unified way, and only the experimentally-verified genes are needed to provide (no need for expression profile). VGAEMCD outperforms classical one-class classification algorithms in Recall, Precision, F-measure, Specificity, and Accuracy. Further experiments show that seven metrics of VGAEMCD are higher than those of state-of-art function/disease-genes prediction algorithms. The above results indicate that VGAEMCD can well learn the distribution characteristics of positive examples and accurately identify function/disease-genes.
{"title":"Function-Genes and Disease-Genes Prediction Based on Network Embedding and One-Class Classification.","authors":"Weiyu Shi, Yan Zhang, Yeqing Sun, Zhengkui Lin","doi":"10.1007/s12539-024-00638-7","DOIUrl":"10.1007/s12539-024-00638-7","url":null,"abstract":"<p><p>Using genes which have been experimentally-validated for diseases (functions) can develop machine learning methods to predict new disease/function-genes. However, the prediction of both function-genes and disease-genes faces the same problem: there are only certain positive examples, but no negative examples. To solve this problem, we proposed a function/disease-genes prediction algorithm based on network embedding (Variational Graph Auto-Encoders, VGAE) and one-class classification (Fast Minimum Covariance Determinant, Fast-MCD): VGAEMCD. Firstly, we constructed a protein-protein interaction (PPI) network centered on experimentally-validated genes; then VGAE was used to get the embeddings of nodes (genes) in the network; finally, the embeddings were input into the improved deep learning one-class classifier based on Fast-MCD to predict function/disease-genes. VGAEMCD can predict function-gene and disease-gene in a unified way, and only the experimentally-verified genes are needed to provide (no need for expression profile). VGAEMCD outperforms classical one-class classification algorithms in Recall, Precision, F-measure, Specificity, and Accuracy. Further experiments show that seven metrics of VGAEMCD are higher than those of state-of-art function/disease-genes prediction algorithms. The above results indicate that VGAEMCD can well learn the distribution characteristics of positive examples and accurately identify function/disease-genes.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"781-801"},"PeriodicalIF":3.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142125655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-10-05DOI: 10.1007/s12539-024-00647-6
Fırat Hardalaç, Haad Akmal, Kubilay Ayturan, U Rajendra Acharya, Ru-San Tan
Cardiotocography (CTG) is used to assess the health of the fetus during birth or antenatally in the third trimester. It concurrently detects the maternal uterine contractions (UC) and fetal heart rate (FHR). Fetal distress, which may require therapeutic intervention, can be diagnosed using baseline FHR and its reaction to uterine contractions. Using CTG, a pragmatic machine learning strategy based on feature reduction and hyperparameter optimization was suggested in this study to classify the various fetal states (Normal, Suspect, Pathological). An application of this strategy can be a decision support tool to manage pregnancies. On a public dataset of 2126 CTG recordings, the model was assessed using various standard CTG dataset specific and relevant classifiers. The classifiers' accuracy was improved by the proposed method. The model accuracy was increased to 97.20% while using Random Forest (best classifier). Practically speaking, the model was able to correctly predict 100% of all pathological cases and 98.8% of all normal cases in the dataset. The proposed model was also implemented on another public CTG dataset having 552 CTG signals, resulting in a 97.34% accuracy. If integrated with telemedicine, this proposed model could also be used for long-distance "stay at home" fetal monitoring in high-risk pregnancies.
{"title":"A Pragmatic Approach to Fetal Monitoring via Cardiotocography Using Feature Elimination and Hyperparameter Optimization.","authors":"Fırat Hardalaç, Haad Akmal, Kubilay Ayturan, U Rajendra Acharya, Ru-San Tan","doi":"10.1007/s12539-024-00647-6","DOIUrl":"10.1007/s12539-024-00647-6","url":null,"abstract":"<p><p>Cardiotocography (CTG) is used to assess the health of the fetus during birth or antenatally in the third trimester. It concurrently detects the maternal uterine contractions (UC) and fetal heart rate (FHR). Fetal distress, which may require therapeutic intervention, can be diagnosed using baseline FHR and its reaction to uterine contractions. Using CTG, a pragmatic machine learning strategy based on feature reduction and hyperparameter optimization was suggested in this study to classify the various fetal states (Normal, Suspect, Pathological). An application of this strategy can be a decision support tool to manage pregnancies. On a public dataset of 2126 CTG recordings, the model was assessed using various standard CTG dataset specific and relevant classifiers. The classifiers' accuracy was improved by the proposed method. The model accuracy was increased to 97.20% while using Random Forest (best classifier). Practically speaking, the model was able to correctly predict 100% of all pathological cases and 98.8% of all normal cases in the dataset. The proposed model was also implemented on another public CTG dataset having 552 CTG signals, resulting in a 97.34% accuracy. If integrated with telemedicine, this proposed model could also be used for long-distance \"stay at home\" fetal monitoring in high-risk pregnancies.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"882-906"},"PeriodicalIF":3.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142377867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Copy number variation (CNV) is an essential genetic driving factor of cancer formation and progression, making intelligent classification based on CNV feasible. However, there are a few challenges in the current machine learning and deep learning methods, such as the design of base classifier combination schemes in ensemble methods and the selection of layers of neural networks, which often result in low accuracy. Therefore, an adaptive bilinear dynamic cascade model (Adap-BDCM) is developed to further enhance the accuracy and applicability of these methods for intelligent classification on CNV datasets. In this model, a feature selection module is introduced to mitigate the interference of redundant information, and a bilinear model based on the gated attention mechanism is proposed to extract more beneficial deep fusion features. Furthermore, an adaptive base classifier selection scheme is designed to overcome the difficulty of manually designing base classifier combinations and enhance the applicability of the model. Lastly, a novel feature fusion scheme with an attribute recall submodule is constructed, effectively avoiding getting stuck in local solutions and missing some valuable information. Numerous experiments have demonstrated that our Adap-BDCM model exhibits optimal performance in cancer classification, stage prediction, and recurrence on CNV datasets. This study can assist physicians in making diagnoses faster and better.
{"title":"Adap-BDCM: Adaptive Bilinear Dynamic Cascade Model for Classification Tasks on CNV Datasets.","authors":"Liancheng Jiang, Liye Jia, Yizhen Wang, Yongfei Wu, Junhong Yue","doi":"10.1007/s12539-024-00635-w","DOIUrl":"10.1007/s12539-024-00635-w","url":null,"abstract":"<p><p>Copy number variation (CNV) is an essential genetic driving factor of cancer formation and progression, making intelligent classification based on CNV feasible. However, there are a few challenges in the current machine learning and deep learning methods, such as the design of base classifier combination schemes in ensemble methods and the selection of layers of neural networks, which often result in low accuracy. Therefore, an adaptive bilinear dynamic cascade model (Adap-BDCM) is developed to further enhance the accuracy and applicability of these methods for intelligent classification on CNV datasets. In this model, a feature selection module is introduced to mitigate the interference of redundant information, and a bilinear model based on the gated attention mechanism is proposed to extract more beneficial deep fusion features. Furthermore, an adaptive base classifier selection scheme is designed to overcome the difficulty of manually designing base classifier combinations and enhance the applicability of the model. Lastly, a novel feature fusion scheme with an attribute recall submodule is constructed, effectively avoiding getting stuck in local solutions and missing some valuable information. Numerous experiments have demonstrated that our Adap-BDCM model exhibits optimal performance in cancer classification, stage prediction, and recurrence on CNV datasets. This study can assist physicians in making diagnoses faster and better.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"1019-1037"},"PeriodicalIF":3.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140956486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-05-23DOI: 10.1007/s12539-024-00633-y
Wei Liu, Zhijie Teng, Zejun Li, Jing Chen
Gene regulatory network (GRN) inference based on single-cell RNA sequencing data (scRNAseq) plays a crucial role in understanding the regulatory mechanisms between genes. Various computational methods have been employed for GRN inference, but their performance in terms of network accuracy and model generalization is not satisfactory, and their poor performance is caused by high-dimensional data and network sparsity. In this paper, we propose a self-supervised method for gene regulatory network inference using single-cell RNA sequencing data (CVGAE). CVGAE uses graph neural network for inductive representation learning, which merges gene expression data and observed topology into a low-dimensional vector space. The well-trained vectors will be used to calculate mathematical distance of each gene, and further predict interactions between genes. In overall framework, FastICA is implemented to relief computational complexity caused by high dimensional data, and CVGAE adopts multi-stacked GraphSAGE layers as an encoder and an improved decoder to overcome network sparsity. CVGAE is evaluated on several single cell datasets containing four related ground-truth networks, and the result shows that CVGAE achieve better performance than comparative methods. To validate learning and generalization capabilities, CVGAE is applied in few-shot environment by change the ratio of train set and test set. In condition of few-shot, CVGAE obtains comparable or superior performance.
{"title":"CVGAE: A Self-Supervised Generative Method for Gene Regulatory Network Inference Using Single-Cell RNA Sequencing Data.","authors":"Wei Liu, Zhijie Teng, Zejun Li, Jing Chen","doi":"10.1007/s12539-024-00633-y","DOIUrl":"10.1007/s12539-024-00633-y","url":null,"abstract":"<p><p>Gene regulatory network (GRN) inference based on single-cell RNA sequencing data (scRNAseq) plays a crucial role in understanding the regulatory mechanisms between genes. Various computational methods have been employed for GRN inference, but their performance in terms of network accuracy and model generalization is not satisfactory, and their poor performance is caused by high-dimensional data and network sparsity. In this paper, we propose a self-supervised method for gene regulatory network inference using single-cell RNA sequencing data (CVGAE). CVGAE uses graph neural network for inductive representation learning, which merges gene expression data and observed topology into a low-dimensional vector space. The well-trained vectors will be used to calculate mathematical distance of each gene, and further predict interactions between genes. In overall framework, FastICA is implemented to relief computational complexity caused by high dimensional data, and CVGAE adopts multi-stacked GraphSAGE layers as an encoder and an improved decoder to overcome network sparsity. CVGAE is evaluated on several single cell datasets containing four related ground-truth networks, and the result shows that CVGAE achieve better performance than comparative methods. To validate learning and generalization capabilities, CVGAE is applied in few-shot environment by change the ratio of train set and test set. In condition of few-shot, CVGAE obtains comparable or superior performance.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"990-1004"},"PeriodicalIF":3.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141081107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-09-27DOI: 10.1007/s12539-024-00642-x
Safia Firdous, Zubair Nawaz, Rizwan Abid, Leo L Cheng, Syed Ghulam Musharraf, Saima Sadaf
Diagnosing and classifying central nervous system tumors such as gliomas or glioblastomas pose a significant challenge due to their aggressive and infiltrative nature. However, recent advancements in metabolomics and magnetic resonance spectroscopy (MRS) offer promising avenues for differentiating tumor grades both in vivo and ex vivo. This study aimed to explore tissue-based metabolic signatures to classify/distinguish between low- and high-grade gliomas. Forty-six histologically confirmed, intact solid tumor samples from glioma patients were analyzed using high-resolution magic angle spinning nuclear magnetic resonance (HRMAS-NMR) spectroscopy. By integrating machine learning (ML) algorithms, spectral regions with the most discriminative potential were identified. Validation was performed through univariate and multivariate statistical analyses, along with HRMAS-NMR analyses of 46 paired plasma samples. Amongst the various ML models applied, the logistics regression identified 46 spectral regions capable of sub-classifying gliomas with accuracy 87% (F1-measure 0.87, Precision 0.82, Recall 0.93), whereas the extra-tree classifier identified three spectral regions with predictive accuracy of 91% (F1-measure 0.91, Precision 0.85, Recall 0.97). Wilcoxon test presented 51 spectral regions significantly differentiating low- and high-grade glioma groups (p < 0.05). Based on sensitivity and area under the curve values, 40 spectral regions corresponding to 18 metabolites were considered as potential biomarkers for tissue-based glioma classification and amongst these N-acetyl aspartate, glutamate, and glutamine emerged as the most important markers. These markers were validated in paired plasma samples, and their absolute concentrations were computed. Our results demonstrate that the metabolic markers identified through the HRMAS-NMR-ML analysis framework, and their associated metabolic networks, hold promise for targeted treatment planning and clinical interventions in the future.
{"title":"Integrating HRMAS-NMR Data and Machine Learning-Assisted Profiling of Metabolite Fluxes to Classify Low- and High-Grade Gliomas.","authors":"Safia Firdous, Zubair Nawaz, Rizwan Abid, Leo L Cheng, Syed Ghulam Musharraf, Saima Sadaf","doi":"10.1007/s12539-024-00642-x","DOIUrl":"10.1007/s12539-024-00642-x","url":null,"abstract":"<p><p>Diagnosing and classifying central nervous system tumors such as gliomas or glioblastomas pose a significant challenge due to their aggressive and infiltrative nature. However, recent advancements in metabolomics and magnetic resonance spectroscopy (MRS) offer promising avenues for differentiating tumor grades both in vivo and ex vivo. This study aimed to explore tissue-based metabolic signatures to classify/distinguish between low- and high-grade gliomas. Forty-six histologically confirmed, intact solid tumor samples from glioma patients were analyzed using high-resolution magic angle spinning nuclear magnetic resonance (HRMAS-NMR) spectroscopy. By integrating machine learning (ML) algorithms, spectral regions with the most discriminative potential were identified. Validation was performed through univariate and multivariate statistical analyses, along with HRMAS-NMR analyses of 46 paired plasma samples. Amongst the various ML models applied, the logistics regression identified 46 spectral regions capable of sub-classifying gliomas with accuracy 87% (F1-measure 0.87, Precision 0.82, Recall 0.93), whereas the extra-tree classifier identified three spectral regions with predictive accuracy of 91% (F1-measure 0.91, Precision 0.85, Recall 0.97). Wilcoxon test presented 51 spectral regions significantly differentiating low- and high-grade glioma groups (p < 0.05). Based on sensitivity and area under the curve values, 40 spectral regions corresponding to 18 metabolites were considered as potential biomarkers for tissue-based glioma classification and amongst these N-acetyl aspartate, glutamate, and glutamine emerged as the most important markers. These markers were validated in paired plasma samples, and their absolute concentrations were computed. Our results demonstrate that the metabolic markers identified through the HRMAS-NMR-ML analysis framework, and their associated metabolic networks, hold promise for targeted treatment planning and clinical interventions in the future.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"854-871"},"PeriodicalIF":3.9,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142346019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}