Pub Date : 2024-12-18DOI: 10.1186/s13040-024-00411-y
Nina Kastendiek, Roberta Coletti, Thilo Gross, Marta B Lopes
Gliomas are primary malignant brain tumors with a typically poor prognosis, exhibiting significant heterogeneity across different cancer types. Each glioma type possesses distinct molecular characteristics determining patient prognosis and therapeutic options. This study aims to explore the molecular complexity of gliomas at the transcriptome level, employing a comprehensive approach grounded in network discovery. The graphical lasso method was used to estimate a gene co-expression network for each glioma type from a transcriptomics dataset. Causality was subsequently inferred from correlation networks by estimating the Jacobian matrix. The networks were then analyzed for gene importance using centrality measures and modularity detection, leading to the selection of genes that might play an important role in the disease. To explore the pathways and biological functions these genes are involved in, KEGG and Gene Ontology (GO) enrichment analyses on the disclosed gene sets were performed, highlighting the significance of the genes selected across several relevent pathways and GO terms. Spectral clustering based on patient similarity networks was applied to stratify patients into groups with similar molecular characteristics and to assess whether the resulting clusters align with the diagnosed glioma type. The results presented highlight the ability of the proposed methodology to uncover relevant genes associated with glioma intertumoral heterogeneity. Further investigation might encompass biological validation of the putative biomarkers disclosed.
{"title":"Exploring glioma heterogeneity through omics networks: from gene network discovery to causal insights and patient stratification.","authors":"Nina Kastendiek, Roberta Coletti, Thilo Gross, Marta B Lopes","doi":"10.1186/s13040-024-00411-y","DOIUrl":"10.1186/s13040-024-00411-y","url":null,"abstract":"<p><p>Gliomas are primary malignant brain tumors with a typically poor prognosis, exhibiting significant heterogeneity across different cancer types. Each glioma type possesses distinct molecular characteristics determining patient prognosis and therapeutic options. This study aims to explore the molecular complexity of gliomas at the transcriptome level, employing a comprehensive approach grounded in network discovery. The graphical lasso method was used to estimate a gene co-expression network for each glioma type from a transcriptomics dataset. Causality was subsequently inferred from correlation networks by estimating the Jacobian matrix. The networks were then analyzed for gene importance using centrality measures and modularity detection, leading to the selection of genes that might play an important role in the disease. To explore the pathways and biological functions these genes are involved in, KEGG and Gene Ontology (GO) enrichment analyses on the disclosed gene sets were performed, highlighting the significance of the genes selected across several relevent pathways and GO terms. Spectral clustering based on patient similarity networks was applied to stratify patients into groups with similar molecular characteristics and to assess whether the resulting clusters align with the diagnosed glioma type. The results presented highlight the ability of the proposed methodology to uncover relevant genes associated with glioma intertumoral heterogeneity. Further investigation might encompass biological validation of the putative biomarkers disclosed.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"56"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11657291/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142856223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objectives: Bladder cancer (BLCA) is a tumor that affects men more than women. The biological function and prognostic value of androgen-responsive genes (ARGs) in BLCA are currently unknown. To address this, we established an androgen signature to determine the prognosis of BLCA.
Methods: Sequencing data for BLCA from the TCGA and GEO datasets were used for research. The tumor microenvironment (TME) was measured using Cibersort and ssGSEA. Prognosis-related genes were identified and a risk score model was constructed using univariate Cox regression, LASSO regression, and multivariate Cox regression. Drug sensitivity analysis was performed using Genomics of drug sensitivity in cancer (GDSC). Real-time quantitative PCR was performed to assess the expression of representative genes in clinical samples.
Results: ARGs (especially the CDK6, FADS1, PGM3, SCD, PTK2B, and TPD52) might regulate the progression of BLCA. The different expression patterns of ARGs may lead to different immune cell infiltration. The risk model indicates that patients with higher risk scores have a poorer prognosis, more stromal infiltration, and an enrichment of biological functions. Single-cell RNA analysis, bulk RNA data, and PCR analysis support the reliability of this risk model, and a nomogram was also established for clinical use. Drug prediction analysis showed that high-risk patients had a better response to fludarabine, AZD8186, and carmustine.
Conclusion: ARGs played an important role in the progression, immune infiltration, and prognosis of BLCA. The ARGs model has high accuracy in predicting the prognosis of BLCA patients and provides more effective medication guidelines.
{"title":"Prognostic feature based on androgen-responsive genes in bladder cancer and screening for potential targeted drugs.","authors":"Jiang Zhao, Qian Zhang, Cunle Zhu, Wu Yuqi, Guohui Zhang, Qianliang Wang, Xingyou Dong, Benyi Li, Xiangwei Wang","doi":"10.1186/s13040-024-00377-x","DOIUrl":"10.1186/s13040-024-00377-x","url":null,"abstract":"<p><strong>Objectives: </strong>Bladder cancer (BLCA) is a tumor that affects men more than women. The biological function and prognostic value of androgen-responsive genes (ARGs) in BLCA are currently unknown. To address this, we established an androgen signature to determine the prognosis of BLCA.</p><p><strong>Methods: </strong>Sequencing data for BLCA from the TCGA and GEO datasets were used for research. The tumor microenvironment (TME) was measured using Cibersort and ssGSEA. Prognosis-related genes were identified and a risk score model was constructed using univariate Cox regression, LASSO regression, and multivariate Cox regression. Drug sensitivity analysis was performed using Genomics of drug sensitivity in cancer (GDSC). Real-time quantitative PCR was performed to assess the expression of representative genes in clinical samples.</p><p><strong>Results: </strong>ARGs (especially the CDK6, FADS1, PGM3, SCD, PTK2B, and TPD52) might regulate the progression of BLCA. The different expression patterns of ARGs may lead to different immune cell infiltration. The risk model indicates that patients with higher risk scores have a poorer prognosis, more stromal infiltration, and an enrichment of biological functions. Single-cell RNA analysis, bulk RNA data, and PCR analysis support the reliability of this risk model, and a nomogram was also established for clinical use. Drug prediction analysis showed that high-risk patients had a better response to fludarabine, AZD8186, and carmustine.</p><p><strong>Conclusion: </strong>ARGs played an important role in the progression, immune infiltration, and prognosis of BLCA. The ARGs model has high accuracy in predicting the prognosis of BLCA patients and provides more effective medication guidelines.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"59"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11657289/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142856224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-18DOI: 10.1186/s13040-024-00408-7
Shilpa R Thandla, Grace Q Armstrong, Adil Menon, Aashna Shah, David L Gueye, Clara Harb, Estefania Hernandez, Yasaswini Iyer, Abigail R Hotchner, Riddhi Modi, Anusha Mudigonda, Maria A Prokos, Tharun M Rao, Olivia R Thomas, Camilo A Beltran, Taylor Guerrieri, Sydney LeBlanc, Skanda Moorthy, Sara G Yacoub, Jacob E Gardner, Benjamin M Greenberg, Alyssa Hubal, Yuliana P Lapina, Jacqueline Moran, Joseph P O'Brien, Anna C Winnicki, Christina Yoka, Junwei Zhang, Peter A Zimmerman
<p><strong>Introduction: </strong>The transformative feature of Artificial Intelligence (AI) is the massive capacity for interpreting and transforming unstructured data into a coherent and meaningful context. In general, the potential that AI will alter traditional approaches to student research and its evaluation appears to be significant. With regard to research in global health, it is important for students and research experts to assess strengths and limitations of GenAI within this space. Thus, the goal of our research was to evaluate the information literacy of GenAI compared to expectations that graduate students meet in writing research papers.</p><p><strong>Methods: </strong>After completing the course, Fundamentals of Global Health (INTH 401) at Case Western Reserve University (CWRU), Graduate students who successfully completed their required research paper were recruited to compare their original papers with a paper they generated by ChatGPT-4o using the original assignment prompt. Students also completed a Google Forms survey to evaluate different sections of the AI-generated paper (e.g., Adherence to Introduction guidelines, Presentation of three perspectives, Conclusion) and their original papers and their overall satisfaction with the AI work. The original student to ChatGPT-4o comparison also enabled evaluation of narrative elements and references.</p><p><strong>Results: </strong>Of the 54 students who completed the required research paper, 28 (51.8%) agreed to collaborate in the comparison project. A summary of the survey responses suggested that students evaluated the AI-generated paper as inferior or similar to their own paper (overall satisfaction average = 2.39 (1.61-3.17); Likert scale: 1 to 5 with lower scores indicating inferiority). Evaluating the average individual student responses for 5 Likert item queries showed that 17 scores were < 2.9; 7 scores were between 3.0 to 3.9; 4 scores were ≥ 4.0, consistent with inferiority of the AI-generated paper. Evaluation of reference selection by ChatGPT-4o (n = 729 total references) showed that 54% (n = 396) were authentic, 46% (n = 333) did not exist. Of the authentic references, 26.5% (105/396) were relevant to the paper narrative; 14.4% of the 729 total references.</p><p><strong>Discussion: </strong>Our findings reveal strengths and limitations on the potential of AI tools to assist in understanding the complexities of global health topics. Strengths mentioned by students included the ability of ChatGPT-4o to produce content very quickly and to suggest topics that they had not considered in the 3-perspective sections of their papers. Consistently presenting up-to-date facts and references, as well as further examining or summarizing the complexities of global health topics, appears to be a current limitation of ChatGPT-4o. Because ChatGPT-4o generated references from highly credible biomedical research journals that did not exist, our findings conclude that ChatGPT-4o failed a
{"title":"Comparing new tools of artificial intelligence to the authentic intelligence of our global health students.","authors":"Shilpa R Thandla, Grace Q Armstrong, Adil Menon, Aashna Shah, David L Gueye, Clara Harb, Estefania Hernandez, Yasaswini Iyer, Abigail R Hotchner, Riddhi Modi, Anusha Mudigonda, Maria A Prokos, Tharun M Rao, Olivia R Thomas, Camilo A Beltran, Taylor Guerrieri, Sydney LeBlanc, Skanda Moorthy, Sara G Yacoub, Jacob E Gardner, Benjamin M Greenberg, Alyssa Hubal, Yuliana P Lapina, Jacqueline Moran, Joseph P O'Brien, Anna C Winnicki, Christina Yoka, Junwei Zhang, Peter A Zimmerman","doi":"10.1186/s13040-024-00408-7","DOIUrl":"10.1186/s13040-024-00408-7","url":null,"abstract":"<p><strong>Introduction: </strong>The transformative feature of Artificial Intelligence (AI) is the massive capacity for interpreting and transforming unstructured data into a coherent and meaningful context. In general, the potential that AI will alter traditional approaches to student research and its evaluation appears to be significant. With regard to research in global health, it is important for students and research experts to assess strengths and limitations of GenAI within this space. Thus, the goal of our research was to evaluate the information literacy of GenAI compared to expectations that graduate students meet in writing research papers.</p><p><strong>Methods: </strong>After completing the course, Fundamentals of Global Health (INTH 401) at Case Western Reserve University (CWRU), Graduate students who successfully completed their required research paper were recruited to compare their original papers with a paper they generated by ChatGPT-4o using the original assignment prompt. Students also completed a Google Forms survey to evaluate different sections of the AI-generated paper (e.g., Adherence to Introduction guidelines, Presentation of three perspectives, Conclusion) and their original papers and their overall satisfaction with the AI work. The original student to ChatGPT-4o comparison also enabled evaluation of narrative elements and references.</p><p><strong>Results: </strong>Of the 54 students who completed the required research paper, 28 (51.8%) agreed to collaborate in the comparison project. A summary of the survey responses suggested that students evaluated the AI-generated paper as inferior or similar to their own paper (overall satisfaction average = 2.39 (1.61-3.17); Likert scale: 1 to 5 with lower scores indicating inferiority). Evaluating the average individual student responses for 5 Likert item queries showed that 17 scores were < 2.9; 7 scores were between 3.0 to 3.9; 4 scores were ≥ 4.0, consistent with inferiority of the AI-generated paper. Evaluation of reference selection by ChatGPT-4o (n = 729 total references) showed that 54% (n = 396) were authentic, 46% (n = 333) did not exist. Of the authentic references, 26.5% (105/396) were relevant to the paper narrative; 14.4% of the 729 total references.</p><p><strong>Discussion: </strong>Our findings reveal strengths and limitations on the potential of AI tools to assist in understanding the complexities of global health topics. Strengths mentioned by students included the ability of ChatGPT-4o to produce content very quickly and to suggest topics that they had not considered in the 3-perspective sections of their papers. Consistently presenting up-to-date facts and references, as well as further examining or summarizing the complexities of global health topics, appears to be a current limitation of ChatGPT-4o. Because ChatGPT-4o generated references from highly credible biomedical research journals that did not exist, our findings conclude that ChatGPT-4o failed a","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"58"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11656723/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142856210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-18DOI: 10.1186/s13040-024-00410-z
Jakub Horvath, Pavel Jedlicka, Marie Kratka, Zdenek Kubat, Eduard Kejnovsky, Matej Lexa
Background: Long terminal repeats (LTRs) represent important parts of LTR retrotransposons and retroviruses found in high copy numbers in a majority of eukaryotic genomes. LTRs contain regulatory sequences essential for the life cycle of the retrotransposon. Previous experimental and sequence studies have provided only limited information about LTR structure and composition, mostly from model systems. To enhance our understanding of these key sequence modules, we focused on the contrasts between LTRs of various retrotransposon families and other genomic regions. Furthermore, this approach can be utilized for the classification and prediction of LTRs.
Results: We used machine learning methods suitable for DNA sequence classification and applied them to a large dataset of plant LTR retrotransposon sequences. We trained three machine learning models using (i) traditional model ensembles (Gradient Boosting), (ii) hybrid convolutional/long and short memory network models, and (iii) a DNA pre-trained transformer-based model using k-mer sequence representation. All three approaches were successful in classifying and isolating LTRs in this data, as well as providing valuable insights into LTR sequence composition. The best classification (expressed as F1 score) achieved for LTR detection was 0.85 using the hybrid network model. The most accurate classification task was superfamily classification (F1=0.89) while the least accurate was family classification (F1=0.74). The trained models were subjected to explainability analysis. Positional analysis identified a mixture of interesting features, many of which had a preferred absolute position within the LTR and/or were biologically relevant, such as a centrally positioned TATA-box regulatory sequence, and TG..CA nucleotide patterns around both LTR edges.
Conclusions: Our results show that the models used here recognized biologically relevant motifs, such as core promoter elements in the LTR detection task, and a development and stress-related subclass of transcription factor binding sites in the family classification task. Explainability analysis also highlighted the importance of 5'- and 3'- edges in LTR identity and revealed need to analyze more than just dinucleotides at these ends. Our work shows the applicability of machine learning models to regulatory sequence analysis and classification, and demonstrates the important role of the identified motifs in LTR detection.
{"title":"Detection and classification of long terminal repeat sequences in plant LTR-retrotransposons and their analysis using explainable machine learning.","authors":"Jakub Horvath, Pavel Jedlicka, Marie Kratka, Zdenek Kubat, Eduard Kejnovsky, Matej Lexa","doi":"10.1186/s13040-024-00410-z","DOIUrl":"10.1186/s13040-024-00410-z","url":null,"abstract":"<p><strong>Background: </strong>Long terminal repeats (LTRs) represent important parts of LTR retrotransposons and retroviruses found in high copy numbers in a majority of eukaryotic genomes. LTRs contain regulatory sequences essential for the life cycle of the retrotransposon. Previous experimental and sequence studies have provided only limited information about LTR structure and composition, mostly from model systems. To enhance our understanding of these key sequence modules, we focused on the contrasts between LTRs of various retrotransposon families and other genomic regions. Furthermore, this approach can be utilized for the classification and prediction of LTRs.</p><p><strong>Results: </strong>We used machine learning methods suitable for DNA sequence classification and applied them to a large dataset of plant LTR retrotransposon sequences. We trained three machine learning models using (i) traditional model ensembles (Gradient Boosting), (ii) hybrid convolutional/long and short memory network models, and (iii) a DNA pre-trained transformer-based model using k-mer sequence representation. All three approaches were successful in classifying and isolating LTRs in this data, as well as providing valuable insights into LTR sequence composition. The best classification (expressed as F1 score) achieved for LTR detection was 0.85 using the hybrid network model. The most accurate classification task was superfamily classification (F1=0.89) while the least accurate was family classification (F1=0.74). The trained models were subjected to explainability analysis. Positional analysis identified a mixture of interesting features, many of which had a preferred absolute position within the LTR and/or were biologically relevant, such as a centrally positioned TATA-box regulatory sequence, and TG..CA nucleotide patterns around both LTR edges.</p><p><strong>Conclusions: </strong>Our results show that the models used here recognized biologically relevant motifs, such as core promoter elements in the LTR detection task, and a development and stress-related subclass of transcription factor binding sites in the family classification task. Explainability analysis also highlighted the importance of 5'- and 3'- edges in LTR identity and revealed need to analyze more than just dinucleotides at these ends. Our work shows the applicability of machine learning models to regulatory sequence analysis and classification, and demonstrates the important role of the identified motifs in LTR detection.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"57"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11656987/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142856213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-06DOI: 10.1186/s13040-024-00409-6
Zhaoming Kong, Rong Zhou, Xinwei Luo, Songlin Zhao, Ann B Ragin, Alex D Leow, Lifang He
Multimodal brain network analysis enables a comprehensive understanding of neurological disorders by integrating information from multiple neuroimaging modalities. However, existing methods often struggle to effectively model the complex structures of multimodal brain networks. In this paper, we propose a novel tensor-based graph convolutional network (TGNet) framework that combines tensor decomposition with multi-layer GCNs to capture both the homogeneity and intricate graph structures of multimodal brain networks. We evaluate TGNet on four datasets-HIV, Bipolar Disorder (BP), and Parkinson's Disease (PPMI), Alzheimer's Disease (ADNI)-demonstrating that it significantly outperforms existing methods for disease classification tasks, particularly in scenarios with limited sample sizes. The robustness and effectiveness of TGNet highlight its potential for advancing multimodal brain network analysis. The code is available at https://github.com/rongzhou7/TGNet .
{"title":"TGNet: tensor-based graph convolutional networks for multimodal brain network analysis.","authors":"Zhaoming Kong, Rong Zhou, Xinwei Luo, Songlin Zhao, Ann B Ragin, Alex D Leow, Lifang He","doi":"10.1186/s13040-024-00409-6","DOIUrl":"10.1186/s13040-024-00409-6","url":null,"abstract":"<p><p>Multimodal brain network analysis enables a comprehensive understanding of neurological disorders by integrating information from multiple neuroimaging modalities. However, existing methods often struggle to effectively model the complex structures of multimodal brain networks. In this paper, we propose a novel tensor-based graph convolutional network (TGNet) framework that combines tensor decomposition with multi-layer GCNs to capture both the homogeneity and intricate graph structures of multimodal brain networks. We evaluate TGNet on four datasets-HIV, Bipolar Disorder (BP), and Parkinson's Disease (PPMI), Alzheimer's Disease (ADNI)-demonstrating that it significantly outperforms existing methods for disease classification tasks, particularly in scenarios with limited sample sizes. The robustness and effectiveness of TGNet highlight its potential for advancing multimodal brain network analysis. The code is available at https://github.com/rongzhou7/TGNet .</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"55"},"PeriodicalIF":4.0,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11622555/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142787246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-02DOI: 10.1186/s13040-024-00399-5
Richa Gupta, Mansi Bhandari, Anhad Grover, Taher Al-Shehari, Mohammed Kadrie, Taha Alfakih, Hussain Alsalman
This research presents a predictive model aimed at estimating the progression of Amyotrophic Lateral Sclerosis (ALS) based on clinical features collected from a dataset of 50 patients. Important features included evaluations of speech, mobility, and respiratory function. We utilized an XGBoost regression model to forecast scores on the ALS Functional Rating Scale (ALSFRS-R), achieving a training mean squared error (MSE) of 0.1651 and a testing MSE of 0.0073, with R² values of 0.9800 for training and 0.9993 for testing. The model demonstrates high accuracy, providing a useful tool for clinicians to track disease progression and enhance patient management and treatment strategies.
{"title":"Predictive modeling of ALS progression: an XGBoost approach using clinical features.","authors":"Richa Gupta, Mansi Bhandari, Anhad Grover, Taher Al-Shehari, Mohammed Kadrie, Taha Alfakih, Hussain Alsalman","doi":"10.1186/s13040-024-00399-5","DOIUrl":"10.1186/s13040-024-00399-5","url":null,"abstract":"<p><p>This research presents a predictive model aimed at estimating the progression of Amyotrophic Lateral Sclerosis (ALS) based on clinical features collected from a dataset of 50 patients. Important features included evaluations of speech, mobility, and respiratory function. We utilized an XGBoost regression model to forecast scores on the ALS Functional Rating Scale (ALSFRS-R), achieving a training mean squared error (MSE) of 0.1651 and a testing MSE of 0.0073, with R² values of 0.9800 for training and 0.9993 for testing. The model demonstrates high accuracy, providing a useful tool for clinicians to track disease progression and enhance patient management and treatment strategies.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"54"},"PeriodicalIF":4.0,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11610297/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142774029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Timely identification of deteriorating patients is crucial to prevent the progression to cardiac arrest. However, current methods predicting emergency department cardiac arrest are primarily static, rule-based with limited precision and cannot accommodate time-series data. Deep learning has the potential to continuously update data and provide more precise predictions throughout the emergency department stay.
Methods: We developed and internally validated a deep learning-based scoring system, the Deep EDICAS for early prediction of cardiac arrest and a subset of arrest, cardiopulmonary resuscitation (CPR), in the emergency department. Our proposed model effectively integrates tabular and time series data to enhance predictive accuracy. To address data imbalance and bolster early prediction capabilities, we implemented data augmentation techniques.
Results: Our system achieved an AUPRC of 0.5178 and an AUROC of 0.9388 on on data from the National Taiwan University Hospital. For early prediction, our system achieved an AUPRC of 0.2798 and an AUROC of 0.9046, demonstrating superiority over other early warning scores. Moerover, Deep EDICAS offers interpretability through feature importance analysis.
Conclusion: Our study demonstrates the effectiveness of deep learning in predicting cardiac arrest in emergency department. Despite the higher clinical value associated with detecting patients requiring CPR, there is a scarcity of literature utilizing deep learning in CPR detection tasks. Therefore, this study embarks on an initial exploration into the task of CPR detection.
{"title":"Deep learning-based Emergency Department In-hospital Cardiac Arrest Score (Deep EDICAS) for early prediction of cardiac arrest and cardiopulmonary resuscitation in the emergency department.","authors":"Yuan-Xiang Deng, Jyun-Yi Wang, Chia-Hsin Ko, Chien-Hua Huang, Chu-Lin Tsai, Li-Chen Fu","doi":"10.1186/s13040-024-00407-8","DOIUrl":"10.1186/s13040-024-00407-8","url":null,"abstract":"<p><strong>Background: </strong>Timely identification of deteriorating patients is crucial to prevent the progression to cardiac arrest. However, current methods predicting emergency department cardiac arrest are primarily static, rule-based with limited precision and cannot accommodate time-series data. Deep learning has the potential to continuously update data and provide more precise predictions throughout the emergency department stay.</p><p><strong>Methods: </strong>We developed and internally validated a deep learning-based scoring system, the Deep EDICAS for early prediction of cardiac arrest and a subset of arrest, cardiopulmonary resuscitation (CPR), in the emergency department. Our proposed model effectively integrates tabular and time series data to enhance predictive accuracy. To address data imbalance and bolster early prediction capabilities, we implemented data augmentation techniques.</p><p><strong>Results: </strong>Our system achieved an AUPRC of 0.5178 and an AUROC of 0.9388 on on data from the National Taiwan University Hospital. For early prediction, our system achieved an AUPRC of 0.2798 and an AUROC of 0.9046, demonstrating superiority over other early warning scores. Moerover, Deep EDICAS offers interpretability through feature importance analysis.</p><p><strong>Conclusion: </strong>Our study demonstrates the effectiveness of deep learning in predicting cardiac arrest in emergency department. Despite the higher clinical value associated with detecting patients requiring CPR, there is a scarcity of literature utilizing deep learning in CPR detection tasks. Therefore, this study embarks on an initial exploration into the task of CPR detection.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"52"},"PeriodicalIF":4.0,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11585162/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142695993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-23DOI: 10.1186/s13040-024-00406-9
Mitja Briscik, Gabriele Tazza, László Vidács, Marie-Agnès Dillies, Sébastien Déjean
Background: Advances in high-throughput technologies have originated an ever-increasing availability of omics datasets. The integration of multiple heterogeneous data sources is currently an issue for biology and bioinformatics. Multiple kernel learning (MKL) has shown to be a flexible and valid approach to consider the diverse nature of multi-omics inputs, despite being an underused tool in genomic data mining.
Results: We provide novel MKL approaches based on different kernel fusion strategies. To learn from the meta-kernel of input kernels, we adapted unsupervised integration algorithms for supervised tasks with support vector machines. We also tested deep learning architectures for kernel fusion and classification. The results show that MKL-based models can outperform more complex, state-of-the-art, supervised multi-omics integrative approaches.
Conclusion: Multiple kernel learning offers a natural framework for predictive models in multi-omics data. It proved to provide a fast and reliable solution that can compete with and outperform more complex architectures. Our results offer a direction for bio-data mining research, biomarker discovery and further development of methods for heterogeneous data integration.
{"title":"Supervised multiple kernel learning approaches for multi-omics data integration.","authors":"Mitja Briscik, Gabriele Tazza, László Vidács, Marie-Agnès Dillies, Sébastien Déjean","doi":"10.1186/s13040-024-00406-9","DOIUrl":"10.1186/s13040-024-00406-9","url":null,"abstract":"<p><strong>Background: </strong>Advances in high-throughput technologies have originated an ever-increasing availability of omics datasets. The integration of multiple heterogeneous data sources is currently an issue for biology and bioinformatics. Multiple kernel learning (MKL) has shown to be a flexible and valid approach to consider the diverse nature of multi-omics inputs, despite being an underused tool in genomic data mining.</p><p><strong>Results: </strong>We provide novel MKL approaches based on different kernel fusion strategies. To learn from the meta-kernel of input kernels, we adapted unsupervised integration algorithms for supervised tasks with support vector machines. We also tested deep learning architectures for kernel fusion and classification. The results show that MKL-based models can outperform more complex, state-of-the-art, supervised multi-omics integrative approaches.</p><p><strong>Conclusion: </strong>Multiple kernel learning offers a natural framework for predictive models in multi-omics data. It proved to provide a fast and reliable solution that can compete with and outperform more complex architectures. Our results offer a direction for bio-data mining research, biomarker discovery and further development of methods for heterogeneous data integration.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"53"},"PeriodicalIF":4.0,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11585117/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142695995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-14DOI: 10.1186/s13040-024-00404-x
Yang Qixin, Huang Jing, He Jiang, Liu Xueyang, Yu Lu, Li Yuehua
Background: Regulatory T cells (Tregs) play a critical role in shaping the immunosuppressive microenvironment within tumors. Investigating the role of Tregs in Clear cell renal cell carcinoma (ccRCC) is crucial for identifying prognostic markers and therapeutic targets for ccRCC.
Methods: Weighted gene co-expression network analysis (WGCNA) was utilized to pinpoint modules related to Treg infiltration in TCGA-KIRC samples. Following this, consensus clustering was employed to derive two clusters associated with Treg infiltration in ccRCC. A prognostic model was then developed using the gene module associated with Treg infiltration. We then evaluated the ability of the prognostic model to predict ccRCC overall survival and demonstrated that RCN1 can be used as a target to predict ccRCC prognosis.
Results: We deduce that the two clusters associated with Treg infiltration exhibit distinct compositions of the immune microenvironment, pathway activations, prognosis, and drug sensitivities commonly utilized in ccRCC treatment. Furthermore, a 7-gene model risk score, developed based on ccRCC Treg infiltration, proved to be a reliable prognostic marker in both training and validation cohorts. Additionally, survival analysis indicated that RCN1 serves as a reliable prognostic factor for ccRCC. Single-cell sequencing analysis revealed that RCN1 is predominantly expressed in tumor cells. A pan-cancer analysis highlighted that RCN1 is linked with poor prognosis and the activation of inflammatory response pathways across various cancers.
Conclusion: We developed a prognostic model associated with Treg infiltration, which facilitates the clinical categorization of ccRCC progression. Moreover, our findings underscore the significant potential of RCN1 as a ccRCC biomarker.
背景:调节性 T 细胞(Tregs调节性 T 细胞(Tregs)在形成肿瘤内免疫抑制微环境方面发挥着关键作用。研究Tregs在透明细胞肾细胞癌(ccRCC)中的作用对于确定ccRCC的预后标志物和治疗靶点至关重要:方法:利用加权基因共表达网络分析(WGCNA)确定TCGA-KIRC样本中与Treg浸润相关的模块。方法:利用加权基因共表达网络分析(WGCNA)确定了TCGA-KIRC样本中与Treg浸润相关的模块,然后利用共识聚类得出了两个与ccRCC中Treg浸润相关的聚类。然后利用与 Treg 浸润相关的基因模块建立了一个预后模型。然后,我们评估了该预后模型预测ccRCC总生存期的能力,并证明RCN1可作为预测ccRCC预后的靶点:结果:我们推断出,与Treg浸润相关的两个群组在免疫微环境、通路激活、预后和ccRCC治疗中常用的药物敏感性方面表现出不同的构成。此外,根据 ccRCC Treg 浸润情况开发的 7 基因模型风险评分在训练组和验证组中都被证明是可靠的预后标志物。此外,生存分析表明,RCN1是ccRCC的可靠预后因素。单细胞测序分析表明,RCN1 主要在肿瘤细胞中表达。一项泛癌症分析强调,RCN1与预后不良以及各种癌症的炎症反应通路激活有关:我们建立了一个与Treg浸润相关的预后模型,这有助于对ccRCC的进展进行临床分类。此外,我们的研究结果还强调了RCN1作为ccRCC生物标志物的巨大潜力。
{"title":"Transcriptome-based network analysis related to regulatory T cells infiltration identified RCN1 as a potential biomarker for prognosis in clear cell renal cell carcinoma.","authors":"Yang Qixin, Huang Jing, He Jiang, Liu Xueyang, Yu Lu, Li Yuehua","doi":"10.1186/s13040-024-00404-x","DOIUrl":"10.1186/s13040-024-00404-x","url":null,"abstract":"<p><strong>Background: </strong>Regulatory T cells (Tregs) play a critical role in shaping the immunosuppressive microenvironment within tumors. Investigating the role of Tregs in Clear cell renal cell carcinoma (ccRCC) is crucial for identifying prognostic markers and therapeutic targets for ccRCC.</p><p><strong>Methods: </strong>Weighted gene co-expression network analysis (WGCNA) was utilized to pinpoint modules related to Treg infiltration in TCGA-KIRC samples. Following this, consensus clustering was employed to derive two clusters associated with Treg infiltration in ccRCC. A prognostic model was then developed using the gene module associated with Treg infiltration. We then evaluated the ability of the prognostic model to predict ccRCC overall survival and demonstrated that RCN1 can be used as a target to predict ccRCC prognosis.</p><p><strong>Results: </strong>We deduce that the two clusters associated with Treg infiltration exhibit distinct compositions of the immune microenvironment, pathway activations, prognosis, and drug sensitivities commonly utilized in ccRCC treatment. Furthermore, a 7-gene model risk score, developed based on ccRCC Treg infiltration, proved to be a reliable prognostic marker in both training and validation cohorts. Additionally, survival analysis indicated that RCN1 serves as a reliable prognostic factor for ccRCC. Single-cell sequencing analysis revealed that RCN1 is predominantly expressed in tumor cells. A pan-cancer analysis highlighted that RCN1 is linked with poor prognosis and the activation of inflammatory response pathways across various cancers.</p><p><strong>Conclusion: </strong>We developed a prognostic model associated with Treg infiltration, which facilitates the clinical categorization of ccRCC progression. Moreover, our findings underscore the significant potential of RCN1 as a ccRCC biomarker.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"51"},"PeriodicalIF":4.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11566375/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-13DOI: 10.1186/s13040-024-00400-1
Pradeep Varathan Pugalenthi, Bing He, Linhui Xie, Kwangsik Nho, Andrew J Saykin, Jingwen Yan
Alzheimer's disease (AD) is a highly heritable brain dementia, along with substantial failure of cognitive function. Large-scale genome-wide association studies (GWASs) have led to a set of SNPs significantly associated with AD and related traits. GWAS hits usually emerge as clusters where a lead SNP with the highest significance is surrounded by other less significant neighboring SNPs. Although functionality is not guaranteed even with the strongest associations in GWASs, lead SNPs have historically been the focus of the field, with the remaining associations inferred to be redundant. Recent deep genome annotation tools enable the prediction of function from a segment of a DNA sequence with significantly improved precision, which allows in-silico mutagenesis to interrogate the functional effect of SNP alleles. In this project, we explored the impact of top AD GWAS hits around APOE region on chromatin functions and whether it will be altered by the genetic context (i.e., alleles of neighboring SNPs). Our results showed that highly correlated SNPs in the same LD block could have distinct impacts on downstream functions. Although some GWAS lead SNPs showed dominant functional effects regardless of the neighborhood SNP alleles, several other SNPs did exhibit enhanced loss or gain of function under certain genetic contexts, suggesting potential additional information hidden in the LD blocks.
阿尔茨海默病(AD)是一种高度遗传性脑痴呆症,同时伴有认知功能的严重衰竭。大规模的全基因组关联研究(GWAS)发现了一系列与阿尔茨海默病及相关特征有显著关联的 SNPs。全基因组关联研究的结果通常会以群集的形式出现,在这些群集中,一个最重要的 SNP 被其他重要性较低的邻近 SNP 所包围。尽管在 GWAS 中,即使是关联性最强的 SNP 也不能保证其功能性,但主导 SNP 一直是该领域的研究重点,而其余的关联则被推断为多余的。最近的深度基因组注释工具可以从DNA序列的一个片段预测功能,其精确度大大提高,从而可以通过体内诱变来研究SNP等位基因的功能效应。在本项目中,我们探讨了APOE区域周围的顶级AD GWAS命中基因对染色质功能的影响,以及这种影响是否会因遗传背景(即相邻SNP的等位基因)而改变。我们的研究结果表明,在同一LD区块中高度相关的SNPs可能会对下游功能产生不同的影响。尽管一些 GWAS 引导 SNPs 显示出了显性功能效应,与邻近 SNP 等位基因无关,但其他几个 SNPs 在某些遗传背景下确实表现出了增强的功能丧失或增益,这表明 LD 区块中隐藏着潜在的额外信息。
{"title":"Deciphering the tissue-specific functional effect of Alzheimer risk SNPs with deep genome annotation.","authors":"Pradeep Varathan Pugalenthi, Bing He, Linhui Xie, Kwangsik Nho, Andrew J Saykin, Jingwen Yan","doi":"10.1186/s13040-024-00400-1","DOIUrl":"10.1186/s13040-024-00400-1","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is a highly heritable brain dementia, along with substantial failure of cognitive function. Large-scale genome-wide association studies (GWASs) have led to a set of SNPs significantly associated with AD and related traits. GWAS hits usually emerge as clusters where a lead SNP with the highest significance is surrounded by other less significant neighboring SNPs. Although functionality is not guaranteed even with the strongest associations in GWASs, lead SNPs have historically been the focus of the field, with the remaining associations inferred to be redundant. Recent deep genome annotation tools enable the prediction of function from a segment of a DNA sequence with significantly improved precision, which allows in-silico mutagenesis to interrogate the functional effect of SNP alleles. In this project, we explored the impact of top AD GWAS hits around APOE region on chromatin functions and whether it will be altered by the genetic context (i.e., alleles of neighboring SNPs). Our results showed that highly correlated SNPs in the same LD block could have distinct impacts on downstream functions. Although some GWAS lead SNPs showed dominant functional effects regardless of the neighborhood SNP alleles, several other SNPs did exhibit enhanced loss or gain of function under certain genetic contexts, suggesting potential additional information hidden in the LD blocks.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"50"},"PeriodicalIF":4.0,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11558841/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}