首页 > 最新文献

Biodata Mining最新文献

英文 中文
Exploring glioma heterogeneity through omics networks: from gene network discovery to causal insights and patient stratification. 通过组学网络探索胶质瘤异质性:从基因网络发现到因果洞察和患者分层。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-18 DOI: 10.1186/s13040-024-00411-y
Nina Kastendiek, Roberta Coletti, Thilo Gross, Marta B Lopes

Gliomas are primary malignant brain tumors with a typically poor prognosis, exhibiting significant heterogeneity across different cancer types. Each glioma type possesses distinct molecular characteristics determining patient prognosis and therapeutic options. This study aims to explore the molecular complexity of gliomas at the transcriptome level, employing a comprehensive approach grounded in network discovery. The graphical lasso method was used to estimate a gene co-expression network for each glioma type from a transcriptomics dataset. Causality was subsequently inferred from correlation networks by estimating the Jacobian matrix. The networks were then analyzed for gene importance using centrality measures and modularity detection, leading to the selection of genes that might play an important role in the disease. To explore the pathways and biological functions these genes are involved in, KEGG and Gene Ontology (GO) enrichment analyses on the disclosed gene sets were performed, highlighting the significance of the genes selected across several relevent pathways and GO terms. Spectral clustering based on patient similarity networks was applied to stratify patients into groups with similar molecular characteristics and to assess whether the resulting clusters align with the diagnosed glioma type. The results presented highlight the ability of the proposed methodology to uncover relevant genes associated with glioma intertumoral heterogeneity. Further investigation might encompass biological validation of the putative biomarkers disclosed.

胶质瘤是原发性恶性脑肿瘤,通常预后较差,在不同类型的肿瘤中表现出显著的异质性。每种胶质瘤类型具有不同的分子特征,决定了患者的预后和治疗选择。本研究旨在利用基于网络发现的综合方法,在转录组水平上探索胶质瘤的分子复杂性。使用图形套索方法从转录组学数据集估计每种胶质瘤类型的基因共表达网络。因果关系随后通过估计雅可比矩阵从相关网络推断出来。然后使用中心性测量和模块化检测来分析这些网络的基因重要性,从而选择可能在疾病中发挥重要作用的基因。为了探索这些基因参与的途径和生物学功能,对公开的基因集进行了KEGG和基因本体(GO)富集分析,突出了在几个相关途径和GO术语中选择的基因的重要性。基于患者相似网络的光谱聚类应用于将患者分层为具有相似分子特征的组,并评估结果聚类是否与诊断的胶质瘤类型一致。提出的结果强调了提出的方法揭示与胶质瘤肿瘤间异质性相关的相关基因的能力。进一步的研究可能包括对所披露的假定生物标志物的生物学验证。
{"title":"Exploring glioma heterogeneity through omics networks: from gene network discovery to causal insights and patient stratification.","authors":"Nina Kastendiek, Roberta Coletti, Thilo Gross, Marta B Lopes","doi":"10.1186/s13040-024-00411-y","DOIUrl":"10.1186/s13040-024-00411-y","url":null,"abstract":"<p><p>Gliomas are primary malignant brain tumors with a typically poor prognosis, exhibiting significant heterogeneity across different cancer types. Each glioma type possesses distinct molecular characteristics determining patient prognosis and therapeutic options. This study aims to explore the molecular complexity of gliomas at the transcriptome level, employing a comprehensive approach grounded in network discovery. The graphical lasso method was used to estimate a gene co-expression network for each glioma type from a transcriptomics dataset. Causality was subsequently inferred from correlation networks by estimating the Jacobian matrix. The networks were then analyzed for gene importance using centrality measures and modularity detection, leading to the selection of genes that might play an important role in the disease. To explore the pathways and biological functions these genes are involved in, KEGG and Gene Ontology (GO) enrichment analyses on the disclosed gene sets were performed, highlighting the significance of the genes selected across several relevent pathways and GO terms. Spectral clustering based on patient similarity networks was applied to stratify patients into groups with similar molecular characteristics and to assess whether the resulting clusters align with the diagnosed glioma type. The results presented highlight the ability of the proposed methodology to uncover relevant genes associated with glioma intertumoral heterogeneity. Further investigation might encompass biological validation of the putative biomarkers disclosed.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"56"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11657291/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142856223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prognostic feature based on androgen-responsive genes in bladder cancer and screening for potential targeted drugs. 基于雄激素反应基因的膀胱癌预后特征及潜在靶向药物筛选。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-18 DOI: 10.1186/s13040-024-00377-x
Jiang Zhao, Qian Zhang, Cunle Zhu, Wu Yuqi, Guohui Zhang, Qianliang Wang, Xingyou Dong, Benyi Li, Xiangwei Wang

Objectives: Bladder cancer (BLCA) is a tumor that affects men more than women. The biological function and prognostic value of androgen-responsive genes (ARGs) in BLCA are currently unknown. To address this, we established an androgen signature to determine the prognosis of BLCA.

Methods: Sequencing data for BLCA from the TCGA and GEO datasets were used for research. The tumor microenvironment (TME) was measured using Cibersort and ssGSEA. Prognosis-related genes were identified and a risk score model was constructed using univariate Cox regression, LASSO regression, and multivariate Cox regression. Drug sensitivity analysis was performed using Genomics of drug sensitivity in cancer (GDSC). Real-time quantitative PCR was performed to assess the expression of representative genes in clinical samples.

Results: ARGs (especially the CDK6, FADS1, PGM3, SCD, PTK2B, and TPD52) might regulate the progression of BLCA. The different expression patterns of ARGs may lead to different immune cell infiltration. The risk model indicates that patients with higher risk scores have a poorer prognosis, more stromal infiltration, and an enrichment of biological functions. Single-cell RNA analysis, bulk RNA data, and PCR analysis support the reliability of this risk model, and a nomogram was also established for clinical use. Drug prediction analysis showed that high-risk patients had a better response to fludarabine, AZD8186, and carmustine.

Conclusion: ARGs played an important role in the progression, immune infiltration, and prognosis of BLCA. The ARGs model has high accuracy in predicting the prognosis of BLCA patients and provides more effective medication guidelines.

目的:膀胱癌(BLCA)是一种男性多于女性的肿瘤。雄激素反应基因(ARGs)在BLCA中的生物学功能和预后价值目前尚不清楚。为了解决这个问题,我们建立了雄激素标记来确定BLCA的预后。方法:利用TCGA和GEO数据集的BLCA测序数据进行研究。采用Cibersort和ssGSEA检测肿瘤微环境(TME)。鉴定预后相关基因,并采用单因素Cox回归、LASSO回归和多因素Cox回归构建风险评分模型。采用肿瘤药物敏感性基因组学(GDSC)进行药物敏感性分析。采用实时荧光定量PCR检测临床样品中代表性基因的表达情况。结果:ARGs(特别是CDK6、FADS1、PGM3、SCD、PTK2B和TPD52)可能调节BLCA的进展。不同的ARGs表达模式可能导致不同的免疫细胞浸润。风险模型提示,风险评分越高的患者预后越差,间质浸润越多,生物功能越丰富。单细胞RNA分析、大量RNA数据和PCR分析支持该风险模型的可靠性,并建立了用于临床的nomogram。药物预测分析显示,高危患者对氟达拉滨、AZD8186和卡莫司汀的反应较好。结论:ARGs在BLCA的进展、免疫浸润及预后中起重要作用。ARGs模型在预测BLCA患者预后方面具有较高的准确性,为BLCA患者提供更有效的用药指导。
{"title":"Prognostic feature based on androgen-responsive genes in bladder cancer and screening for potential targeted drugs.","authors":"Jiang Zhao, Qian Zhang, Cunle Zhu, Wu Yuqi, Guohui Zhang, Qianliang Wang, Xingyou Dong, Benyi Li, Xiangwei Wang","doi":"10.1186/s13040-024-00377-x","DOIUrl":"10.1186/s13040-024-00377-x","url":null,"abstract":"<p><strong>Objectives: </strong>Bladder cancer (BLCA) is a tumor that affects men more than women. The biological function and prognostic value of androgen-responsive genes (ARGs) in BLCA are currently unknown. To address this, we established an androgen signature to determine the prognosis of BLCA.</p><p><strong>Methods: </strong>Sequencing data for BLCA from the TCGA and GEO datasets were used for research. The tumor microenvironment (TME) was measured using Cibersort and ssGSEA. Prognosis-related genes were identified and a risk score model was constructed using univariate Cox regression, LASSO regression, and multivariate Cox regression. Drug sensitivity analysis was performed using Genomics of drug sensitivity in cancer (GDSC). Real-time quantitative PCR was performed to assess the expression of representative genes in clinical samples.</p><p><strong>Results: </strong>ARGs (especially the CDK6, FADS1, PGM3, SCD, PTK2B, and TPD52) might regulate the progression of BLCA. The different expression patterns of ARGs may lead to different immune cell infiltration. The risk model indicates that patients with higher risk scores have a poorer prognosis, more stromal infiltration, and an enrichment of biological functions. Single-cell RNA analysis, bulk RNA data, and PCR analysis support the reliability of this risk model, and a nomogram was also established for clinical use. Drug prediction analysis showed that high-risk patients had a better response to fludarabine, AZD8186, and carmustine.</p><p><strong>Conclusion: </strong>ARGs played an important role in the progression, immune infiltration, and prognosis of BLCA. The ARGs model has high accuracy in predicting the prognosis of BLCA patients and provides more effective medication guidelines.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"59"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11657289/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142856224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparing new tools of artificial intelligence to the authentic intelligence of our global health students. 将人工智能的新工具与全球健康学生的真实智能进行比较。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-18 DOI: 10.1186/s13040-024-00408-7
Shilpa R Thandla, Grace Q Armstrong, Adil Menon, Aashna Shah, David L Gueye, Clara Harb, Estefania Hernandez, Yasaswini Iyer, Abigail R Hotchner, Riddhi Modi, Anusha Mudigonda, Maria A Prokos, Tharun M Rao, Olivia R Thomas, Camilo A Beltran, Taylor Guerrieri, Sydney LeBlanc, Skanda Moorthy, Sara G Yacoub, Jacob E Gardner, Benjamin M Greenberg, Alyssa Hubal, Yuliana P Lapina, Jacqueline Moran, Joseph P O'Brien, Anna C Winnicki, Christina Yoka, Junwei Zhang, Peter A Zimmerman
<p><strong>Introduction: </strong>The transformative feature of Artificial Intelligence (AI) is the massive capacity for interpreting and transforming unstructured data into a coherent and meaningful context. In general, the potential that AI will alter traditional approaches to student research and its evaluation appears to be significant. With regard to research in global health, it is important for students and research experts to assess strengths and limitations of GenAI within this space. Thus, the goal of our research was to evaluate the information literacy of GenAI compared to expectations that graduate students meet in writing research papers.</p><p><strong>Methods: </strong>After completing the course, Fundamentals of Global Health (INTH 401) at Case Western Reserve University (CWRU), Graduate students who successfully completed their required research paper were recruited to compare their original papers with a paper they generated by ChatGPT-4o using the original assignment prompt. Students also completed a Google Forms survey to evaluate different sections of the AI-generated paper (e.g., Adherence to Introduction guidelines, Presentation of three perspectives, Conclusion) and their original papers and their overall satisfaction with the AI work. The original student to ChatGPT-4o comparison also enabled evaluation of narrative elements and references.</p><p><strong>Results: </strong>Of the 54 students who completed the required research paper, 28 (51.8%) agreed to collaborate in the comparison project. A summary of the survey responses suggested that students evaluated the AI-generated paper as inferior or similar to their own paper (overall satisfaction average = 2.39 (1.61-3.17); Likert scale: 1 to 5 with lower scores indicating inferiority). Evaluating the average individual student responses for 5 Likert item queries showed that 17 scores were < 2.9; 7 scores were between 3.0 to 3.9; 4 scores were ≥ 4.0, consistent with inferiority of the AI-generated paper. Evaluation of reference selection by ChatGPT-4o (n = 729 total references) showed that 54% (n = 396) were authentic, 46% (n = 333) did not exist. Of the authentic references, 26.5% (105/396) were relevant to the paper narrative; 14.4% of the 729 total references.</p><p><strong>Discussion: </strong>Our findings reveal strengths and limitations on the potential of AI tools to assist in understanding the complexities of global health topics. Strengths mentioned by students included the ability of ChatGPT-4o to produce content very quickly and to suggest topics that they had not considered in the 3-perspective sections of their papers. Consistently presenting up-to-date facts and references, as well as further examining or summarizing the complexities of global health topics, appears to be a current limitation of ChatGPT-4o. Because ChatGPT-4o generated references from highly credible biomedical research journals that did not exist, our findings conclude that ChatGPT-4o failed a
人工智能(AI)的变革性特征是将非结构化数据解释和转换为连贯且有意义的上下文的巨大能力。总的来说,人工智能将改变学生研究及其评估的传统方法的潜力似乎是巨大的。在全球卫生研究方面,学生和研究专家必须评估基因人工智能在这一领域的优势和局限性。因此,我们研究的目的是将GenAI的信息素养与研究生在撰写研究论文时所达到的期望进行比较。方法:在完成凯斯西储大学(CWRU)的全球健康基础(INTH 401)课程后,招募成功完成要求的研究论文的研究生,使用原始作业提示将其原始论文与chatgpt - 40生成的论文进行比较。学生们还完成了一份谷歌表格调查,以评估人工智能生成论文的不同部分(例如,对引言指南的遵守程度、三个观点的呈现、结论)和他们的原始论文,以及他们对人工智能工作的总体满意度。原学生对chatgpt - 40的比较也使评价叙述元素和参考文献成为可能。结果:在完成要求的研究论文的54名学生中,有28名(51.8%)同意在比较项目中合作。对调查反馈的总结表明,学生对人工智能生成的论文的评价不如或类似于他们自己的论文(总体满意度平均= 2.39 (1.61-3.17);李克特量表:1至5分,分数越低表示自卑)。评估5个李克特项目查询的平均个人学生回答显示,17个得分为讨论:我们的研究结果揭示了人工智能工具在帮助理解全球健康主题复杂性方面的优势和局限性。学生们提到的优势包括chatgpt - 40能够非常快速地生成内容,并提出他们在论文的三视角部分中没有考虑到的主题。始终如一地提供最新的事实和参考资料,以及进一步审查或总结全球卫生主题的复杂性,似乎是chatgpt - 40目前的局限性。由于chatgpt - 40从不存在的高度可信的生物医学研究期刊中生成参考文献,我们的研究结果得出结论,chatgpt - 40在有效利用信息方面失败了一个重要组成部分。此外,对可信赖的公共卫生信息来源的歪曲令人高度关切,特别是考虑到最近的COVID-19大流行以及最近在报告自然灾害影响和应对方面的经验。这是GenAI满足研究生所期望的信息素养标准的能力的一个重大限制。
{"title":"Comparing new tools of artificial intelligence to the authentic intelligence of our global health students.","authors":"Shilpa R Thandla, Grace Q Armstrong, Adil Menon, Aashna Shah, David L Gueye, Clara Harb, Estefania Hernandez, Yasaswini Iyer, Abigail R Hotchner, Riddhi Modi, Anusha Mudigonda, Maria A Prokos, Tharun M Rao, Olivia R Thomas, Camilo A Beltran, Taylor Guerrieri, Sydney LeBlanc, Skanda Moorthy, Sara G Yacoub, Jacob E Gardner, Benjamin M Greenberg, Alyssa Hubal, Yuliana P Lapina, Jacqueline Moran, Joseph P O'Brien, Anna C Winnicki, Christina Yoka, Junwei Zhang, Peter A Zimmerman","doi":"10.1186/s13040-024-00408-7","DOIUrl":"10.1186/s13040-024-00408-7","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Introduction: &lt;/strong&gt;The transformative feature of Artificial Intelligence (AI) is the massive capacity for interpreting and transforming unstructured data into a coherent and meaningful context. In general, the potential that AI will alter traditional approaches to student research and its evaluation appears to be significant. With regard to research in global health, it is important for students and research experts to assess strengths and limitations of GenAI within this space. Thus, the goal of our research was to evaluate the information literacy of GenAI compared to expectations that graduate students meet in writing research papers.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;After completing the course, Fundamentals of Global Health (INTH 401) at Case Western Reserve University (CWRU), Graduate students who successfully completed their required research paper were recruited to compare their original papers with a paper they generated by ChatGPT-4o using the original assignment prompt. Students also completed a Google Forms survey to evaluate different sections of the AI-generated paper (e.g., Adherence to Introduction guidelines, Presentation of three perspectives, Conclusion) and their original papers and their overall satisfaction with the AI work. The original student to ChatGPT-4o comparison also enabled evaluation of narrative elements and references.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;Of the 54 students who completed the required research paper, 28 (51.8%) agreed to collaborate in the comparison project. A summary of the survey responses suggested that students evaluated the AI-generated paper as inferior or similar to their own paper (overall satisfaction average = 2.39 (1.61-3.17); Likert scale: 1 to 5 with lower scores indicating inferiority). Evaluating the average individual student responses for 5 Likert item queries showed that 17 scores were &lt; 2.9; 7 scores were between 3.0 to 3.9; 4 scores were ≥ 4.0, consistent with inferiority of the AI-generated paper. Evaluation of reference selection by ChatGPT-4o (n = 729 total references) showed that 54% (n = 396) were authentic, 46% (n = 333) did not exist. Of the authentic references, 26.5% (105/396) were relevant to the paper narrative; 14.4% of the 729 total references.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Discussion: &lt;/strong&gt;Our findings reveal strengths and limitations on the potential of AI tools to assist in understanding the complexities of global health topics. Strengths mentioned by students included the ability of ChatGPT-4o to produce content very quickly and to suggest topics that they had not considered in the 3-perspective sections of their papers. Consistently presenting up-to-date facts and references, as well as further examining or summarizing the complexities of global health topics, appears to be a current limitation of ChatGPT-4o. Because ChatGPT-4o generated references from highly credible biomedical research journals that did not exist, our findings conclude that ChatGPT-4o failed a","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"58"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11656723/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142856210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection and classification of long terminal repeat sequences in plant LTR-retrotransposons and their analysis using explainable machine learning. 植物ltr -反转录转座子长末端重复序列的检测和分类及其可解释性机器学习分析。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-18 DOI: 10.1186/s13040-024-00410-z
Jakub Horvath, Pavel Jedlicka, Marie Kratka, Zdenek Kubat, Eduard Kejnovsky, Matej Lexa

Background: Long terminal repeats (LTRs) represent important parts of LTR retrotransposons and retroviruses found in high copy numbers in a majority of eukaryotic genomes. LTRs contain regulatory sequences essential for the life cycle of the retrotransposon. Previous experimental and sequence studies have provided only limited information about LTR structure and composition, mostly from model systems. To enhance our understanding of these key sequence modules, we focused on the contrasts between LTRs of various retrotransposon families and other genomic regions. Furthermore, this approach can be utilized for the classification and prediction of LTRs.

Results: We used machine learning methods suitable for DNA sequence classification and applied them to a large dataset of plant LTR retrotransposon sequences. We trained three machine learning models using (i) traditional model ensembles (Gradient Boosting), (ii) hybrid convolutional/long and short memory network models, and (iii) a DNA pre-trained transformer-based model using k-mer sequence representation. All three approaches were successful in classifying and isolating LTRs in this data, as well as providing valuable insights into LTR sequence composition. The best classification (expressed as F1 score) achieved for LTR detection was 0.85 using the hybrid network model. The most accurate classification task was superfamily classification (F1=0.89) while the least accurate was family classification (F1=0.74). The trained models were subjected to explainability analysis. Positional analysis identified a mixture of interesting features, many of which had a preferred absolute position within the LTR and/or were biologically relevant, such as a centrally positioned TATA-box regulatory sequence, and TG..CA nucleotide patterns around both LTR edges.

Conclusions: Our results show that the models used here recognized biologically relevant motifs, such as core promoter elements in the LTR detection task, and a development and stress-related subclass of transcription factor binding sites in the family classification task. Explainability analysis also highlighted the importance of 5'- and 3'- edges in LTR identity and revealed need to analyze more than just dinucleotides at these ends. Our work shows the applicability of machine learning models to regulatory sequence analysis and classification, and demonstrates the important role of the identified motifs in LTR detection.

背景:长末端重复序列(lts)是LTR反转录转座子和逆转录病毒的重要组成部分,在大多数真核生物基因组中具有高拷贝数。LTRs包含逆转录转座子生命周期所必需的调控序列。以往的实验和序列研究仅提供了有限的LTR结构和组成信息,主要来自模型系统。为了加强我们对这些关键序列模块的理解,我们重点研究了不同反转录转座子家族和其他基因组区域的ltr之间的对比。此外,该方法还可用于ltr的分类和预测。结果:我们采用了适合DNA序列分类的机器学习方法,并将其应用于植物LTR反转录转座子序列的大型数据集。我们使用(i)传统的模型集成(梯度增强),(ii)混合卷积/长和短记忆网络模型,以及(iii)使用k-mer序列表示的DNA预训练的基于变压器的模型训练了三个机器学习模型。这三种方法都成功地对该数据中的LTR进行了分类和分离,并为LTR序列组成提供了有价值的见解。使用混合网络模型对LTR检测获得的最佳分类(以F1分数表示)为0.85。准确率最高的分类任务是超家族分类(F1=0.89),准确率最低的分类任务是家族分类(F1=0.74)。对训练好的模型进行可解释性分析。位置分析发现了许多有趣的特征,其中许多特征在LTR内具有优先的绝对位置和/或具有生物学相关性,例如位于中心位置的TATA-box调节序列,以及LTR边缘周围的TG. CA核苷酸模式。结论:我们的研究结果表明,这里使用的模型识别了生物学相关的基序,例如LTR检测任务中的核心启动子元件,以及家族分类任务中转录因子结合位点的发育和应激相关亚类。可解释性分析还强调了5‘和3’边在LTR识别中的重要性,并揭示了需要分析的不仅仅是这些末端的二核苷酸。我们的工作显示了机器学习模型在调控序列分析和分类中的适用性,并证明了识别的基序在LTR检测中的重要作用。
{"title":"Detection and classification of long terminal repeat sequences in plant LTR-retrotransposons and their analysis using explainable machine learning.","authors":"Jakub Horvath, Pavel Jedlicka, Marie Kratka, Zdenek Kubat, Eduard Kejnovsky, Matej Lexa","doi":"10.1186/s13040-024-00410-z","DOIUrl":"10.1186/s13040-024-00410-z","url":null,"abstract":"<p><strong>Background: </strong>Long terminal repeats (LTRs) represent important parts of LTR retrotransposons and retroviruses found in high copy numbers in a majority of eukaryotic genomes. LTRs contain regulatory sequences essential for the life cycle of the retrotransposon. Previous experimental and sequence studies have provided only limited information about LTR structure and composition, mostly from model systems. To enhance our understanding of these key sequence modules, we focused on the contrasts between LTRs of various retrotransposon families and other genomic regions. Furthermore, this approach can be utilized for the classification and prediction of LTRs.</p><p><strong>Results: </strong>We used machine learning methods suitable for DNA sequence classification and applied them to a large dataset of plant LTR retrotransposon sequences. We trained three machine learning models using (i) traditional model ensembles (Gradient Boosting), (ii) hybrid convolutional/long and short memory network models, and (iii) a DNA pre-trained transformer-based model using k-mer sequence representation. All three approaches were successful in classifying and isolating LTRs in this data, as well as providing valuable insights into LTR sequence composition. The best classification (expressed as F1 score) achieved for LTR detection was 0.85 using the hybrid network model. The most accurate classification task was superfamily classification (F1=0.89) while the least accurate was family classification (F1=0.74). The trained models were subjected to explainability analysis. Positional analysis identified a mixture of interesting features, many of which had a preferred absolute position within the LTR and/or were biologically relevant, such as a centrally positioned TATA-box regulatory sequence, and TG..CA nucleotide patterns around both LTR edges.</p><p><strong>Conclusions: </strong>Our results show that the models used here recognized biologically relevant motifs, such as core promoter elements in the LTR detection task, and a development and stress-related subclass of transcription factor binding sites in the family classification task. Explainability analysis also highlighted the importance of 5'- and 3'- edges in LTR identity and revealed need to analyze more than just dinucleotides at these ends. Our work shows the applicability of machine learning models to regulatory sequence analysis and classification, and demonstrates the important role of the identified motifs in LTR detection.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"57"},"PeriodicalIF":4.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11656987/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142856213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TGNet: tensor-based graph convolutional networks for multimodal brain network analysis. 基于张量的图卷积网络用于多模态脑网络分析。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-06 DOI: 10.1186/s13040-024-00409-6
Zhaoming Kong, Rong Zhou, Xinwei Luo, Songlin Zhao, Ann B Ragin, Alex D Leow, Lifang He

Multimodal brain network analysis enables a comprehensive understanding of neurological disorders by integrating information from multiple neuroimaging modalities. However, existing methods often struggle to effectively model the complex structures of multimodal brain networks. In this paper, we propose a novel tensor-based graph convolutional network (TGNet) framework that combines tensor decomposition with multi-layer GCNs to capture both the homogeneity and intricate graph structures of multimodal brain networks. We evaluate TGNet on four datasets-HIV, Bipolar Disorder (BP), and Parkinson's Disease (PPMI), Alzheimer's Disease (ADNI)-demonstrating that it significantly outperforms existing methods for disease classification tasks, particularly in scenarios with limited sample sizes. The robustness and effectiveness of TGNet highlight its potential for advancing multimodal brain network analysis. The code is available at  https://github.com/rongzhou7/TGNet .

多模态脑网络分析通过整合来自多种神经成像模式的信息,可以全面了解神经系统疾病。然而,现有的方法往往难以有效地模拟多模态大脑网络的复杂结构。在本文中,我们提出了一种新的基于张量的图卷积网络(TGNet)框架,该框架将张量分解与多层GCNs相结合,以捕获多模态大脑网络的同质性和复杂的图结构。我们在hiv、双相情感障碍(BP)、帕金森病(PPMI)、阿尔茨海默病(ADNI)四个数据集上对TGNet进行了评估,结果表明,TGNet在疾病分类任务上显著优于现有方法,特别是在样本量有限的情况下。TGNet的鲁棒性和有效性突出了它在推进多模态脑网络分析方面的潜力。代码可在https://github.com/rongzhou7/TGNet上获得。
{"title":"TGNet: tensor-based graph convolutional networks for multimodal brain network analysis.","authors":"Zhaoming Kong, Rong Zhou, Xinwei Luo, Songlin Zhao, Ann B Ragin, Alex D Leow, Lifang He","doi":"10.1186/s13040-024-00409-6","DOIUrl":"10.1186/s13040-024-00409-6","url":null,"abstract":"<p><p>Multimodal brain network analysis enables a comprehensive understanding of neurological disorders by integrating information from multiple neuroimaging modalities. However, existing methods often struggle to effectively model the complex structures of multimodal brain networks. In this paper, we propose a novel tensor-based graph convolutional network (TGNet) framework that combines tensor decomposition with multi-layer GCNs to capture both the homogeneity and intricate graph structures of multimodal brain networks. We evaluate TGNet on four datasets-HIV, Bipolar Disorder (BP), and Parkinson's Disease (PPMI), Alzheimer's Disease (ADNI)-demonstrating that it significantly outperforms existing methods for disease classification tasks, particularly in scenarios with limited sample sizes. The robustness and effectiveness of TGNet highlight its potential for advancing multimodal brain network analysis. The code is available at  https://github.com/rongzhou7/TGNet .</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"55"},"PeriodicalIF":4.0,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11622555/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142787246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predictive modeling of ALS progression: an XGBoost approach using clinical features. ALS进展的预测建模:使用临床特征的XGBoost方法
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-12-02 DOI: 10.1186/s13040-024-00399-5
Richa Gupta, Mansi Bhandari, Anhad Grover, Taher Al-Shehari, Mohammed Kadrie, Taha Alfakih, Hussain Alsalman

This research presents a predictive model aimed at estimating the progression of Amyotrophic Lateral Sclerosis (ALS) based on clinical features collected from a dataset of 50 patients. Important features included evaluations of speech, mobility, and respiratory function. We utilized an XGBoost regression model to forecast scores on the ALS Functional Rating Scale (ALSFRS-R), achieving a training mean squared error (MSE) of 0.1651 and a testing MSE of 0.0073, with R² values of 0.9800 for training and 0.9993 for testing. The model demonstrates high accuracy, providing a useful tool for clinicians to track disease progression and enhance patient management and treatment strategies.

本研究提出了一种预测模型,旨在基于从50例患者数据集中收集的临床特征来估计肌萎缩侧索硬化症(ALS)的进展。重要的特征包括言语、活动能力和呼吸功能的评估。我们利用XGBoost回归模型预测ALS功能评定量表(ALSFRS-R)的得分,得到训练均方误差(MSE)为0.1651,检验均方误差(MSE)为0.0073,其中训练均方误差为0.9800,检验均方误差为0.9993。该模型具有较高的准确性,为临床医生跟踪疾病进展,提高患者管理和治疗策略提供了有用的工具。
{"title":"Predictive modeling of ALS progression: an XGBoost approach using clinical features.","authors":"Richa Gupta, Mansi Bhandari, Anhad Grover, Taher Al-Shehari, Mohammed Kadrie, Taha Alfakih, Hussain Alsalman","doi":"10.1186/s13040-024-00399-5","DOIUrl":"10.1186/s13040-024-00399-5","url":null,"abstract":"<p><p>This research presents a predictive model aimed at estimating the progression of Amyotrophic Lateral Sclerosis (ALS) based on clinical features collected from a dataset of 50 patients. Important features included evaluations of speech, mobility, and respiratory function. We utilized an XGBoost regression model to forecast scores on the ALS Functional Rating Scale (ALSFRS-R), achieving a training mean squared error (MSE) of 0.1651 and a testing MSE of 0.0073, with R² values of 0.9800 for training and 0.9993 for testing. The model demonstrates high accuracy, providing a useful tool for clinicians to track disease progression and enhance patient management and treatment strategies.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"54"},"PeriodicalIF":4.0,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11610297/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142774029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep learning-based Emergency Department In-hospital Cardiac Arrest Score (Deep EDICAS) for early prediction of cardiac arrest and cardiopulmonary resuscitation in the emergency department. 基于深度学习的急诊科院内心脏骤停评分(Deep EDICAS),用于急诊科心脏骤停和心肺复苏的早期预测。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-23 DOI: 10.1186/s13040-024-00407-8
Yuan-Xiang Deng, Jyun-Yi Wang, Chia-Hsin Ko, Chien-Hua Huang, Chu-Lin Tsai, Li-Chen Fu

Background: Timely identification of deteriorating patients is crucial to prevent the progression to cardiac arrest. However, current methods predicting emergency department cardiac arrest are primarily static, rule-based with limited precision and cannot accommodate time-series data. Deep learning has the potential to continuously update data and provide more precise predictions throughout the emergency department stay.

Methods: We developed and internally validated a deep learning-based scoring system, the Deep EDICAS for early prediction of cardiac arrest and a subset of arrest, cardiopulmonary resuscitation (CPR), in the emergency department. Our proposed model effectively integrates tabular and time series data to enhance predictive accuracy. To address data imbalance and bolster early prediction capabilities, we implemented data augmentation techniques.

Results: Our system achieved an AUPRC of 0.5178 and an AUROC of 0.9388 on on data from the National Taiwan University Hospital. For early prediction, our system achieved an AUPRC of 0.2798 and an AUROC of 0.9046, demonstrating superiority over other early warning scores. Moerover, Deep EDICAS offers interpretability through feature importance analysis.

Conclusion: Our study demonstrates the effectiveness of deep learning in predicting cardiac arrest in emergency department. Despite the higher clinical value associated with detecting patients requiring CPR, there is a scarcity of literature utilizing deep learning in CPR detection tasks. Therefore, this study embarks on an initial exploration into the task of CPR detection.

背景:及时发现病情恶化的患者对于防止病情恶化导致心脏骤停至关重要。然而,目前预测急诊科心脏骤停的方法主要是静态的、基于规则的,精确度有限,而且无法适应时间序列数据。深度学习有可能持续更新数据,并在急诊科住院期间提供更精确的预测:我们开发并在内部验证了一种基于深度学习的评分系统--深度 EDICAS,用于早期预测急诊科的心脏骤停和骤停的子集--心肺复苏(CPR)。我们提出的模型有效整合了表格数据和时间序列数据,从而提高了预测准确性。为了解决数据不平衡问题并增强早期预测能力,我们采用了数据增强技术:我们的系统在台大医院的数据上取得了 0.5178 的 AUPRC 和 0.9388 的 AUROC。在早期预测方面,我们的系统达到了 0.2798 的 AUPRC 和 0.9046 的 AUROC,显示出优于其他早期预警评分。此外,深度EDICAS还通过特征重要性分析提供了可解释性:我们的研究证明了深度学习在预测急诊科心脏骤停方面的有效性。尽管检测需要心肺复苏的患者具有更高的临床价值,但在心肺复苏检测任务中利用深度学习的文献却很少。因此,本研究开始了对心肺复苏检测任务的初步探索。
{"title":"Deep learning-based Emergency Department In-hospital Cardiac Arrest Score (Deep EDICAS) for early prediction of cardiac arrest and cardiopulmonary resuscitation in the emergency department.","authors":"Yuan-Xiang Deng, Jyun-Yi Wang, Chia-Hsin Ko, Chien-Hua Huang, Chu-Lin Tsai, Li-Chen Fu","doi":"10.1186/s13040-024-00407-8","DOIUrl":"10.1186/s13040-024-00407-8","url":null,"abstract":"<p><strong>Background: </strong>Timely identification of deteriorating patients is crucial to prevent the progression to cardiac arrest. However, current methods predicting emergency department cardiac arrest are primarily static, rule-based with limited precision and cannot accommodate time-series data. Deep learning has the potential to continuously update data and provide more precise predictions throughout the emergency department stay.</p><p><strong>Methods: </strong>We developed and internally validated a deep learning-based scoring system, the Deep EDICAS for early prediction of cardiac arrest and a subset of arrest, cardiopulmonary resuscitation (CPR), in the emergency department. Our proposed model effectively integrates tabular and time series data to enhance predictive accuracy. To address data imbalance and bolster early prediction capabilities, we implemented data augmentation techniques.</p><p><strong>Results: </strong>Our system achieved an AUPRC of 0.5178 and an AUROC of 0.9388 on on data from the National Taiwan University Hospital. For early prediction, our system achieved an AUPRC of 0.2798 and an AUROC of 0.9046, demonstrating superiority over other early warning scores. Moerover, Deep EDICAS offers interpretability through feature importance analysis.</p><p><strong>Conclusion: </strong>Our study demonstrates the effectiveness of deep learning in predicting cardiac arrest in emergency department. Despite the higher clinical value associated with detecting patients requiring CPR, there is a scarcity of literature utilizing deep learning in CPR detection tasks. Therefore, this study embarks on an initial exploration into the task of CPR detection.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"52"},"PeriodicalIF":4.0,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11585162/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142695993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Supervised multiple kernel learning approaches for multi-omics data integration. 用于多组学数据整合的有监督多核学习方法。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-23 DOI: 10.1186/s13040-024-00406-9
Mitja Briscik, Gabriele Tazza, László Vidács, Marie-Agnès Dillies, Sébastien Déjean

Background: Advances in high-throughput technologies have originated an ever-increasing availability of omics datasets. The integration of multiple heterogeneous data sources is currently an issue for biology and bioinformatics. Multiple kernel learning (MKL) has shown to be a flexible and valid approach to consider the diverse nature of multi-omics inputs, despite being an underused tool in genomic data mining.

Results: We provide novel MKL approaches based on different kernel fusion strategies. To learn from the meta-kernel of input kernels, we adapted unsupervised integration algorithms for supervised tasks with support vector machines. We also tested deep learning architectures for kernel fusion and classification. The results show that MKL-based models can outperform more complex, state-of-the-art, supervised multi-omics integrative approaches.

Conclusion: Multiple kernel learning offers a natural framework for predictive models in multi-omics data. It proved to provide a fast and reliable solution that can compete with and outperform more complex architectures. Our results offer a direction for bio-data mining research, biomarker discovery and further development of methods for heterogeneous data integration.

背景:高通量技术的进步带来了越来越多的组学数据集。整合多种异构数据源是当前生物学和生物信息学面临的一个问题。多重内核学习(MKL)已被证明是一种灵活有效的方法,可用于考虑多组学输入的多样性,尽管它在基因组数据挖掘中尚未得到充分利用:我们提供了基于不同核融合策略的新型 MKL 方法。为了从输入内核的元内核中学习,我们调整了无监督整合算法,用于支持向量机的监督任务。我们还测试了用于内核融合和分类的深度学习架构。结果表明,基于 MKL 的模型可以超越更复杂、更先进的有监督多组学整合方法:多核学习为多组学数据预测模型提供了一个自然框架。事实证明,它提供了一种快速、可靠的解决方案,可以与更复杂的架构相媲美,甚至更胜一筹。我们的研究结果为生物数据挖掘研究、生物标记物发现以及异构数据整合方法的进一步发展提供了一个方向。
{"title":"Supervised multiple kernel learning approaches for multi-omics data integration.","authors":"Mitja Briscik, Gabriele Tazza, László Vidács, Marie-Agnès Dillies, Sébastien Déjean","doi":"10.1186/s13040-024-00406-9","DOIUrl":"10.1186/s13040-024-00406-9","url":null,"abstract":"<p><strong>Background: </strong>Advances in high-throughput technologies have originated an ever-increasing availability of omics datasets. The integration of multiple heterogeneous data sources is currently an issue for biology and bioinformatics. Multiple kernel learning (MKL) has shown to be a flexible and valid approach to consider the diverse nature of multi-omics inputs, despite being an underused tool in genomic data mining.</p><p><strong>Results: </strong>We provide novel MKL approaches based on different kernel fusion strategies. To learn from the meta-kernel of input kernels, we adapted unsupervised integration algorithms for supervised tasks with support vector machines. We also tested deep learning architectures for kernel fusion and classification. The results show that MKL-based models can outperform more complex, state-of-the-art, supervised multi-omics integrative approaches.</p><p><strong>Conclusion: </strong>Multiple kernel learning offers a natural framework for predictive models in multi-omics data. It proved to provide a fast and reliable solution that can compete with and outperform more complex architectures. Our results offer a direction for bio-data mining research, biomarker discovery and further development of methods for heterogeneous data integration.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"53"},"PeriodicalIF":4.0,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11585117/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142695995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transcriptome-based network analysis related to regulatory T cells infiltration identified RCN1 as a potential biomarker for prognosis in clear cell renal cell carcinoma. 基于转录组的调节性T细胞浸润网络分析发现,RCN1是透明细胞肾细胞癌预后的潜在生物标志物。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-14 DOI: 10.1186/s13040-024-00404-x
Yang Qixin, Huang Jing, He Jiang, Liu Xueyang, Yu Lu, Li Yuehua

Background: Regulatory T cells (Tregs) play a critical role in shaping the immunosuppressive microenvironment within tumors. Investigating the role of Tregs in Clear cell renal cell carcinoma (ccRCC) is crucial for identifying prognostic markers and therapeutic targets for ccRCC.

Methods: Weighted gene co-expression network analysis (WGCNA) was utilized to pinpoint modules related to Treg infiltration in TCGA-KIRC samples. Following this, consensus clustering was employed to derive two clusters associated with Treg infiltration in ccRCC. A prognostic model was then developed using the gene module associated with Treg infiltration. We then evaluated the ability of the prognostic model to predict ccRCC overall survival and demonstrated that RCN1 can be used as a target to predict ccRCC prognosis.

Results: We deduce that the two clusters associated with Treg infiltration exhibit distinct compositions of the immune microenvironment, pathway activations, prognosis, and drug sensitivities commonly utilized in ccRCC treatment. Furthermore, a 7-gene model risk score, developed based on ccRCC Treg infiltration, proved to be a reliable prognostic marker in both training and validation cohorts. Additionally, survival analysis indicated that RCN1 serves as a reliable prognostic factor for ccRCC. Single-cell sequencing analysis revealed that RCN1 is predominantly expressed in tumor cells. A pan-cancer analysis highlighted that RCN1 is linked with poor prognosis and the activation of inflammatory response pathways across various cancers.

Conclusion: We developed a prognostic model associated with Treg infiltration, which facilitates the clinical categorization of ccRCC progression. Moreover, our findings underscore the significant potential of RCN1 as a ccRCC biomarker.

背景:调节性 T 细胞(Tregs调节性 T 细胞(Tregs)在形成肿瘤内免疫抑制微环境方面发挥着关键作用。研究Tregs在透明细胞肾细胞癌(ccRCC)中的作用对于确定ccRCC的预后标志物和治疗靶点至关重要:方法:利用加权基因共表达网络分析(WGCNA)确定TCGA-KIRC样本中与Treg浸润相关的模块。方法:利用加权基因共表达网络分析(WGCNA)确定了TCGA-KIRC样本中与Treg浸润相关的模块,然后利用共识聚类得出了两个与ccRCC中Treg浸润相关的聚类。然后利用与 Treg 浸润相关的基因模块建立了一个预后模型。然后,我们评估了该预后模型预测ccRCC总生存期的能力,并证明RCN1可作为预测ccRCC预后的靶点:结果:我们推断出,与Treg浸润相关的两个群组在免疫微环境、通路激活、预后和ccRCC治疗中常用的药物敏感性方面表现出不同的构成。此外,根据 ccRCC Treg 浸润情况开发的 7 基因模型风险评分在训练组和验证组中都被证明是可靠的预后标志物。此外,生存分析表明,RCN1是ccRCC的可靠预后因素。单细胞测序分析表明,RCN1 主要在肿瘤细胞中表达。一项泛癌症分析强调,RCN1与预后不良以及各种癌症的炎症反应通路激活有关:我们建立了一个与Treg浸润相关的预后模型,这有助于对ccRCC的进展进行临床分类。此外,我们的研究结果还强调了RCN1作为ccRCC生物标志物的巨大潜力。
{"title":"Transcriptome-based network analysis related to regulatory T cells infiltration identified RCN1 as a potential biomarker for prognosis in clear cell renal cell carcinoma.","authors":"Yang Qixin, Huang Jing, He Jiang, Liu Xueyang, Yu Lu, Li Yuehua","doi":"10.1186/s13040-024-00404-x","DOIUrl":"10.1186/s13040-024-00404-x","url":null,"abstract":"<p><strong>Background: </strong>Regulatory T cells (Tregs) play a critical role in shaping the immunosuppressive microenvironment within tumors. Investigating the role of Tregs in Clear cell renal cell carcinoma (ccRCC) is crucial for identifying prognostic markers and therapeutic targets for ccRCC.</p><p><strong>Methods: </strong>Weighted gene co-expression network analysis (WGCNA) was utilized to pinpoint modules related to Treg infiltration in TCGA-KIRC samples. Following this, consensus clustering was employed to derive two clusters associated with Treg infiltration in ccRCC. A prognostic model was then developed using the gene module associated with Treg infiltration. We then evaluated the ability of the prognostic model to predict ccRCC overall survival and demonstrated that RCN1 can be used as a target to predict ccRCC prognosis.</p><p><strong>Results: </strong>We deduce that the two clusters associated with Treg infiltration exhibit distinct compositions of the immune microenvironment, pathway activations, prognosis, and drug sensitivities commonly utilized in ccRCC treatment. Furthermore, a 7-gene model risk score, developed based on ccRCC Treg infiltration, proved to be a reliable prognostic marker in both training and validation cohorts. Additionally, survival analysis indicated that RCN1 serves as a reliable prognostic factor for ccRCC. Single-cell sequencing analysis revealed that RCN1 is predominantly expressed in tumor cells. A pan-cancer analysis highlighted that RCN1 is linked with poor prognosis and the activation of inflammatory response pathways across various cancers.</p><p><strong>Conclusion: </strong>We developed a prognostic model associated with Treg infiltration, which facilitates the clinical categorization of ccRCC progression. Moreover, our findings underscore the significant potential of RCN1 as a ccRCC biomarker.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"51"},"PeriodicalIF":4.0,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11566375/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deciphering the tissue-specific functional effect of Alzheimer risk SNPs with deep genome annotation. 利用深度基因组注释解密阿尔茨海默氏症风险 SNPs 的组织特异性功能效应。
IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-13 DOI: 10.1186/s13040-024-00400-1
Pradeep Varathan Pugalenthi, Bing He, Linhui Xie, Kwangsik Nho, Andrew J Saykin, Jingwen Yan

Alzheimer's disease (AD) is a highly heritable brain dementia, along with substantial failure of cognitive function. Large-scale genome-wide association studies (GWASs) have led to a set of SNPs significantly associated with AD and related traits. GWAS hits usually emerge as clusters where a lead SNP with the highest significance is surrounded by other less significant neighboring SNPs. Although functionality is not guaranteed even with the strongest associations in GWASs, lead SNPs have historically been the focus of the field, with the remaining associations inferred to be redundant. Recent deep genome annotation tools enable the prediction of function from a segment of a DNA sequence with significantly improved precision, which allows in-silico mutagenesis to interrogate the functional effect of SNP alleles. In this project, we explored the impact of top AD GWAS hits around APOE region on chromatin functions and whether it will be altered by the genetic context (i.e., alleles of neighboring SNPs). Our results showed that highly correlated SNPs in the same LD block could have distinct impacts on downstream functions. Although some GWAS lead SNPs showed dominant functional effects regardless of the neighborhood SNP alleles, several other SNPs did exhibit enhanced loss or gain of function under certain genetic contexts, suggesting potential additional information hidden in the LD blocks.

阿尔茨海默病(AD)是一种高度遗传性脑痴呆症,同时伴有认知功能的严重衰竭。大规模的全基因组关联研究(GWAS)发现了一系列与阿尔茨海默病及相关特征有显著关联的 SNPs。全基因组关联研究的结果通常会以群集的形式出现,在这些群集中,一个最重要的 SNP 被其他重要性较低的邻近 SNP 所包围。尽管在 GWAS 中,即使是关联性最强的 SNP 也不能保证其功能性,但主导 SNP 一直是该领域的研究重点,而其余的关联则被推断为多余的。最近的深度基因组注释工具可以从DNA序列的一个片段预测功能,其精确度大大提高,从而可以通过体内诱变来研究SNP等位基因的功能效应。在本项目中,我们探讨了APOE区域周围的顶级AD GWAS命中基因对染色质功能的影响,以及这种影响是否会因遗传背景(即相邻SNP的等位基因)而改变。我们的研究结果表明,在同一LD区块中高度相关的SNPs可能会对下游功能产生不同的影响。尽管一些 GWAS 引导 SNPs 显示出了显性功能效应,与邻近 SNP 等位基因无关,但其他几个 SNPs 在某些遗传背景下确实表现出了增强的功能丧失或增益,这表明 LD 区块中隐藏着潜在的额外信息。
{"title":"Deciphering the tissue-specific functional effect of Alzheimer risk SNPs with deep genome annotation.","authors":"Pradeep Varathan Pugalenthi, Bing He, Linhui Xie, Kwangsik Nho, Andrew J Saykin, Jingwen Yan","doi":"10.1186/s13040-024-00400-1","DOIUrl":"10.1186/s13040-024-00400-1","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is a highly heritable brain dementia, along with substantial failure of cognitive function. Large-scale genome-wide association studies (GWASs) have led to a set of SNPs significantly associated with AD and related traits. GWAS hits usually emerge as clusters where a lead SNP with the highest significance is surrounded by other less significant neighboring SNPs. Although functionality is not guaranteed even with the strongest associations in GWASs, lead SNPs have historically been the focus of the field, with the remaining associations inferred to be redundant. Recent deep genome annotation tools enable the prediction of function from a segment of a DNA sequence with significantly improved precision, which allows in-silico mutagenesis to interrogate the functional effect of SNP alleles. In this project, we explored the impact of top AD GWAS hits around APOE region on chromatin functions and whether it will be altered by the genetic context (i.e., alleles of neighboring SNPs). Our results showed that highly correlated SNPs in the same LD block could have distinct impacts on downstream functions. Although some GWAS lead SNPs showed dominant functional effects regardless of the neighborhood SNP alleles, several other SNPs did exhibit enhanced loss or gain of function under certain genetic contexts, suggesting potential additional information hidden in the LD blocks.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"17 1","pages":"50"},"PeriodicalIF":4.0,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11558841/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Biodata Mining
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1