首页 > 最新文献

Briefings in bioinformatics最新文献

英文 中文
Bioinformatics approaches for studying molecular sex differences in complex diseases. 研究复杂疾病分子性别差异的生物信息学方法。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae499
Rebecca Ting Jiin Loo, Mohamed Soudy, Francesco Nasta, Mirco Macchi, Enrico Glaab

Many complex diseases exhibit pronounced sex differences that can affect both the initial risk of developing the disease, as well as clinical disease symptoms, molecular manifestations, disease progression, and the risk of developing comorbidities. Despite this, computational studies of molecular data for complex diseases often treat sex as a confounding variable, aiming to filter out sex-specific effects rather than attempting to interpret them. A more systematic, in-depth exploration of sex-specific disease mechanisms could significantly improve our understanding of pathological and protective processes with sex-dependent profiles. This survey discusses dedicated bioinformatics approaches for the study of molecular sex differences in complex diseases. It highlights that, beyond classical statistical methods, approaches are needed that integrate prior knowledge of relevant hormone signaling interactions, gene regulatory networks, and sex linkage of genes to provide a mechanistic interpretation of sex-dependent alterations in disease. The review examines and compares the advantages, pitfalls and limitations of various conventional statistical and systems-level mechanistic analyses for this purpose, including tailored pathway and network analysis techniques. Overall, this survey highlights the potential of specialized bioinformatics techniques to systematically investigate molecular sex differences in complex diseases, to inform biomarker signature modeling, and to guide more personalized treatment approaches.

许多复杂疾病都表现出明显的性别差异,这种差异既会影响患病的初始风险,也会影响临床疾病症状、分子表现、疾病进展和患合并症的风险。尽管如此,对复杂疾病分子数据的计算研究往往将性别作为一个混杂变量,旨在过滤掉性别特异性效应,而不是试图解释这些效应。对性别特异性疾病机制进行更系统、更深入的探索,可以大大提高我们对具有性别依赖性的病理和保护过程的理解。本报告讨论了研究复杂疾病分子性别差异的专用生物信息学方法。它强调,除了传统的统计方法外,还需要整合相关激素信号相互作用、基因调控网络和基因性别关联的先验知识的方法,以提供疾病中性别依赖性改变的机理解释。本综述研究并比较了为此目的进行的各种传统统计和系统级机理分析的优势、缺陷和局限性,包括量身定制的通路和网络分析技术。总之,这项调查强调了专业生物信息学技术在系统研究复杂疾病的分子性别差异、为生物标志物特征建模提供信息以及指导更加个性化的治疗方法方面的潜力。
{"title":"Bioinformatics approaches for studying molecular sex differences in complex diseases.","authors":"Rebecca Ting Jiin Loo, Mohamed Soudy, Francesco Nasta, Mirco Macchi, Enrico Glaab","doi":"10.1093/bib/bbae499","DOIUrl":"https://doi.org/10.1093/bib/bbae499","url":null,"abstract":"<p><p>Many complex diseases exhibit pronounced sex differences that can affect both the initial risk of developing the disease, as well as clinical disease symptoms, molecular manifestations, disease progression, and the risk of developing comorbidities. Despite this, computational studies of molecular data for complex diseases often treat sex as a confounding variable, aiming to filter out sex-specific effects rather than attempting to interpret them. A more systematic, in-depth exploration of sex-specific disease mechanisms could significantly improve our understanding of pathological and protective processes with sex-dependent profiles. This survey discusses dedicated bioinformatics approaches for the study of molecular sex differences in complex diseases. It highlights that, beyond classical statistical methods, approaches are needed that integrate prior knowledge of relevant hormone signaling interactions, gene regulatory networks, and sex linkage of genes to provide a mechanistic interpretation of sex-dependent alterations in disease. The review examines and compares the advantages, pitfalls and limitations of various conventional statistical and systems-level mechanistic analyses for this purpose, including tailored pathway and network analysis techniques. Overall, this survey highlights the potential of specialized bioinformatics techniques to systematically investigate molecular sex differences in complex diseases, to inform biomarker signature modeling, and to guide more personalized treatment approaches.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471957/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predictability of antigen binding based on short motifs in the antibody CDRH3. 根据抗体 CDRH3 中的短图案预测抗原结合的可预测性。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae537
Lonneke Scheffer, Eric Emanuel Reber, Brij Bhushan Mehta, Milena Pavlović, Maria Chernigovskaya, Eve Richardson, Rahmad Akbar, Fridtjof Lund-Johansen, Victor Greiff, Ingrid Hobæk Haff, Geir Kjetil Sandve

Adaptive immune receptors, such as antibodies and T-cell receptors, recognize foreign threats with exquisite specificity. A major challenge in adaptive immunology is discovering the rules governing immune receptor-antigen binding in order to predict the antigen binding status of previously unseen immune receptors. Many studies assume that the antigen binding status of an immune receptor may be determined by the presence of a short motif in the complementarity determining region 3 (CDR3), disregarding other amino acids. To test this assumption, we present a method to discover short motifs which show high precision in predicting antigen binding and generalize well to unseen simulated and experimental data. Our analysis of a mutagenesis-based antibody dataset reveals 11 336 position-specific, mostly gapped motifs of 3-5 amino acids that retain high precision on independently generated experimental data. Using a subset of only 178 motifs, a simple classifier was made that on the independently generated dataset outperformed a deep learning model proposed specifically for such datasets. In conclusion, our findings support the notion that for some antibodies, antigen binding may be largely determined by a short CDR3 motif. As more experimental data emerge, our methodology could serve as a foundation for in-depth investigations into antigen binding signals.

适应性免疫受体(如抗体和 T 细胞受体)能以极高的特异性识别外来威胁。适应性免疫学的一个主要挑战是发现免疫受体与抗原结合的规则,以便预测以前未见过的免疫受体的抗原结合状态。许多研究认为,免疫受体的抗原结合状态可能是由互补决定区 3(CDR3)中存在的一个短图案决定的,而不考虑其他氨基酸。为了验证这一假设,我们提出了一种发现短图案的方法,这种短图案在预测抗原结合方面表现出很高的精确度,并能很好地推广到未见过的模拟和实验数据中。我们对基于诱变的抗体数据集进行了分析,发现了 11 336 个位置特异的、大多为 3-5 个氨基酸的间隙图案,这些图案在独立生成的实验数据中保持了较高的精度。利用仅有的 178 个图案子集,我们制作了一个简单的分类器,在独立生成的数据集上,该分类器的表现优于专为此类数据集提出的深度学习模型。总之,我们的研究结果支持这样一种观点,即对于某些抗体来说,抗原结合可能在很大程度上取决于短 CDR3 主题。随着更多实验数据的出现,我们的方法可以作为深入研究抗原结合信号的基础。
{"title":"Predictability of antigen binding based on short motifs in the antibody CDRH3.","authors":"Lonneke Scheffer, Eric Emanuel Reber, Brij Bhushan Mehta, Milena Pavlović, Maria Chernigovskaya, Eve Richardson, Rahmad Akbar, Fridtjof Lund-Johansen, Victor Greiff, Ingrid Hobæk Haff, Geir Kjetil Sandve","doi":"10.1093/bib/bbae537","DOIUrl":"https://doi.org/10.1093/bib/bbae537","url":null,"abstract":"<p><p>Adaptive immune receptors, such as antibodies and T-cell receptors, recognize foreign threats with exquisite specificity. A major challenge in adaptive immunology is discovering the rules governing immune receptor-antigen binding in order to predict the antigen binding status of previously unseen immune receptors. Many studies assume that the antigen binding status of an immune receptor may be determined by the presence of a short motif in the complementarity determining region 3 (CDR3), disregarding other amino acids. To test this assumption, we present a method to discover short motifs which show high precision in predicting antigen binding and generalize well to unseen simulated and experimental data. Our analysis of a mutagenesis-based antibody dataset reveals 11 336 position-specific, mostly gapped motifs of 3-5 amino acids that retain high precision on independently generated experimental data. Using a subset of only 178 motifs, a simple classifier was made that on the independently generated dataset outperformed a deep learning model proposed specifically for such datasets. In conclusion, our findings support the notion that for some antibodies, antigen binding may be largely determined by a short CDR3 motif. As more experimental data emerge, our methodology could serve as a foundation for in-depth investigations into antigen binding signals.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11495870/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142495342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting bacterial transcription factor binding sites through machine learning and structural characterization based on DNA duplex stability. 基于 DNA 双链稳定性,通过机器学习和结构特征分析预测细菌转录因子结合位点。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae581
André Borges Farias, Gustavo Sganzerla Martinez, Edgardo Galán-Vásquez, Marisa Fabiana Nicolás, Ernesto Pérez-Rueda

Transcriptional factors (TFs) in bacteria play a crucial role in gene regulation by binding to specific DNA sequences, thereby assisting in the activation or repression of genes. Despite their central role, deciphering shape recognition of bacterial TFs-DNA interactions remains an intricate challenge. A deeper understanding of DNA secondary structures could greatly enhance our knowledge of how TFs recognize and interact with DNA, thereby elucidating their biological function. In this study, we employed machine learning algorithms to predict transcription factor binding sites (TFBS) and classify them as directed-repeat (DR) or inverted-repeat (IR). To accomplish this, we divided the set of TFBS nucleotide sequences by size, ranging from 8 to 20 base pairs, and converted them into thermodynamic data known as DNA duplex stability (DDS). Our results demonstrate that the Random Forest algorithm accurately predicts TFBS with an average accuracy of over 82% and effectively distinguishes between IR and DR with an accuracy of 89%. Interestingly, upon converting the base pairs of several TFBS-IR into DDS values, we observed a symmetric profile typical of the palindromic structure associated with these architectures. This study presents a novel TFBS prediction model based on a DDS characteristic that may indicate how respective proteins interact with base pairs, thus providing insights into molecular mechanisms underlying bacterial TFs-DNA interaction.

细菌中的转录因子(TFs)通过与特定的 DNA 序列结合,从而帮助激活或抑制基因,在基因调控中发挥着至关重要的作用。尽管细菌转录因子起着核心作用,但破译细菌转录因子与 DNA 之间相互作用的形状识别仍然是一项复杂的挑战。加深对 DNA 二级结构的理解可大大增进我们对 TFs 如何识别 DNA 并与之相互作用的了解,从而阐明它们的生物学功能。在这项研究中,我们采用了机器学习算法来预测转录因子结合位点(TFBS),并将其分为定向重复位点(DR)和反向重复位点(IR)。为此,我们将 TFBS 核苷酸序列集按大小(从 8 个碱基对到 20 个碱基对不等)进行了划分,并将其转换成热力学数据,即 DNA 双工稳定性(DDS)。结果表明,随机森林算法能准确预测 TFBS,平均准确率超过 82%,并能有效区分 IR 和 DR,准确率高达 89%。有趣的是,在将几个 TFBS-IR 的碱基对转换成 DDS 值时,我们观察到了与这些结构相关的典型的回文结构的对称轮廓。本研究提出了一种基于 DDS 特征的新型 TFBS 预测模型,该模型可显示各自的蛋白质如何与碱基对相互作用,从而为了解细菌 TFs-DNA 相互作用的分子机制提供启示。
{"title":"Predicting bacterial transcription factor binding sites through machine learning and structural characterization based on DNA duplex stability.","authors":"André Borges Farias, Gustavo Sganzerla Martinez, Edgardo Galán-Vásquez, Marisa Fabiana Nicolás, Ernesto Pérez-Rueda","doi":"10.1093/bib/bbae581","DOIUrl":"10.1093/bib/bbae581","url":null,"abstract":"<p><p>Transcriptional factors (TFs) in bacteria play a crucial role in gene regulation by binding to specific DNA sequences, thereby assisting in the activation or repression of genes. Despite their central role, deciphering shape recognition of bacterial TFs-DNA interactions remains an intricate challenge. A deeper understanding of DNA secondary structures could greatly enhance our knowledge of how TFs recognize and interact with DNA, thereby elucidating their biological function. In this study, we employed machine learning algorithms to predict transcription factor binding sites (TFBS) and classify them as directed-repeat (DR) or inverted-repeat (IR). To accomplish this, we divided the set of TFBS nucleotide sequences by size, ranging from 8 to 20 base pairs, and converted them into thermodynamic data known as DNA duplex stability (DDS). Our results demonstrate that the Random Forest algorithm accurately predicts TFBS with an average accuracy of over 82% and effectively distinguishes between IR and DR with an accuracy of 89%. Interestingly, upon converting the base pairs of several TFBS-IR into DDS values, we observed a symmetric profile typical of the palindromic structure associated with these architectures. This study presents a novel TFBS prediction model based on a DDS characteristic that may indicate how respective proteins interact with base pairs, thus providing insights into molecular mechanisms underlying bacterial TFs-DNA interaction.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562833/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-view learning framework for predicting unknown types of cancer markers via directed graph neural networks fitting regulatory networks. 通过有向图神经网络拟合调控网络预测未知类型癌症标记物的多视角学习框架。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae546
Xin-Fei Wang, Lan Huang, Yan Wang, Ren-Chu Guan, Zhu-Hong You, Nan Sheng, Xu-Ping Xie, Wen-Ju Hou

The discovery of diagnostic and therapeutic biomarkers for complex diseases, especially cancer, has always been a central and long-term challenge in molecular association prediction research, offering promising avenues for advancing the understanding of complex diseases. To this end, researchers have developed various network-based prediction techniques targeting specific molecular associations. However, limitations imposed by reductionism and network representation learning have led existing studies to narrowly focus on high prediction efficiency within single association type, thereby glossing over the discovery of unknown types of associations. Additionally, effectively utilizing network structure to fit the interaction properties of regulatory networks and combining specific case biomarker validations remains an unresolved issue in cancer biomarker prediction methods. To overcome these limitations, we propose a multi-view learning framework, CeRVE, based on directed graph neural networks (DGNN) for predicting unknown type cancer biomarkers. CeRVE effectively extracts and integrates subgraph information through multi-view feature learning. Subsequently, CeRVE utilizes DGNN to simulate the entire regulatory network, propagating node attribute features and extracting various interaction relationships between molecules. Furthermore, CeRVE constructed a comparative analysis matrix of three cancers and adjacent normal tissues through The Cancer Genome Atlas and identified multiple types of potential cancer biomarkers through differential expression analysis of mRNA, microRNA, and long noncoding RNA. Computational testing of multiple types of biomarkers for 72 cancers demonstrates that CeRVE exhibits superior performance in cancer biomarker prediction, providing a powerful tool and insightful approach for AI-assisted disease biomarker discovery.

发现复杂疾病(尤其是癌症)的诊断和治疗生物标志物一直是分子关联预测研究的核心和长期挑战,这为促进对复杂疾病的了解提供了大有可为的途径。为此,研究人员针对特定的分子关联开发了各种基于网络的预测技术。然而,还原论和网络表征学习的局限性导致现有研究狭隘地关注单一关联类型的高预测效率,从而忽略了未知关联类型的发现。此外,有效利用网络结构来适应调控网络的相互作用特性,并结合具体案例进行生物标志物验证,仍然是癌症生物标志物预测方法中一个尚未解决的问题。为了克服这些局限性,我们提出了一种基于有向图神经网络(DGNN)的多视角学习框架 CeRVE,用于预测未知类型的癌症生物标记物。CeRVE 通过多视图特征学习有效地提取和整合了子图信息。随后,CeRVE 利用有向图神经网络模拟整个调控网络,传播节点属性特征并提取分子间的各种相互作用关系。此外,CeRVE 还通过癌症基因组图谱构建了三种癌症和相邻正常组织的对比分析矩阵,并通过 mRNA、microRNA 和长非编码 RNA 的差异表达分析,确定了多种类型的潜在癌症生物标记物。对72种癌症的多种类型生物标志物的计算测试表明,CeRVE在癌症生物标志物预测方面表现出卓越的性能,为人工智能辅助疾病生物标志物的发现提供了一个强大的工具和具有洞察力的方法。
{"title":"Multi-view learning framework for predicting unknown types of cancer markers via directed graph neural networks fitting regulatory networks.","authors":"Xin-Fei Wang, Lan Huang, Yan Wang, Ren-Chu Guan, Zhu-Hong You, Nan Sheng, Xu-Ping Xie, Wen-Ju Hou","doi":"10.1093/bib/bbae546","DOIUrl":"10.1093/bib/bbae546","url":null,"abstract":"<p><p>The discovery of diagnostic and therapeutic biomarkers for complex diseases, especially cancer, has always been a central and long-term challenge in molecular association prediction research, offering promising avenues for advancing the understanding of complex diseases. To this end, researchers have developed various network-based prediction techniques targeting specific molecular associations. However, limitations imposed by reductionism and network representation learning have led existing studies to narrowly focus on high prediction efficiency within single association type, thereby glossing over the discovery of unknown types of associations. Additionally, effectively utilizing network structure to fit the interaction properties of regulatory networks and combining specific case biomarker validations remains an unresolved issue in cancer biomarker prediction methods. To overcome these limitations, we propose a multi-view learning framework, CeRVE, based on directed graph neural networks (DGNN) for predicting unknown type cancer biomarkers. CeRVE effectively extracts and integrates subgraph information through multi-view feature learning. Subsequently, CeRVE utilizes DGNN to simulate the entire regulatory network, propagating node attribute features and extracting various interaction relationships between molecules. Furthermore, CeRVE constructed a comparative analysis matrix of three cancers and adjacent normal tissues through The Cancer Genome Atlas and identified multiple types of potential cancer biomarkers through differential expression analysis of mRNA, microRNA, and long noncoding RNA. Computational testing of multiple types of biomarkers for 72 cancers demonstrates that CeRVE exhibits superior performance in cancer biomarker prediction, providing a powerful tool and insightful approach for AI-assisted disease biomarker discovery.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11514060/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142520983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A consensus-based classification workflow to determine genetically inferred ancestry from comprehensive genomic profiling of patients with solid tumors. 基于共识的分类工作流程,从实体瘤患者的综合基因组图谱中确定基因推断祖先。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae557
Zachary D Wallen, Mary K Nesline, Sarabjot Pabla, Shuang Gao, Erik Vanroey, Stephanie B Hastings, Heidi Ko, Kyle C Strickland, Rebecca A Previs, Shengle Zhang, Jeffrey M Conroy, Taylor J Jensen, Elizabeth George, Marcia Eisenberg, Brian Caveney, Pratheesh Sathyan, Shakti Ramkissoon, Eric A Severson

Disparities in cancer diagnosis, treatment, and outcomes based on self-identified race and ethnicity (SIRE) are well documented, yet these variables have historically been excluded from clinical research. Without SIRE, genetic ancestry can be inferred using single-nucleotide polymorphisms (SNPs) detected from tumor DNA using comprehensive genomic profiling (CGP). However, factors inherent to CGP of tumor DNA increase the difficulty of identifying ancestry-informative SNPs, and current workflows for inferring genetic ancestry from CGP need improvements in key areas of the ancestry inference process. This study used genomic data from 4274 diverse reference subjects and CGP data from 491 patients with solid tumors and SIRE to develop and validate a workflow to obtain accurate genetically inferred ancestry (GIA) from CGP sequencing results. We use consensus-based classification to derive confident ancestral inferences from an expanded reference dataset covering eight world populations (African, Admixed American, Central Asian/Siberian, European, East Asian, Middle Eastern, Oceania, South Asian). Our GIA calls were highly concordant with SIRE (95%) and aligned well with reference populations of inferred ancestries. Further, our workflow could expand on SIRE by (i) detecting the ancestry of patients that usually lack appropriate racial categories, (ii) determining what patients have mixed ancestry, and (iii) resolving ancestries of patients in heterogeneous racial categories and who had missing SIRE. Accurate GIA provides needed information to enable ancestry-aware biomarker research, ensure the inclusion of underrepresented groups in clinical research, and increase the diverse representation of patient populations eligible for precision medicine therapies and trials.

基于自我认同的种族和民族(SIRE)在癌症诊断、治疗和预后方面的差异已被充分记录在案,但这些变量历来被排除在临床研究之外。在没有 SIRE 的情况下,可以利用综合基因组分析(CGP)从肿瘤 DNA 中检测到的单核苷酸多态性(SNPs)来推断遗传血统。然而,肿瘤 DNA CGP 的固有因素增加了鉴定具有祖先信息的 SNP 的难度,目前从 CGP 推断遗传祖先的工作流程需要在祖先推断过程的关键领域进行改进。本研究使用了来自 4274 名不同参考对象的基因组数据和来自 491 名实体瘤和 SIRE 患者的 CGP 数据,开发并验证了从 CGP 测序结果中获得准确遗传祖先推断(GIA)的工作流程。我们采用基于共识的分类方法,从涵盖世界八大人群(非洲人、美洲混血人、中亚/西伯利亚人、欧洲人、东亚人、中东人、大洋洲人、南亚人)的扩展参考数据集中得出可靠的祖先推断。我们的 GIA 调用与 SIRE 高度一致(95%),并与推断祖先的参考人群非常吻合。此外,我们的工作流程还可以通过以下方式扩展 SIRE:(i) 检测通常缺乏适当种族类别的患者的祖先;(ii) 确定哪些患者具有混合祖先;(iii) 解决异质种族类别和 SIRE 缺失的患者的祖先问题。准确的 GIA 可提供所需的信息,以开展具有祖先意识的生物标记物研究,确保将代表性不足的群体纳入临床研究,并提高有资格接受精准医学疗法和试验的患者群体的多样性代表性。
{"title":"A consensus-based classification workflow to determine genetically inferred ancestry from comprehensive genomic profiling of patients with solid tumors.","authors":"Zachary D Wallen, Mary K Nesline, Sarabjot Pabla, Shuang Gao, Erik Vanroey, Stephanie B Hastings, Heidi Ko, Kyle C Strickland, Rebecca A Previs, Shengle Zhang, Jeffrey M Conroy, Taylor J Jensen, Elizabeth George, Marcia Eisenberg, Brian Caveney, Pratheesh Sathyan, Shakti Ramkissoon, Eric A Severson","doi":"10.1093/bib/bbae557","DOIUrl":"10.1093/bib/bbae557","url":null,"abstract":"<p><p>Disparities in cancer diagnosis, treatment, and outcomes based on self-identified race and ethnicity (SIRE) are well documented, yet these variables have historically been excluded from clinical research. Without SIRE, genetic ancestry can be inferred using single-nucleotide polymorphisms (SNPs) detected from tumor DNA using comprehensive genomic profiling (CGP). However, factors inherent to CGP of tumor DNA increase the difficulty of identifying ancestry-informative SNPs, and current workflows for inferring genetic ancestry from CGP need improvements in key areas of the ancestry inference process. This study used genomic data from 4274 diverse reference subjects and CGP data from 491 patients with solid tumors and SIRE to develop and validate a workflow to obtain accurate genetically inferred ancestry (GIA) from CGP sequencing results. We use consensus-based classification to derive confident ancestral inferences from an expanded reference dataset covering eight world populations (African, Admixed American, Central Asian/Siberian, European, East Asian, Middle Eastern, Oceania, South Asian). Our GIA calls were highly concordant with SIRE (95%) and aligned well with reference populations of inferred ancestries. Further, our workflow could expand on SIRE by (i) detecting the ancestry of patients that usually lack appropriate racial categories, (ii) determining what patients have mixed ancestry, and (iii) resolving ancestries of patients in heterogeneous racial categories and who had missing SIRE. Accurate GIA provides needed information to enable ancestry-aware biomarker research, ensure the inclusion of underrepresented groups in clinical research, and increase the diverse representation of patient populations eligible for precision medicine therapies and trials.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11521331/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142543787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph contrastive learning as a versatile foundation for advanced scRNA-seq data analysis. 图形对比学习是高级 scRNA-seq 数据分析的多功能基础。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae558
Zhenhao Zhang, Yuxi Liu, Meichen Xiao, Kun Wang, Yu Huang, Jiang Bian, Ruolin Yang, Fuyi Li

Single-cell RNA sequencing (scRNA-seq) offers unprecedented insights into transcriptome-wide gene expression at the single-cell level. Cell clustering has been long established in the analysis of scRNA-seq data to identify the groups of cells with similar expression profiles. However, cell clustering is technically challenging, as raw scRNA-seq data have various analytical issues, including high dimensionality and dropout values. Existing research has developed deep learning models, such as graph machine learning models and contrastive learning-based models, for cell clustering using scRNA-seq data and has summarized the unsupervised learning of cell clustering into a human-interpretable format. While advances in cell clustering have been profound, we are no closer to finding a simple yet effective framework for learning high-quality representations necessary for robust clustering. In this study, we propose scSimGCL, a novel framework based on the graph contrastive learning paradigm for self-supervised pretraining of graph neural networks. This framework facilitates the generation of high-quality representations crucial for cell clustering. Our scSimGCL incorporates cell-cell graph structure and contrastive learning to enhance the performance of cell clustering. Extensive experimental results on simulated and real scRNA-seq datasets suggest the superiority of the proposed scSimGCL. Moreover, clustering assignment analysis confirms the general applicability of scSimGCL, including state-of-the-art clustering algorithms. Further, ablation study and hyperparameter analysis suggest the efficacy of our network architecture with the robustness of decisions in the self-supervised learning setting. The proposed scSimGCL can serve as a robust framework for practitioners developing tools for cell clustering. The source code of scSimGCL is publicly available at https://github.com/zhangzh1328/scSimGCL.

单细胞 RNA 测序(scRNA-seq)可在单细胞水平上深入了解整个转录组的基因表达。细胞聚类在 scRNA-seq 数据分析中早已确立,用于识别具有相似表达谱的细胞群。然而,细胞聚类在技术上具有挑战性,因为原始 scRNA-seq 数据存在各种分析问题,包括高维度和丢弃值。现有研究已经开发出了利用 scRNA-seq 数据进行细胞聚类的深度学习模型,如图机器学习模型和基于对比学习的模型,并将细胞聚类的无监督学习总结为人类可理解的格式。虽然在细胞聚类方面取得了长足的进步,但我们还没有找到一个简单而有效的框架来学习稳健聚类所需的高质量表征。在本研究中,我们提出了 scSimGCL,这是一个基于图对比学习范式的新型框架,用于图神经网络的自我监督预训练。该框架有助于生成对细胞聚类至关重要的高质量表征。我们的 scSimGCL 结合了细胞-细胞图结构和对比学习,以提高细胞聚类的性能。在模拟和真实 scRNA-seq 数据集上的大量实验结果表明了所提出的 scSimGCL 的优越性。此外,聚类赋值分析证实了 scSimGCL 的普遍适用性,包括最先进的聚类算法。此外,消融研究和超参数分析表明,我们的网络架构在自我监督学习设置中具有决策稳健性的功效。对于开发细胞聚类工具的从业人员来说,所提出的 scSimGCL 可以作为一个稳健的框架。scSimGCL 的源代码可在 https://github.com/zhangzh1328/scSimGCL 公开获取。
{"title":"Graph contrastive learning as a versatile foundation for advanced scRNA-seq data analysis.","authors":"Zhenhao Zhang, Yuxi Liu, Meichen Xiao, Kun Wang, Yu Huang, Jiang Bian, Ruolin Yang, Fuyi Li","doi":"10.1093/bib/bbae558","DOIUrl":"10.1093/bib/bbae558","url":null,"abstract":"<p><p>Single-cell RNA sequencing (scRNA-seq) offers unprecedented insights into transcriptome-wide gene expression at the single-cell level. Cell clustering has been long established in the analysis of scRNA-seq data to identify the groups of cells with similar expression profiles. However, cell clustering is technically challenging, as raw scRNA-seq data have various analytical issues, including high dimensionality and dropout values. Existing research has developed deep learning models, such as graph machine learning models and contrastive learning-based models, for cell clustering using scRNA-seq data and has summarized the unsupervised learning of cell clustering into a human-interpretable format. While advances in cell clustering have been profound, we are no closer to finding a simple yet effective framework for learning high-quality representations necessary for robust clustering. In this study, we propose scSimGCL, a novel framework based on the graph contrastive learning paradigm for self-supervised pretraining of graph neural networks. This framework facilitates the generation of high-quality representations crucial for cell clustering. Our scSimGCL incorporates cell-cell graph structure and contrastive learning to enhance the performance of cell clustering. Extensive experimental results on simulated and real scRNA-seq datasets suggest the superiority of the proposed scSimGCL. Moreover, clustering assignment analysis confirms the general applicability of scSimGCL, including state-of-the-art clustering algorithms. Further, ablation study and hyperparameter analysis suggest the efficacy of our network architecture with the robustness of decisions in the self-supervised learning setting. The proposed scSimGCL can serve as a robust framework for practitioners developing tools for cell clustering. The source code of scSimGCL is publicly available at https://github.com/zhangzh1328/scSimGCL.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11530284/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142563897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BFAST: joint dimension reduction and spatial clustering with Bayesian factor analysis for zero-inflated spatial transcriptomics data. BFAST:利用贝叶斯因子分析对零膨胀空间转录组学数据进行联合降维和空间聚类。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae594
Yang Xu, Dian Lv, Xuanxuan Zou, Liang Wu, Xun Xu, Xin Zhao

The development of spatially resolved transcriptomics (ST) technologies has made it possible to measure gene expression profiles coupled with cellular spatial context and assist biologists in comprehensively characterizing cellular phenotype heterogeneity and tissue microenvironment. Spatial clustering is vital for biological downstream analysis. However, due to high noise and dropout events, clustering spatial transcriptomics data poses numerous challenges due to the lack of effective algorithms. Here we develop a novel method, jointly performing dimension reduction and spatial clustering with Bayesian Factor Analysis for zero-inflated Spatial Transcriptomics data (BFAST). BFAST has showcased exceptional performance on simulation data and real spatial transcriptomics datasets, as proven by benchmarking against currently available methods. It effectively extracts more biologically informative low-dimensional features compared to traditional dimensionality reduction approaches, thereby enhancing the accuracy and precision of clustering.

空间分辨转录组学(ST)技术的发展使测量基因表达谱与细胞空间背景相结合成为可能,并帮助生物学家全面描述细胞表型异质性和组织微环境。空间聚类对于生物下游分析至关重要。然而,由于高噪声和丢弃事件,空间转录组学数据的聚类因缺乏有效算法而面临诸多挑战。在此,我们开发了一种新方法,利用贝叶斯因子分析对零膨胀空间转录组学数据(BFAST)联合进行降维和空间聚类。BFAST 在模拟数据和真实空间转录组学数据集上表现出了卓越的性能,这一点已通过与现有方法的基准测试得到了证明。与传统的降维方法相比,它能有效提取更多具有生物信息的低维特征,从而提高聚类的准确性和精确度。
{"title":"BFAST: joint dimension reduction and spatial clustering with Bayesian factor analysis for zero-inflated spatial transcriptomics data.","authors":"Yang Xu, Dian Lv, Xuanxuan Zou, Liang Wu, Xun Xu, Xin Zhao","doi":"10.1093/bib/bbae594","DOIUrl":"10.1093/bib/bbae594","url":null,"abstract":"<p><p>The development of spatially resolved transcriptomics (ST) technologies has made it possible to measure gene expression profiles coupled with cellular spatial context and assist biologists in comprehensively characterizing cellular phenotype heterogeneity and tissue microenvironment. Spatial clustering is vital for biological downstream analysis. However, due to high noise and dropout events, clustering spatial transcriptomics data poses numerous challenges due to the lack of effective algorithms. Here we develop a novel method, jointly performing dimension reduction and spatial clustering with Bayesian Factor Analysis for zero-inflated Spatial Transcriptomics data (BFAST). BFAST has showcased exceptional performance on simulation data and real spatial transcriptomics datasets, as proven by benchmarking against currently available methods. It effectively extracts more biologically informative low-dimensional features compared to traditional dimensionality reduction approaches, thereby enhancing the accuracy and precision of clustering.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11570543/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142647077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RiceSNP-BST: a deep learning framework for predicting biotic stress-associated SNPs in rice. RiceSNP-BST:预测水稻生物胁迫相关 SNP 的深度学习框架。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae599
Jiajun Xu, Yujia Gao, Quan Lu, Renyi Zhang, Jianfeng Gui, Xiaoshuang Liu, Zhenyu Yue

Rice consistently faces significant threats from biotic stresses, such as fungi, bacteria, pests, and viruses. Consequently, accurately and rapidly identifying previously unknown single-nucleotide polymorphisms (SNPs) in the rice genome is a critical challenge for rice research and the development of resistant varieties. However, the limited availability of high-quality rice genotype data has hindered this research. Deep learning has transformed biological research by facilitating the prediction and analysis of SNPs in biological sequence data. Convolutional neural networks are especially effective in extracting structural and local features from DNA sequences, leading to significant advancements in genomics. Nevertheless, the expanding catalog of genome-wide association studies provides valuable biological insights for rice research. Expanding on this idea, we introduce RiceSNP-BST, an automatic architecture search framework designed to predict SNPs associated with rice biotic stress traits (BST-associated SNPs) by integrating multidimensional features. Notably, the model successfully innovates the datasets, offering more precision than state-of-the-art methods while demonstrating good performance on an independent test set and cross-species datasets. Additionally, we extracted features from the original DNA sequences and employed causal inference to enhance the biological interpretability of the model. This study highlights the potential of RiceSNP-BST in advancing genome prediction in rice. Furthermore, a user-friendly web server for RiceSNP-BST (http://rice-snp-bst.aielab.cc) has been developed to support broader genome research.

水稻一直面临着真菌、细菌、害虫和病毒等生物胁迫的严重威胁。因此,准确、快速地鉴定水稻基因组中先前未知的单核苷酸多态性(SNPs)是水稻研究和抗病品种开发面临的关键挑战。然而,高质量水稻基因型数据的有限可用性阻碍了这项研究。深度学习促进了生物序列数据中 SNP 的预测和分析,从而改变了生物学研究。卷积神经网络在从 DNA 序列中提取结构和局部特征方面尤为有效,从而在基因组学领域取得了重大进展。然而,不断扩大的全基因组关联研究为水稻研究提供了宝贵的生物学见解。基于这一想法,我们引入了 RiceSNP-BST,这是一个自动结构搜索框架,旨在通过整合多维特征来预测与水稻生物胁迫性状相关的 SNPs(BST 相关 SNPs)。值得注意的是,该模型成功地对数据集进行了创新,与最先进的方法相比精度更高,同时在独立测试集和跨物种数据集上表现出良好的性能。此外,我们还从原始 DNA 序列中提取了特征,并采用因果推理来增强模型的生物学可解释性。这项研究凸显了 RiceSNP-BST 在推进水稻基因组预测方面的潜力。此外,我们还为 RiceSNP-BST 开发了一个用户友好型网络服务器 (http://rice-snp-bst.aielab.cc),以支持更广泛的基因组研究。
{"title":"RiceSNP-BST: a deep learning framework for predicting biotic stress-associated SNPs in rice.","authors":"Jiajun Xu, Yujia Gao, Quan Lu, Renyi Zhang, Jianfeng Gui, Xiaoshuang Liu, Zhenyu Yue","doi":"10.1093/bib/bbae599","DOIUrl":"https://doi.org/10.1093/bib/bbae599","url":null,"abstract":"<p><p>Rice consistently faces significant threats from biotic stresses, such as fungi, bacteria, pests, and viruses. Consequently, accurately and rapidly identifying previously unknown single-nucleotide polymorphisms (SNPs) in the rice genome is a critical challenge for rice research and the development of resistant varieties. However, the limited availability of high-quality rice genotype data has hindered this research. Deep learning has transformed biological research by facilitating the prediction and analysis of SNPs in biological sequence data. Convolutional neural networks are especially effective in extracting structural and local features from DNA sequences, leading to significant advancements in genomics. Nevertheless, the expanding catalog of genome-wide association studies provides valuable biological insights for rice research. Expanding on this idea, we introduce RiceSNP-BST, an automatic architecture search framework designed to predict SNPs associated with rice biotic stress traits (BST-associated SNPs) by integrating multidimensional features. Notably, the model successfully innovates the datasets, offering more precision than state-of-the-art methods while demonstrating good performance on an independent test set and cross-species datasets. Additionally, we extracted features from the original DNA sequences and employed causal inference to enhance the biological interpretability of the model. This study highlights the potential of RiceSNP-BST in advancing genome prediction in rice. Furthermore, a user-friendly web server for RiceSNP-BST (http://rice-snp-bst.aielab.cc) has been developed to support broader genome research.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142675005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scMGATGRN: a multiview graph attention network-based method for inferring gene regulatory networks from single-cell transcriptomic data. scMGATGRN:一种基于多视图图注意网络的方法,用于从单细胞转录组数据中推断基因调控网络。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae526
Lin Yuan, Ling Zhao, Yufeng Jiang, Zhen Shen, Qinhu Zhang, Ming Zhang, Chun-Hou Zheng, De-Shuang Huang

The gene regulatory network (GRN) plays a vital role in understanding the structure and dynamics of cellular systems, revealing complex regulatory relationships, and exploring disease mechanisms. Recently, deep learning (DL)-based methods have been proposed to infer GRNs from single-cell transcriptomic data and achieved impressive performance. However, these methods do not fully utilize graph topological information and high-order neighbor information from multiple receptive fields. To overcome those limitations, we propose a novel model based on multiview graph attention network, namely, scMGATGRN, to infer GRNs. scMGATGRN mainly consists of GAT, multiview, and view-level attention mechanism. GAT can extract essential features of the gene regulatory network. The multiview model can simultaneously utilize local feature information and high-order neighbor feature information of nodes in the gene regulatory network. The view-level attention mechanism dynamically adjusts the relative importance of node embedding representations and efficiently aggregates node embedding representations from two views. To verify the effectiveness of scMGATGRN, we compared its performance with 10 methods (five shallow learning algorithms and five state-of-the-art DL-based methods) on seven benchmark single-cell RNA sequencing (scRNA-seq) datasets from five cell lines (two in human and three in mouse) with four different kinds of ground-truth networks. The experimental results not only show that scMGATGRN outperforms competing methods but also demonstrate the potential of this model in inferring GRNs. The code and data of scMGATGRN are made freely available on GitHub (https://github.com/nathanyl/scMGATGRN).

基因调控网络(GRN)在理解细胞系统的结构和动态、揭示复杂的调控关系以及探索疾病机理方面发挥着至关重要的作用。最近,有人提出了基于深度学习(DL)的方法来从单细胞转录组数据中推断基因调控网络,并取得了令人瞩目的成绩。然而,这些方法并没有充分利用图拓扑信息和来自多个感受野的高阶邻居信息。为了克服这些局限,我们提出了一种基于多视图图注意网络的新型模型,即 scMGATGRN,来推断 GRN。GAT 可以提取基因调控网络的基本特征。多视图模型可以同时利用基因调控网络中节点的局部特征信息和高阶相邻特征信息。视图级注意力机制可动态调整节点嵌入表征的相对重要性,并有效聚合来自两个视图的节点嵌入表征。为了验证 scMGATGRN 的有效性,我们在 7 个基准单细胞 RNA 测序(scRNA-seq)数据集上比较了 scMGATGRN 和 10 种方法(5 种浅层学习算法和 5 种基于 DL 的先进方法)的性能,这些数据集来自 5 个细胞系(2 个人类细胞系和 3 个小鼠细胞系)和 4 种不同的地面实况网络。实验结果不仅表明 scMGATGRN 优于其他竞争方法,还证明了该模型在推断 GRN 方面的潜力。scMGATGRN 的代码和数据可在 GitHub(https://github.com/nathanyl/scMGATGRN)上免费获取。
{"title":"scMGATGRN: a multiview graph attention network-based method for inferring gene regulatory networks from single-cell transcriptomic data.","authors":"Lin Yuan, Ling Zhao, Yufeng Jiang, Zhen Shen, Qinhu Zhang, Ming Zhang, Chun-Hou Zheng, De-Shuang Huang","doi":"10.1093/bib/bbae526","DOIUrl":"https://doi.org/10.1093/bib/bbae526","url":null,"abstract":"<p><p>The gene regulatory network (GRN) plays a vital role in understanding the structure and dynamics of cellular systems, revealing complex regulatory relationships, and exploring disease mechanisms. Recently, deep learning (DL)-based methods have been proposed to infer GRNs from single-cell transcriptomic data and achieved impressive performance. However, these methods do not fully utilize graph topological information and high-order neighbor information from multiple receptive fields. To overcome those limitations, we propose a novel model based on multiview graph attention network, namely, scMGATGRN, to infer GRNs. scMGATGRN mainly consists of GAT, multiview, and view-level attention mechanism. GAT can extract essential features of the gene regulatory network. The multiview model can simultaneously utilize local feature information and high-order neighbor feature information of nodes in the gene regulatory network. The view-level attention mechanism dynamically adjusts the relative importance of node embedding representations and efficiently aggregates node embedding representations from two views. To verify the effectiveness of scMGATGRN, we compared its performance with 10 methods (five shallow learning algorithms and five state-of-the-art DL-based methods) on seven benchmark single-cell RNA sequencing (scRNA-seq) datasets from five cell lines (two in human and three in mouse) with four different kinds of ground-truth networks. The experimental results not only show that scMGATGRN outperforms competing methods but also demonstrate the potential of this model in inferring GRNs. The code and data of scMGATGRN are made freely available on GitHub (https://github.com/nathanyl/scMGATGRN).</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11484520/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical analysis of multiple regions-of-interest in multiplexed spatial proteomics data. 对多重空间蛋白质组学数据中的多个兴趣区进行统计分析。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-23 DOI: 10.1093/bib/bbae522
Sarah Samorodnitsky, Michael C Wu

Multiplexed spatial proteomics reveals the spatial organization of cells in tumors, which is associated with important clinical outcomes such as survival and treatment response. This spatial organization is often summarized using spatial summary statistics, including Ripley's K and Besag's L. However, if multiple regions of the same tumor are imaged, it is unclear how to synthesize the relationship with a single patient-level endpoint. We evaluate extant approaches for accommodating multiple images within the context of associating summary statistics with outcomes. First, we consider averaging-based approaches wherein multiple summaries for a single sample are combined in a weighted mean. We then propose a novel class of ensemble testing approaches in which we simulate random weights used to aggregate summaries, test for an association with outcomes, and combine the $P$-values. We systematically evaluate the performance of these approaches via simulation and application to data from non-small cell lung cancer, colorectal cancer, and triple negative breast cancer. We find that the optimal strategy varies, but a simple weighted average of the summary statistics based on the number of cells in each image often offers the highest power and controls type I error effectively. When the size of the imaged regions varies, incorporating this variation into the weighted aggregation may yield additional power in cases where the varying size is informative. Ensemble testing (but not resampling) offered high power and type I error control across conditions in our simulated data sets.

多重空间蛋白质组学揭示了肿瘤细胞的空间组织,这与生存和治疗反应等重要临床结果相关。然而,如果对同一肿瘤的多个区域进行成像,如何将其与单一患者水平终点的关系综合起来还不清楚。我们对现有的方法进行了评估,以便在将汇总统计数据与结果相关联的情况下,将多幅图像纳入其中。首先,我们考虑了基于平均值的方法,即将单个样本的多个摘要合并为一个加权平均值。然后,我们提出了一类新颖的集合测试方法,即模拟用于汇总摘要的随机权重,测试与结果的关联,并合并 $P$ 值。我们通过模拟和应用非小细胞肺癌、结直肠癌和三阴性乳腺癌的数据,系统地评估了这些方法的性能。我们发现,最佳策略各不相同,但基于每幅图像中细胞数量的简单加权平均汇总统计通常能提供最高的功率,并有效控制 I 型误差。当成像区域的大小发生变化时,将这种变化纳入加权汇总可能会在不同大小具有信息量的情况下产生额外的功率。在我们的模拟数据集中,集合测试(但不是重采样)在各种条件下都能提供较高的功率和 I 型误差控制。
{"title":"Statistical analysis of multiple regions-of-interest in multiplexed spatial proteomics data.","authors":"Sarah Samorodnitsky, Michael C Wu","doi":"10.1093/bib/bbae522","DOIUrl":"10.1093/bib/bbae522","url":null,"abstract":"<p><p>Multiplexed spatial proteomics reveals the spatial organization of cells in tumors, which is associated with important clinical outcomes such as survival and treatment response. This spatial organization is often summarized using spatial summary statistics, including Ripley's K and Besag's L. However, if multiple regions of the same tumor are imaged, it is unclear how to synthesize the relationship with a single patient-level endpoint. We evaluate extant approaches for accommodating multiple images within the context of associating summary statistics with outcomes. First, we consider averaging-based approaches wherein multiple summaries for a single sample are combined in a weighted mean. We then propose a novel class of ensemble testing approaches in which we simulate random weights used to aggregate summaries, test for an association with outcomes, and combine the $P$-values. We systematically evaluate the performance of these approaches via simulation and application to data from non-small cell lung cancer, colorectal cancer, and triple negative breast cancer. We find that the optimal strategy varies, but a simple weighted average of the summary statistics based on the number of cells in each image often offers the highest power and controls type I error effectively. When the size of the imaged regions varies, incorporating this variation into the weighted aggregation may yield additional power in cases where the varying size is informative. Ensemble testing (but not resampling) offered high power and type I error control across conditions in our simulated data sets.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11491162/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Briefings in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1