Frontiers in Bioinformatics最新文献_第2页

bootGSEA: a bootstrap and rank aggregation pipeline for multi-study and multi-omics enrichment analyses bootGSEA：用于多研究和多组学富集分析的自举和等级聚合管道

Frontiers in Bioinformatics

Pub Date : 2024-04-03 DOI: 10.3389/fbinf.2024.1380928

Shamini Hemandhar Kumar, Ines Tapken, Daniela Kuhn, Peter Claus, Klaus Jung

Introduction: Gene set enrichment analysis (GSEA) subsequent to differential expression analysis is a standard step in transcriptomics and proteomics data analysis. Although many tools for this step are available, the results are often difficult to reproduce because set annotations can change in the databases, that is, new features can be added or existing features can be removed. Finally, such changes in set compositions can have an impact on biological interpretation. Methods: We present bootGSEA, a novel computational pipeline, to study the robustness of GSEA. By repeating GSEA based on bootstrap samples, the variability and robustness of results can be studied. In our pipeline, not all genes or proteins are involved in the different bootstrap replicates of the analyses. Finally, we aggregate the ranks from the bootstrap replicates to obtain a score per gene set that shows whether it gains or loses evidence compared to the ranking of the standard GSEA. Rank aggregation is also used to combine GSEA results from different omics levels or from multiple independent studies at the same omics level. Results: By applying our approach to six independent cancer transcriptomics datasets, we showed that bootstrap GSEA can aid in the selection of more robust enriched gene sets. Additionally, we applied our approach to paired transcriptomics and proteomics data obtained from a mouse model of spinal muscular atrophy (SMA), a neurodegenerative and neurodevelopmental disease associated with multi-system involvement. After obtaining a robust ranking at both omics levels, both ranking lists were combined to aggregate the findings from the transcriptomics and proteomics results. Furthermore, we constructed the new R-package “bootGSEA,” which implements the proposed methods and provides graphical views of the findings. Bootstrap-based GSEA was able in the example datasets to identify gene or protein sets that were less robust when the set composition changed during bootstrap analysis. Discussion: The rank aggregation step was useful for combining bootstrap results and making them comparable to the original findings on the single-omics level or for combining findings from multiple different omics levels.

简介差异表达分析之后的基因组富集分析（GSEA）是转录组学和蛋白质组学数据分析的一个标准步骤。虽然有许多工具可用于这一步骤，但结果往往难以重现，因为数据库中的集合注释会发生变化，即可能添加新的特征或删除现有特征。最后，集合组成的这种变化会对生物学解释产生影响。方法我们提出了 bootGSEA 这一新型计算管道来研究 GSEA 的稳健性。通过重复基于引导样本的 GSEA，可以研究结果的可变性和稳健性。在我们的管道中，并非所有基因或蛋白质都参与了不同的引导复制分析。最后，我们汇总自举重复的等级，得到每个基因组的得分，显示与标准 GSEA 的等级相比，该基因组是获得了证据还是失去了证据。等级聚合还可用于合并来自不同 omics 层面或同一 omics 层面多个独立研究的 GSEA 结果。结果通过将我们的方法应用于六个独立的癌症转录组学数据集，我们发现引导式 GSEA 可以帮助选择更稳健的富集基因集。此外，我们还将我们的方法应用于脊髓性肌萎缩症（SMA）小鼠模型中获得的成对转录组学和蛋白质组学数据，脊髓性肌萎缩症是一种神经退行性和神经发育疾病，涉及多个系统。在获得两个omics水平的稳健排名后，我们将两个排名列表合并，以汇总转录组学和蛋白质组学的研究结果。此外，我们还构建了新的 R 软件包 "bootGSEA"，它实现了所提出的方法，并提供了研究结果的图形视图。在示例数据集中，当数据集组成在引导分析过程中发生变化时，基于引导的 GSEA 能够识别出稳健性较差的基因或蛋白质集。讨论等级聚合步骤有助于合并 Bootstrap 结果，并使其与单个组学层面的原始结果或多个不同组学层面的结果具有可比性。

{"title":"bootGSEA: a bootstrap and rank aggregation pipeline for multi-study and multi-omics enrichment analyses","authors":"Shamini Hemandhar Kumar, Ines Tapken, Daniela Kuhn, Peter Claus, Klaus Jung","doi":"10.3389/fbinf.2024.1380928","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1380928","url":null,"abstract":"Introduction: Gene set enrichment analysis (GSEA) subsequent to differential expression analysis is a standard step in transcriptomics and proteomics data analysis. Although many tools for this step are available, the results are often difficult to reproduce because set annotations can change in the databases, that is, new features can be added or existing features can be removed. Finally, such changes in set compositions can have an impact on biological interpretation. Methods: We present bootGSEA, a novel computational pipeline, to study the robustness of GSEA. By repeating GSEA based on bootstrap samples, the variability and robustness of results can be studied. In our pipeline, not all genes or proteins are involved in the different bootstrap replicates of the analyses. Finally, we aggregate the ranks from the bootstrap replicates to obtain a score per gene set that shows whether it gains or loses evidence compared to the ranking of the standard GSEA. Rank aggregation is also used to combine GSEA results from different omics levels or from multiple independent studies at the same omics level. Results: By applying our approach to six independent cancer transcriptomics datasets, we showed that bootstrap GSEA can aid in the selection of more robust enriched gene sets. Additionally, we applied our approach to paired transcriptomics and proteomics data obtained from a mouse model of spinal muscular atrophy (SMA), a neurodegenerative and neurodevelopmental disease associated with multi-system involvement. After obtaining a robust ranking at both omics levels, both ranking lists were combined to aggregate the findings from the transcriptomics and proteomics results. Furthermore, we constructed the new R-package “bootGSEA,” which implements the proposed methods and provides graphical views of the findings. Bootstrap-based GSEA was able in the example datasets to identify gene or protein sets that were less robust when the set composition changed during bootstrap analysis. Discussion: The rank aggregation step was useful for combining bootstrap results and making them comparable to the original findings on the single-omics level or for combining findings from multiple different omics levels.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140747342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A systematic evaluation of state-of-the-art deconvolution methods in spatial transcriptomics: insights from cardiovascular disease and chronic kidney disease 系统评估空间转录组学中最先进的去卷积方法：心血管疾病和慢性肾病的启示

Frontiers in Bioinformatics

Pub Date : 2024-03-27 DOI: 10.3389/fbinf.2024.1352594

Alban Obel Slabowska, Charles Pyke, Henning Hvid, Leon Eyrich Jessen, Simon Baumgart, Vivek Das

A major challenge in sequencing-based spatial transcriptomics (ST) is resolution limitations. Tissue sections are divided into hundreds of thousands of spots, where each spot invariably contains a mixture of cell types. Methods have been developed to deconvolute the mixed transcriptional signal into its constituents. Although ST is becoming essential for drug discovery, especially in cardiometabolic diseases, to date, no deconvolution benchmark has been performed on these types of tissues and diseases. However, the three methods, Cell2location, RCTD, and spatialDWLS, have previously been shown to perform well in brain tissue and simulated data. Here, we compare these methods to assess the best performance when using human data from cardiovascular disease (CVD) and chronic kidney disease (CKD) from patients in different pathological states, evaluated using expert annotation. In this study, we found that all three methods performed comparably well in deconvoluting verifiable cell types, including smooth muscle cells and macrophages in vascular samples and podocytes in kidney samples. RCTD shows the best performance accuracy scores in CVD samples, while Cell2location, on average, achieved the highest performance across all test experiments. Although all three methods had similar accuracies, Cell2location needed less reference data to converge at the expense of higher computational intensity. Finally, we also report that RCTD has the fastest computational time and the simplest workflow, requiring fewer computational dependencies. In conclusion, we find that each method has particular advantages, and the optimal choice depends on the use case.

基于测序的空间转录组学（ST）面临的一个主要挑战是分辨率的限制。组织切片被分成成百上千个点，每个点总是包含多种细胞类型。目前已开发出将混合转录信号分解为其组成成分的方法。虽然 ST 越来越成为药物发现的关键，尤其是在心脏代谢疾病方面，但迄今为止，还没有针对这类组织和疾病的解旋基准。不过，Cell2location、RCTD 和 spatialDWLS 这三种方法在脑组织和模拟数据中的表现都很好。在此，我们比较了这些方法，以评估它们在使用不同病理状态下的心血管疾病（CVD）和慢性肾脏疾病（CKD）患者的人体数据时的最佳性能，并使用专家注释进行评估。在这项研究中，我们发现这三种方法在去卷积可验证的细胞类型（包括血管样本中的平滑肌细胞和巨噬细胞以及肾脏样本中的荚膜细胞）方面表现相当出色。RCTD 在心血管疾病样本中的准确度得分最高，而 Cell2location 在所有测试实验中的平均准确度得分最高。虽然这三种方法的准确度相近，但 Cell2location 需要更少的参考数据来收敛，代价是更高的计算强度。最后，我们还报告说，RCTD 的计算时间最快，工作流程最简单，所需的计算依赖性较少。总之，我们发现每种方法都有特定的优势，最佳选择取决于使用情况。

{"title":"A systematic evaluation of state-of-the-art deconvolution methods in spatial transcriptomics: insights from cardiovascular disease and chronic kidney disease","authors":"Alban Obel Slabowska, Charles Pyke, Henning Hvid, Leon Eyrich Jessen, Simon Baumgart, Vivek Das","doi":"10.3389/fbinf.2024.1352594","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1352594","url":null,"abstract":"A major challenge in sequencing-based spatial transcriptomics (ST) is resolution limitations. Tissue sections are divided into hundreds of thousands of spots, where each spot invariably contains a mixture of cell types. Methods have been developed to deconvolute the mixed transcriptional signal into its constituents. Although ST is becoming essential for drug discovery, especially in cardiometabolic diseases, to date, no deconvolution benchmark has been performed on these types of tissues and diseases. However, the three methods, Cell2location, RCTD, and spatialDWLS, have previously been shown to perform well in brain tissue and simulated data. Here, we compare these methods to assess the best performance when using human data from cardiovascular disease (CVD) and chronic kidney disease (CKD) from patients in different pathological states, evaluated using expert annotation. In this study, we found that all three methods performed comparably well in deconvoluting verifiable cell types, including smooth muscle cells and macrophages in vascular samples and podocytes in kidney samples. RCTD shows the best performance accuracy scores in CVD samples, while Cell2location, on average, achieved the highest performance across all test experiments. Although all three methods had similar accuracies, Cell2location needed less reference data to converge at the expense of higher computational intensity. Finally, we also report that RCTD has the fastest computational time and the simplest workflow, requiring fewer computational dependencies. In conclusion, we find that each method has particular advantages, and the optimal choice depends on the use case.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140376150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sequencing technologies and hardware-accelerated parallel computing transform computational genomics research 测序技术和硬件加速并行计算改变了计算基因组学研究

Frontiers in Bioinformatics

Pub Date : 2024-03-19 DOI: 10.3389/fbinf.2024.1384497

Michael Olbrich, Lennart Bartels, Inken Wohlers

引用次数: 0

An interactive visualization tool for educational outreach in protein contact map overlap analysis 用于蛋白质接触图重叠分析教育推广的交互式可视化工具

Frontiers in Bioinformatics

Pub Date : 2024-03-15 DOI: 10.3389/fbinf.2024.1358550

Kevan Baker, Nathaniel Hughes, Sutanu Bhattacharya

Recent advancements in contact map-based protein three-dimensional (3D) structure prediction have been driven by the evolution of deep learning algorithms. However, the gap in accessible software tools for novices in this domain remains a significant challenge. This study introduces GoFold, a novel, standalone graphical user interface (GUI) designed for beginners to perform contact map overlap (CMO) problems for better template selection. Unlike existing tools that cater more to research needs or assume foundational knowledge, GoFold offers an intuitive, user-friendly platform with comprehensive tutorials. It stands out in its ability to visually represent the CMO problem, allowing users to input proteins in various formats and explore the CMO problem. The educational value of GoFold is demonstrated through benchmarking against the state-of-the-art contact map overlap method, map_align, using two datasets: PSICOV and CAMEO. GoFold exhibits superior performance in terms of TM-score and Z-score metrics across diverse qualities of contact maps and target difficulties. Notably, GoFold runs efficiently on personal computers without any third-party dependencies, thereby making it accessible to the general public for promoting citizen science. The tool is freely available for download at for macOS, Linux, and Windows.

深度学习算法的发展推动了基于接触图的蛋白质三维（3D）结构预测的最新进展。然而，在这一领域，面向新手的可访问软件工具的缺口仍然是一个重大挑战。本研究介绍了 GoFold，这是一种新颖的独立图形用户界面（GUI），专为初学者设计，用于处理接触图重叠（CMO）问题，以更好地选择模板。现有的工具更多地是为了满足研究需要或假设基础知识，与之不同的是，GoFold 提供了一个直观、用户友好的平台，并配有全面的教程。它的突出之处在于能够直观地表示 CMO 问题，允许用户输入各种格式的蛋白质并探索 CMO 问题。GoFold 的教育价值通过使用两个数据集与最先进的接触图重叠方法 map_align 进行基准测试得到了证明：PSICOV和CAMEO。在不同质量的接触地图和目标难度下，GoFold 在 TM 分数和 Z 分数指标方面表现出卓越的性能。值得注意的是，GoFold 可在个人电脑上高效运行，不需要任何第三方依赖，因此公众也可以使用它来促进公民科学。该工具可在 MacOS、Linux 和 Windows 上免费下载。

{"title":"An interactive visualization tool for educational outreach in protein contact map overlap analysis","authors":"Kevan Baker, Nathaniel Hughes, Sutanu Bhattacharya","doi":"10.3389/fbinf.2024.1358550","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1358550","url":null,"abstract":"Recent advancements in contact map-based protein three-dimensional (3D) structure prediction have been driven by the evolution of deep learning algorithms. However, the gap in accessible software tools for novices in this domain remains a significant challenge. This study introduces GoFold, a novel, standalone graphical user interface (GUI) designed for beginners to perform contact map overlap (CMO) problems for better template selection. Unlike existing tools that cater more to research needs or assume foundational knowledge, GoFold offers an intuitive, user-friendly platform with comprehensive tutorials. It stands out in its ability to visually represent the CMO problem, allowing users to input proteins in various formats and explore the CMO problem. The educational value of GoFold is demonstrated through benchmarking against the state-of-the-art contact map overlap method, map_align, using two datasets: PSICOV and CAMEO. GoFold exhibits superior performance in terms of TM-score and Z-score metrics across diverse qualities of contact maps and target difficulties. Notably, GoFold runs efficiently on personal computers without any third-party dependencies, thereby making it accessible to the general public for promoting citizen science. The tool is freely available for download at for macOS, Linux, and Windows.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140238864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Ten common issues with reference sequence databases and how to mitigate them 参考序列数据库的十个常见问题及解决方法

Frontiers in Bioinformatics

Pub Date : 2024-03-15 DOI: 10.3389/fbinf.2024.1278228

Samuel D. Chorlton

Metagenomic sequencing has revolutionized our understanding of microbiology. While metagenomic tools and approaches have been extensively evaluated and benchmarked, far less attention has been given to the reference sequence database used in metagenomic classification. Issues with reference sequence databases are pervasive. Database contamination is the most recognized issue in the literature; however, it remains relatively unmitigated in most analyses. Other common issues with reference sequence databases include taxonomic errors, inappropriate inclusion and exclusion criteria, and sequence content errors. This review covers ten common issues with reference sequence databases and the potential downstream consequences of these issues. Mitigation measures are discussed for each issue, including bioinformatic tools and database curation strategies. Together, these strategies present a path towards more accurate, reproducible and translatable metagenomic sequencing.

元基因组测序彻底改变了我们对微生物学的认识。虽然对元基因组工具和方法进行了广泛的评估和基准测试，但对用于元基因组分类的参考序列数据库的关注却少得多。参考序列数据库的问题普遍存在。数据库污染是文献中公认的最大问题，但在大多数分析中仍未得到解决。参考序列数据库的其他常见问题包括分类错误、纳入和排除标准不当以及序列内容错误。本综述涉及参考序列数据库的十个常见问题及其潜在的下游后果。针对每个问题讨论了缓解措施，包括生物信息学工具和数据库整理策略。这些策略共同构成了一条通往更准确、可重复和可转化的元基因组测序之路。

引用次数: 0

MARTin—an open-source platform for microarray analysis MARTin--用于微阵列分析的开源平台

Frontiers in Bioinformatics

Pub Date : 2024-02-09 DOI: 10.3389/fbinf.2024.1329062

Kai O. Kreissner, Benjamin Faller, Ivan Talucci, Hans M. Maric

Background: Microarray technology has brought significant advancements to high-throughput analysis, particularly in the comprehensive study of biomolecular interactions involving proteins, peptides, and antibodies, as well as in the fields of gene expression and genotyping. With the ever-increasing volume and intricacy of microarray data, an accurate, reliable and reproducible analysis is essential. Furthermore, there is a high level of variation in the format of microarrays. This not only holds true between different sample types but is also due to differences in the hardware used during the production of the arrays, as well as the personal preferences of the individual users. Therefore, there is a need for transparent, broadly applicable and user-friendly image quantification techniques to extract meaningful information from these complex datasets, while also addressing the challenges posed by specific microarray and imager formats, which can flaw analysis and interpretation.Results: Here we introduce MicroArray Rastering Tool (MARTin), as a versatile tool developed primarily for the analysis of protein and peptide microarrays. Our software provides state-of-the-art methodologies, offering researchers a comprehensive tool for microarray image quantification. MARTin is independent of the microarray platform used and supports various configurations including high-density formats and printed arrays with significant x and y offsets. This is made possible by granting the user the ability to freely customize parts of the application to their specific microarray format. Thanks to built-in features like adaptive filtering and autofit, measurements can be done very efficiently and are highly reproducible. Furthermore, our tool integrates metadata management and integrity check features, providing a straightforward quality control method, along with a ready-to-use interface for in-depth data analysis. This not only promotes good scientific practice in the field of microarray analysis but also enhances the ability to explore and examine the generated data.Conclusion: MARTin has been developed to empower its users with a reliable, efficient, and intuitive tool for peptidomic and proteomic array analysis, thereby facilitating data-driven discovery across disciplines. Our software is an open-source project freely available via the GNU Affero General Public License licence on GitHub.

背景：微阵列技术为高通量分析带来了重大进步，尤其是在涉及蛋白质、肽和抗体的生物分子相互作用的综合研究以及基因表达和基因分型领域。随着微阵列数据的数量和复杂性不断增加，准确、可靠和可重复的分析至关重要。此外，微阵列的格式差异很大。这不仅是不同样本类型之间的差异，也是由于芯片生产过程中使用的硬件不同以及用户的个人偏好造成的。因此，我们需要透明、广泛适用且用户友好的图像量化技术，以便从这些复杂的数据集中提取有意义的信息，同时解决特定微阵列和成像仪格式带来的挑战，这些挑战可能会影响分析和解释：我们在此介绍微阵列栅格化工具（MARTin），它是一种多功能工具，主要用于分析蛋白质和肽微阵列。我们的软件提供了最先进的方法，为研究人员提供了微阵列图像量化的综合工具。MARTin 与所使用的微阵列平台无关，支持各种配置，包括高密度格式和具有显著 x 和 y 偏移的印刷阵列。用户可以根据自己特定的微阵列格式自由定制应用程序的各个部分，从而使上述功能成为可能。得益于自适应滤波和自动拟合等内置功能，测量工作可以非常高效地完成，并具有很高的可重复性。此外，我们的工具还集成了元数据管理和完整性检查功能，提供了一种直接的质量控制方法，以及一个用于深入数据分析的即用型界面。这不仅促进了微阵列分析领域的良好科学实践，还增强了探索和检查生成数据的能力：开发 MARTin 的目的是为用户提供可靠、高效、直观的肽组和蛋白质组阵列分析工具，从而促进跨学科的数据驱动发现。我们的软件是一个开源项目，可通过 GitHub 上的 GNU Affero 通用公共许可证免费获取。

{"title":"MARTin—an open-source platform for microarray analysis","authors":"Kai O. Kreissner, Benjamin Faller, Ivan Talucci, Hans M. Maric","doi":"10.3389/fbinf.2024.1329062","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1329062","url":null,"abstract":"Background: Microarray technology has brought significant advancements to high-throughput analysis, particularly in the comprehensive study of biomolecular interactions involving proteins, peptides, and antibodies, as well as in the fields of gene expression and genotyping. With the ever-increasing volume and intricacy of microarray data, an accurate, reliable and reproducible analysis is essential. Furthermore, there is a high level of variation in the format of microarrays. This not only holds true between different sample types but is also due to differences in the hardware used during the production of the arrays, as well as the personal preferences of the individual users. Therefore, there is a need for transparent, broadly applicable and user-friendly image quantification techniques to extract meaningful information from these complex datasets, while also addressing the challenges posed by specific microarray and imager formats, which can flaw analysis and interpretation.Results: Here we introduce MicroArray Rastering Tool (MARTin), as a versatile tool developed primarily for the analysis of protein and peptide microarrays. Our software provides state-of-the-art methodologies, offering researchers a comprehensive tool for microarray image quantification. MARTin is independent of the microarray platform used and supports various configurations including high-density formats and printed arrays with significant x and y offsets. This is made possible by granting the user the ability to freely customize parts of the application to their specific microarray format. Thanks to built-in features like adaptive filtering and autofit, measurements can be done very efficiently and are highly reproducible. Furthermore, our tool integrates metadata management and integrity check features, providing a straightforward quality control method, along with a ready-to-use interface for in-depth data analysis. This not only promotes good scientific practice in the field of microarray analysis but also enhances the ability to explore and examine the generated data.Conclusion: MARTin has been developed to empower its users with a reliable, efficient, and intuitive tool for peptidomic and proteomic array analysis, thereby facilitating data-driven discovery across disciplines. Our software is an open-source project freely available via the GNU Affero General Public License licence on GitHub.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139790675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A perspective on FAIR quality control in multiplexed imaging data processing 多路复用成像数据处理 FAIR 质量控制透视

Frontiers in Bioinformatics

Pub Date : 2024-02-09 DOI: 10.3389/fbinf.2024.1336257

Wouter‐Michiel A.M. Vierdag, Sinem K. Saka

Multiplexed imaging approaches are getting increasingly adopted for imaging of large tissue areas, yielding big imaging datasets both in terms of the number of samples and the size of image data per sample. The processing and analysis of these datasets is complex owing to frequent technical artifacts and heterogeneous profiles from a high number of stained targets To streamline the analysis of multiplexed images, automated pipelines making use of state-of-the-art algorithms have been developed. In these pipelines, the output quality of one processing step is typically dependent on the output of the previous step and errors from each step, even when they appear minor, can propagate and confound the results. Thus, rigorous quality control (QC) at each of these different steps of the image processing pipeline is of paramount importance both for the proper analysis and interpretation of the analysis results and for ensuring the reusability of the data. Ideally, QC should become an integral and easily retrievable part of the imaging datasets and the analysis process. Yet, limitations of the currently available frameworks make integration of interactive QC difficult for large multiplexed imaging data. Given the increasing size and complexity of multiplexed imaging datasets, we present the different challenges for integrating QC in image analysis pipelines as well as suggest possible solutions that build on top of recent advances in bioimage analysis.

多重成像方法越来越多地被用于大面积组织的成像，从而产生了大量的成像数据集，包括样本数量和每个样本的图像数据大小。为了简化对多重成像的分析，人们开发了利用最先进算法的自动流水线。在这些流水线中，一个处理步骤的输出质量通常取决于前一个步骤的输出，每个步骤的误差，即使看起来很小，也会传播并混淆结果。因此，在图像处理管道的每个不同步骤中进行严格的质量控制（QC），对于正确分析和解释分析结果以及确保数据的可重用性都至关重要。在理想情况下，质量控制应成为成像数据集和分析流程中不可分割且易于检索的一部分。然而，由于目前可用框架的局限性，对于大型多重成像数据来说，整合交互式质量控制非常困难。鉴于多路复用成像数据集的规模和复杂性不断增加，我们提出了在图像分析管道中集成质量控制所面临的不同挑战，并在生物图像分析最新进展的基础上提出了可能的解决方案。

{"title":"A perspective on FAIR quality control in multiplexed imaging data processing","authors":"Wouter‐Michiel A.M. Vierdag, Sinem K. Saka","doi":"10.3389/fbinf.2024.1336257","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1336257","url":null,"abstract":"Multiplexed imaging approaches are getting increasingly adopted for imaging of large tissue areas, yielding big imaging datasets both in terms of the number of samples and the size of image data per sample. The processing and analysis of these datasets is complex owing to frequent technical artifacts and heterogeneous profiles from a high number of stained targets To streamline the analysis of multiplexed images, automated pipelines making use of state-of-the-art algorithms have been developed. In these pipelines, the output quality of one processing step is typically dependent on the output of the previous step and errors from each step, even when they appear minor, can propagate and confound the results. Thus, rigorous quality control (QC) at each of these different steps of the image processing pipeline is of paramount importance both for the proper analysis and interpretation of the analysis results and for ensuring the reusability of the data. Ideally, QC should become an integral and easily retrievable part of the imaging datasets and the analysis process. Yet, limitations of the currently available frameworks make integration of interactive QC difficult for large multiplexed imaging data. Given the increasing size and complexity of multiplexed imaging datasets, we present the different challenges for integrating QC in image analysis pipelines as well as suggest possible solutions that build on top of recent advances in bioimage analysis.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139850479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MARTin—an open-source platform for microarray analysis MARTin--用于微阵列分析的开源平台

Frontiers in Bioinformatics

Pub Date : 2024-02-09 DOI: 10.3389/fbinf.2024.1329062

Kai O. Kreissner, Benjamin Faller, Ivan Talucci, Hans M. Maric

Background: Microarray technology has brought significant advancements to high-throughput analysis, particularly in the comprehensive study of biomolecular interactions involving proteins, peptides, and antibodies, as well as in the fields of gene expression and genotyping. With the ever-increasing volume and intricacy of microarray data, an accurate, reliable and reproducible analysis is essential. Furthermore, there is a high level of variation in the format of microarrays. This not only holds true between different sample types but is also due to differences in the hardware used during the production of the arrays, as well as the personal preferences of the individual users. Therefore, there is a need for transparent, broadly applicable and user-friendly image quantification techniques to extract meaningful information from these complex datasets, while also addressing the challenges posed by specific microarray and imager formats, which can flaw analysis and interpretation.Results: Here we introduce MicroArray Rastering Tool (MARTin), as a versatile tool developed primarily for the analysis of protein and peptide microarrays. Our software provides state-of-the-art methodologies, offering researchers a comprehensive tool for microarray image quantification. MARTin is independent of the microarray platform used and supports various configurations including high-density formats and printed arrays with significant x and y offsets. This is made possible by granting the user the ability to freely customize parts of the application to their specific microarray format. Thanks to built-in features like adaptive filtering and autofit, measurements can be done very efficiently and are highly reproducible. Furthermore, our tool integrates metadata management and integrity check features, providing a straightforward quality control method, along with a ready-to-use interface for in-depth data analysis. This not only promotes good scientific practice in the field of microarray analysis but also enhances the ability to explore and examine the generated data.Conclusion: MARTin has been developed to empower its users with a reliable, efficient, and intuitive tool for peptidomic and proteomic array analysis, thereby facilitating data-driven discovery across disciplines. Our software is an open-source project freely available via the GNU Affero General Public License licence on GitHub.

背景：微阵列技术为高通量分析带来了重大进步，尤其是在涉及蛋白质、肽和抗体的生物分子相互作用的综合研究以及基因表达和基因分型领域。随着微阵列数据的数量和复杂性不断增加，准确、可靠和可重复的分析至关重要。此外，微阵列的格式差异很大。这不仅是不同样本类型之间的差异，也是由于芯片生产过程中使用的硬件不同以及用户的个人偏好造成的。因此，我们需要透明、广泛适用且用户友好的图像量化技术，以便从这些复杂的数据集中提取有意义的信息，同时解决特定微阵列和成像仪格式带来的挑战，这些挑战可能会影响分析和解释：我们在此介绍微阵列栅格化工具（MARTin），它是一种多功能工具，主要用于分析蛋白质和肽微阵列。我们的软件提供了最先进的方法，为研究人员提供了微阵列图像量化的综合工具。MARTin 与所使用的微阵列平台无关，支持各种配置，包括高密度格式和具有显著 x 和 y 偏移的印刷阵列。用户可以根据自己特定的微阵列格式自由定制应用程序的各个部分，从而使上述功能成为可能。得益于自适应滤波和自动拟合等内置功能，测量工作可以非常高效地完成，并具有很高的可重复性。此外，我们的工具还集成了元数据管理和完整性检查功能，提供了一种直接的质量控制方法，以及一个用于深入数据分析的即用型界面。这不仅促进了微阵列分析领域的良好科学实践，还增强了探索和检查生成数据的能力：开发 MARTin 的目的是为用户提供可靠、高效、直观的肽组和蛋白质组阵列分析工具，从而促进跨学科的数据驱动发现。我们的软件是一个开源项目，可通过 GitHub 上的 GNU Affero 通用公共许可证免费获取。

{"title":"MARTin—an open-source platform for microarray analysis","authors":"Kai O. Kreissner, Benjamin Faller, Ivan Talucci, Hans M. Maric","doi":"10.3389/fbinf.2024.1329062","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1329062","url":null,"abstract":"Background: Microarray technology has brought significant advancements to high-throughput analysis, particularly in the comprehensive study of biomolecular interactions involving proteins, peptides, and antibodies, as well as in the fields of gene expression and genotyping. With the ever-increasing volume and intricacy of microarray data, an accurate, reliable and reproducible analysis is essential. Furthermore, there is a high level of variation in the format of microarrays. This not only holds true between different sample types but is also due to differences in the hardware used during the production of the arrays, as well as the personal preferences of the individual users. Therefore, there is a need for transparent, broadly applicable and user-friendly image quantification techniques to extract meaningful information from these complex datasets, while also addressing the challenges posed by specific microarray and imager formats, which can flaw analysis and interpretation.Results: Here we introduce MicroArray Rastering Tool (MARTin), as a versatile tool developed primarily for the analysis of protein and peptide microarrays. Our software provides state-of-the-art methodologies, offering researchers a comprehensive tool for microarray image quantification. MARTin is independent of the microarray platform used and supports various configurations including high-density formats and printed arrays with significant x and y offsets. This is made possible by granting the user the ability to freely customize parts of the application to their specific microarray format. Thanks to built-in features like adaptive filtering and autofit, measurements can be done very efficiently and are highly reproducible. Furthermore, our tool integrates metadata management and integrity check features, providing a straightforward quality control method, along with a ready-to-use interface for in-depth data analysis. This not only promotes good scientific practice in the field of microarray analysis but also enhances the ability to explore and examine the generated data.Conclusion: MARTin has been developed to empower its users with a reliable, efficient, and intuitive tool for peptidomic and proteomic array analysis, thereby facilitating data-driven discovery across disciplines. Our software is an open-source project freely available via the GNU Affero General Public License licence on GitHub.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139850376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A perspective on FAIR quality control in multiplexed imaging data processing 多路复用成像数据处理 FAIR 质量控制透视

Frontiers in Bioinformatics

Pub Date : 2024-02-09 DOI: 10.3389/fbinf.2024.1336257

Wouter‐Michiel A.M. Vierdag, Sinem K. Saka

Multiplexed imaging approaches are getting increasingly adopted for imaging of large tissue areas, yielding big imaging datasets both in terms of the number of samples and the size of image data per sample. The processing and analysis of these datasets is complex owing to frequent technical artifacts and heterogeneous profiles from a high number of stained targets To streamline the analysis of multiplexed images, automated pipelines making use of state-of-the-art algorithms have been developed. In these pipelines, the output quality of one processing step is typically dependent on the output of the previous step and errors from each step, even when they appear minor, can propagate and confound the results. Thus, rigorous quality control (QC) at each of these different steps of the image processing pipeline is of paramount importance both for the proper analysis and interpretation of the analysis results and for ensuring the reusability of the data. Ideally, QC should become an integral and easily retrievable part of the imaging datasets and the analysis process. Yet, limitations of the currently available frameworks make integration of interactive QC difficult for large multiplexed imaging data. Given the increasing size and complexity of multiplexed imaging datasets, we present the different challenges for integrating QC in image analysis pipelines as well as suggest possible solutions that build on top of recent advances in bioimage analysis.

多重成像方法越来越多地被用于大面积组织的成像，从而产生了大量的成像数据集，包括样本数量和每个样本的图像数据大小。为了简化对多重成像的分析，人们开发了利用最先进算法的自动流水线。在这些流水线中，一个处理步骤的输出质量通常取决于前一个步骤的输出，每个步骤的误差，即使看起来很小，也会传播并混淆结果。因此，在图像处理管道的每个不同步骤中进行严格的质量控制（QC），对于正确分析和解释分析结果以及确保数据的可重用性都至关重要。在理想情况下，质量控制应成为成像数据集和分析流程中不可分割且易于检索的一部分。然而，由于目前可用框架的局限性，对于大型多重成像数据来说，整合交互式质量控制非常困难。鉴于多路复用成像数据集的规模和复杂性不断增加，我们提出了在图像分析管道中集成质量控制所面临的不同挑战，并在生物图像分析最新进展的基础上提出了可能的解决方案。

{"title":"A perspective on FAIR quality control in multiplexed imaging data processing","authors":"Wouter‐Michiel A.M. Vierdag, Sinem K. Saka","doi":"10.3389/fbinf.2024.1336257","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1336257","url":null,"abstract":"Multiplexed imaging approaches are getting increasingly adopted for imaging of large tissue areas, yielding big imaging datasets both in terms of the number of samples and the size of image data per sample. The processing and analysis of these datasets is complex owing to frequent technical artifacts and heterogeneous profiles from a high number of stained targets To streamline the analysis of multiplexed images, automated pipelines making use of state-of-the-art algorithms have been developed. In these pipelines, the output quality of one processing step is typically dependent on the output of the previous step and errors from each step, even when they appear minor, can propagate and confound the results. Thus, rigorous quality control (QC) at each of these different steps of the image processing pipeline is of paramount importance both for the proper analysis and interpretation of the analysis results and for ensuring the reusability of the data. Ideally, QC should become an integral and easily retrievable part of the imaging datasets and the analysis process. Yet, limitations of the currently available frameworks make integration of interactive QC difficult for large multiplexed imaging data. Given the increasing size and complexity of multiplexed imaging datasets, we present the different challenges for integrating QC in image analysis pipelines as well as suggest possible solutions that build on top of recent advances in bioimage analysis.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139790460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MetaWin 3: open-source software for meta-analysis MetaWin 3：用于荟萃分析的开源软件

Frontiers in Bioinformatics

Pub Date : 2024-02-08 DOI: 10.3389/fbinf.2024.1305969

Michael S. Rosenberg

The rise of research synthesis and systematic reviews over the last 25 years has been aided by a series of software packages providing simple and accessible GUI interfaces which are intuitively easy to use by novice analysts and users. Development of many of these packages has been abandoned over time due to a variety of factors, leaving a gap in the software infrastructure available for meta-analysis. To fulfill the continued demand for a GUI-based meta-analytic system, we have now released MetaWin 3 as free, open-source, multi-platform software. MetaWin3 is written in Python and developed from scratch relative to earlier versions. The codebase is available on Github, with pre-compiled executables for both Windows and macOS available from the MetaWin website. MetaWin includes standardized effect size calculations, exploratory and publication bias analyses, and allows for both simple and complex explanatory models of variation within a meta-analytic framework, including meta-regression, using traditional least-squares/moments estimation.

在过去的 25 年中，研究综述和系统综述的兴起得益于一系列软件包，这些软件包提供了简单易用的图形用户界面，新手分析师和用户可以直观地轻松使用。随着时间的推移，其中许多软件包由于各种因素而被放弃开发，从而导致荟萃分析软件基础设施的空白。为了满足对基于图形用户界面的元分析系统的持续需求，我们现在发布了免费、开源、多平台软件 MetaWin 3。与早期版本相比，MetaWin3 是用 Python 编写的，从零开始开发。代码库可从 Github 上获取，Windows 和 macOS 的预编译可执行程序可从 MetaWin 网站获取。MetaWin 包括标准化效应大小计算、探索性和发表偏倚分析，并允许在元分析框架（包括元回归）内使用传统的最小二乘/常量估计建立简单和复杂的变异解释模型。

引用次数: 0