Pub Date : 2024-04-03DOI: 10.3389/fbinf.2024.1380928
Shamini Hemandhar Kumar, Ines Tapken, Daniela Kuhn, Peter Claus, Klaus Jung
Introduction: Gene set enrichment analysis (GSEA) subsequent to differential expression analysis is a standard step in transcriptomics and proteomics data analysis. Although many tools for this step are available, the results are often difficult to reproduce because set annotations can change in the databases, that is, new features can be added or existing features can be removed. Finally, such changes in set compositions can have an impact on biological interpretation. Methods: We present bootGSEA, a novel computational pipeline, to study the robustness of GSEA. By repeating GSEA based on bootstrap samples, the variability and robustness of results can be studied. In our pipeline, not all genes or proteins are involved in the different bootstrap replicates of the analyses. Finally, we aggregate the ranks from the bootstrap replicates to obtain a score per gene set that shows whether it gains or loses evidence compared to the ranking of the standard GSEA. Rank aggregation is also used to combine GSEA results from different omics levels or from multiple independent studies at the same omics level. Results: By applying our approach to six independent cancer transcriptomics datasets, we showed that bootstrap GSEA can aid in the selection of more robust enriched gene sets. Additionally, we applied our approach to paired transcriptomics and proteomics data obtained from a mouse model of spinal muscular atrophy (SMA), a neurodegenerative and neurodevelopmental disease associated with multi-system involvement. After obtaining a robust ranking at both omics levels, both ranking lists were combined to aggregate the findings from the transcriptomics and proteomics results. Furthermore, we constructed the new R-package “bootGSEA,” which implements the proposed methods and provides graphical views of the findings. Bootstrap-based GSEA was able in the example datasets to identify gene or protein sets that were less robust when the set composition changed during bootstrap analysis. Discussion: The rank aggregation step was useful for combining bootstrap results and making them comparable to the original findings on the single-omics level or for combining findings from multiple different omics levels.
{"title":"bootGSEA: a bootstrap and rank aggregation pipeline for multi-study and multi-omics enrichment analyses","authors":"Shamini Hemandhar Kumar, Ines Tapken, Daniela Kuhn, Peter Claus, Klaus Jung","doi":"10.3389/fbinf.2024.1380928","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1380928","url":null,"abstract":"Introduction: Gene set enrichment analysis (GSEA) subsequent to differential expression analysis is a standard step in transcriptomics and proteomics data analysis. Although many tools for this step are available, the results are often difficult to reproduce because set annotations can change in the databases, that is, new features can be added or existing features can be removed. Finally, such changes in set compositions can have an impact on biological interpretation. Methods: We present bootGSEA, a novel computational pipeline, to study the robustness of GSEA. By repeating GSEA based on bootstrap samples, the variability and robustness of results can be studied. In our pipeline, not all genes or proteins are involved in the different bootstrap replicates of the analyses. Finally, we aggregate the ranks from the bootstrap replicates to obtain a score per gene set that shows whether it gains or loses evidence compared to the ranking of the standard GSEA. Rank aggregation is also used to combine GSEA results from different omics levels or from multiple independent studies at the same omics level. Results: By applying our approach to six independent cancer transcriptomics datasets, we showed that bootstrap GSEA can aid in the selection of more robust enriched gene sets. Additionally, we applied our approach to paired transcriptomics and proteomics data obtained from a mouse model of spinal muscular atrophy (SMA), a neurodegenerative and neurodevelopmental disease associated with multi-system involvement. After obtaining a robust ranking at both omics levels, both ranking lists were combined to aggregate the findings from the transcriptomics and proteomics results. Furthermore, we constructed the new R-package “bootGSEA,” which implements the proposed methods and provides graphical views of the findings. Bootstrap-based GSEA was able in the example datasets to identify gene or protein sets that were less robust when the set composition changed during bootstrap analysis. Discussion: The rank aggregation step was useful for combining bootstrap results and making them comparable to the original findings on the single-omics level or for combining findings from multiple different omics levels.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140747342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-27DOI: 10.3389/fbinf.2024.1352594
Alban Obel Slabowska, Charles Pyke, Henning Hvid, Leon Eyrich Jessen, Simon Baumgart, Vivek Das
A major challenge in sequencing-based spatial transcriptomics (ST) is resolution limitations. Tissue sections are divided into hundreds of thousands of spots, where each spot invariably contains a mixture of cell types. Methods have been developed to deconvolute the mixed transcriptional signal into its constituents. Although ST is becoming essential for drug discovery, especially in cardiometabolic diseases, to date, no deconvolution benchmark has been performed on these types of tissues and diseases. However, the three methods, Cell2location, RCTD, and spatialDWLS, have previously been shown to perform well in brain tissue and simulated data. Here, we compare these methods to assess the best performance when using human data from cardiovascular disease (CVD) and chronic kidney disease (CKD) from patients in different pathological states, evaluated using expert annotation. In this study, we found that all three methods performed comparably well in deconvoluting verifiable cell types, including smooth muscle cells and macrophages in vascular samples and podocytes in kidney samples. RCTD shows the best performance accuracy scores in CVD samples, while Cell2location, on average, achieved the highest performance across all test experiments. Although all three methods had similar accuracies, Cell2location needed less reference data to converge at the expense of higher computational intensity. Finally, we also report that RCTD has the fastest computational time and the simplest workflow, requiring fewer computational dependencies. In conclusion, we find that each method has particular advantages, and the optimal choice depends on the use case.
{"title":"A systematic evaluation of state-of-the-art deconvolution methods in spatial transcriptomics: insights from cardiovascular disease and chronic kidney disease","authors":"Alban Obel Slabowska, Charles Pyke, Henning Hvid, Leon Eyrich Jessen, Simon Baumgart, Vivek Das","doi":"10.3389/fbinf.2024.1352594","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1352594","url":null,"abstract":"A major challenge in sequencing-based spatial transcriptomics (ST) is resolution limitations. Tissue sections are divided into hundreds of thousands of spots, where each spot invariably contains a mixture of cell types. Methods have been developed to deconvolute the mixed transcriptional signal into its constituents. Although ST is becoming essential for drug discovery, especially in cardiometabolic diseases, to date, no deconvolution benchmark has been performed on these types of tissues and diseases. However, the three methods, Cell2location, RCTD, and spatialDWLS, have previously been shown to perform well in brain tissue and simulated data. Here, we compare these methods to assess the best performance when using human data from cardiovascular disease (CVD) and chronic kidney disease (CKD) from patients in different pathological states, evaluated using expert annotation. In this study, we found that all three methods performed comparably well in deconvoluting verifiable cell types, including smooth muscle cells and macrophages in vascular samples and podocytes in kidney samples. RCTD shows the best performance accuracy scores in CVD samples, while Cell2location, on average, achieved the highest performance across all test experiments. Although all three methods had similar accuracies, Cell2location needed less reference data to converge at the expense of higher computational intensity. Finally, we also report that RCTD has the fastest computational time and the simplest workflow, requiring fewer computational dependencies. In conclusion, we find that each method has particular advantages, and the optimal choice depends on the use case.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140376150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent advancements in contact map-based protein three-dimensional (3D) structure prediction have been driven by the evolution of deep learning algorithms. However, the gap in accessible software tools for novices in this domain remains a significant challenge. This study introduces GoFold, a novel, standalone graphical user interface (GUI) designed for beginners to perform contact map overlap (CMO) problems for better template selection. Unlike existing tools that cater more to research needs or assume foundational knowledge, GoFold offers an intuitive, user-friendly platform with comprehensive tutorials. It stands out in its ability to visually represent the CMO problem, allowing users to input proteins in various formats and explore the CMO problem. The educational value of GoFold is demonstrated through benchmarking against the state-of-the-art contact map overlap method, map_align, using two datasets: PSICOV and CAMEO. GoFold exhibits superior performance in terms of TM-score and Z-score metrics across diverse qualities of contact maps and target difficulties. Notably, GoFold runs efficiently on personal computers without any third-party dependencies, thereby making it accessible to the general public for promoting citizen science. The tool is freely available for download at for macOS, Linux, and Windows.
深度学习算法的发展推动了基于接触图的蛋白质三维(3D)结构预测的最新进展。然而,在这一领域,面向新手的可访问软件工具的缺口仍然是一个重大挑战。本研究介绍了 GoFold,这是一种新颖的独立图形用户界面(GUI),专为初学者设计,用于处理接触图重叠(CMO)问题,以更好地选择模板。现有的工具更多地是为了满足研究需要或假设基础知识,与之不同的是,GoFold 提供了一个直观、用户友好的平台,并配有全面的教程。它的突出之处在于能够直观地表示 CMO 问题,允许用户输入各种格式的蛋白质并探索 CMO 问题。GoFold 的教育价值通过使用两个数据集与最先进的接触图重叠方法 map_align 进行基准测试得到了证明:PSICOV和CAMEO。在不同质量的接触地图和目标难度下,GoFold 在 TM 分数和 Z 分数指标方面表现出卓越的性能。值得注意的是,GoFold 可在个人电脑上高效运行,不需要任何第三方依赖,因此公众也可以使用它来促进公民科学。该工具可在 MacOS、Linux 和 Windows 上免费下载。
{"title":"An interactive visualization tool for educational outreach in protein contact map overlap analysis","authors":"Kevan Baker, Nathaniel Hughes, Sutanu Bhattacharya","doi":"10.3389/fbinf.2024.1358550","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1358550","url":null,"abstract":"Recent advancements in contact map-based protein three-dimensional (3D) structure prediction have been driven by the evolution of deep learning algorithms. However, the gap in accessible software tools for novices in this domain remains a significant challenge. This study introduces GoFold, a novel, standalone graphical user interface (GUI) designed for beginners to perform contact map overlap (CMO) problems for better template selection. Unlike existing tools that cater more to research needs or assume foundational knowledge, GoFold offers an intuitive, user-friendly platform with comprehensive tutorials. It stands out in its ability to visually represent the CMO problem, allowing users to input proteins in various formats and explore the CMO problem. The educational value of GoFold is demonstrated through benchmarking against the state-of-the-art contact map overlap method, map_align, using two datasets: PSICOV and CAMEO. GoFold exhibits superior performance in terms of TM-score and Z-score metrics across diverse qualities of contact maps and target difficulties. Notably, GoFold runs efficiently on personal computers without any third-party dependencies, thereby making it accessible to the general public for promoting citizen science. The tool is freely available for download at for macOS, Linux, and Windows.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140238864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-15DOI: 10.3389/fbinf.2024.1278228
Samuel D. Chorlton
Metagenomic sequencing has revolutionized our understanding of microbiology. While metagenomic tools and approaches have been extensively evaluated and benchmarked, far less attention has been given to the reference sequence database used in metagenomic classification. Issues with reference sequence databases are pervasive. Database contamination is the most recognized issue in the literature; however, it remains relatively unmitigated in most analyses. Other common issues with reference sequence databases include taxonomic errors, inappropriate inclusion and exclusion criteria, and sequence content errors. This review covers ten common issues with reference sequence databases and the potential downstream consequences of these issues. Mitigation measures are discussed for each issue, including bioinformatic tools and database curation strategies. Together, these strategies present a path towards more accurate, reproducible and translatable metagenomic sequencing.
{"title":"Ten common issues with reference sequence databases and how to mitigate them","authors":"Samuel D. Chorlton","doi":"10.3389/fbinf.2024.1278228","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1278228","url":null,"abstract":"Metagenomic sequencing has revolutionized our understanding of microbiology. While metagenomic tools and approaches have been extensively evaluated and benchmarked, far less attention has been given to the reference sequence database used in metagenomic classification. Issues with reference sequence databases are pervasive. Database contamination is the most recognized issue in the literature; however, it remains relatively unmitigated in most analyses. Other common issues with reference sequence databases include taxonomic errors, inappropriate inclusion and exclusion criteria, and sequence content errors. This review covers ten common issues with reference sequence databases and the potential downstream consequences of these issues. Mitigation measures are discussed for each issue, including bioinformatic tools and database curation strategies. Together, these strategies present a path towards more accurate, reproducible and translatable metagenomic sequencing.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140238827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-09DOI: 10.3389/fbinf.2024.1329062
Kai O. Kreissner, Benjamin Faller, Ivan Talucci, Hans M. Maric
Background: Microarray technology has brought significant advancements to high-throughput analysis, particularly in the comprehensive study of biomolecular interactions involving proteins, peptides, and antibodies, as well as in the fields of gene expression and genotyping. With the ever-increasing volume and intricacy of microarray data, an accurate, reliable and reproducible analysis is essential. Furthermore, there is a high level of variation in the format of microarrays. This not only holds true between different sample types but is also due to differences in the hardware used during the production of the arrays, as well as the personal preferences of the individual users. Therefore, there is a need for transparent, broadly applicable and user-friendly image quantification techniques to extract meaningful information from these complex datasets, while also addressing the challenges posed by specific microarray and imager formats, which can flaw analysis and interpretation.Results: Here we introduce MicroArray Rastering Tool (MARTin), as a versatile tool developed primarily for the analysis of protein and peptide microarrays. Our software provides state-of-the-art methodologies, offering researchers a comprehensive tool for microarray image quantification. MARTin is independent of the microarray platform used and supports various configurations including high-density formats and printed arrays with significant x and y offsets. This is made possible by granting the user the ability to freely customize parts of the application to their specific microarray format. Thanks to built-in features like adaptive filtering and autofit, measurements can be done very efficiently and are highly reproducible. Furthermore, our tool integrates metadata management and integrity check features, providing a straightforward quality control method, along with a ready-to-use interface for in-depth data analysis. This not only promotes good scientific practice in the field of microarray analysis but also enhances the ability to explore and examine the generated data.Conclusion: MARTin has been developed to empower its users with a reliable, efficient, and intuitive tool for peptidomic and proteomic array analysis, thereby facilitating data-driven discovery across disciplines. Our software is an open-source project freely available via the GNU Affero General Public License licence on GitHub.
背景:微阵列技术为高通量分析带来了重大进步,尤其是在涉及蛋白质、肽和抗体的生物分子相互作用的综合研究以及基因表达和基因分型领域。随着微阵列数据的数量和复杂性不断增加,准确、可靠和可重复的分析至关重要。此外,微阵列的格式差异很大。这不仅是不同样本类型之间的差异,也是由于芯片生产过程中使用的硬件不同以及用户的个人偏好造成的。因此,我们需要透明、广泛适用且用户友好的图像量化技术,以便从这些复杂的数据集中提取有意义的信息,同时解决特定微阵列和成像仪格式带来的挑战,这些挑战可能会影响分析和解释:我们在此介绍微阵列栅格化工具(MARTin),它是一种多功能工具,主要用于分析蛋白质和肽微阵列。我们的软件提供了最先进的方法,为研究人员提供了微阵列图像量化的综合工具。MARTin 与所使用的微阵列平台无关,支持各种配置,包括高密度格式和具有显著 x 和 y 偏移的印刷阵列。用户可以根据自己特定的微阵列格式自由定制应用程序的各个部分,从而使上述功能成为可能。得益于自适应滤波和自动拟合等内置功能,测量工作可以非常高效地完成,并具有很高的可重复性。此外,我们的工具还集成了元数据管理和完整性检查功能,提供了一种直接的质量控制方法,以及一个用于深入数据分析的即用型界面。这不仅促进了微阵列分析领域的良好科学实践,还增强了探索和检查生成数据的能力:开发 MARTin 的目的是为用户提供可靠、高效、直观的肽组和蛋白质组阵列分析工具,从而促进跨学科的数据驱动发现。我们的软件是一个开源项目,可通过 GitHub 上的 GNU Affero 通用公共许可证免费获取。
{"title":"MARTin—an open-source platform for microarray analysis","authors":"Kai O. Kreissner, Benjamin Faller, Ivan Talucci, Hans M. Maric","doi":"10.3389/fbinf.2024.1329062","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1329062","url":null,"abstract":"Background: Microarray technology has brought significant advancements to high-throughput analysis, particularly in the comprehensive study of biomolecular interactions involving proteins, peptides, and antibodies, as well as in the fields of gene expression and genotyping. With the ever-increasing volume and intricacy of microarray data, an accurate, reliable and reproducible analysis is essential. Furthermore, there is a high level of variation in the format of microarrays. This not only holds true between different sample types but is also due to differences in the hardware used during the production of the arrays, as well as the personal preferences of the individual users. Therefore, there is a need for transparent, broadly applicable and user-friendly image quantification techniques to extract meaningful information from these complex datasets, while also addressing the challenges posed by specific microarray and imager formats, which can flaw analysis and interpretation.Results: Here we introduce MicroArray Rastering Tool (MARTin), as a versatile tool developed primarily for the analysis of protein and peptide microarrays. Our software provides state-of-the-art methodologies, offering researchers a comprehensive tool for microarray image quantification. MARTin is independent of the microarray platform used and supports various configurations including high-density formats and printed arrays with significant x and y offsets. This is made possible by granting the user the ability to freely customize parts of the application to their specific microarray format. Thanks to built-in features like adaptive filtering and autofit, measurements can be done very efficiently and are highly reproducible. Furthermore, our tool integrates metadata management and integrity check features, providing a straightforward quality control method, along with a ready-to-use interface for in-depth data analysis. This not only promotes good scientific practice in the field of microarray analysis but also enhances the ability to explore and examine the generated data.Conclusion: MARTin has been developed to empower its users with a reliable, efficient, and intuitive tool for peptidomic and proteomic array analysis, thereby facilitating data-driven discovery across disciplines. Our software is an open-source project freely available via the GNU Affero General Public License licence on GitHub.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139790675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-09DOI: 10.3389/fbinf.2024.1336257
Wouter‐Michiel A.M. Vierdag, Sinem K. Saka
Multiplexed imaging approaches are getting increasingly adopted for imaging of large tissue areas, yielding big imaging datasets both in terms of the number of samples and the size of image data per sample. The processing and analysis of these datasets is complex owing to frequent technical artifacts and heterogeneous profiles from a high number of stained targets To streamline the analysis of multiplexed images, automated pipelines making use of state-of-the-art algorithms have been developed. In these pipelines, the output quality of one processing step is typically dependent on the output of the previous step and errors from each step, even when they appear minor, can propagate and confound the results. Thus, rigorous quality control (QC) at each of these different steps of the image processing pipeline is of paramount importance both for the proper analysis and interpretation of the analysis results and for ensuring the reusability of the data. Ideally, QC should become an integral and easily retrievable part of the imaging datasets and the analysis process. Yet, limitations of the currently available frameworks make integration of interactive QC difficult for large multiplexed imaging data. Given the increasing size and complexity of multiplexed imaging datasets, we present the different challenges for integrating QC in image analysis pipelines as well as suggest possible solutions that build on top of recent advances in bioimage analysis.
{"title":"A perspective on FAIR quality control in multiplexed imaging data processing","authors":"Wouter‐Michiel A.M. Vierdag, Sinem K. Saka","doi":"10.3389/fbinf.2024.1336257","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1336257","url":null,"abstract":"Multiplexed imaging approaches are getting increasingly adopted for imaging of large tissue areas, yielding big imaging datasets both in terms of the number of samples and the size of image data per sample. The processing and analysis of these datasets is complex owing to frequent technical artifacts and heterogeneous profiles from a high number of stained targets To streamline the analysis of multiplexed images, automated pipelines making use of state-of-the-art algorithms have been developed. In these pipelines, the output quality of one processing step is typically dependent on the output of the previous step and errors from each step, even when they appear minor, can propagate and confound the results. Thus, rigorous quality control (QC) at each of these different steps of the image processing pipeline is of paramount importance both for the proper analysis and interpretation of the analysis results and for ensuring the reusability of the data. Ideally, QC should become an integral and easily retrievable part of the imaging datasets and the analysis process. Yet, limitations of the currently available frameworks make integration of interactive QC difficult for large multiplexed imaging data. Given the increasing size and complexity of multiplexed imaging datasets, we present the different challenges for integrating QC in image analysis pipelines as well as suggest possible solutions that build on top of recent advances in bioimage analysis.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139850479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-09DOI: 10.3389/fbinf.2024.1329062
Kai O. Kreissner, Benjamin Faller, Ivan Talucci, Hans M. Maric
Background: Microarray technology has brought significant advancements to high-throughput analysis, particularly in the comprehensive study of biomolecular interactions involving proteins, peptides, and antibodies, as well as in the fields of gene expression and genotyping. With the ever-increasing volume and intricacy of microarray data, an accurate, reliable and reproducible analysis is essential. Furthermore, there is a high level of variation in the format of microarrays. This not only holds true between different sample types but is also due to differences in the hardware used during the production of the arrays, as well as the personal preferences of the individual users. Therefore, there is a need for transparent, broadly applicable and user-friendly image quantification techniques to extract meaningful information from these complex datasets, while also addressing the challenges posed by specific microarray and imager formats, which can flaw analysis and interpretation.Results: Here we introduce MicroArray Rastering Tool (MARTin), as a versatile tool developed primarily for the analysis of protein and peptide microarrays. Our software provides state-of-the-art methodologies, offering researchers a comprehensive tool for microarray image quantification. MARTin is independent of the microarray platform used and supports various configurations including high-density formats and printed arrays with significant x and y offsets. This is made possible by granting the user the ability to freely customize parts of the application to their specific microarray format. Thanks to built-in features like adaptive filtering and autofit, measurements can be done very efficiently and are highly reproducible. Furthermore, our tool integrates metadata management and integrity check features, providing a straightforward quality control method, along with a ready-to-use interface for in-depth data analysis. This not only promotes good scientific practice in the field of microarray analysis but also enhances the ability to explore and examine the generated data.Conclusion: MARTin has been developed to empower its users with a reliable, efficient, and intuitive tool for peptidomic and proteomic array analysis, thereby facilitating data-driven discovery across disciplines. Our software is an open-source project freely available via the GNU Affero General Public License licence on GitHub.
背景:微阵列技术为高通量分析带来了重大进步,尤其是在涉及蛋白质、肽和抗体的生物分子相互作用的综合研究以及基因表达和基因分型领域。随着微阵列数据的数量和复杂性不断增加,准确、可靠和可重复的分析至关重要。此外,微阵列的格式差异很大。这不仅是不同样本类型之间的差异,也是由于芯片生产过程中使用的硬件不同以及用户的个人偏好造成的。因此,我们需要透明、广泛适用且用户友好的图像量化技术,以便从这些复杂的数据集中提取有意义的信息,同时解决特定微阵列和成像仪格式带来的挑战,这些挑战可能会影响分析和解释:我们在此介绍微阵列栅格化工具(MARTin),它是一种多功能工具,主要用于分析蛋白质和肽微阵列。我们的软件提供了最先进的方法,为研究人员提供了微阵列图像量化的综合工具。MARTin 与所使用的微阵列平台无关,支持各种配置,包括高密度格式和具有显著 x 和 y 偏移的印刷阵列。用户可以根据自己特定的微阵列格式自由定制应用程序的各个部分,从而使上述功能成为可能。得益于自适应滤波和自动拟合等内置功能,测量工作可以非常高效地完成,并具有很高的可重复性。此外,我们的工具还集成了元数据管理和完整性检查功能,提供了一种直接的质量控制方法,以及一个用于深入数据分析的即用型界面。这不仅促进了微阵列分析领域的良好科学实践,还增强了探索和检查生成数据的能力:开发 MARTin 的目的是为用户提供可靠、高效、直观的肽组和蛋白质组阵列分析工具,从而促进跨学科的数据驱动发现。我们的软件是一个开源项目,可通过 GitHub 上的 GNU Affero 通用公共许可证免费获取。
{"title":"MARTin—an open-source platform for microarray analysis","authors":"Kai O. Kreissner, Benjamin Faller, Ivan Talucci, Hans M. Maric","doi":"10.3389/fbinf.2024.1329062","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1329062","url":null,"abstract":"Background: Microarray technology has brought significant advancements to high-throughput analysis, particularly in the comprehensive study of biomolecular interactions involving proteins, peptides, and antibodies, as well as in the fields of gene expression and genotyping. With the ever-increasing volume and intricacy of microarray data, an accurate, reliable and reproducible analysis is essential. Furthermore, there is a high level of variation in the format of microarrays. This not only holds true between different sample types but is also due to differences in the hardware used during the production of the arrays, as well as the personal preferences of the individual users. Therefore, there is a need for transparent, broadly applicable and user-friendly image quantification techniques to extract meaningful information from these complex datasets, while also addressing the challenges posed by specific microarray and imager formats, which can flaw analysis and interpretation.Results: Here we introduce MicroArray Rastering Tool (MARTin), as a versatile tool developed primarily for the analysis of protein and peptide microarrays. Our software provides state-of-the-art methodologies, offering researchers a comprehensive tool for microarray image quantification. MARTin is independent of the microarray platform used and supports various configurations including high-density formats and printed arrays with significant x and y offsets. This is made possible by granting the user the ability to freely customize parts of the application to their specific microarray format. Thanks to built-in features like adaptive filtering and autofit, measurements can be done very efficiently and are highly reproducible. Furthermore, our tool integrates metadata management and integrity check features, providing a straightforward quality control method, along with a ready-to-use interface for in-depth data analysis. This not only promotes good scientific practice in the field of microarray analysis but also enhances the ability to explore and examine the generated data.Conclusion: MARTin has been developed to empower its users with a reliable, efficient, and intuitive tool for peptidomic and proteomic array analysis, thereby facilitating data-driven discovery across disciplines. Our software is an open-source project freely available via the GNU Affero General Public License licence on GitHub.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139850376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-09DOI: 10.3389/fbinf.2024.1336257
Wouter‐Michiel A.M. Vierdag, Sinem K. Saka
Multiplexed imaging approaches are getting increasingly adopted for imaging of large tissue areas, yielding big imaging datasets both in terms of the number of samples and the size of image data per sample. The processing and analysis of these datasets is complex owing to frequent technical artifacts and heterogeneous profiles from a high number of stained targets To streamline the analysis of multiplexed images, automated pipelines making use of state-of-the-art algorithms have been developed. In these pipelines, the output quality of one processing step is typically dependent on the output of the previous step and errors from each step, even when they appear minor, can propagate and confound the results. Thus, rigorous quality control (QC) at each of these different steps of the image processing pipeline is of paramount importance both for the proper analysis and interpretation of the analysis results and for ensuring the reusability of the data. Ideally, QC should become an integral and easily retrievable part of the imaging datasets and the analysis process. Yet, limitations of the currently available frameworks make integration of interactive QC difficult for large multiplexed imaging data. Given the increasing size and complexity of multiplexed imaging datasets, we present the different challenges for integrating QC in image analysis pipelines as well as suggest possible solutions that build on top of recent advances in bioimage analysis.
{"title":"A perspective on FAIR quality control in multiplexed imaging data processing","authors":"Wouter‐Michiel A.M. Vierdag, Sinem K. Saka","doi":"10.3389/fbinf.2024.1336257","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1336257","url":null,"abstract":"Multiplexed imaging approaches are getting increasingly adopted for imaging of large tissue areas, yielding big imaging datasets both in terms of the number of samples and the size of image data per sample. The processing and analysis of these datasets is complex owing to frequent technical artifacts and heterogeneous profiles from a high number of stained targets To streamline the analysis of multiplexed images, automated pipelines making use of state-of-the-art algorithms have been developed. In these pipelines, the output quality of one processing step is typically dependent on the output of the previous step and errors from each step, even when they appear minor, can propagate and confound the results. Thus, rigorous quality control (QC) at each of these different steps of the image processing pipeline is of paramount importance both for the proper analysis and interpretation of the analysis results and for ensuring the reusability of the data. Ideally, QC should become an integral and easily retrievable part of the imaging datasets and the analysis process. Yet, limitations of the currently available frameworks make integration of interactive QC difficult for large multiplexed imaging data. Given the increasing size and complexity of multiplexed imaging datasets, we present the different challenges for integrating QC in image analysis pipelines as well as suggest possible solutions that build on top of recent advances in bioimage analysis.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139790460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-08DOI: 10.3389/fbinf.2024.1305969
Michael S. Rosenberg
The rise of research synthesis and systematic reviews over the last 25 years has been aided by a series of software packages providing simple and accessible GUI interfaces which are intuitively easy to use by novice analysts and users. Development of many of these packages has been abandoned over time due to a variety of factors, leaving a gap in the software infrastructure available for meta-analysis. To fulfill the continued demand for a GUI-based meta-analytic system, we have now released MetaWin 3 as free, open-source, multi-platform software. MetaWin3 is written in Python and developed from scratch relative to earlier versions. The codebase is available on Github, with pre-compiled executables for both Windows and macOS available from the MetaWin website. MetaWin includes standardized effect size calculations, exploratory and publication bias analyses, and allows for both simple and complex explanatory models of variation within a meta-analytic framework, including meta-regression, using traditional least-squares/moments estimation.
{"title":"MetaWin 3: open-source software for meta-analysis","authors":"Michael S. Rosenberg","doi":"10.3389/fbinf.2024.1305969","DOIUrl":"https://doi.org/10.3389/fbinf.2024.1305969","url":null,"abstract":"The rise of research synthesis and systematic reviews over the last 25 years has been aided by a series of software packages providing simple and accessible GUI interfaces which are intuitively easy to use by novice analysts and users. Development of many of these packages has been abandoned over time due to a variety of factors, leaving a gap in the software infrastructure available for meta-analysis. To fulfill the continued demand for a GUI-based meta-analytic system, we have now released MetaWin 3 as free, open-source, multi-platform software. MetaWin3 is written in Python and developed from scratch relative to earlier versions. The codebase is available on Github, with pre-compiled executables for both Windows and macOS available from the MetaWin website. MetaWin includes standardized effect size calculations, exploratory and publication bias analyses, and allows for both simple and complex explanatory models of variation within a meta-analytic framework, including meta-regression, using traditional least-squares/moments estimation.","PeriodicalId":507586,"journal":{"name":"Frontiers in Bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139791053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}