Pub Date : 2023-11-30eCollection Date: 2023-01-01DOI: 10.3389/fbinf.2023.1286983
Kevin K D Tan, Mark A Tsuchida, Jenu V Chacko, Niklas A Gahm, Kevin W Eliceiri
Fluorescence lifetime imaging microscopy (FLIM) provides valuable quantitative insights into fluorophores' chemical microenvironment. Due to long computation times and the lack of accessible, open-source real-time analysis toolkits, traditional analysis of FLIM data, particularly with the widely used time-correlated single-photon counting (TCSPC) approach, typically occurs after acquisition. As a result, uncertainties about the quality of FLIM data persist even after collection, frequently necessitating the extension of imaging sessions. Unfortunately, prolonged sessions not only risk missing important biological events but also cause photobleaching and photodamage. We present the first open-source program designed for real-time FLIM analysis during specimen scanning to address these challenges. Our approach combines acquisition with real-time computational and visualization capabilities, allowing us to assess FLIM data quality on the fly. Our open-source real-time FLIM viewer, integrated as a Napari plugin, displays phasor analysis and rapid lifetime determination (RLD) results computed from real-time data transmitted by acquisition software such as the open-source Micro-Manager-based OpenScan package. Our method facilitates early identification of FLIM signatures and data quality assessment by providing preliminary analysis during acquisition. This not only speeds up the imaging process, but it is especially useful when imaging sensitive live biological samples.
{"title":"Real-time open-source FLIM analysis.","authors":"Kevin K D Tan, Mark A Tsuchida, Jenu V Chacko, Niklas A Gahm, Kevin W Eliceiri","doi":"10.3389/fbinf.2023.1286983","DOIUrl":"10.3389/fbinf.2023.1286983","url":null,"abstract":"<p><p>Fluorescence lifetime imaging microscopy (FLIM) provides valuable quantitative insights into fluorophores' chemical microenvironment. Due to long computation times and the lack of accessible, open-source real-time analysis toolkits, traditional analysis of FLIM data, particularly with the widely used time-correlated single-photon counting (TCSPC) approach, typically occurs after acquisition. As a result, uncertainties about the quality of FLIM data persist even after collection, frequently necessitating the extension of imaging sessions. Unfortunately, prolonged sessions not only risk missing important biological events but also cause photobleaching and photodamage. We present the first open-source program designed for real-time FLIM analysis during specimen scanning to address these challenges. Our approach combines acquisition with real-time computational and visualization capabilities, allowing us to assess FLIM data quality on the fly. Our open-source real-time FLIM viewer, integrated as a Napari plugin, displays phasor analysis and rapid lifetime determination (RLD) results computed from real-time data transmitted by acquisition software such as the open-source Micro-Manager-based OpenScan package. Our method facilitates early identification of FLIM signatures and data quality assessment by providing preliminary analysis during acquisition. This not only speeds up the imaging process, but it is especially useful when imaging sensitive live biological samples.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1286983"},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10720713/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138813817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-24eCollection Date: 2023-01-01DOI: 10.3389/fbinf.2023.1237551
Luca Panconi, Dylan M Owen, Juliette Griffié
Many proteins display a non-random distribution on the cell surface. From dimers to nanoscale clusters to large, micron-scale aggregations, these distributions regulate protein-protein interactions and signalling. Although these distributions show organisation on length-scales below the resolution limit of conventional optical microscopy, single molecule localisation microscopy (SMLM) can map molecule locations with nanometre precision. The data from SMLM is not a conventional pixelated image and instead takes the form of a point-pattern-a list of the x, y coordinates of the localised molecules. To extract the biological insights that researchers require cluster analysis is often performed on these data sets, quantifying such parameters as the size of clusters, the percentage of monomers and so on. Here, we provide some guidance on how SMLM clustering should best be performed.
{"title":"Cluster analysis for localisation-based data sets: dos and don'ts when quantifying protein aggregates.","authors":"Luca Panconi, Dylan M Owen, Juliette Griffié","doi":"10.3389/fbinf.2023.1237551","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1237551","url":null,"abstract":"<p><p>Many proteins display a non-random distribution on the cell surface. From dimers to nanoscale clusters to large, micron-scale aggregations, these distributions regulate protein-protein interactions and signalling. Although these distributions show organisation on length-scales below the resolution limit of conventional optical microscopy, single molecule localisation microscopy (SMLM) can map molecule locations with nanometre precision. The data from SMLM is not a conventional pixelated image and instead takes the form of a point-pattern-a list of the x, y coordinates of the localised molecules. To extract the biological insights that researchers require cluster analysis is often performed on these data sets, quantifying such parameters as the size of clusters, the percentage of monomers and so on. Here, we provide some guidance on how SMLM clustering should best be performed.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1237551"},"PeriodicalIF":0.0,"publicationDate":"2023-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10704244/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138813816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-23eCollection Date: 2023-01-01DOI: 10.3389/fbinf.2023.1304099
Giorgio Valentini, Dario Malchiodi, Jessica Gliozzo, Marco Mesiti, Mauricio Soto-Gomez, Alberto Cabri, Justin Reese, Elena Casiraghi, Peter N Robinson
The recent breakthroughs of Large Language Models (LLMs) in the context of natural language processing have opened the way to significant advances in protein research. Indeed, the relationships between human natural language and the "language of proteins" invite the application and adaptation of LLMs to protein modelling and design. Considering the impressive results of GPT-4 and other recently developed LLMs in processing, generating and translating human languages, we anticipate analogous results with the language of proteins. Indeed, protein language models have been already trained to accurately predict protein properties, generate novel functionally characterized proteins, achieving state-of-the-art results. In this paper we discuss the promises and the open challenges raised by this novel and exciting research area, and we propose our perspective on how LLMs will affect protein modeling and design.
{"title":"The promises of large language models for protein design and modeling.","authors":"Giorgio Valentini, Dario Malchiodi, Jessica Gliozzo, Marco Mesiti, Mauricio Soto-Gomez, Alberto Cabri, Justin Reese, Elena Casiraghi, Peter N Robinson","doi":"10.3389/fbinf.2023.1304099","DOIUrl":"10.3389/fbinf.2023.1304099","url":null,"abstract":"<p><p>The recent breakthroughs of Large Language Models (LLMs) in the context of natural language processing have opened the way to significant advances in protein research. Indeed, the relationships between human natural language and the \"language of proteins\" invite the application and adaptation of LLMs to protein modelling and design. Considering the impressive results of GPT-4 and other recently developed LLMs in processing, generating and translating human languages, we anticipate analogous results with the language of proteins. Indeed, protein language models have been already trained to accurately predict protein properties, generate novel functionally characterized proteins, achieving state-of-the-art results. In this paper we discuss the promises and the open challenges raised by this novel and exciting research area, and we propose our perspective on how LLMs will affect protein modeling and design.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1304099"},"PeriodicalIF":2.8,"publicationDate":"2023-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10701588/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138813818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-22eCollection Date: 2023-01-01DOI: 10.3389/fbinf.2023.1268899
Bela T L Vogler, Francesco Reina, Christian Eggeling
In this study, we introduce Blob-B-Gone, a lightweight framework to computationally differentiate and eventually remove dense isotropic localization accumulations (blobs) caused by artifactually immobilized particles in MINFLUX single-particle tracking (SPT) measurements. This approach uses purely geometrical features extracted from MINFLUX-detected single-particle trajectories, which are treated as point clouds of localizations. Employing k-means++ clustering, we perform single-shot separation of the feature space to rapidly extract blobs from the dataset without the need for training. We automatically annotate the resulting sub-sets and, finally, evaluate our results by means of principal component analysis (PCA), highlighting a clear separation in the feature space. We demonstrate our approach using two- and three-dimensional simulations of freely diffusing particles and blob artifacts based on parameters extracted from hand-labeled MINFLUX tracking data of fixed 23-nm bead samples and two-dimensional diffusing quantum dots on model lipid membranes. Applying Blob-B-Gone, we achieve a clear distinction between blob-like and other trajectories, represented in F1 scores of 0.998 (2D) and 1.0 (3D) as well as 0.995 (balanced) and 0.994 (imbalanced). This framework can be straightforwardly applied to similar situations, where discerning between blob and elongated time traces is desirable. Given a number of localizations sufficient to express geometric features, the method can operate on any generic point clouds presented to it, regardless of its origin.
{"title":"Blob-B-Gone: a lightweight framework for removing blob artifacts from 2D/3D MINFLUX single-particle tracking data.","authors":"Bela T L Vogler, Francesco Reina, Christian Eggeling","doi":"10.3389/fbinf.2023.1268899","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1268899","url":null,"abstract":"<p><p>In this study, we introduce Blob-B-Gone, a lightweight framework to computationally differentiate and eventually remove dense isotropic localization accumulations (blobs) caused by artifactually immobilized particles in MINFLUX single-particle tracking (SPT) measurements. This approach uses purely geometrical features extracted from MINFLUX-detected single-particle trajectories, which are treated as point clouds of localizations. Employing <i>k-means++</i> clustering, we perform single-shot separation of the feature space to rapidly extract blobs from the dataset without the need for training. We automatically annotate the resulting sub-sets and, finally, evaluate our results by means of principal component analysis (PCA), highlighting a clear separation in the feature space. We demonstrate our approach using two- and three-dimensional simulations of freely diffusing particles and blob artifacts based on parameters extracted from hand-labeled MINFLUX tracking data of fixed 23-nm bead samples and two-dimensional diffusing quantum dots on model lipid membranes. Applying Blob-B-Gone, we achieve a clear distinction between blob-like and other trajectories, represented in F1 scores of 0.998 (2D) and 1.0 (3D) as well as 0.995 (balanced) and 0.994 (imbalanced). This framework can be straightforwardly applied to similar situations, where discerning between blob and elongated time traces is desirable. Given a number of localizations sufficient to express geometric features, the method can operate on any generic point clouds presented to it, regardless of its origin.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1268899"},"PeriodicalIF":0.0,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10704905/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138813815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-14DOI: 10.3389/fbinf.2023.1279359
Suad Algarni, Steven L. Foley, Hailin Tang, Shaohua Zhao, Dereje D. Gudeta, Bijay K. Khajanchi, Steven C. Ricke, Jing Han
Introduction: Type IV secretion systems (T4SSs) are integral parts of the conjugation process in enteric bacteria. These secretion systems are encoded within the transfer ( tra ) regions of plasmids, including those that harbor antimicrobial resistance (AMR) genes. The conjugal transfer of resistance plasmids can lead to the dissemination of AMR among bacterial populations. Methods: To facilitate the analyses of the conjugation-associated genes, transfer related genes associated with key groups of AMR plasmids were identified, extracted from GenBank and used to generate a plasmid transfer gene dataset that is part of the Virulence and Plasmid Transfer Factor Database at FDA, serving as the foundation for computational tools for the comparison of the conjugal transfer genes. To assess the genetic feature of the transfer gene database, genes/proteins of the same name (e.g., traI/ TraI) or predicted function (VirD4 ATPase homologs) were compared across the different plasmid types to assess sequence diversity. Two analyses tools, the Plasmid Transfer Factor Profile Assessment and Plasmid Transfer Factor Comparison tools, were developed to evaluate the transfer genes located on plasmids and to facilitate the comparison of plasmids from multiple sequence files. To assess the database and associated tools, plasmid, and whole genome sequencing (WGS) data were extracted from GenBank and previous WGS experiments in our lab and assessed using the analysis tools. Results: Overall, the plasmid transfer database and associated tools proved to be very useful for evaluating the different plasmid types, their association with T4SSs, and increased our understanding how conjugative plasmids contribute to the dissemination of AMR genes.
{"title":"Development of an antimicrobial resistance plasmid transfer gene database for enteric bacteria","authors":"Suad Algarni, Steven L. Foley, Hailin Tang, Shaohua Zhao, Dereje D. Gudeta, Bijay K. Khajanchi, Steven C. Ricke, Jing Han","doi":"10.3389/fbinf.2023.1279359","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1279359","url":null,"abstract":"Introduction: Type IV secretion systems (T4SSs) are integral parts of the conjugation process in enteric bacteria. These secretion systems are encoded within the transfer ( tra ) regions of plasmids, including those that harbor antimicrobial resistance (AMR) genes. The conjugal transfer of resistance plasmids can lead to the dissemination of AMR among bacterial populations. Methods: To facilitate the analyses of the conjugation-associated genes, transfer related genes associated with key groups of AMR plasmids were identified, extracted from GenBank and used to generate a plasmid transfer gene dataset that is part of the Virulence and Plasmid Transfer Factor Database at FDA, serving as the foundation for computational tools for the comparison of the conjugal transfer genes. To assess the genetic feature of the transfer gene database, genes/proteins of the same name (e.g., traI/ TraI) or predicted function (VirD4 ATPase homologs) were compared across the different plasmid types to assess sequence diversity. Two analyses tools, the Plasmid Transfer Factor Profile Assessment and Plasmid Transfer Factor Comparison tools, were developed to evaluate the transfer genes located on plasmids and to facilitate the comparison of plasmids from multiple sequence files. To assess the database and associated tools, plasmid, and whole genome sequencing (WGS) data were extracted from GenBank and previous WGS experiments in our lab and assessed using the analysis tools. Results: Overall, the plasmid transfer database and associated tools proved to be very useful for evaluating the different plasmid types, their association with T4SSs, and increased our understanding how conjugative plasmids contribute to the dissemination of AMR genes.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"51 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134902965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-10eCollection Date: 2023-01-01DOI: 10.3389/fbinf.2023.1328945
Elizabeth A Heron, Giorgio Valle, Anna Bernasconi
{"title":"Editorial: Identification of phenotypically important genomic variants.","authors":"Elizabeth A Heron, Giorgio Valle, Anna Bernasconi","doi":"10.3389/fbinf.2023.1328945","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1328945","url":null,"abstract":"","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1328945"},"PeriodicalIF":0.0,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10668015/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138464731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-09DOI: 10.3389/fbinf.2023.1225149
Dmitrii K. Chebanov, Vsevolod A. Misyurin, Irina Zh. Shubina
In this study, we present an algorithmic framework integrated within the created software platform tailored for the discovery of novel small-molecule anti-tumor agents. Our approach was exemplified in the context of combatting lung cancer. In the initial phase, target identification for therapeutic intervention was accomplished. Leveraging deep learning, we scrutinized gene expression profiles, focusing on those associated with adverse clinical outcomes in lung cancer patients. Augmenting this, generative adversarial neural (GAN) networks were employed to amass additional patient data. This effort yielded a subset of genes definitively linked to unfavorable prognoses. We further employed deep learning to delineate genes capable of discriminating between normal and tumor tissues based on expression patterns. The remaining genes were earmarked as potential targets for precision lung cancer therapy. Subsequently, a dedicated module was formulated to predict the interactions between inhibitors and proteins. To achieve this, protein amino acid sequences and chemical compound formulations engaged in protein interactions were encoded into vectorized representations. Additionally, a deep learning-based component was developed to forecast IC 50 values through experimentation on cell lines. Virtual pre-clinical trials employing these inhibitors facilitated the selection of pertinent cell lines for subsequent laboratory assays. In summary, our study culminated in the derivation of several small-molecule formulas projected to bind selectively to specific proteins. This algorithmic platform holds promise in accelerating the identification and design of anti-tumor compounds, a critical pursuit in advancing targeted cancer therapies.
{"title":"An algorithm for drug discovery based on deep learning with an example of developing a drug for the treatment of lung cancer","authors":"Dmitrii K. Chebanov, Vsevolod A. Misyurin, Irina Zh. Shubina","doi":"10.3389/fbinf.2023.1225149","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1225149","url":null,"abstract":"In this study, we present an algorithmic framework integrated within the created software platform tailored for the discovery of novel small-molecule anti-tumor agents. Our approach was exemplified in the context of combatting lung cancer. In the initial phase, target identification for therapeutic intervention was accomplished. Leveraging deep learning, we scrutinized gene expression profiles, focusing on those associated with adverse clinical outcomes in lung cancer patients. Augmenting this, generative adversarial neural (GAN) networks were employed to amass additional patient data. This effort yielded a subset of genes definitively linked to unfavorable prognoses. We further employed deep learning to delineate genes capable of discriminating between normal and tumor tissues based on expression patterns. The remaining genes were earmarked as potential targets for precision lung cancer therapy. Subsequently, a dedicated module was formulated to predict the interactions between inhibitors and proteins. To achieve this, protein amino acid sequences and chemical compound formulations engaged in protein interactions were encoded into vectorized representations. Additionally, a deep learning-based component was developed to forecast IC 50 values through experimentation on cell lines. Virtual pre-clinical trials employing these inhibitors facilitated the selection of pertinent cell lines for subsequent laboratory assays. In summary, our study culminated in the derivation of several small-molecule formulas projected to bind selectively to specific proteins. This algorithmic platform holds promise in accelerating the identification and design of anti-tumor compounds, a critical pursuit in advancing targeted cancer therapies.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":" 24","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135292756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-07DOI: 10.3389/fbinf.2023.1275593
Konstantinos A. Kyritsis, Nikolaos Pechlivanis, Fotis Psomopoulos
Background: Automating data analysis pipelines is a key requirement to ensure reproducibility of results, especially when dealing with large volumes of data. Here we assembled automated pipelines for the analysis of High-throughput Sequencing (HTS) data originating from RNA-Seq, ChIP-Seq and Germline variant calling experiments. We implemented these workflows in Common workflow language (CWL) and evaluated their performance by: i) reproducing the results of two previously published studies on Chronic Lymphocytic Leukemia (CLL), and ii) analyzing whole genome sequencing data from four Genome in a Bottle Consortium (GIAB) samples, comparing the detected variants against their respective golden standard truth sets. Findings: We demonstrated that CWL-implemented workflows clearly achieved high accuracy in reproducing previously published results, discovering significant biomarkers and detecting germline SNP and small INDEL variants. Conclusion: CWL pipelines are characterized by reproducibility and reusability; combined with containerization, they provide the ability to overcome issues of software incompatibility and laborious configuration requirements. In addition, they are flexible and can be used immediately or adapted to the specific needs of an experiment or study. The CWL-based workflows developed in this study, along with version information for all software tools, are publicly available on GitHub ( https://github.com/BiodataAnalysisGroup/CWL_HTS_pipelines ) under the MIT License. They are suitable for the analysis of short-read (such as Illumina-based) data and constitute an open resource that can facilitate automation, reproducibility and cross-platform compatibility for standard bioinformatic analyses.
背景:自动化数据分析管道是确保结果可再现性的关键要求,特别是在处理大量数据时。在这里,我们组装了自动化管道,用于分析来自RNA-Seq, ChIP-Seq和种系变异召唤实验的高通量测序(HTS)数据。我们在通用工作流程语言(CWL)中实现了这些工作流程,并通过以下方式评估了它们的性能:i)再现了之前发表的两项关于慢性淋巴细胞白血病(CLL)的研究结果,ii)分析了来自四个genome in a Bottle Consortium (GIAB)样本的全基因组测序数据,将检测到的变体与各自的黄金标准真值集进行了比较。研究结果:我们证明了cwl实施的工作流程在复制先前发表的结果、发现重要的生物标志物和检测种系SNP和小INDEL变体方面明显达到了很高的准确性。结论:CWL管道具有重复性和可重用性;与容器化相结合,它们提供了克服软件不兼容和费力的配置需求问题的能力。此外,它们是灵活的,可以立即使用或适应实验或研究的具体需要。本研究中开发的基于cwl的工作流,以及所有软件工具的版本信息,在MIT许可下可在GitHub (https://github.com/BiodataAnalysisGroup/CWL_HTS_pipelines)上公开获得。它们适用于分析短读(如基于illumina的)数据,并构成一个开放资源,可以促进自动化,可重复性和标准生物信息学分析的跨平台兼容性。
{"title":"Software pipelines for RNA-Seq, ChIP-Seq and germline variant calling analyses in common workflow language (CWL)","authors":"Konstantinos A. Kyritsis, Nikolaos Pechlivanis, Fotis Psomopoulos","doi":"10.3389/fbinf.2023.1275593","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1275593","url":null,"abstract":"Background: Automating data analysis pipelines is a key requirement to ensure reproducibility of results, especially when dealing with large volumes of data. Here we assembled automated pipelines for the analysis of High-throughput Sequencing (HTS) data originating from RNA-Seq, ChIP-Seq and Germline variant calling experiments. We implemented these workflows in Common workflow language (CWL) and evaluated their performance by: i) reproducing the results of two previously published studies on Chronic Lymphocytic Leukemia (CLL), and ii) analyzing whole genome sequencing data from four Genome in a Bottle Consortium (GIAB) samples, comparing the detected variants against their respective golden standard truth sets. Findings: We demonstrated that CWL-implemented workflows clearly achieved high accuracy in reproducing previously published results, discovering significant biomarkers and detecting germline SNP and small INDEL variants. Conclusion: CWL pipelines are characterized by reproducibility and reusability; combined with containerization, they provide the ability to overcome issues of software incompatibility and laborious configuration requirements. In addition, they are flexible and can be used immediately or adapted to the specific needs of an experiment or study. The CWL-based workflows developed in this study, along with version information for all software tools, are publicly available on GitHub ( https://github.com/BiodataAnalysisGroup/CWL_HTS_pipelines ) under the MIT License. They are suitable for the analysis of short-read (such as Illumina-based) data and constitute an open resource that can facilitate automation, reproducibility and cross-platform compatibility for standard bioinformatic analyses.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135475666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-01DOI: 10.3389/fbinf.2023.1216139
Helena Klara Jambor
Posters are intended to spark scientific dialogue and are omnipresent at biological conferences. Guides and how-to articles help life scientists in preparing informative visualizations in poster format. However, posters shown at conferences are at present often overloaded with data and text and lack visual structure. Here, I surveyed life scientists themselves to understand how they are currently preparing posters and which parts they struggle with. Biologist spend on average two entire days preparing one poster, with half of the time devoted to visual design aspects. Most receive no design or software training and also receive little to no feedback when preparing their visualizations. In conclusion, training in visualization principles and tools for poster preparation would likely improve the quality of conference posters. This would also benefit other common visuals such as figures and slides, and improve the science communication of researchers overall.
{"title":"Insights on poster preparation practices in life sciences","authors":"Helena Klara Jambor","doi":"10.3389/fbinf.2023.1216139","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1216139","url":null,"abstract":"Posters are intended to spark scientific dialogue and are omnipresent at biological conferences. Guides and how-to articles help life scientists in preparing informative visualizations in poster format. However, posters shown at conferences are at present often overloaded with data and text and lack visual structure. Here, I surveyed life scientists themselves to understand how they are currently preparing posters and which parts they struggle with. Biologist spend on average two entire days preparing one poster, with half of the time devoted to visual design aspects. Most receive no design or software training and also receive little to no feedback when preparing their visualizations. In conclusion, training in visualization principles and tools for poster preparation would likely improve the quality of conference posters. This would also benefit other common visuals such as figures and slides, and improve the science communication of researchers overall.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"126 1-2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135270825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-23eCollection Date: 2023-01-01DOI: 10.3389/fbinf.2023.1214074
Famke Bäuerle, Gwendolyn O Döbel, Laura Camus, Simon Heilbronner, Andreas Dräger
Introduction: Genome-scale metabolic models (GEMs) are organism-specific knowledge bases which can be used to unravel pathogenicity or improve production of specific metabolites in biotechnology applications. However, the validity of predictions for bacterial proliferation in in vitro settings is hardly investigated. Methods: The present work combines in silico and in vitro approaches to create and curate strain-specific genome-scale metabolic models of Corynebacterium striatum. Results: We introduce five newly created strain-specific genome-scale metabolic models (GEMs) of high quality, satisfying all contemporary standards and requirements. All these models have been benchmarked using the community standard test suite Metabolic Model Testing (MEMOTE) and were validated by laboratory experiments. For the curation of those models, the software infrastructure refineGEMs was developed to work on these models in parallel and to comply with the quality standards for GEMs. The model predictions were confirmed by experimental data and a new comparison metric based on the doubling time was developed to quantify bacterial growth. Discussion: Future modeling projects can rely on the proposed software, which is independent of specific environmental conditions. The validation approach based on the growth rate calculation is now accessible and closely aligned with biological questions. The curated models are freely available via BioModels and a GitHub repository and can be used. The open-source software refineGEMs is available from https://github.com/draeger-lab/refinegems.
{"title":"Genome-scale metabolic models consistently predict <i>in vitro</i> characteristics of <i>Corynebacterium striatum</i>.","authors":"Famke Bäuerle, Gwendolyn O Döbel, Laura Camus, Simon Heilbronner, Andreas Dräger","doi":"10.3389/fbinf.2023.1214074","DOIUrl":"10.3389/fbinf.2023.1214074","url":null,"abstract":"<p><p><b>Introduction:</b> Genome-scale metabolic models (GEMs) are organism-specific knowledge bases which can be used to unravel pathogenicity or improve production of specific metabolites in biotechnology applications. However, the validity of predictions for bacterial proliferation in <i>in vitro</i> settings is hardly investigated. <b>Methods:</b> The present work combines <i>in silico</i> and <i>in vitro</i> approaches to create and curate strain-specific genome-scale metabolic models of <i>Corynebacterium striatum</i>. <b>Results:</b> We introduce five newly created strain-specific genome-scale metabolic models (GEMs) of high quality, satisfying all contemporary standards and requirements. All these models have been benchmarked using the community standard test suite Metabolic Model Testing (MEMOTE) and were validated by laboratory experiments. For the curation of those models, the software infrastructure <i>refineGEMs</i> was developed to work on these models in parallel and to comply with the quality standards for GEMs. The model predictions were confirmed by experimental data and a new comparison metric based on the doubling time was developed to quantify bacterial growth. <b>Discussion:</b> Future modeling projects can rely on the proposed software, which is independent of specific environmental conditions. The validation approach based on the growth rate calculation is now accessible and closely aligned with biological questions. The curated models are freely available via BioModels and a GitHub repository and can be used. The open-source software refineGEMs is available from https://github.com/draeger-lab/refinegems.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1214074"},"PeriodicalIF":0.0,"publicationDate":"2023-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10626998/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71489591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}