Pub Date : 2023-11-22eCollection Date: 2023-01-01DOI: 10.3389/fbinf.2023.1268899
Bela T L Vogler, Francesco Reina, Christian Eggeling
In this study, we introduce Blob-B-Gone, a lightweight framework to computationally differentiate and eventually remove dense isotropic localization accumulations (blobs) caused by artifactually immobilized particles in MINFLUX single-particle tracking (SPT) measurements. This approach uses purely geometrical features extracted from MINFLUX-detected single-particle trajectories, which are treated as point clouds of localizations. Employing k-means++ clustering, we perform single-shot separation of the feature space to rapidly extract blobs from the dataset without the need for training. We automatically annotate the resulting sub-sets and, finally, evaluate our results by means of principal component analysis (PCA), highlighting a clear separation in the feature space. We demonstrate our approach using two- and three-dimensional simulations of freely diffusing particles and blob artifacts based on parameters extracted from hand-labeled MINFLUX tracking data of fixed 23-nm bead samples and two-dimensional diffusing quantum dots on model lipid membranes. Applying Blob-B-Gone, we achieve a clear distinction between blob-like and other trajectories, represented in F1 scores of 0.998 (2D) and 1.0 (3D) as well as 0.995 (balanced) and 0.994 (imbalanced). This framework can be straightforwardly applied to similar situations, where discerning between blob and elongated time traces is desirable. Given a number of localizations sufficient to express geometric features, the method can operate on any generic point clouds presented to it, regardless of its origin.
{"title":"Blob-B-Gone: a lightweight framework for removing blob artifacts from 2D/3D MINFLUX single-particle tracking data.","authors":"Bela T L Vogler, Francesco Reina, Christian Eggeling","doi":"10.3389/fbinf.2023.1268899","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1268899","url":null,"abstract":"<p><p>In this study, we introduce Blob-B-Gone, a lightweight framework to computationally differentiate and eventually remove dense isotropic localization accumulations (blobs) caused by artifactually immobilized particles in MINFLUX single-particle tracking (SPT) measurements. This approach uses purely geometrical features extracted from MINFLUX-detected single-particle trajectories, which are treated as point clouds of localizations. Employing <i>k-means++</i> clustering, we perform single-shot separation of the feature space to rapidly extract blobs from the dataset without the need for training. We automatically annotate the resulting sub-sets and, finally, evaluate our results by means of principal component analysis (PCA), highlighting a clear separation in the feature space. We demonstrate our approach using two- and three-dimensional simulations of freely diffusing particles and blob artifacts based on parameters extracted from hand-labeled MINFLUX tracking data of fixed 23-nm bead samples and two-dimensional diffusing quantum dots on model lipid membranes. Applying Blob-B-Gone, we achieve a clear distinction between blob-like and other trajectories, represented in F1 scores of 0.998 (2D) and 1.0 (3D) as well as 0.995 (balanced) and 0.994 (imbalanced). This framework can be straightforwardly applied to similar situations, where discerning between blob and elongated time traces is desirable. Given a number of localizations sufficient to express geometric features, the method can operate on any generic point clouds presented to it, regardless of its origin.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1268899"},"PeriodicalIF":0.0,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10704905/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138813815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-14DOI: 10.3389/fbinf.2023.1279359
Suad Algarni, Steven L. Foley, Hailin Tang, Shaohua Zhao, Dereje D. Gudeta, Bijay K. Khajanchi, Steven C. Ricke, Jing Han
Introduction: Type IV secretion systems (T4SSs) are integral parts of the conjugation process in enteric bacteria. These secretion systems are encoded within the transfer ( tra ) regions of plasmids, including those that harbor antimicrobial resistance (AMR) genes. The conjugal transfer of resistance plasmids can lead to the dissemination of AMR among bacterial populations. Methods: To facilitate the analyses of the conjugation-associated genes, transfer related genes associated with key groups of AMR plasmids were identified, extracted from GenBank and used to generate a plasmid transfer gene dataset that is part of the Virulence and Plasmid Transfer Factor Database at FDA, serving as the foundation for computational tools for the comparison of the conjugal transfer genes. To assess the genetic feature of the transfer gene database, genes/proteins of the same name (e.g., traI/ TraI) or predicted function (VirD4 ATPase homologs) were compared across the different plasmid types to assess sequence diversity. Two analyses tools, the Plasmid Transfer Factor Profile Assessment and Plasmid Transfer Factor Comparison tools, were developed to evaluate the transfer genes located on plasmids and to facilitate the comparison of plasmids from multiple sequence files. To assess the database and associated tools, plasmid, and whole genome sequencing (WGS) data were extracted from GenBank and previous WGS experiments in our lab and assessed using the analysis tools. Results: Overall, the plasmid transfer database and associated tools proved to be very useful for evaluating the different plasmid types, their association with T4SSs, and increased our understanding how conjugative plasmids contribute to the dissemination of AMR genes.
{"title":"Development of an antimicrobial resistance plasmid transfer gene database for enteric bacteria","authors":"Suad Algarni, Steven L. Foley, Hailin Tang, Shaohua Zhao, Dereje D. Gudeta, Bijay K. Khajanchi, Steven C. Ricke, Jing Han","doi":"10.3389/fbinf.2023.1279359","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1279359","url":null,"abstract":"Introduction: Type IV secretion systems (T4SSs) are integral parts of the conjugation process in enteric bacteria. These secretion systems are encoded within the transfer ( tra ) regions of plasmids, including those that harbor antimicrobial resistance (AMR) genes. The conjugal transfer of resistance plasmids can lead to the dissemination of AMR among bacterial populations. Methods: To facilitate the analyses of the conjugation-associated genes, transfer related genes associated with key groups of AMR plasmids were identified, extracted from GenBank and used to generate a plasmid transfer gene dataset that is part of the Virulence and Plasmid Transfer Factor Database at FDA, serving as the foundation for computational tools for the comparison of the conjugal transfer genes. To assess the genetic feature of the transfer gene database, genes/proteins of the same name (e.g., traI/ TraI) or predicted function (VirD4 ATPase homologs) were compared across the different plasmid types to assess sequence diversity. Two analyses tools, the Plasmid Transfer Factor Profile Assessment and Plasmid Transfer Factor Comparison tools, were developed to evaluate the transfer genes located on plasmids and to facilitate the comparison of plasmids from multiple sequence files. To assess the database and associated tools, plasmid, and whole genome sequencing (WGS) data were extracted from GenBank and previous WGS experiments in our lab and assessed using the analysis tools. Results: Overall, the plasmid transfer database and associated tools proved to be very useful for evaluating the different plasmid types, their association with T4SSs, and increased our understanding how conjugative plasmids contribute to the dissemination of AMR genes.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"51 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134902965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-10eCollection Date: 2023-01-01DOI: 10.3389/fbinf.2023.1328945
Elizabeth A Heron, Giorgio Valle, Anna Bernasconi
{"title":"Editorial: Identification of phenotypically important genomic variants.","authors":"Elizabeth A Heron, Giorgio Valle, Anna Bernasconi","doi":"10.3389/fbinf.2023.1328945","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1328945","url":null,"abstract":"","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1328945"},"PeriodicalIF":0.0,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10668015/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138464731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-09DOI: 10.3389/fbinf.2023.1225149
Dmitrii K. Chebanov, Vsevolod A. Misyurin, Irina Zh. Shubina
In this study, we present an algorithmic framework integrated within the created software platform tailored for the discovery of novel small-molecule anti-tumor agents. Our approach was exemplified in the context of combatting lung cancer. In the initial phase, target identification for therapeutic intervention was accomplished. Leveraging deep learning, we scrutinized gene expression profiles, focusing on those associated with adverse clinical outcomes in lung cancer patients. Augmenting this, generative adversarial neural (GAN) networks were employed to amass additional patient data. This effort yielded a subset of genes definitively linked to unfavorable prognoses. We further employed deep learning to delineate genes capable of discriminating between normal and tumor tissues based on expression patterns. The remaining genes were earmarked as potential targets for precision lung cancer therapy. Subsequently, a dedicated module was formulated to predict the interactions between inhibitors and proteins. To achieve this, protein amino acid sequences and chemical compound formulations engaged in protein interactions were encoded into vectorized representations. Additionally, a deep learning-based component was developed to forecast IC 50 values through experimentation on cell lines. Virtual pre-clinical trials employing these inhibitors facilitated the selection of pertinent cell lines for subsequent laboratory assays. In summary, our study culminated in the derivation of several small-molecule formulas projected to bind selectively to specific proteins. This algorithmic platform holds promise in accelerating the identification and design of anti-tumor compounds, a critical pursuit in advancing targeted cancer therapies.
{"title":"An algorithm for drug discovery based on deep learning with an example of developing a drug for the treatment of lung cancer","authors":"Dmitrii K. Chebanov, Vsevolod A. Misyurin, Irina Zh. Shubina","doi":"10.3389/fbinf.2023.1225149","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1225149","url":null,"abstract":"In this study, we present an algorithmic framework integrated within the created software platform tailored for the discovery of novel small-molecule anti-tumor agents. Our approach was exemplified in the context of combatting lung cancer. In the initial phase, target identification for therapeutic intervention was accomplished. Leveraging deep learning, we scrutinized gene expression profiles, focusing on those associated with adverse clinical outcomes in lung cancer patients. Augmenting this, generative adversarial neural (GAN) networks were employed to amass additional patient data. This effort yielded a subset of genes definitively linked to unfavorable prognoses. We further employed deep learning to delineate genes capable of discriminating between normal and tumor tissues based on expression patterns. The remaining genes were earmarked as potential targets for precision lung cancer therapy. Subsequently, a dedicated module was formulated to predict the interactions between inhibitors and proteins. To achieve this, protein amino acid sequences and chemical compound formulations engaged in protein interactions were encoded into vectorized representations. Additionally, a deep learning-based component was developed to forecast IC 50 values through experimentation on cell lines. Virtual pre-clinical trials employing these inhibitors facilitated the selection of pertinent cell lines for subsequent laboratory assays. In summary, our study culminated in the derivation of several small-molecule formulas projected to bind selectively to specific proteins. This algorithmic platform holds promise in accelerating the identification and design of anti-tumor compounds, a critical pursuit in advancing targeted cancer therapies.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":" 24","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135292756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-07DOI: 10.3389/fbinf.2023.1275593
Konstantinos A. Kyritsis, Nikolaos Pechlivanis, Fotis Psomopoulos
Background: Automating data analysis pipelines is a key requirement to ensure reproducibility of results, especially when dealing with large volumes of data. Here we assembled automated pipelines for the analysis of High-throughput Sequencing (HTS) data originating from RNA-Seq, ChIP-Seq and Germline variant calling experiments. We implemented these workflows in Common workflow language (CWL) and evaluated their performance by: i) reproducing the results of two previously published studies on Chronic Lymphocytic Leukemia (CLL), and ii) analyzing whole genome sequencing data from four Genome in a Bottle Consortium (GIAB) samples, comparing the detected variants against their respective golden standard truth sets. Findings: We demonstrated that CWL-implemented workflows clearly achieved high accuracy in reproducing previously published results, discovering significant biomarkers and detecting germline SNP and small INDEL variants. Conclusion: CWL pipelines are characterized by reproducibility and reusability; combined with containerization, they provide the ability to overcome issues of software incompatibility and laborious configuration requirements. In addition, they are flexible and can be used immediately or adapted to the specific needs of an experiment or study. The CWL-based workflows developed in this study, along with version information for all software tools, are publicly available on GitHub ( https://github.com/BiodataAnalysisGroup/CWL_HTS_pipelines ) under the MIT License. They are suitable for the analysis of short-read (such as Illumina-based) data and constitute an open resource that can facilitate automation, reproducibility and cross-platform compatibility for standard bioinformatic analyses.
背景:自动化数据分析管道是确保结果可再现性的关键要求,特别是在处理大量数据时。在这里,我们组装了自动化管道,用于分析来自RNA-Seq, ChIP-Seq和种系变异召唤实验的高通量测序(HTS)数据。我们在通用工作流程语言(CWL)中实现了这些工作流程,并通过以下方式评估了它们的性能:i)再现了之前发表的两项关于慢性淋巴细胞白血病(CLL)的研究结果,ii)分析了来自四个genome in a Bottle Consortium (GIAB)样本的全基因组测序数据,将检测到的变体与各自的黄金标准真值集进行了比较。研究结果:我们证明了cwl实施的工作流程在复制先前发表的结果、发现重要的生物标志物和检测种系SNP和小INDEL变体方面明显达到了很高的准确性。结论:CWL管道具有重复性和可重用性;与容器化相结合,它们提供了克服软件不兼容和费力的配置需求问题的能力。此外,它们是灵活的,可以立即使用或适应实验或研究的具体需要。本研究中开发的基于cwl的工作流,以及所有软件工具的版本信息,在MIT许可下可在GitHub (https://github.com/BiodataAnalysisGroup/CWL_HTS_pipelines)上公开获得。它们适用于分析短读(如基于illumina的)数据,并构成一个开放资源,可以促进自动化,可重复性和标准生物信息学分析的跨平台兼容性。
{"title":"Software pipelines for RNA-Seq, ChIP-Seq and germline variant calling analyses in common workflow language (CWL)","authors":"Konstantinos A. Kyritsis, Nikolaos Pechlivanis, Fotis Psomopoulos","doi":"10.3389/fbinf.2023.1275593","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1275593","url":null,"abstract":"Background: Automating data analysis pipelines is a key requirement to ensure reproducibility of results, especially when dealing with large volumes of data. Here we assembled automated pipelines for the analysis of High-throughput Sequencing (HTS) data originating from RNA-Seq, ChIP-Seq and Germline variant calling experiments. We implemented these workflows in Common workflow language (CWL) and evaluated their performance by: i) reproducing the results of two previously published studies on Chronic Lymphocytic Leukemia (CLL), and ii) analyzing whole genome sequencing data from four Genome in a Bottle Consortium (GIAB) samples, comparing the detected variants against their respective golden standard truth sets. Findings: We demonstrated that CWL-implemented workflows clearly achieved high accuracy in reproducing previously published results, discovering significant biomarkers and detecting germline SNP and small INDEL variants. Conclusion: CWL pipelines are characterized by reproducibility and reusability; combined with containerization, they provide the ability to overcome issues of software incompatibility and laborious configuration requirements. In addition, they are flexible and can be used immediately or adapted to the specific needs of an experiment or study. The CWL-based workflows developed in this study, along with version information for all software tools, are publicly available on GitHub ( https://github.com/BiodataAnalysisGroup/CWL_HTS_pipelines ) under the MIT License. They are suitable for the analysis of short-read (such as Illumina-based) data and constitute an open resource that can facilitate automation, reproducibility and cross-platform compatibility for standard bioinformatic analyses.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135475666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-01DOI: 10.3389/fbinf.2023.1216139
Helena Klara Jambor
Posters are intended to spark scientific dialogue and are omnipresent at biological conferences. Guides and how-to articles help life scientists in preparing informative visualizations in poster format. However, posters shown at conferences are at present often overloaded with data and text and lack visual structure. Here, I surveyed life scientists themselves to understand how they are currently preparing posters and which parts they struggle with. Biologist spend on average two entire days preparing one poster, with half of the time devoted to visual design aspects. Most receive no design or software training and also receive little to no feedback when preparing their visualizations. In conclusion, training in visualization principles and tools for poster preparation would likely improve the quality of conference posters. This would also benefit other common visuals such as figures and slides, and improve the science communication of researchers overall.
{"title":"Insights on poster preparation practices in life sciences","authors":"Helena Klara Jambor","doi":"10.3389/fbinf.2023.1216139","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1216139","url":null,"abstract":"Posters are intended to spark scientific dialogue and are omnipresent at biological conferences. Guides and how-to articles help life scientists in preparing informative visualizations in poster format. However, posters shown at conferences are at present often overloaded with data and text and lack visual structure. Here, I surveyed life scientists themselves to understand how they are currently preparing posters and which parts they struggle with. Biologist spend on average two entire days preparing one poster, with half of the time devoted to visual design aspects. Most receive no design or software training and also receive little to no feedback when preparing their visualizations. In conclusion, training in visualization principles and tools for poster preparation would likely improve the quality of conference posters. This would also benefit other common visuals such as figures and slides, and improve the science communication of researchers overall.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"126 1-2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135270825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-23eCollection Date: 2023-01-01DOI: 10.3389/fbinf.2023.1214074
Famke Bäuerle, Gwendolyn O Döbel, Laura Camus, Simon Heilbronner, Andreas Dräger
Introduction: Genome-scale metabolic models (GEMs) are organism-specific knowledge bases which can be used to unravel pathogenicity or improve production of specific metabolites in biotechnology applications. However, the validity of predictions for bacterial proliferation in in vitro settings is hardly investigated. Methods: The present work combines in silico and in vitro approaches to create and curate strain-specific genome-scale metabolic models of Corynebacterium striatum. Results: We introduce five newly created strain-specific genome-scale metabolic models (GEMs) of high quality, satisfying all contemporary standards and requirements. All these models have been benchmarked using the community standard test suite Metabolic Model Testing (MEMOTE) and were validated by laboratory experiments. For the curation of those models, the software infrastructure refineGEMs was developed to work on these models in parallel and to comply with the quality standards for GEMs. The model predictions were confirmed by experimental data and a new comparison metric based on the doubling time was developed to quantify bacterial growth. Discussion: Future modeling projects can rely on the proposed software, which is independent of specific environmental conditions. The validation approach based on the growth rate calculation is now accessible and closely aligned with biological questions. The curated models are freely available via BioModels and a GitHub repository and can be used. The open-source software refineGEMs is available from https://github.com/draeger-lab/refinegems.
{"title":"Genome-scale metabolic models consistently predict <i>in vitro</i> characteristics of <i>Corynebacterium striatum</i>.","authors":"Famke Bäuerle, Gwendolyn O Döbel, Laura Camus, Simon Heilbronner, Andreas Dräger","doi":"10.3389/fbinf.2023.1214074","DOIUrl":"10.3389/fbinf.2023.1214074","url":null,"abstract":"<p><p><b>Introduction:</b> Genome-scale metabolic models (GEMs) are organism-specific knowledge bases which can be used to unravel pathogenicity or improve production of specific metabolites in biotechnology applications. However, the validity of predictions for bacterial proliferation in <i>in vitro</i> settings is hardly investigated. <b>Methods:</b> The present work combines <i>in silico</i> and <i>in vitro</i> approaches to create and curate strain-specific genome-scale metabolic models of <i>Corynebacterium striatum</i>. <b>Results:</b> We introduce five newly created strain-specific genome-scale metabolic models (GEMs) of high quality, satisfying all contemporary standards and requirements. All these models have been benchmarked using the community standard test suite Metabolic Model Testing (MEMOTE) and were validated by laboratory experiments. For the curation of those models, the software infrastructure <i>refineGEMs</i> was developed to work on these models in parallel and to comply with the quality standards for GEMs. The model predictions were confirmed by experimental data and a new comparison metric based on the doubling time was developed to quantify bacterial growth. <b>Discussion:</b> Future modeling projects can rely on the proposed software, which is independent of specific environmental conditions. The validation approach based on the growth rate calculation is now accessible and closely aligned with biological questions. The curated models are freely available via BioModels and a GitHub repository and can be used. The open-source software refineGEMs is available from https://github.com/draeger-lab/refinegems.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1214074"},"PeriodicalIF":0.0,"publicationDate":"2023-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10626998/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71489591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-19eCollection Date: 2023-01-01DOI: 10.3389/fbinf.2023.1275402
Erik Burlingame, Luke Ternes, Jia-Ren Lin, Yu-An Chen, Eun Na Kim, Joe W Gray, Young Hwan Chang
Introduction: Tissue-based sampling and diagnosis are defined as the extraction of information from certain limited spaces and its diagnostic significance of a certain object. Pathologists deal with issues related to tumor heterogeneity since analyzing a single sample does not necessarily capture a representative depiction of cancer, and a tissue biopsy usually only presents a small fraction of the tumor. Many multiplex tissue imaging platforms (MTIs) make the assumption that tissue microarrays (TMAs) containing small core samples of 2-dimensional (2D) tissue sections are a good approximation of bulk tumors although tumors are not 2D. However, emerging whole slide imaging (WSI) or 3D tumor atlases that use MTIs like cyclic immunofluorescence (CyCIF) strongly challenge this assumption. In spite of the additional insight gathered by measuring the tumor microenvironment in WSI or 3D, it can be prohibitively expensive and time-consuming to process tens or hundreds of tissue sections with CyCIF. Even when resources are not limited, the criteria for region of interest (ROI) selection in tissues for downstream analysis remain largely qualitative and subjective as stratified sampling requires the knowledge of objects and evaluates their features. Despite the fact TMAs fail to adequately approximate whole tissue features, a theoretical subsampling of tissue exists that can best represent the tumor in the whole slide image. Methods: To address these challenges, we propose deep learning approaches to learn multi-modal image translation tasks from two aspects: 1) generative modeling approach to reconstruct 3D CyCIF representation and 2) co-embedding CyCIF image and Hematoxylin and Eosin (H&E) section to learn multi-modal mappings by a cross-domain translation for minimum representative ROI selection. Results and discussion: We demonstrate that generative modeling enables a 3D virtual CyCIF reconstruction of a colorectal cancer specimen given a small subset of the imaging data at training time. By co-embedding histology and MTI features, we propose a simple convex optimization for objective ROI selection. We demonstrate the potential application of ROI selection and the efficiency of its performance with respect to cellular heterogeneity.
{"title":"3D multiplexed tissue imaging reconstruction and optimized region of interest (ROI) selection through deep learning model of channels embedding.","authors":"Erik Burlingame, Luke Ternes, Jia-Ren Lin, Yu-An Chen, Eun Na Kim, Joe W Gray, Young Hwan Chang","doi":"10.3389/fbinf.2023.1275402","DOIUrl":"10.3389/fbinf.2023.1275402","url":null,"abstract":"<p><p><b>Introduction:</b> Tissue-based sampling and diagnosis are defined as the extraction of information from certain limited spaces and its diagnostic significance of a certain object. Pathologists deal with issues related to tumor heterogeneity since analyzing a single sample does not necessarily capture a representative depiction of cancer, and a tissue biopsy usually only presents a small fraction of the tumor. Many multiplex tissue imaging platforms (MTIs) make the assumption that tissue microarrays (TMAs) containing small core samples of 2-dimensional (2D) tissue sections are a good approximation of bulk tumors although tumors are not 2D. However, emerging whole slide imaging (WSI) or 3D tumor atlases that use MTIs like cyclic immunofluorescence (CyCIF) strongly challenge this assumption. In spite of the additional insight gathered by measuring the tumor microenvironment in WSI or 3D, it can be prohibitively expensive and time-consuming to process tens or hundreds of tissue sections with CyCIF. Even when resources are not limited, the criteria for region of interest (ROI) selection in tissues for downstream analysis remain largely qualitative and subjective as stratified sampling requires the knowledge of objects and evaluates their features. Despite the fact TMAs fail to adequately approximate whole tissue features, a theoretical subsampling of tissue exists that can best represent the tumor in the whole slide image. <b>Methods:</b> To address these challenges, we propose deep learning approaches to learn multi-modal image translation tasks from two aspects: 1) generative modeling approach to reconstruct 3D CyCIF representation and 2) co-embedding CyCIF image and Hematoxylin and Eosin (H&E) section to learn multi-modal mappings by a cross-domain translation for minimum representative ROI selection. <b>Results and discussion:</b> We demonstrate that generative modeling enables a 3D virtual CyCIF reconstruction of a colorectal cancer specimen given a small subset of the imaging data at training time. By co-embedding histology and MTI features, we propose a simple convex optimization for objective ROI selection. We demonstrate the potential application of ROI selection and the efficiency of its performance with respect to cellular heterogeneity.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1275402"},"PeriodicalIF":2.8,"publicationDate":"2023-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10620917/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71489590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-17eCollection Date: 2023-01-01DOI: 10.3389/fbinf.2023.1198218
Soumyadip Roy, Asa Ben-Hur
Motivation: The prediction of a protein 3D structure is essential for understanding protein function, drug discovery, and disease mechanisms; with the advent of methods like AlphaFold that are capable of producing very high-quality decoys, ensuring the quality of those decoys can provide further confidence in the accuracy of their predictions. Results: In this work, we describe Qϵ, a graph convolutional network (GCN) that utilizes a minimal set of atom and residue features as inputs to predict the global distance test total score (GDTTS) and local distance difference test (lDDT) score of a decoy. To improve the model's performance, we introduce a novel loss function based on the ϵ-insensitive loss function used for SVM regression. This loss function is specifically designed for evaluating the characteristics of the quality assessment problem and provides predictions with improved accuracy over standard loss functions used for this task. Despite using only a minimal set of features, it matches the performance of recent state-of-the-art methods like DeepUMQA. Availability: The code for Qϵ is available at https://github.com/soumyadip1997/qepsilon.
{"title":"Protein quality assessment with a loss function designed for high-quality decoys.","authors":"Soumyadip Roy, Asa Ben-Hur","doi":"10.3389/fbinf.2023.1198218","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1198218","url":null,"abstract":"<p><p><b>Motivation:</b> The prediction of a protein 3D structure is essential for understanding protein function, drug discovery, and disease mechanisms; with the advent of methods like AlphaFold that are capable of producing very high-quality decoys, ensuring the quality of those decoys can provide further confidence in the accuracy of their predictions. <b>Results:</b> In this work, we describe Q<sub><i>ϵ</i></sub>, a graph convolutional network (GCN) that utilizes a minimal set of atom and residue features as inputs to predict the global distance test total score (GDTTS) and local distance difference test (lDDT) score of a decoy. To improve the model's performance, we introduce a novel loss function based on the <i>ϵ</i>-insensitive loss function used for SVM regression. This loss function is specifically designed for evaluating the characteristics of the quality assessment problem and provides predictions with improved accuracy over standard loss functions used for this task. Despite using only a minimal set of features, it matches the performance of recent state-of-the-art methods like DeepUMQA. <b>Availability:</b> The code for Q<sub><i>ϵ</i></sub> is available at https://github.com/soumyadip1997/qepsilon.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1198218"},"PeriodicalIF":0.0,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10616882/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71429770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
DNA, as the storage medium in organisms, can address the shortcomings of existing electromagnetic storage media, such as low information density, high maintenance power consumption, and short storage time. Current research on DNA storage mainly focuses on designing corresponding encoders to convert binary data into DNA base data that meets biological constraints. We have created a new Chinese character code table that enables exceptionally high information storage density for storing Chinese characters (compared to traditional UTF-8 encoding). To meet biological constraints, we have devised a DNA shift coding scheme with low algorithmic complexity, which can encode any strand of DNA even has excessively long homopolymer. The designed DNA sequence will be stored in a double-stranded plasmid of 744bp, ensuring high reliability during storage. Additionally, the plasmid's resistance to environmental interference ensuring long-term stable information storage. Moreover, it can be replicated at a lower cost.
{"title":"Towards Chinese text and DNA shift encoding scheme based on biomass plasmid storage.","authors":"Xu Yang, Langwen Lai, Xiaoli Qiang, Ming Deng, Yuhao Xie, Xiaolong Shi, Zheng Kou","doi":"10.3389/fbinf.2023.1276934","DOIUrl":"10.3389/fbinf.2023.1276934","url":null,"abstract":"<p><p>DNA, as the storage medium in organisms, can address the shortcomings of existing electromagnetic storage media, such as low information density, high maintenance power consumption, and short storage time. Current research on DNA storage mainly focuses on designing corresponding encoders to convert binary data into DNA base data that meets biological constraints. We have created a new Chinese character code table that enables exceptionally high information storage density for storing Chinese characters (compared to traditional UTF-8 encoding). To meet biological constraints, we have devised a DNA shift coding scheme with low algorithmic complexity, which can encode any strand of DNA even has excessively long homopolymer. The designed DNA sequence will be stored in a double-stranded plasmid of 744bp, ensuring high reliability during storage. Additionally, the plasmid's resistance to environmental interference ensuring long-term stable information storage. Moreover, it can be replicated at a lower cost.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1276934"},"PeriodicalIF":2.8,"publicationDate":"2023-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10602677/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71415731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}