首页 > 最新文献

Frontiers in bioinformatics最新文献

英文 中文
Proteinortho6: pseudo-reciprocal best alignment heuristic for graph-based detection of (co-)orthologs Proteinortho6:基于图谱检测(同)同源物的伪互易最佳配准启发式
Pub Date : 2023-12-13 DOI: 10.3389/fbinf.2023.1322477
Paul Klemm, Peter F. Stadler, Marcus Lechner
Proteinortho is a widely used tool to predict (co)-orthologous groups of genes for any set of species. It finds application in comparative and functional genomics, phylogenomics, and evolutionary reconstructions. With a rapidly increasing number of available genomes, the demand for large-scale predictions is also growing. In this contribution, we evaluate and implement major algorithmic improvements that significantly enhance the speed of the analysis without reducing precision. Graph-based detection of (co-)orthologs is typically based on a reciprocal best alignment heuristic that requires an all vs. all comparison of proteins from all species under study. The initial identification of similar proteins is accelerated by introducing an alternative search tool along with a revised search strategy—the pseudo-reciprocal best alignment heuristic—that reduces the number of required sequence comparisons by one-half. The clustering algorithm was reworked to efficiently decompose very large clusters and accelerate processing. Proteinortho6 reduces the overall processing time by an order of magnitude compared to its predecessor while maintaining its small memory footprint and good predictive quality.
Proteinortho 是一种广泛使用的工具,用于预测任何物种的(同)同源基因组。它适用于比较和功能基因组学、系统发生组学和进化重建。随着可用基因组数量的迅速增加,对大规模预测的需求也在不断增长。在这篇论文中,我们评估并实施了重大的算法改进,在不降低精度的情况下显著提高了分析速度。基于图谱的(共)同源物检测通常基于互易最佳配对启发式,需要对所有研究物种的蛋白质进行全对全比较。通过引入另一种搜索工具和修订后的搜索策略--伪互易最佳配对启发式--可将所需的序列比较次数减少一半,从而加快了相似蛋白质的初步识别。聚类算法经过重新设计,可有效分解超大聚类并加快处理速度。与前者相比,Proteinortho6 的整体处理时间缩短了一个数量级,同时保持了较小的内存占用和良好的预测质量。
{"title":"Proteinortho6: pseudo-reciprocal best alignment heuristic for graph-based detection of (co-)orthologs","authors":"Paul Klemm, Peter F. Stadler, Marcus Lechner","doi":"10.3389/fbinf.2023.1322477","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1322477","url":null,"abstract":"Proteinortho is a widely used tool to predict (co)-orthologous groups of genes for any set of species. It finds application in comparative and functional genomics, phylogenomics, and evolutionary reconstructions. With a rapidly increasing number of available genomes, the demand for large-scale predictions is also growing. In this contribution, we evaluate and implement major algorithmic improvements that significantly enhance the speed of the analysis without reducing precision. Graph-based detection of (co-)orthologs is typically based on a reciprocal best alignment heuristic that requires an all vs. all comparison of proteins from all species under study. The initial identification of similar proteins is accelerated by introducing an alternative search tool along with a revised search strategy—the pseudo-reciprocal best alignment heuristic—that reduces the number of required sequence comparisons by one-half. The clustering algorithm was reworked to efficiently decompose very large clusters and accelerate processing. Proteinortho6 reduces the overall processing time by an order of magnitude compared to its predecessor while maintaining its small memory footprint and good predictive quality.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139006198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reconstructing diploid 3D chromatin structures from single cell Hi-C data with a polymer-based approach 用基于聚合物的方法从单细胞 Hi-C 数据中重建二倍体三维染色质结构
Pub Date : 2023-12-11 DOI: 10.3389/fbinf.2023.1284484
Jan Rothörl, M. Brems, Tim J. Stevens, Peter Virnau
Detailed understanding of the 3D structure of chromatin is a key ingredient to investigate a variety of processes inside the cell. Since direct methods to experimentally ascertain these structures lack the desired spatial fidelity, computational inference methods based on single cell Hi-C data have gained significant interest. Here, we develop a progressive simulation protocol to iteratively improve the resolution of predicted interphase structures by maximum-likelihood association of ambiguous Hi-C contacts using lower-resolution predictions. Compared to state-of-the-art methods, our procedure is not limited to haploid cell data and allows us to reach a resolution of up to 5,000 base pairs per bead. High resolution chromatin models grant access to a multitude of structural phenomena. Exemplarily, we verify the formation of chromosome territories and holes near aggregated chromocenters as well as the inversion of the CpG content for rod photoreceptor cells.
详细了解染色质的三维结构是研究细胞内各种过程的关键要素。由于通过实验确定这些结构的直接方法缺乏所需的空间保真度,基于单细胞 Hi-C 数据的计算推断方法受到了广泛关注。在这里,我们开发了一种渐进式模拟协议,通过使用低分辨率预测结果对模棱两可的 Hi-C 接触进行最大似然关联,从而迭代提高预测的间期结构分辨率。与最先进的方法相比,我们的程序并不局限于单倍体细胞数据,而且能使我们达到每个珠子多达 5000 碱基对的分辨率。高分辨率染色质模型能让我们了解多种结构现象。例如,我们验证了染色体区域的形成、聚集染色体中心附近的孔洞以及杆状感光细胞中 CpG 含量的反转。
{"title":"Reconstructing diploid 3D chromatin structures from single cell Hi-C data with a polymer-based approach","authors":"Jan Rothörl, M. Brems, Tim J. Stevens, Peter Virnau","doi":"10.3389/fbinf.2023.1284484","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1284484","url":null,"abstract":"Detailed understanding of the 3D structure of chromatin is a key ingredient to investigate a variety of processes inside the cell. Since direct methods to experimentally ascertain these structures lack the desired spatial fidelity, computational inference methods based on single cell Hi-C data have gained significant interest. Here, we develop a progressive simulation protocol to iteratively improve the resolution of predicted interphase structures by maximum-likelihood association of ambiguous Hi-C contacts using lower-resolution predictions. Compared to state-of-the-art methods, our procedure is not limited to haploid cell data and allows us to reach a resolution of up to 5,000 base pairs per bead. High resolution chromatin models grant access to a multitude of structural phenomena. Exemplarily, we verify the formation of chromosome territories and holes near aggregated chromocenters as well as the inversion of the CpG content for rod photoreceptor cells.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138981396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking software tools for trimming adapters and merging next-generation sequencing data for ancient DNA. 对用于修剪适配体和合并古 DNA 下一代测序数据的软件工具进行基准测试。
Pub Date : 2023-12-07 eCollection Date: 2023-01-01 DOI: 10.3389/fbinf.2023.1260486
Annette Lien, Leonardo Pestana Legori, Louis Kraft, Peter Wad Sackett, Gabriel Renaud

Ancient DNA is highly degraded, resulting in very short sequences. Reads generated with modern high-throughput sequencing machines are generally longer than ancient DNA molecules, therefore the reads often contain some portion of the sequencing adaptors. It is crucial to remove those adaptors, as they can interfere with downstream analysis. Furthermore, overlapping portions when DNA has been read forward and backward (paired-end) can be merged to correct sequencing errors and improve read quality. Several tools have been developed for adapter trimming and read merging, however, no one has attempted to evaluate their accuracy and evaluate their potential impact on downstream analyses. Through the simulation of sequencing data, seven commonly used tools were analyzed in their ability to reconstruct ancient DNA sequences through read merging. The analyzed tools exhibit notable differences in their abilities to correct sequence errors and identify the correct read overlap, but the most substantial difference is observed in their ability to calculate quality scores for merged bases. Selecting the most appropriate tool for a given project depends on several factors, although some tools such as fastp have some shortcomings, whereas others like leeHom outperform the other tools in most aspects. While the choice of tool did not result in a measurable difference when analyzing population genetics using principal component analysis, it is important to note that downstream analyses that are sensitive to wrongly merged reads or that rely on quality scores can be significantly impacted by the choice of tool.

古 DNA 降解程度高,因此序列非常短。现代高通量测序机器生成的读数通常比古 DNA 分子长,因此读数中往往含有部分测序适配体。移除这些适配体至关重要,因为它们会干扰下游分析。此外,DNA 正向和反向(成对端)读取时的重叠部分可以合并,以纠正测序错误并提高读取质量。目前已开发出几种用于适配器修剪和读取合并的工具,但还没有人尝试评估它们的准确性以及对下游分析的潜在影响。通过模拟测序数据,分析了七种常用工具通过读取合并重建古 DNA 序列的能力。所分析的工具在纠正序列错误和识别正确的读数重叠方面表现出明显的差异,但最大的差异在于它们计算合并碱基质量分数的能力。为特定项目选择最合适的工具取决于多个因素,尽管一些工具(如 fastp)存在一些缺陷,但其他工具(如 leeHom)在大多数方面都优于其他工具。虽然在使用主成分分析进行群体遗传学分析时,工具的选择并不会造成明显的差异,但值得注意的是,对错误合并读数敏感或依赖质量分数的下游分析可能会受到工具选择的重大影响。
{"title":"Benchmarking software tools for trimming adapters and merging next-generation sequencing data for ancient DNA.","authors":"Annette Lien, Leonardo Pestana Legori, Louis Kraft, Peter Wad Sackett, Gabriel Renaud","doi":"10.3389/fbinf.2023.1260486","DOIUrl":"10.3389/fbinf.2023.1260486","url":null,"abstract":"<p><p>Ancient DNA is highly degraded, resulting in very short sequences. Reads generated with modern high-throughput sequencing machines are generally longer than ancient DNA molecules, therefore the reads often contain some portion of the sequencing adaptors. It is crucial to remove those adaptors, as they can interfere with downstream analysis. Furthermore, overlapping portions when DNA has been read forward and backward (paired-end) can be merged to correct sequencing errors and improve read quality. Several tools have been developed for adapter trimming and read merging, however, no one has attempted to evaluate their accuracy and evaluate their potential impact on downstream analyses. Through the simulation of sequencing data, seven commonly used tools were analyzed in their ability to reconstruct ancient DNA sequences through read merging. The analyzed tools exhibit notable differences in their abilities to correct sequence errors and identify the correct read overlap, but the most substantial difference is observed in their ability to calculate quality scores for merged bases. Selecting the most appropriate tool for a given project depends on several factors, although some tools such as fastp have some shortcomings, whereas others like leeHom outperform the other tools in most aspects. While the choice of tool did not result in a measurable difference when analyzing population genetics using principal component analysis, it is important to note that downstream analyses that are sensitive to wrongly merged reads or that rely on quality scores can be significantly impacted by the choice of tool.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10733496/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138833352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RCSB Protein Data Bank: visualizing groups of experimentally determined PDB structures alongside computed structure models of proteins RCSB 蛋白质数据库:可视化实验确定的 PDB 结构组和计算得出的蛋白质结构模型
Pub Date : 2023-12-04 DOI: 10.3389/fbinf.2023.1311287
J. Segura, Yana Rose, Chunxiao Bi, Jose M. Duarte, Stephen K. Burley, S. Bittrich
Recent advances in Artificial Intelligence and Machine Learning (e.g., AlphaFold, RosettaFold, and ESMFold) enable prediction of three-dimensional (3D) protein structures from amino acid sequences alone at accuracies comparable to lower-resolution experimental methods. These tools have been employed to predict structures across entire proteomes and the results of large-scale metagenomic sequence studies, yielding an exponential increase in available biomolecular 3D structural information. Given the enormous volume of this newly computed biostructure data, there is an urgent need for robust tools to manage, search, cluster, and visualize large collections of structures. Equally important is the capability to efficiently summarize and visualize metadata, biological/biochemical annotations, and structural features, particularly when working with vast numbers of protein structures of both experimental origin from the Protein Data Bank (PDB) and computationally-predicted models. Moreover, researchers require advanced visualization techniques that support interactive exploration of multiple sequences and structural alignments. This paper introduces a suite of tools provided on the RCSB PDB research-focused web portal RCSB. org, tailor-made for efficient management, search, organization, and visualization of this burgeoning corpus of 3D macromolecular structure data.
人工智能和机器学习的最新进展(例如,AlphaFold, rosettfold和ESMFold)能够仅从氨基酸序列预测三维(3D)蛋白质结构,其精度可与低分辨率实验方法相媲美。这些工具已被用于预测整个蛋白质组的结构和大规模宏基因组序列研究的结果,产生了可用的生物分子3D结构信息的指数增长。考虑到这些新计算的生物结构数据的巨大容量,迫切需要一个强大的工具来管理、搜索、聚类和可视化大量的结构集合。同样重要的是有效总结和可视化元数据、生物/生化注释和结构特征的能力,特别是当处理来自蛋白质数据库(PDB)和计算预测模型的大量实验来源的蛋白质结构时。此外,研究人员需要先进的可视化技术来支持多序列和结构比对的交互式探索。本文介绍了RCSB PDB研究门户网站RCSB上提供的一套工具。org,专为高效管理、搜索、组织和可视化这个新兴的3D大分子结构数据语料库而量身定制。
{"title":"RCSB Protein Data Bank: visualizing groups of experimentally determined PDB structures alongside computed structure models of proteins","authors":"J. Segura, Yana Rose, Chunxiao Bi, Jose M. Duarte, Stephen K. Burley, S. Bittrich","doi":"10.3389/fbinf.2023.1311287","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1311287","url":null,"abstract":"Recent advances in Artificial Intelligence and Machine Learning (e.g., AlphaFold, RosettaFold, and ESMFold) enable prediction of three-dimensional (3D) protein structures from amino acid sequences alone at accuracies comparable to lower-resolution experimental methods. These tools have been employed to predict structures across entire proteomes and the results of large-scale metagenomic sequence studies, yielding an exponential increase in available biomolecular 3D structural information. Given the enormous volume of this newly computed biostructure data, there is an urgent need for robust tools to manage, search, cluster, and visualize large collections of structures. Equally important is the capability to efficiently summarize and visualize metadata, biological/biochemical annotations, and structural features, particularly when working with vast numbers of protein structures of both experimental origin from the Protein Data Bank (PDB) and computationally-predicted models. Moreover, researchers require advanced visualization techniques that support interactive exploration of multiple sequences and structural alignments. This paper introduces a suite of tools provided on the RCSB PDB research-focused web portal RCSB. org, tailor-made for efficient management, search, organization, and visualization of this burgeoning corpus of 3D macromolecular structure data.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138603602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-time open-source FLIM analysis. 实时开源 FLIM 分析。
Pub Date : 2023-11-30 eCollection Date: 2023-01-01 DOI: 10.3389/fbinf.2023.1286983
Kevin K D Tan, Mark A Tsuchida, Jenu V Chacko, Niklas A Gahm, Kevin W Eliceiri

Fluorescence lifetime imaging microscopy (FLIM) provides valuable quantitative insights into fluorophores' chemical microenvironment. Due to long computation times and the lack of accessible, open-source real-time analysis toolkits, traditional analysis of FLIM data, particularly with the widely used time-correlated single-photon counting (TCSPC) approach, typically occurs after acquisition. As a result, uncertainties about the quality of FLIM data persist even after collection, frequently necessitating the extension of imaging sessions. Unfortunately, prolonged sessions not only risk missing important biological events but also cause photobleaching and photodamage. We present the first open-source program designed for real-time FLIM analysis during specimen scanning to address these challenges. Our approach combines acquisition with real-time computational and visualization capabilities, allowing us to assess FLIM data quality on the fly. Our open-source real-time FLIM viewer, integrated as a Napari plugin, displays phasor analysis and rapid lifetime determination (RLD) results computed from real-time data transmitted by acquisition software such as the open-source Micro-Manager-based OpenScan package. Our method facilitates early identification of FLIM signatures and data quality assessment by providing preliminary analysis during acquisition. This not only speeds up the imaging process, but it is especially useful when imaging sensitive live biological samples.

荧光寿命成像显微镜(FLIM)为荧光团的化学微环境提供了宝贵的定量洞察力。由于计算时间长,且缺乏可访问的开源实时分析工具包,传统的 FLIM 数据分析,特别是广泛使用的时间相关单光子计数(TCSPC)方法,通常是在采集后进行的。因此,即使在采集之后,FLIM 数据质量的不确定性依然存在,经常需要延长成像时间。遗憾的是,延长成像时间不仅有可能错过重要的生物事件,还会造成光漂白和光损伤。为了应对这些挑战,我们推出了首个开源程序,用于在标本扫描过程中进行实时 FLIM 分析。我们的方法将采集与实时计算和可视化功能相结合,使我们能够即时评估 FLIM 数据质量。我们的开源实时 FLIM 查看器集成了 Napari 插件,可显示相位分析和快速寿命测定 (RLD) 结果,这些结果是通过基于开源 Micro-Manager 的 OpenScan 软件包等采集软件传输的实时数据计算得出的。我们的方法通过在采集过程中提供初步分析,有助于早期识别 FLIM 信号和数据质量评估。这不仅加快了成像过程,而且在对敏感的活体生物样本进行成像时尤其有用。
{"title":"Real-time open-source FLIM analysis.","authors":"Kevin K D Tan, Mark A Tsuchida, Jenu V Chacko, Niklas A Gahm, Kevin W Eliceiri","doi":"10.3389/fbinf.2023.1286983","DOIUrl":"10.3389/fbinf.2023.1286983","url":null,"abstract":"<p><p>Fluorescence lifetime imaging microscopy (FLIM) provides valuable quantitative insights into fluorophores' chemical microenvironment. Due to long computation times and the lack of accessible, open-source real-time analysis toolkits, traditional analysis of FLIM data, particularly with the widely used time-correlated single-photon counting (TCSPC) approach, typically occurs after acquisition. As a result, uncertainties about the quality of FLIM data persist even after collection, frequently necessitating the extension of imaging sessions. Unfortunately, prolonged sessions not only risk missing important biological events but also cause photobleaching and photodamage. We present the first open-source program designed for real-time FLIM analysis during specimen scanning to address these challenges. Our approach combines acquisition with real-time computational and visualization capabilities, allowing us to assess FLIM data quality on the fly. Our open-source real-time FLIM viewer, integrated as a Napari plugin, displays phasor analysis and rapid lifetime determination (RLD) results computed from real-time data transmitted by acquisition software such as the open-source Micro-Manager-based OpenScan package. Our method facilitates early identification of FLIM signatures and data quality assessment by providing preliminary analysis during acquisition. This not only speeds up the imaging process, but it is especially useful when imaging sensitive live biological samples.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10720713/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138813817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cluster analysis for localisation-based data sets: dos and don'ts when quantifying protein aggregates. 基于定位数据集的聚类分析:量化蛋白质聚集时的注意事项。
Pub Date : 2023-11-24 eCollection Date: 2023-01-01 DOI: 10.3389/fbinf.2023.1237551
Luca Panconi, Dylan M Owen, Juliette Griffié

Many proteins display a non-random distribution on the cell surface. From dimers to nanoscale clusters to large, micron-scale aggregations, these distributions regulate protein-protein interactions and signalling. Although these distributions show organisation on length-scales below the resolution limit of conventional optical microscopy, single molecule localisation microscopy (SMLM) can map molecule locations with nanometre precision. The data from SMLM is not a conventional pixelated image and instead takes the form of a point-pattern-a list of the x, y coordinates of the localised molecules. To extract the biological insights that researchers require cluster analysis is often performed on these data sets, quantifying such parameters as the size of clusters, the percentage of monomers and so on. Here, we provide some guidance on how SMLM clustering should best be performed.

许多蛋白质在细胞表面呈现非随机分布。从二聚体到纳米级团簇,再到大型微米级聚集体,这些分布调节着蛋白质与蛋白质之间的相互作用和信号传递。虽然这些分布显示的组织长度尺度低于传统光学显微镜的分辨率极限,但单分子定位显微镜(SMLM)可以绘制出纳米级精度的分子位置图。单分子定位显微镜的数据不是传统的像素化图像,而是以点图案的形式出现--即定位分子的 x、y 坐标列表。为了提取研究人员所需的生物学洞察力,通常会对这些数据集进行聚类分析,量化诸如聚类大小、单体百分比等参数。在此,我们将就如何最好地进行 SMLM 聚类提供一些指导。
{"title":"Cluster analysis for localisation-based data sets: dos and don'ts when quantifying protein aggregates.","authors":"Luca Panconi, Dylan M Owen, Juliette Griffié","doi":"10.3389/fbinf.2023.1237551","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1237551","url":null,"abstract":"<p><p>Many proteins display a non-random distribution on the cell surface. From dimers to nanoscale clusters to large, micron-scale aggregations, these distributions regulate protein-protein interactions and signalling. Although these distributions show organisation on length-scales below the resolution limit of conventional optical microscopy, single molecule localisation microscopy (SMLM) can map molecule locations with nanometre precision. The data from SMLM is not a conventional pixelated image and instead takes the form of a point-pattern-a list of the x, y coordinates of the localised molecules. To extract the biological insights that researchers require cluster analysis is often performed on these data sets, quantifying such parameters as the size of clusters, the percentage of monomers and so on. Here, we provide some guidance on how SMLM clustering should best be performed.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10704244/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138813816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The promises of large language models for protein design and modeling. 大语言模型在蛋白质设计和建模方面的前景。
Pub Date : 2023-11-23 eCollection Date: 2023-01-01 DOI: 10.3389/fbinf.2023.1304099
Giorgio Valentini, Dario Malchiodi, Jessica Gliozzo, Marco Mesiti, Mauricio Soto-Gomez, Alberto Cabri, Justin Reese, Elena Casiraghi, Peter N Robinson

The recent breakthroughs of Large Language Models (LLMs) in the context of natural language processing have opened the way to significant advances in protein research. Indeed, the relationships between human natural language and the "language of proteins" invite the application and adaptation of LLMs to protein modelling and design. Considering the impressive results of GPT-4 and other recently developed LLMs in processing, generating and translating human languages, we anticipate analogous results with the language of proteins. Indeed, protein language models have been already trained to accurately predict protein properties, generate novel functionally characterized proteins, achieving state-of-the-art results. In this paper we discuss the promises and the open challenges raised by this novel and exciting research area, and we propose our perspective on how LLMs will affect protein modeling and design.

大语言模型(LLMs)最近在自然语言处理方面取得的突破为蛋白质研究的重大进展开辟了道路。事实上,人类自然语言与 "蛋白质语言 "之间的关系促使人们将大型语言模型应用于蛋白质建模和设计。考虑到 GPT-4 和其他最近开发的 LLM 在处理、生成和翻译人类语言方面取得的令人印象深刻的成果,我们预计蛋白质语言也会取得类似的成果。事实上,蛋白质语言模型已经经过训练,可以准确预测蛋白质特性,生成具有功能特征的新型蛋白质,取得了最先进的成果。在本文中,我们将讨论这一令人兴奋的新研究领域所带来的前景和挑战,并就 LLM 将如何影响蛋白质建模和设计提出我们的看法。
{"title":"The promises of large language models for protein design and modeling.","authors":"Giorgio Valentini, Dario Malchiodi, Jessica Gliozzo, Marco Mesiti, Mauricio Soto-Gomez, Alberto Cabri, Justin Reese, Elena Casiraghi, Peter N Robinson","doi":"10.3389/fbinf.2023.1304099","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1304099","url":null,"abstract":"<p><p>The recent breakthroughs of Large Language Models (LLMs) in the context of natural language processing have opened the way to significant advances in protein research. Indeed, the relationships between human natural language and the \"language of proteins\" invite the application and adaptation of LLMs to protein modelling and design. Considering the impressive results of GPT-4 and other recently developed LLMs in processing, generating and translating human languages, we anticipate analogous results with the language of proteins. Indeed, protein language models have been already trained to accurately predict protein properties, generate novel functionally characterized proteins, achieving state-of-the-art results. In this paper we discuss the promises and the open challenges raised by this novel and exciting research area, and we propose our perspective on how LLMs will affect protein modeling and design.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10701588/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138813818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Blob-B-Gone: a lightweight framework for removing blob artifacts from 2D/3D MINFLUX single-particle tracking data. Blob-B-Gone:从二维/三维 MINFLUX 单粒子跟踪数据中去除 Blob 伪影的轻量级框架。
Pub Date : 2023-11-22 eCollection Date: 2023-01-01 DOI: 10.3389/fbinf.2023.1268899
Bela T L Vogler, Francesco Reina, Christian Eggeling

In this study, we introduce Blob-B-Gone, a lightweight framework to computationally differentiate and eventually remove dense isotropic localization accumulations (blobs) caused by artifactually immobilized particles in MINFLUX single-particle tracking (SPT) measurements. This approach uses purely geometrical features extracted from MINFLUX-detected single-particle trajectories, which are treated as point clouds of localizations. Employing k-means++ clustering, we perform single-shot separation of the feature space to rapidly extract blobs from the dataset without the need for training. We automatically annotate the resulting sub-sets and, finally, evaluate our results by means of principal component analysis (PCA), highlighting a clear separation in the feature space. We demonstrate our approach using two- and three-dimensional simulations of freely diffusing particles and blob artifacts based on parameters extracted from hand-labeled MINFLUX tracking data of fixed 23-nm bead samples and two-dimensional diffusing quantum dots on model lipid membranes. Applying Blob-B-Gone, we achieve a clear distinction between blob-like and other trajectories, represented in F1 scores of 0.998 (2D) and 1.0 (3D) as well as 0.995 (balanced) and 0.994 (imbalanced). This framework can be straightforwardly applied to similar situations, where discerning between blob and elongated time traces is desirable. Given a number of localizations sufficient to express geometric features, the method can operate on any generic point clouds presented to it, regardless of its origin.

在本研究中,我们介绍了 Blob-B-Gone,这是一个轻量级框架,用于计算区分并最终移除 MINFLUX 单粒子跟踪(SPT)测量中由人为固定粒子造成的致密各向同性定位堆积(blob)。这种方法使用从 MINFLUX 检测到的单粒子轨迹中提取的纯几何特征,这些轨迹被视为定位的点云。我们采用 k-means++ 聚类,对特征空间进行单次分离,从而无需训练即可从数据集中快速提取 Blob。我们自动注释生成的子集,最后通过主成分分析(PCA)评估我们的结果,突出了特征空间的明显分离。我们使用二维和三维模拟自由扩散粒子和 Blob 伪影来演示我们的方法,这些粒子和伪影的参数是从固定的 23 纳米珠子样本和模型脂膜上二维扩散量子点的手工标记 MINFLUX 跟踪数据中提取的。通过应用 Blob-B-Gone,我们明确区分了类 Blob 轨迹和其他轨迹,F1 分数分别为 0.998(二维)和 1.0(三维),以及 0.995(平衡)和 0.994(不平衡)。这一框架可直接应用于类似的情况,即需要区分 Blob 和拉长的时间轨迹。由于定位的数量足以表达几何特征,因此该方法可用于任何通用点云,无论其来源如何。
{"title":"Blob-B-Gone: a lightweight framework for removing blob artifacts from 2D/3D MINFLUX single-particle tracking data.","authors":"Bela T L Vogler, Francesco Reina, Christian Eggeling","doi":"10.3389/fbinf.2023.1268899","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1268899","url":null,"abstract":"<p><p>In this study, we introduce Blob-B-Gone, a lightweight framework to computationally differentiate and eventually remove dense isotropic localization accumulations (blobs) caused by artifactually immobilized particles in MINFLUX single-particle tracking (SPT) measurements. This approach uses purely geometrical features extracted from MINFLUX-detected single-particle trajectories, which are treated as point clouds of localizations. Employing <i>k-means++</i> clustering, we perform single-shot separation of the feature space to rapidly extract blobs from the dataset without the need for training. We automatically annotate the resulting sub-sets and, finally, evaluate our results by means of principal component analysis (PCA), highlighting a clear separation in the feature space. We demonstrate our approach using two- and three-dimensional simulations of freely diffusing particles and blob artifacts based on parameters extracted from hand-labeled MINFLUX tracking data of fixed 23-nm bead samples and two-dimensional diffusing quantum dots on model lipid membranes. Applying Blob-B-Gone, we achieve a clear distinction between blob-like and other trajectories, represented in F1 scores of 0.998 (2D) and 1.0 (3D) as well as 0.995 (balanced) and 0.994 (imbalanced). This framework can be straightforwardly applied to similar situations, where discerning between blob and elongated time traces is desirable. Given a number of localizations sufficient to express geometric features, the method can operate on any generic point clouds presented to it, regardless of its origin.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10704905/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138813815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of an antimicrobial resistance plasmid transfer gene database for enteric bacteria 肠道细菌耐药质粒转移基因数据库的建立
Pub Date : 2023-11-14 DOI: 10.3389/fbinf.2023.1279359
Suad Algarni, Steven L. Foley, Hailin Tang, Shaohua Zhao, Dereje D. Gudeta, Bijay K. Khajanchi, Steven C. Ricke, Jing Han
Introduction: Type IV secretion systems (T4SSs) are integral parts of the conjugation process in enteric bacteria. These secretion systems are encoded within the transfer ( tra ) regions of plasmids, including those that harbor antimicrobial resistance (AMR) genes. The conjugal transfer of resistance plasmids can lead to the dissemination of AMR among bacterial populations. Methods: To facilitate the analyses of the conjugation-associated genes, transfer related genes associated with key groups of AMR plasmids were identified, extracted from GenBank and used to generate a plasmid transfer gene dataset that is part of the Virulence and Plasmid Transfer Factor Database at FDA, serving as the foundation for computational tools for the comparison of the conjugal transfer genes. To assess the genetic feature of the transfer gene database, genes/proteins of the same name (e.g., traI/ TraI) or predicted function (VirD4 ATPase homologs) were compared across the different plasmid types to assess sequence diversity. Two analyses tools, the Plasmid Transfer Factor Profile Assessment and Plasmid Transfer Factor Comparison tools, were developed to evaluate the transfer genes located on plasmids and to facilitate the comparison of plasmids from multiple sequence files. To assess the database and associated tools, plasmid, and whole genome sequencing (WGS) data were extracted from GenBank and previous WGS experiments in our lab and assessed using the analysis tools. Results: Overall, the plasmid transfer database and associated tools proved to be very useful for evaluating the different plasmid types, their association with T4SSs, and increased our understanding how conjugative plasmids contribute to the dissemination of AMR genes.
IV型分泌系统(t4ss)是肠道细菌结合过程的组成部分。这些分泌系统在质粒的转移(tra)区域内编码,包括那些含有抗菌素耐药性(AMR)基因的区域。耐药质粒的共轭转移可导致抗菌素耐药性在细菌群体中的传播。方法:为了便于结合相关基因的分析,鉴定了与AMR质粒关键组相关的转移相关基因,从GenBank中提取并用于生成质粒转移基因数据集,该数据集是FDA毒力和质粒转移因子数据库的一部分,作为比较共轭转移基因的计算工具的基础。为了评估转移基因数据库的遗传特征,在不同的质粒类型中比较相同名称的基因/蛋白质(例如,traI/ traI)或预测功能(VirD4 ATPase同源物),以评估序列多样性。开发了质粒转移因子谱评估和质粒转移因子比较两种分析工具,用于评估质粒上的转移基因,并便于对多个序列文件中的质粒进行比较。为了评估数据库和相关工具,我们从GenBank和实验室之前的WGS实验中提取质粒和全基因组测序(WGS)数据,并使用分析工具进行评估。结果:总体而言,质粒转移数据库和相关工具被证明对评估不同类型的质粒及其与t4ss的关系非常有用,并且增加了我们对结合质粒如何促进AMR基因传播的理解。
{"title":"Development of an antimicrobial resistance plasmid transfer gene database for enteric bacteria","authors":"Suad Algarni, Steven L. Foley, Hailin Tang, Shaohua Zhao, Dereje D. Gudeta, Bijay K. Khajanchi, Steven C. Ricke, Jing Han","doi":"10.3389/fbinf.2023.1279359","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1279359","url":null,"abstract":"Introduction: Type IV secretion systems (T4SSs) are integral parts of the conjugation process in enteric bacteria. These secretion systems are encoded within the transfer ( tra ) regions of plasmids, including those that harbor antimicrobial resistance (AMR) genes. The conjugal transfer of resistance plasmids can lead to the dissemination of AMR among bacterial populations. Methods: To facilitate the analyses of the conjugation-associated genes, transfer related genes associated with key groups of AMR plasmids were identified, extracted from GenBank and used to generate a plasmid transfer gene dataset that is part of the Virulence and Plasmid Transfer Factor Database at FDA, serving as the foundation for computational tools for the comparison of the conjugal transfer genes. To assess the genetic feature of the transfer gene database, genes/proteins of the same name (e.g., traI/ TraI) or predicted function (VirD4 ATPase homologs) were compared across the different plasmid types to assess sequence diversity. Two analyses tools, the Plasmid Transfer Factor Profile Assessment and Plasmid Transfer Factor Comparison tools, were developed to evaluate the transfer genes located on plasmids and to facilitate the comparison of plasmids from multiple sequence files. To assess the database and associated tools, plasmid, and whole genome sequencing (WGS) data were extracted from GenBank and previous WGS experiments in our lab and assessed using the analysis tools. Results: Overall, the plasmid transfer database and associated tools proved to be very useful for evaluating the different plasmid types, their association with T4SSs, and increased our understanding how conjugative plasmids contribute to the dissemination of AMR genes.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134902965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial: Identification of phenotypically important genomic variants. 社论:表型上重要的基因组变异的鉴定。
Pub Date : 2023-11-10 eCollection Date: 2023-01-01 DOI: 10.3389/fbinf.2023.1328945
Elizabeth A Heron, Giorgio Valle, Anna Bernasconi
{"title":"Editorial: Identification of phenotypically important genomic variants.","authors":"Elizabeth A Heron, Giorgio Valle, Anna Bernasconi","doi":"10.3389/fbinf.2023.1328945","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1328945","url":null,"abstract":"","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10668015/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138464731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1