首页 > 最新文献

Frontiers in bioinformatics最新文献

英文 中文
Real-time open-source FLIM analysis. 实时开源 FLIM 分析。
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-11-30 eCollection Date: 2023-01-01 DOI: 10.3389/fbinf.2023.1286983
Kevin K D Tan, Mark A Tsuchida, Jenu V Chacko, Niklas A Gahm, Kevin W Eliceiri

Fluorescence lifetime imaging microscopy (FLIM) provides valuable quantitative insights into fluorophores' chemical microenvironment. Due to long computation times and the lack of accessible, open-source real-time analysis toolkits, traditional analysis of FLIM data, particularly with the widely used time-correlated single-photon counting (TCSPC) approach, typically occurs after acquisition. As a result, uncertainties about the quality of FLIM data persist even after collection, frequently necessitating the extension of imaging sessions. Unfortunately, prolonged sessions not only risk missing important biological events but also cause photobleaching and photodamage. We present the first open-source program designed for real-time FLIM analysis during specimen scanning to address these challenges. Our approach combines acquisition with real-time computational and visualization capabilities, allowing us to assess FLIM data quality on the fly. Our open-source real-time FLIM viewer, integrated as a Napari plugin, displays phasor analysis and rapid lifetime determination (RLD) results computed from real-time data transmitted by acquisition software such as the open-source Micro-Manager-based OpenScan package. Our method facilitates early identification of FLIM signatures and data quality assessment by providing preliminary analysis during acquisition. This not only speeds up the imaging process, but it is especially useful when imaging sensitive live biological samples.

荧光寿命成像显微镜(FLIM)为荧光团的化学微环境提供了宝贵的定量洞察力。由于计算时间长,且缺乏可访问的开源实时分析工具包,传统的 FLIM 数据分析,特别是广泛使用的时间相关单光子计数(TCSPC)方法,通常是在采集后进行的。因此,即使在采集之后,FLIM 数据质量的不确定性依然存在,经常需要延长成像时间。遗憾的是,延长成像时间不仅有可能错过重要的生物事件,还会造成光漂白和光损伤。为了应对这些挑战,我们推出了首个开源程序,用于在标本扫描过程中进行实时 FLIM 分析。我们的方法将采集与实时计算和可视化功能相结合,使我们能够即时评估 FLIM 数据质量。我们的开源实时 FLIM 查看器集成了 Napari 插件,可显示相位分析和快速寿命测定 (RLD) 结果,这些结果是通过基于开源 Micro-Manager 的 OpenScan 软件包等采集软件传输的实时数据计算得出的。我们的方法通过在采集过程中提供初步分析,有助于早期识别 FLIM 信号和数据质量评估。这不仅加快了成像过程,而且在对敏感的活体生物样本进行成像时尤其有用。
{"title":"Real-time open-source FLIM analysis.","authors":"Kevin K D Tan, Mark A Tsuchida, Jenu V Chacko, Niklas A Gahm, Kevin W Eliceiri","doi":"10.3389/fbinf.2023.1286983","DOIUrl":"10.3389/fbinf.2023.1286983","url":null,"abstract":"<p><p>Fluorescence lifetime imaging microscopy (FLIM) provides valuable quantitative insights into fluorophores' chemical microenvironment. Due to long computation times and the lack of accessible, open-source real-time analysis toolkits, traditional analysis of FLIM data, particularly with the widely used time-correlated single-photon counting (TCSPC) approach, typically occurs after acquisition. As a result, uncertainties about the quality of FLIM data persist even after collection, frequently necessitating the extension of imaging sessions. Unfortunately, prolonged sessions not only risk missing important biological events but also cause photobleaching and photodamage. We present the first open-source program designed for real-time FLIM analysis during specimen scanning to address these challenges. Our approach combines acquisition with real-time computational and visualization capabilities, allowing us to assess FLIM data quality on the fly. Our open-source real-time FLIM viewer, integrated as a Napari plugin, displays phasor analysis and rapid lifetime determination (RLD) results computed from real-time data transmitted by acquisition software such as the open-source Micro-Manager-based OpenScan package. Our method facilitates early identification of FLIM signatures and data quality assessment by providing preliminary analysis during acquisition. This not only speeds up the imaging process, but it is especially useful when imaging sensitive live biological samples.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1286983"},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10720713/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138813817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cluster analysis for localisation-based data sets: dos and don'ts when quantifying protein aggregates. 基于定位数据集的聚类分析:量化蛋白质聚集时的注意事项。
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-11-24 eCollection Date: 2023-01-01 DOI: 10.3389/fbinf.2023.1237551
Luca Panconi, Dylan M Owen, Juliette Griffié

Many proteins display a non-random distribution on the cell surface. From dimers to nanoscale clusters to large, micron-scale aggregations, these distributions regulate protein-protein interactions and signalling. Although these distributions show organisation on length-scales below the resolution limit of conventional optical microscopy, single molecule localisation microscopy (SMLM) can map molecule locations with nanometre precision. The data from SMLM is not a conventional pixelated image and instead takes the form of a point-pattern-a list of the x, y coordinates of the localised molecules. To extract the biological insights that researchers require cluster analysis is often performed on these data sets, quantifying such parameters as the size of clusters, the percentage of monomers and so on. Here, we provide some guidance on how SMLM clustering should best be performed.

许多蛋白质在细胞表面呈现非随机分布。从二聚体到纳米级团簇,再到大型微米级聚集体,这些分布调节着蛋白质与蛋白质之间的相互作用和信号传递。虽然这些分布显示的组织长度尺度低于传统光学显微镜的分辨率极限,但单分子定位显微镜(SMLM)可以绘制出纳米级精度的分子位置图。单分子定位显微镜的数据不是传统的像素化图像,而是以点图案的形式出现--即定位分子的 x、y 坐标列表。为了提取研究人员所需的生物学洞察力,通常会对这些数据集进行聚类分析,量化诸如聚类大小、单体百分比等参数。在此,我们将就如何最好地进行 SMLM 聚类提供一些指导。
{"title":"Cluster analysis for localisation-based data sets: dos and don'ts when quantifying protein aggregates.","authors":"Luca Panconi, Dylan M Owen, Juliette Griffié","doi":"10.3389/fbinf.2023.1237551","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1237551","url":null,"abstract":"<p><p>Many proteins display a non-random distribution on the cell surface. From dimers to nanoscale clusters to large, micron-scale aggregations, these distributions regulate protein-protein interactions and signalling. Although these distributions show organisation on length-scales below the resolution limit of conventional optical microscopy, single molecule localisation microscopy (SMLM) can map molecule locations with nanometre precision. The data from SMLM is not a conventional pixelated image and instead takes the form of a point-pattern-a list of the x, y coordinates of the localised molecules. To extract the biological insights that researchers require cluster analysis is often performed on these data sets, quantifying such parameters as the size of clusters, the percentage of monomers and so on. Here, we provide some guidance on how SMLM clustering should best be performed.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1237551"},"PeriodicalIF":0.0,"publicationDate":"2023-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10704244/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138813816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The promises of large language models for protein design and modeling. 大语言模型在蛋白质设计和建模方面的前景。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-11-23 eCollection Date: 2023-01-01 DOI: 10.3389/fbinf.2023.1304099
Giorgio Valentini, Dario Malchiodi, Jessica Gliozzo, Marco Mesiti, Mauricio Soto-Gomez, Alberto Cabri, Justin Reese, Elena Casiraghi, Peter N Robinson

The recent breakthroughs of Large Language Models (LLMs) in the context of natural language processing have opened the way to significant advances in protein research. Indeed, the relationships between human natural language and the "language of proteins" invite the application and adaptation of LLMs to protein modelling and design. Considering the impressive results of GPT-4 and other recently developed LLMs in processing, generating and translating human languages, we anticipate analogous results with the language of proteins. Indeed, protein language models have been already trained to accurately predict protein properties, generate novel functionally characterized proteins, achieving state-of-the-art results. In this paper we discuss the promises and the open challenges raised by this novel and exciting research area, and we propose our perspective on how LLMs will affect protein modeling and design.

大语言模型(LLMs)最近在自然语言处理方面取得的突破为蛋白质研究的重大进展开辟了道路。事实上,人类自然语言与 "蛋白质语言 "之间的关系促使人们将大型语言模型应用于蛋白质建模和设计。考虑到 GPT-4 和其他最近开发的 LLM 在处理、生成和翻译人类语言方面取得的令人印象深刻的成果,我们预计蛋白质语言也会取得类似的成果。事实上,蛋白质语言模型已经经过训练,可以准确预测蛋白质特性,生成具有功能特征的新型蛋白质,取得了最先进的成果。在本文中,我们将讨论这一令人兴奋的新研究领域所带来的前景和挑战,并就 LLM 将如何影响蛋白质建模和设计提出我们的看法。
{"title":"The promises of large language models for protein design and modeling.","authors":"Giorgio Valentini, Dario Malchiodi, Jessica Gliozzo, Marco Mesiti, Mauricio Soto-Gomez, Alberto Cabri, Justin Reese, Elena Casiraghi, Peter N Robinson","doi":"10.3389/fbinf.2023.1304099","DOIUrl":"10.3389/fbinf.2023.1304099","url":null,"abstract":"<p><p>The recent breakthroughs of Large Language Models (LLMs) in the context of natural language processing have opened the way to significant advances in protein research. Indeed, the relationships between human natural language and the \"language of proteins\" invite the application and adaptation of LLMs to protein modelling and design. Considering the impressive results of GPT-4 and other recently developed LLMs in processing, generating and translating human languages, we anticipate analogous results with the language of proteins. Indeed, protein language models have been already trained to accurately predict protein properties, generate novel functionally characterized proteins, achieving state-of-the-art results. In this paper we discuss the promises and the open challenges raised by this novel and exciting research area, and we propose our perspective on how LLMs will affect protein modeling and design.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1304099"},"PeriodicalIF":2.8,"publicationDate":"2023-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10701588/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138813818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Blob-B-Gone: a lightweight framework for removing blob artifacts from 2D/3D MINFLUX single-particle tracking data. Blob-B-Gone:从二维/三维 MINFLUX 单粒子跟踪数据中去除 Blob 伪影的轻量级框架。
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-11-22 eCollection Date: 2023-01-01 DOI: 10.3389/fbinf.2023.1268899
Bela T L Vogler, Francesco Reina, Christian Eggeling

In this study, we introduce Blob-B-Gone, a lightweight framework to computationally differentiate and eventually remove dense isotropic localization accumulations (blobs) caused by artifactually immobilized particles in MINFLUX single-particle tracking (SPT) measurements. This approach uses purely geometrical features extracted from MINFLUX-detected single-particle trajectories, which are treated as point clouds of localizations. Employing k-means++ clustering, we perform single-shot separation of the feature space to rapidly extract blobs from the dataset without the need for training. We automatically annotate the resulting sub-sets and, finally, evaluate our results by means of principal component analysis (PCA), highlighting a clear separation in the feature space. We demonstrate our approach using two- and three-dimensional simulations of freely diffusing particles and blob artifacts based on parameters extracted from hand-labeled MINFLUX tracking data of fixed 23-nm bead samples and two-dimensional diffusing quantum dots on model lipid membranes. Applying Blob-B-Gone, we achieve a clear distinction between blob-like and other trajectories, represented in F1 scores of 0.998 (2D) and 1.0 (3D) as well as 0.995 (balanced) and 0.994 (imbalanced). This framework can be straightforwardly applied to similar situations, where discerning between blob and elongated time traces is desirable. Given a number of localizations sufficient to express geometric features, the method can operate on any generic point clouds presented to it, regardless of its origin.

在本研究中,我们介绍了 Blob-B-Gone,这是一个轻量级框架,用于计算区分并最终移除 MINFLUX 单粒子跟踪(SPT)测量中由人为固定粒子造成的致密各向同性定位堆积(blob)。这种方法使用从 MINFLUX 检测到的单粒子轨迹中提取的纯几何特征,这些轨迹被视为定位的点云。我们采用 k-means++ 聚类,对特征空间进行单次分离,从而无需训练即可从数据集中快速提取 Blob。我们自动注释生成的子集,最后通过主成分分析(PCA)评估我们的结果,突出了特征空间的明显分离。我们使用二维和三维模拟自由扩散粒子和 Blob 伪影来演示我们的方法,这些粒子和伪影的参数是从固定的 23 纳米珠子样本和模型脂膜上二维扩散量子点的手工标记 MINFLUX 跟踪数据中提取的。通过应用 Blob-B-Gone,我们明确区分了类 Blob 轨迹和其他轨迹,F1 分数分别为 0.998(二维)和 1.0(三维),以及 0.995(平衡)和 0.994(不平衡)。这一框架可直接应用于类似的情况,即需要区分 Blob 和拉长的时间轨迹。由于定位的数量足以表达几何特征,因此该方法可用于任何通用点云,无论其来源如何。
{"title":"Blob-B-Gone: a lightweight framework for removing blob artifacts from 2D/3D MINFLUX single-particle tracking data.","authors":"Bela T L Vogler, Francesco Reina, Christian Eggeling","doi":"10.3389/fbinf.2023.1268899","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1268899","url":null,"abstract":"<p><p>In this study, we introduce Blob-B-Gone, a lightweight framework to computationally differentiate and eventually remove dense isotropic localization accumulations (blobs) caused by artifactually immobilized particles in MINFLUX single-particle tracking (SPT) measurements. This approach uses purely geometrical features extracted from MINFLUX-detected single-particle trajectories, which are treated as point clouds of localizations. Employing <i>k-means++</i> clustering, we perform single-shot separation of the feature space to rapidly extract blobs from the dataset without the need for training. We automatically annotate the resulting sub-sets and, finally, evaluate our results by means of principal component analysis (PCA), highlighting a clear separation in the feature space. We demonstrate our approach using two- and three-dimensional simulations of freely diffusing particles and blob artifacts based on parameters extracted from hand-labeled MINFLUX tracking data of fixed 23-nm bead samples and two-dimensional diffusing quantum dots on model lipid membranes. Applying Blob-B-Gone, we achieve a clear distinction between blob-like and other trajectories, represented in F1 scores of 0.998 (2D) and 1.0 (3D) as well as 0.995 (balanced) and 0.994 (imbalanced). This framework can be straightforwardly applied to similar situations, where discerning between blob and elongated time traces is desirable. Given a number of localizations sufficient to express geometric features, the method can operate on any generic point clouds presented to it, regardless of its origin.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1268899"},"PeriodicalIF":0.0,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10704905/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138813815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of an antimicrobial resistance plasmid transfer gene database for enteric bacteria 肠道细菌耐药质粒转移基因数据库的建立
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-11-14 DOI: 10.3389/fbinf.2023.1279359
Suad Algarni, Steven L. Foley, Hailin Tang, Shaohua Zhao, Dereje D. Gudeta, Bijay K. Khajanchi, Steven C. Ricke, Jing Han
Introduction: Type IV secretion systems (T4SSs) are integral parts of the conjugation process in enteric bacteria. These secretion systems are encoded within the transfer ( tra ) regions of plasmids, including those that harbor antimicrobial resistance (AMR) genes. The conjugal transfer of resistance plasmids can lead to the dissemination of AMR among bacterial populations. Methods: To facilitate the analyses of the conjugation-associated genes, transfer related genes associated with key groups of AMR plasmids were identified, extracted from GenBank and used to generate a plasmid transfer gene dataset that is part of the Virulence and Plasmid Transfer Factor Database at FDA, serving as the foundation for computational tools for the comparison of the conjugal transfer genes. To assess the genetic feature of the transfer gene database, genes/proteins of the same name (e.g., traI/ TraI) or predicted function (VirD4 ATPase homologs) were compared across the different plasmid types to assess sequence diversity. Two analyses tools, the Plasmid Transfer Factor Profile Assessment and Plasmid Transfer Factor Comparison tools, were developed to evaluate the transfer genes located on plasmids and to facilitate the comparison of plasmids from multiple sequence files. To assess the database and associated tools, plasmid, and whole genome sequencing (WGS) data were extracted from GenBank and previous WGS experiments in our lab and assessed using the analysis tools. Results: Overall, the plasmid transfer database and associated tools proved to be very useful for evaluating the different plasmid types, their association with T4SSs, and increased our understanding how conjugative plasmids contribute to the dissemination of AMR genes.
IV型分泌系统(t4ss)是肠道细菌结合过程的组成部分。这些分泌系统在质粒的转移(tra)区域内编码,包括那些含有抗菌素耐药性(AMR)基因的区域。耐药质粒的共轭转移可导致抗菌素耐药性在细菌群体中的传播。方法:为了便于结合相关基因的分析,鉴定了与AMR质粒关键组相关的转移相关基因,从GenBank中提取并用于生成质粒转移基因数据集,该数据集是FDA毒力和质粒转移因子数据库的一部分,作为比较共轭转移基因的计算工具的基础。为了评估转移基因数据库的遗传特征,在不同的质粒类型中比较相同名称的基因/蛋白质(例如,traI/ traI)或预测功能(VirD4 ATPase同源物),以评估序列多样性。开发了质粒转移因子谱评估和质粒转移因子比较两种分析工具,用于评估质粒上的转移基因,并便于对多个序列文件中的质粒进行比较。为了评估数据库和相关工具,我们从GenBank和实验室之前的WGS实验中提取质粒和全基因组测序(WGS)数据,并使用分析工具进行评估。结果:总体而言,质粒转移数据库和相关工具被证明对评估不同类型的质粒及其与t4ss的关系非常有用,并且增加了我们对结合质粒如何促进AMR基因传播的理解。
{"title":"Development of an antimicrobial resistance plasmid transfer gene database for enteric bacteria","authors":"Suad Algarni, Steven L. Foley, Hailin Tang, Shaohua Zhao, Dereje D. Gudeta, Bijay K. Khajanchi, Steven C. Ricke, Jing Han","doi":"10.3389/fbinf.2023.1279359","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1279359","url":null,"abstract":"Introduction: Type IV secretion systems (T4SSs) are integral parts of the conjugation process in enteric bacteria. These secretion systems are encoded within the transfer ( tra ) regions of plasmids, including those that harbor antimicrobial resistance (AMR) genes. The conjugal transfer of resistance plasmids can lead to the dissemination of AMR among bacterial populations. Methods: To facilitate the analyses of the conjugation-associated genes, transfer related genes associated with key groups of AMR plasmids were identified, extracted from GenBank and used to generate a plasmid transfer gene dataset that is part of the Virulence and Plasmid Transfer Factor Database at FDA, serving as the foundation for computational tools for the comparison of the conjugal transfer genes. To assess the genetic feature of the transfer gene database, genes/proteins of the same name (e.g., traI/ TraI) or predicted function (VirD4 ATPase homologs) were compared across the different plasmid types to assess sequence diversity. Two analyses tools, the Plasmid Transfer Factor Profile Assessment and Plasmid Transfer Factor Comparison tools, were developed to evaluate the transfer genes located on plasmids and to facilitate the comparison of plasmids from multiple sequence files. To assess the database and associated tools, plasmid, and whole genome sequencing (WGS) data were extracted from GenBank and previous WGS experiments in our lab and assessed using the analysis tools. Results: Overall, the plasmid transfer database and associated tools proved to be very useful for evaluating the different plasmid types, their association with T4SSs, and increased our understanding how conjugative plasmids contribute to the dissemination of AMR genes.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"51 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134902965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Editorial: Identification of phenotypically important genomic variants. 社论:表型上重要的基因组变异的鉴定。
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-11-10 eCollection Date: 2023-01-01 DOI: 10.3389/fbinf.2023.1328945
Elizabeth A Heron, Giorgio Valle, Anna Bernasconi
{"title":"Editorial: Identification of phenotypically important genomic variants.","authors":"Elizabeth A Heron, Giorgio Valle, Anna Bernasconi","doi":"10.3389/fbinf.2023.1328945","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1328945","url":null,"abstract":"","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1328945"},"PeriodicalIF":0.0,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10668015/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138464731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An algorithm for drug discovery based on deep learning with an example of developing a drug for the treatment of lung cancer 基于深度学习的药物发现算法,并以开发治疗肺癌的药物为例
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-11-09 DOI: 10.3389/fbinf.2023.1225149
Dmitrii K. Chebanov, Vsevolod A. Misyurin, Irina Zh. Shubina
In this study, we present an algorithmic framework integrated within the created software platform tailored for the discovery of novel small-molecule anti-tumor agents. Our approach was exemplified in the context of combatting lung cancer. In the initial phase, target identification for therapeutic intervention was accomplished. Leveraging deep learning, we scrutinized gene expression profiles, focusing on those associated with adverse clinical outcomes in lung cancer patients. Augmenting this, generative adversarial neural (GAN) networks were employed to amass additional patient data. This effort yielded a subset of genes definitively linked to unfavorable prognoses. We further employed deep learning to delineate genes capable of discriminating between normal and tumor tissues based on expression patterns. The remaining genes were earmarked as potential targets for precision lung cancer therapy. Subsequently, a dedicated module was formulated to predict the interactions between inhibitors and proteins. To achieve this, protein amino acid sequences and chemical compound formulations engaged in protein interactions were encoded into vectorized representations. Additionally, a deep learning-based component was developed to forecast IC 50 values through experimentation on cell lines. Virtual pre-clinical trials employing these inhibitors facilitated the selection of pertinent cell lines for subsequent laboratory assays. In summary, our study culminated in the derivation of several small-molecule formulas projected to bind selectively to specific proteins. This algorithmic platform holds promise in accelerating the identification and design of anti-tumor compounds, a critical pursuit in advancing targeted cancer therapies.
在这项研究中,我们提出了一个算法框架,集成在为发现新的小分子抗肿瘤药物而量身定制的软件平台中。我们的方法在抗击肺癌方面得到了例证。在初始阶段,完成了治疗干预的目标识别。利用深度学习,我们仔细研究了基因表达谱,重点关注与肺癌患者不良临床结果相关的基因表达谱。增强这一点,生成对抗神经(GAN)网络被用来积累额外的患者数据。这一努力产生了与不良预后明确相关的基因子集。我们进一步利用深度学习来描述能够根据表达模式区分正常和肿瘤组织的基因。剩下的基因被指定为精确肺癌治疗的潜在靶点。随后,制定了一个专用模块来预测抑制剂和蛋白质之间的相互作用。为了实现这一点,蛋白质氨基酸序列和参与蛋白质相互作用的化合物配方被编码成矢量表示。此外,开发了基于深度学习的组件,通过细胞系实验预测IC 50值。使用这些抑制剂的虚拟临床前试验促进了相关细胞系的选择,以进行后续的实验室分析。总之,我们的研究最终衍生出了几种小分子配方,预计可以选择性地与特定蛋白质结合。该算法平台有望加速抗肿瘤化合物的识别和设计,这是推进靶向癌症治疗的关键追求。
{"title":"An algorithm for drug discovery based on deep learning with an example of developing a drug for the treatment of lung cancer","authors":"Dmitrii K. Chebanov, Vsevolod A. Misyurin, Irina Zh. Shubina","doi":"10.3389/fbinf.2023.1225149","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1225149","url":null,"abstract":"In this study, we present an algorithmic framework integrated within the created software platform tailored for the discovery of novel small-molecule anti-tumor agents. Our approach was exemplified in the context of combatting lung cancer. In the initial phase, target identification for therapeutic intervention was accomplished. Leveraging deep learning, we scrutinized gene expression profiles, focusing on those associated with adverse clinical outcomes in lung cancer patients. Augmenting this, generative adversarial neural (GAN) networks were employed to amass additional patient data. This effort yielded a subset of genes definitively linked to unfavorable prognoses. We further employed deep learning to delineate genes capable of discriminating between normal and tumor tissues based on expression patterns. The remaining genes were earmarked as potential targets for precision lung cancer therapy. Subsequently, a dedicated module was formulated to predict the interactions between inhibitors and proteins. To achieve this, protein amino acid sequences and chemical compound formulations engaged in protein interactions were encoded into vectorized representations. Additionally, a deep learning-based component was developed to forecast IC 50 values through experimentation on cell lines. Virtual pre-clinical trials employing these inhibitors facilitated the selection of pertinent cell lines for subsequent laboratory assays. In summary, our study culminated in the derivation of several small-molecule formulas projected to bind selectively to specific proteins. This algorithmic platform holds promise in accelerating the identification and design of anti-tumor compounds, a critical pursuit in advancing targeted cancer therapies.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":" 24","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135292756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Software pipelines for RNA-Seq, ChIP-Seq and germline variant calling analyses in common workflow language (CWL) 基于通用工作流语言(CWL)的RNA-Seq、ChIP-Seq和种系变异调用分析软件管道
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-11-07 DOI: 10.3389/fbinf.2023.1275593
Konstantinos A. Kyritsis, Nikolaos Pechlivanis, Fotis Psomopoulos
Background: Automating data analysis pipelines is a key requirement to ensure reproducibility of results, especially when dealing with large volumes of data. Here we assembled automated pipelines for the analysis of High-throughput Sequencing (HTS) data originating from RNA-Seq, ChIP-Seq and Germline variant calling experiments. We implemented these workflows in Common workflow language (CWL) and evaluated their performance by: i) reproducing the results of two previously published studies on Chronic Lymphocytic Leukemia (CLL), and ii) analyzing whole genome sequencing data from four Genome in a Bottle Consortium (GIAB) samples, comparing the detected variants against their respective golden standard truth sets. Findings: We demonstrated that CWL-implemented workflows clearly achieved high accuracy in reproducing previously published results, discovering significant biomarkers and detecting germline SNP and small INDEL variants. Conclusion: CWL pipelines are characterized by reproducibility and reusability; combined with containerization, they provide the ability to overcome issues of software incompatibility and laborious configuration requirements. In addition, they are flexible and can be used immediately or adapted to the specific needs of an experiment or study. The CWL-based workflows developed in this study, along with version information for all software tools, are publicly available on GitHub ( https://github.com/BiodataAnalysisGroup/CWL_HTS_pipelines ) under the MIT License. They are suitable for the analysis of short-read (such as Illumina-based) data and constitute an open resource that can facilitate automation, reproducibility and cross-platform compatibility for standard bioinformatic analyses.
背景:自动化数据分析管道是确保结果可再现性的关键要求,特别是在处理大量数据时。在这里,我们组装了自动化管道,用于分析来自RNA-Seq, ChIP-Seq和种系变异召唤实验的高通量测序(HTS)数据。我们在通用工作流程语言(CWL)中实现了这些工作流程,并通过以下方式评估了它们的性能:i)再现了之前发表的两项关于慢性淋巴细胞白血病(CLL)的研究结果,ii)分析了来自四个genome in a Bottle Consortium (GIAB)样本的全基因组测序数据,将检测到的变体与各自的黄金标准真值集进行了比较。研究结果:我们证明了cwl实施的工作流程在复制先前发表的结果、发现重要的生物标志物和检测种系SNP和小INDEL变体方面明显达到了很高的准确性。结论:CWL管道具有重复性和可重用性;与容器化相结合,它们提供了克服软件不兼容和费力的配置需求问题的能力。此外,它们是灵活的,可以立即使用或适应实验或研究的具体需要。本研究中开发的基于cwl的工作流,以及所有软件工具的版本信息,在MIT许可下可在GitHub (https://github.com/BiodataAnalysisGroup/CWL_HTS_pipelines)上公开获得。它们适用于分析短读(如基于illumina的)数据,并构成一个开放资源,可以促进自动化,可重复性和标准生物信息学分析的跨平台兼容性。
{"title":"Software pipelines for RNA-Seq, ChIP-Seq and germline variant calling analyses in common workflow language (CWL)","authors":"Konstantinos A. Kyritsis, Nikolaos Pechlivanis, Fotis Psomopoulos","doi":"10.3389/fbinf.2023.1275593","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1275593","url":null,"abstract":"Background: Automating data analysis pipelines is a key requirement to ensure reproducibility of results, especially when dealing with large volumes of data. Here we assembled automated pipelines for the analysis of High-throughput Sequencing (HTS) data originating from RNA-Seq, ChIP-Seq and Germline variant calling experiments. We implemented these workflows in Common workflow language (CWL) and evaluated their performance by: i) reproducing the results of two previously published studies on Chronic Lymphocytic Leukemia (CLL), and ii) analyzing whole genome sequencing data from four Genome in a Bottle Consortium (GIAB) samples, comparing the detected variants against their respective golden standard truth sets. Findings: We demonstrated that CWL-implemented workflows clearly achieved high accuracy in reproducing previously published results, discovering significant biomarkers and detecting germline SNP and small INDEL variants. Conclusion: CWL pipelines are characterized by reproducibility and reusability; combined with containerization, they provide the ability to overcome issues of software incompatibility and laborious configuration requirements. In addition, they are flexible and can be used immediately or adapted to the specific needs of an experiment or study. The CWL-based workflows developed in this study, along with version information for all software tools, are publicly available on GitHub ( https://github.com/BiodataAnalysisGroup/CWL_HTS_pipelines ) under the MIT License. They are suitable for the analysis of short-read (such as Illumina-based) data and constitute an open resource that can facilitate automation, reproducibility and cross-platform compatibility for standard bioinformatic analyses.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135475666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Insights on poster preparation practices in life sciences 生命科学中海报制作实践的见解
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-11-01 DOI: 10.3389/fbinf.2023.1216139
Helena Klara Jambor
Posters are intended to spark scientific dialogue and are omnipresent at biological conferences. Guides and how-to articles help life scientists in preparing informative visualizations in poster format. However, posters shown at conferences are at present often overloaded with data and text and lack visual structure. Here, I surveyed life scientists themselves to understand how they are currently preparing posters and which parts they struggle with. Biologist spend on average two entire days preparing one poster, with half of the time devoted to visual design aspects. Most receive no design or software training and also receive little to no feedback when preparing their visualizations. In conclusion, training in visualization principles and tools for poster preparation would likely improve the quality of conference posters. This would also benefit other common visuals such as figures and slides, and improve the science communication of researchers overall.
海报旨在激发科学对话,在生物学会议上无处不在。指南和教程文章帮助生命科学家准备海报格式的信息可视化。然而,目前在会议上展示的海报往往数据和文字过多,缺乏视觉结构。在这里,我调查了生命科学家自己,以了解他们目前是如何准备海报的,以及他们在哪些方面遇到了困难。生物学家平均要花整整两天的时间准备一张海报,其中一半的时间用于视觉设计方面。大多数人没有接受过设计或软件培训,在准备可视化时也几乎没有收到任何反馈。最后,关于制作海报的可视化原则和工具的培训可能会提高会议海报的质量。这也将有利于其他常见的视觉效果,如图表和幻灯片,并从整体上改善研究人员的科学交流。
{"title":"Insights on poster preparation practices in life sciences","authors":"Helena Klara Jambor","doi":"10.3389/fbinf.2023.1216139","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1216139","url":null,"abstract":"Posters are intended to spark scientific dialogue and are omnipresent at biological conferences. Guides and how-to articles help life scientists in preparing informative visualizations in poster format. However, posters shown at conferences are at present often overloaded with data and text and lack visual structure. Here, I surveyed life scientists themselves to understand how they are currently preparing posters and which parts they struggle with. Biologist spend on average two entire days preparing one poster, with half of the time devoted to visual design aspects. Most receive no design or software training and also receive little to no feedback when preparing their visualizations. In conclusion, training in visualization principles and tools for poster preparation would likely improve the quality of conference posters. This would also benefit other common visuals such as figures and slides, and improve the science communication of researchers overall.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"126 1-2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135270825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genome-scale metabolic models consistently predict in vitro characteristics of Corynebacterium striatum. 基因组规模的代谢模型一致地预测纹状体棒状杆菌的体外特征。
Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2023-10-23 eCollection Date: 2023-01-01 DOI: 10.3389/fbinf.2023.1214074
Famke Bäuerle, Gwendolyn O Döbel, Laura Camus, Simon Heilbronner, Andreas Dräger

Introduction: Genome-scale metabolic models (GEMs) are organism-specific knowledge bases which can be used to unravel pathogenicity or improve production of specific metabolites in biotechnology applications. However, the validity of predictions for bacterial proliferation in in vitro settings is hardly investigated. Methods: The present work combines in silico and in vitro approaches to create and curate strain-specific genome-scale metabolic models of Corynebacterium striatum. Results: We introduce five newly created strain-specific genome-scale metabolic models (GEMs) of high quality, satisfying all contemporary standards and requirements. All these models have been benchmarked using the community standard test suite Metabolic Model Testing (MEMOTE) and were validated by laboratory experiments. For the curation of those models, the software infrastructure refineGEMs was developed to work on these models in parallel and to comply with the quality standards for GEMs. The model predictions were confirmed by experimental data and a new comparison metric based on the doubling time was developed to quantify bacterial growth. Discussion: Future modeling projects can rely on the proposed software, which is independent of specific environmental conditions. The validation approach based on the growth rate calculation is now accessible and closely aligned with biological questions. The curated models are freely available via BioModels and a GitHub repository and can be used. The open-source software refineGEMs is available from https://github.com/draeger-lab/refinegems.

引言:基因组规模代谢模型(GEMs)是生物体特有的知识库,可用于揭示致病性或提高生物技术应用中特定代谢产物的产生。然而,在体外环境中预测细菌增殖的有效性几乎没有得到研究。方法:本工作结合了计算机和体外方法,创建和策划纹状体棒状杆菌的菌株特异性基因组级代谢模型。结果:我们介绍了五个新创建的高质量菌株特异性基因组规模代谢模型(GEM),满足所有当代标准和要求。所有这些模型都使用社区标准测试套件代谢模型测试(MEMOTE)进行了基准测试,并通过实验室实验进行了验证。为了管理这些模型,开发了软件基础设施精化GEM,以并行处理这些模型,并符合GEM的质量标准。实验数据证实了模型预测,并开发了一种基于倍增时间的新的比较指标来量化细菌生长。讨论:未来的建模项目可以依赖于所提出的软件,该软件独立于特定的环境条件。基于生长率计算的验证方法现在可以使用,并且与生物学问题密切相关。策划的模型可以通过BioModels和GitHub存储库免费获得,并且可以使用。开源软件精化GEMS可从https://github.com/draeger-lab/refinegems.
{"title":"Genome-scale metabolic models consistently predict <i>in vitro</i> characteristics of <i>Corynebacterium striatum</i>.","authors":"Famke Bäuerle, Gwendolyn O Döbel, Laura Camus, Simon Heilbronner, Andreas Dräger","doi":"10.3389/fbinf.2023.1214074","DOIUrl":"10.3389/fbinf.2023.1214074","url":null,"abstract":"<p><p><b>Introduction:</b> Genome-scale metabolic models (GEMs) are organism-specific knowledge bases which can be used to unravel pathogenicity or improve production of specific metabolites in biotechnology applications. However, the validity of predictions for bacterial proliferation in <i>in vitro</i> settings is hardly investigated. <b>Methods:</b> The present work combines <i>in silico</i> and <i>in vitro</i> approaches to create and curate strain-specific genome-scale metabolic models of <i>Corynebacterium striatum</i>. <b>Results:</b> We introduce five newly created strain-specific genome-scale metabolic models (GEMs) of high quality, satisfying all contemporary standards and requirements. All these models have been benchmarked using the community standard test suite Metabolic Model Testing (MEMOTE) and were validated by laboratory experiments. For the curation of those models, the software infrastructure <i>refineGEMs</i> was developed to work on these models in parallel and to comply with the quality standards for GEMs. The model predictions were confirmed by experimental data and a new comparison metric based on the doubling time was developed to quantify bacterial growth. <b>Discussion:</b> Future modeling projects can rely on the proposed software, which is independent of specific environmental conditions. The validation approach based on the growth rate calculation is now accessible and closely aligned with biological questions. The curated models are freely available via BioModels and a GitHub repository and can be used. The open-source software refineGEMs is available from https://github.com/draeger-lab/refinegems.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1214074"},"PeriodicalIF":0.0,"publicationDate":"2023-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10626998/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71489591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1