Bioinformatics最新文献_第9页

Somatic mutation effects diffused over microRNA dysregulation. 体细胞突变效应扩散到microRNA失调。

IF 5.8 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Bioinformatics

Pub Date : 2023-09-02 DOI: 10.1093/bioinformatics/btad520

Hui Yu, Limin Jiang, Chung-I Li, Scott Ness, Sara G M Piccirillo, Yan Guo

Motivation: As an important player in transcriptome regulation, microRNAs may effectively diffuse somatic mutation impacts to broad cellular processes and ultimately manifest disease and dictate prognosis. Previous studies that tried to correlate mutation with gene expression dysregulation neglected to adjust for the disparate multitudes of false positives associated with unequal sample sizes and uneven class balancing scenarios.

Results: To properly address this issue, we developed a statistical framework to rigorously assess the extent of mutation impact on microRNAs in relation to a permutation-based null distribution of a matching sample structure. Carrying out the framework in a pan-cancer study, we ascertained 9008 protein-coding genes with statistically significant mutation impacts on miRNAs. Of these, the collective miRNA expression for 83 genes showed significant prognostic power in nine cancer types. For example, in lower-grade glioma, 10 genes' mutations broadly impacted miRNAs, all of which showed prognostic value with the corresponding miRNA expression. Our framework was further validated with functional analysis and augmented with rich features including the ability to analyze miRNA isoforms; aggregative prognostic analysis; advanced annotations such as mutation type, regulator alteration, somatic motif, and disease association; and instructive visualization such as mutation OncoPrint, Ideogram, and interactive mRNA-miRNA network.

Availability and implementation: The data underlying this article are available in MutMix, at http://innovebioinfo.com/Database/TmiEx/MutMix.php.

动机:作为转录组调控的重要参与者，microrna可以有效地将体细胞突变影响扩散到广泛的细胞过程，最终表现出疾病并决定预后。先前的研究试图将突变与基因表达失调联系起来，但忽略了对不同数量的假阳性进行调整，这些假阳性与不相等的样本量和不平衡的类平衡情况有关。结果:为了正确解决这个问题，我们开发了一个统计框架来严格评估突变对microrna的影响程度，该影响与基于排列的匹配样本结构的零分布有关。在一项泛癌症研究中，我们确定了9008个蛋白质编码基因，这些基因对mirna的突变影响具有统计学意义。其中，83个基因的miRNA集体表达在9种癌症类型中显示出显著的预后能力。例如，在低级别胶质瘤中，10个基因的突变广泛影响miRNA，这些突变都具有相应miRNA表达的预后价值。我们的框架通过功能分析得到进一步验证，并增加了丰富的功能，包括分析miRNA亚型的能力;综合预后分析;高级注释，如突变类型、调节因子改变、体细胞基序和疾病关联;以及具有指导意义的可视化，如突变oncopprint、Ideogram和相互作用的mRNA-miRNA网络。可用性和实现:本文的基础数据可在MutMix中获得，网址为http://innovebioinfo.com/Database/TmiEx/MutMix.php。

{"title":"Somatic mutation effects diffused over microRNA dysregulation.","authors":"Hui Yu, Limin Jiang, Chung-I Li, Scott Ness, Sara G M Piccirillo, Yan Guo","doi":"10.1093/bioinformatics/btad520","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad520","url":null,"abstract":"Motivation: As an important player in transcriptome regulation, microRNAs may effectively diffuse somatic mutation impacts to broad cellular processes and ultimately manifest disease and dictate prognosis. Previous studies that tried to correlate mutation with gene expression dysregulation neglected to adjust for the disparate multitudes of false positives associated with unequal sample sizes and uneven class balancing scenarios.Results: To properly address this issue, we developed a statistical framework to rigorously assess the extent of mutation impact on microRNAs in relation to a permutation-based null distribution of a matching sample structure. Carrying out the framework in a pan-cancer study, we ascertained 9008 protein-coding genes with statistically significant mutation impacts on miRNAs. Of these, the collective miRNA expression for 83 genes showed significant prognostic power in nine cancer types. For example, in lower-grade glioma, 10 genes' mutations broadly impacted miRNAs, all of which showed prognostic value with the corresponding miRNA expression. Our framework was further validated with functional analysis and augmented with rich features including the ability to analyze miRNA isoforms; aggregative prognostic analysis; advanced annotations such as mutation type, regulator alteration, somatic motif, and disease association; and instructive visualization such as mutation OncoPrint, Ideogram, and interactive mRNA-miRNA network.Availability and implementation: The data underlying this article are available in MutMix, at http://innovebioinfo.com/Database/TmiEx/MutMix.php.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"39 9","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10474951/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10335312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

cloneRate: fast estimation of single-cell clonal dynamics using coalescent theory. cloneRate：使用联合理论快速估计单细胞克隆动力学。

IF 4.4 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Bioinformatics

Pub Date : 2023-09-02 DOI: 10.1093/bioinformatics/btad561

Brian Johnson, Yubo Shuai, Jason Schweinsberg, Kit Curtius

Motivation: While evolutionary approaches to medicine show promise, measuring evolution itself is difficult due to experimental constraints and the dynamic nature of body systems. In cancer evolution, continuous observation of clonal architecture is impossible, and longitudinal samples from multiple timepoints are rare. Increasingly available DNA sequencing datasets at single-cell resolution enable the reconstruction of past evolution using mutational history, allowing for a better understanding of dynamics prior to detectable disease. There is an unmet need for an accurate, fast, and easy-to-use method to quantify clone growth dynamics from these datasets.

Results: We derived methods based on coalescent theory for estimating the net growth rate of clones using either reconstructed phylogenies or the number of shared mutations. We applied and validated our analytical methods for estimating the net growth rate of clones, eliminating the need for complex simulations used in previous methods. When applied to hematopoietic data, we show that our estimates may have broad applications to improve mechanistic understanding and prognostic ability. Compared to clones with a single or unknown driver mutation, clones with multiple drivers have significantly increased growth rates (median 0.94 versus 0.25 per year; P = 1.6×10-6). Further, stratifying patients with a myeloproliferative neoplasm (MPN) by the growth rate of their fittest clone shows that higher growth rates are associated with shorter time to MPN diagnosis (median 13.9 versus 26.4 months; P = 0.0026).

Availability and implementation: We developed a publicly available R package, cloneRate, to implement our methods (Package website: https://bdj34.github.io/cloneRate/). Source code: https://github.com/bdj34/cloneRate/.

动机：虽然进化医学方法显示出前景，但由于实验限制和身体系统的动态性质，测量进化本身很困难。在癌症进化中，克隆结构的连续观察是不可能的，并且来自多个时间点的纵向样本是罕见的。越来越多的单细胞分辨率的DNA测序数据集能够利用突变历史重建过去的进化，从而更好地了解可检测疾病之前的动力学。对一种准确、快速、易于使用的方法来量化这些数据集的克隆生长动态的需求尚未得到满足。结果：我们推导了基于联合理论的方法，使用重建的系统发育或共享突变的数量来估计克隆的净生长率。我们应用并验证了我们的分析方法来估计克隆的净增长率，消除了以前方法中使用的复杂模拟的需要。当应用于造血数据时，我们表明我们的估计可能具有广泛的应用，以提高对机制的理解和预后能力。与具有单一或未知驱动因素突变的克隆相比，具有多个驱动因素的克隆的生长率显著提高（中位数为0.94，而每年为0.25；P = 1.6×10-6）。此外，根据最适克隆的生长率对骨髓增生性肿瘤（MPN）患者进行分层显示，较高的生长率与较短的诊断时间有关（中位数13.9对26.4 月；P = 0.0026）。可用性和实现：我们开发了一个公开可用的R包cloneRate来实现我们的方法（包网站：https://bdj34.github.io/cloneRate/)。源代码：https://github.com/bdj34/cloneRate/.

{"title":"cloneRate: fast estimation of single-cell clonal dynamics using coalescent theory.","authors":"Brian Johnson, Yubo Shuai, Jason Schweinsberg, Kit Curtius","doi":"10.1093/bioinformatics/btad561","DOIUrl":"10.1093/bioinformatics/btad561","url":null,"abstract":"Motivation: While evolutionary approaches to medicine show promise, measuring evolution itself is difficult due to experimental constraints and the dynamic nature of body systems. In cancer evolution, continuous observation of clonal architecture is impossible, and longitudinal samples from multiple timepoints are rare. Increasingly available DNA sequencing datasets at single-cell resolution enable the reconstruction of past evolution using mutational history, allowing for a better understanding of dynamics prior to detectable disease. There is an unmet need for an accurate, fast, and easy-to-use method to quantify clone growth dynamics from these datasets.Results: We derived methods based on coalescent theory for estimating the net growth rate of clones using either reconstructed phylogenies or the number of shared mutations. We applied and validated our analytical methods for estimating the net growth rate of clones, eliminating the need for complex simulations used in previous methods. When applied to hematopoietic data, we show that our estimates may have broad applications to improve mechanistic understanding and prognostic ability. Compared to clones with a single or unknown driver mutation, clones with multiple drivers have significantly increased growth rates (median 0.94 versus 0.25 per year; P = 1.6×10-6). Further, stratifying patients with a myeloproliferative neoplasm (MPN) by the growth rate of their fittest clone shows that higher growth rates are associated with shorter time to MPN diagnosis (median 13.9 versus 26.4 months; P = 0.0026).Availability and implementation: We developed a publicly available R package, cloneRate, to implement our methods (Package website: https://bdj34.github.io/cloneRate/). Source code: https://github.com/bdj34/cloneRate/.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":" ","pages":""},"PeriodicalIF":4.4,"publicationDate":"2023-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10534056/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10226226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advancements in computational modelling of biological systems: seventh annual SysMod meeting 生物系统计算建模的进展:第七届SysMod年会

3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Bioinformatics

Pub Date : 2023-09-01 DOI: 10.1093/bioinformatics/btad539

Bhanwar Lal Puniya, Andreas Dräger

Abstract Summary The Computational Modelling of Systems Biology (SysMod) Community of Special Interest (COSI) convenes annually at the Intelligent Systems for Molecular Biology (ISMB) conference to facilitate knowledge dissemination and exchange of research findings on systems modelling from interdisciplinary domains. The SysMod meeting 2022 was held in a hybrid mode in Madison, Wisconsin, spanning a 1-day duration centred on modelling techniques, applications, and single-cell technology implementations. The meeting showcased innovative approaches to modelling biological systems using cell-specific and multiscale modelling, multiomics data integration, and novel tools to develop systems models using single-cell and multiomics technology. The meeting also recognized outstanding research by awarding the three best posters. This report summarizes the key highlights and outcomes of the meeting. Availability and implementation: All resources and further information are freely accessible at https://sysmod.info.

系统生物学计算建模(SysMod)特殊兴趣社区(COSI)每年在智能系统分子生物学(ISMB)会议上召开一次会议，以促进跨学科领域系统建模的知识传播和研究成果的交流。SysMod会议2022在威斯康星州麦迪逊以混合模式举行，为期1天，主要讨论建模技术、应用和单细胞技术实现。会议展示了使用细胞特异性和多尺度建模、多组学数据集成以及使用单细胞和多组学技术开发系统模型的新工具来建模生物系统的创新方法。会议还颁发了三张最佳海报，以表彰杰出的研究。本报告总结了会议的主要亮点和成果。可用性和实施:所有资源和进一步的信息都可以在https://sysmod.info上免费获取。

引用次数: 0

Coherent pathway enrichment estimation by modeling inter-pathway dependencies using regularized regression. 利用正则化回归建模路径间依赖关系的相干路径富集估计。

IF 5.8 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Bioinformatics

Pub Date : 2023-08-01 DOI: 10.1093/bioinformatics/btad522

Kim Philipp Jablonski, Niko Beerenwinkel

Motivation: Gene set enrichment methods are a common tool to improve the interpretability of gene lists as obtained, for example, from differential gene expression analyses. They are based on computing whether dysregulated genes are located in certain biological pathways more often than expected by chance. Gene set enrichment tools rely on pre-existing pathway databases such as KEGG, Reactome, or the Gene Ontology. These databases are increasing in size and in the number of redundancies between pathways, which complicates the statistical enrichment computation.

Results: We address this problem and develop a novel gene set enrichment method, called pareg, which is based on a regularized generalized linear model and directly incorporates dependencies between gene sets related to certain biological functions, for example, due to shared genes, in the enrichment computation. We show that pareg is more robust to noise than competing methods. Additionally, we demonstrate the ability of our method to recover known pathways as well as to suggest novel treatment targets in an exploratory analysis using breast cancer samples from TCGA.

Availability and implementation: pareg is freely available as an R package on Bioconductor (https://bioconductor.org/packages/release/bioc/html/pareg.html) as well as on https://github.com/cbg-ethz/pareg. The GitHub repository also contains the Snakemake workflows needed to reproduce all results presented here.

动机:基因集富集方法是提高基因列表可解释性的常用工具，例如，从差异基因表达分析中获得。它们的基础是计算失调基因在某些生物途径中的位置是否比预期的偶然更频繁。基因集富集工具依赖于预先存在的通路数据库，如KEGG、Reactome或Gene Ontology。这些数据库的规模和路径之间的冗余数量都在增加，这使得统计富集计算变得复杂。结果:我们解决了这一问题，并开发了一种新的基因集富集方法，称为pareg，该方法基于正则化广义线性模型，并在富集计算中直接纳入与某些生物功能相关的基因集之间的依赖关系，例如，由于共享基因。我们证明pareg比竞争方法对噪声的鲁棒性更强。此外，我们证明了我们的方法能够恢复已知途径，并在使用TCGA乳腺癌样本的探索性分析中提出新的治疗靶点。可用性和实现:pareg作为R包可以在Bioconductor (https://bioconductor.org/packages/release/bioc/html/pareg.html)和https://github.com/cbg-ethz/pareg上免费获得。GitHub存储库还包含了蛇makake工作流，它需要重现这里展示的所有结果。

{"title":"Coherent pathway enrichment estimation by modeling inter-pathway dependencies using regularized regression.","authors":"Kim Philipp Jablonski, Niko Beerenwinkel","doi":"10.1093/bioinformatics/btad522","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad522","url":null,"abstract":"Motivation: Gene set enrichment methods are a common tool to improve the interpretability of gene lists as obtained, for example, from differential gene expression analyses. They are based on computing whether dysregulated genes are located in certain biological pathways more often than expected by chance. Gene set enrichment tools rely on pre-existing pathway databases such as KEGG, Reactome, or the Gene Ontology. These databases are increasing in size and in the number of redundancies between pathways, which complicates the statistical enrichment computation.Results: We address this problem and develop a novel gene set enrichment method, called pareg, which is based on a regularized generalized linear model and directly incorporates dependencies between gene sets related to certain biological functions, for example, due to shared genes, in the enrichment computation. We show that pareg is more robust to noise than competing methods. Additionally, we demonstrate the ability of our method to recover known pathways as well as to suggest novel treatment targets in an exploratory analysis using breast cancer samples from TCGA.Availability and implementation: pareg is freely available as an R package on Bioconductor (https://bioconductor.org/packages/release/bioc/html/pareg.html) as well as on https://github.com/cbg-ethz/pareg. The GitHub repository also contains the Snakemake workflows needed to reproduce all results presented here.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"39 8","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10471899/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10647981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Block Aligner: an adaptive SIMD-accelerated aligner for sequences and position-specific scoring matrices. Block Aligner：用于序列和特定位置评分矩阵的自适应 SIMD 加速排列器。

IF 4.4 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Bioinformatics

Pub Date : 2023-08-01 DOI: 10.1093/bioinformatics/btad487

Daniel Liu, Martin Steinegger

Motivation: Efficiently aligning sequences is a fundamental problem in bioinformatics. Many recent algorithms for computing alignments through Smith-Waterman-Gotoh dynamic programming (DP) exploit Single Instruction Multiple Data (SIMD) operations on modern CPUs for speed. However, these advances have largely ignored difficulties associated with efficiently handling complex scoring matrices or large gaps (insertions or deletions).

Results: We propose a new SIMD-accelerated algorithm called Block Aligner for aligning nucleotide and protein sequences against other sequences or position-specific scoring matrices. We introduce a new paradigm that uses blocks in the DP matrix that greedily shift, grow, and shrink. This approach allows regions of the DP matrix to be adaptively computed. Our algorithm reaches over 5-10 times faster than some previous methods while incurring an error rate of less than 3% on protein and long read datasets, despite large gaps and low sequence identities.

Availability and implementation: Our algorithm is implemented for global, local, and X-drop alignments. It is available as a Rust library (with C bindings) at https://github.com/Daniel-Liu-c0deb0t/block-aligner.

动机高效排列序列是生物信息学中的一个基本问题。最近许多通过 Smith-Waterman-Gotoh 动态编程（DP）计算排列的算法都利用了现代 CPU 上的单指令多数据（SIMD）操作来提高速度。然而，这些进展在很大程度上忽视了与高效处理复杂计分矩阵或大缺口（插入或删除）相关的困难：我们提出了一种名为 Block Aligner 的新 SIMD 加速算法，用于将核苷酸和蛋白质序列与其他序列或特定位置的评分矩阵进行比对。我们引入了一种新范式，在 DP 矩阵中使用贪婪移动、增长和收缩的块。这种方法允许自适应计算 DP 矩阵的区域。我们的算法比之前的一些方法快 5-10 倍以上，同时在蛋白质和长读取数据集上的错误率低于 3%，尽管存在较大的差距和较低的序列同一性：我们的算法适用于全局、局部和 X-drop 对齐。它是一个 Rust 库（带有 C 绑定），可在 https://github.com/Daniel-Liu-c0deb0t/block-aligner 上获取。

{"title":"Block Aligner: an adaptive SIMD-accelerated aligner for sequences and position-specific scoring matrices.","authors":"Daniel Liu, Martin Steinegger","doi":"10.1093/bioinformatics/btad487","DOIUrl":"10.1093/bioinformatics/btad487","url":null,"abstract":"Motivation: Efficiently aligning sequences is a fundamental problem in bioinformatics. Many recent algorithms for computing alignments through Smith-Waterman-Gotoh dynamic programming (DP) exploit Single Instruction Multiple Data (SIMD) operations on modern CPUs for speed. However, these advances have largely ignored difficulties associated with efficiently handling complex scoring matrices or large gaps (insertions or deletions).Results: We propose a new SIMD-accelerated algorithm called Block Aligner for aligning nucleotide and protein sequences against other sequences or position-specific scoring matrices. We introduce a new paradigm that uses blocks in the DP matrix that greedily shift, grow, and shrink. This approach allows regions of the DP matrix to be adaptively computed. Our algorithm reaches over 5-10 times faster than some previous methods while incurring an error rate of less than 3% on protein and long read datasets, despite large gaps and low sequence identities.Availability and implementation: Our algorithm is implemented for global, local, and X-drop alignments. It is available as a Rust library (with C bindings) at https://github.com/Daniel-Liu-c0deb0t/block-aligner.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"39 8","pages":""},"PeriodicalIF":4.4,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10457662/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10093070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DEP2: an upgraded comprehensive analysis toolkit for quantitative proteomics data. DEP2:用于定量蛋白质组学数据的升级综合分析工具包。

IF 5.8 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Bioinformatics

Pub Date : 2023-08-01 DOI: 10.1093/bioinformatics/btad526

Zhenhuan Feng, Peiyang Fang, Hui Zheng, Xiaofei Zhang

Summary: Mass spectrometry (MS)-based proteomics has become the most powerful approach to study the proteome of given biological and clinical samples. Advancements in sample preparation and MS detection have extended the application of proteomics but have also brought new demands on data analysis. Appropriate proteomics data analysis workflow mainly requires quality control, hypothesis testing, functional mining, and visualization. Although there are numerous tools for each process, an efficient and universal tandem analysis toolkit to obtain a quick overall view of various proteomics data is still urgently needed. Here, we present DEP2, an updated version of DEP we previously established, for proteomics data analysis. We amended the analysis workflow by incorporating alternative approaches to accommodate diverse proteomics data, introducing peptide-protein summarization and coupling biological function exploration. In summary, DEP2 is a well-rounded toolkit designed for protein- and peptide-level quantitative proteomics data. It features a more flexible differential analysis workflow and includes a user-friendly Shiny application to facilitate data analysis.

Availability and implementation: DEP2 is available at https://github.com/mildpiggy/DEP2, released under the MIT license. For further information and usage details, please refer to the package website at https://mildpiggy.github.io/DEP2/.

摘要:基于质谱(MS)的蛋白质组学已经成为研究给定生物和临床样品的蛋白质组学的最有力的方法。样品制备和质谱检测技术的进步，扩大了蛋白质组学的应用范围，但也对数据分析提出了新的要求。合适的蛋白质组学数据分析工作流程主要包括质量控制、假设检验、功能挖掘和可视化。尽管每个过程都有许多工具，但仍然迫切需要一个有效和通用的串联分析工具包来快速全面地了解各种蛋白质组学数据。在这里，我们提出了DEP2，这是我们之前建立的DEP的更新版本，用于蛋白质组学数据分析。我们修改了分析工作流程，采用不同的方法来适应不同的蛋白质组学数据，引入肽-蛋白总结和偶联生物学功能探索。总之，DEP2是一个全面的工具包，用于蛋白质和肽水平的定量蛋白质组学数据。它具有更灵活的差异分析工作流程，并包括一个用户友好的Shiny应用程序，以方便数据分析。可用性和实现:DEP2可从https://github.com/mildpiggy/DEP2获得，在MIT许可下发布。欲了解更多信息和使用细节，请参阅套餐网站https://mildpiggy.github.io/DEP2/。

{"title":"DEP2: an upgraded comprehensive analysis toolkit for quantitative proteomics data.","authors":"Zhenhuan Feng, Peiyang Fang, Hui Zheng, Xiaofei Zhang","doi":"10.1093/bioinformatics/btad526","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad526","url":null,"abstract":"Summary: Mass spectrometry (MS)-based proteomics has become the most powerful approach to study the proteome of given biological and clinical samples. Advancements in sample preparation and MS detection have extended the application of proteomics but have also brought new demands on data analysis. Appropriate proteomics data analysis workflow mainly requires quality control, hypothesis testing, functional mining, and visualization. Although there are numerous tools for each process, an efficient and universal tandem analysis toolkit to obtain a quick overall view of various proteomics data is still urgently needed. Here, we present DEP2, an updated version of DEP we previously established, for proteomics data analysis. We amended the analysis workflow by incorporating alternative approaches to accommodate diverse proteomics data, introducing peptide-protein summarization and coupling biological function exploration. In summary, DEP2 is a well-rounded toolkit designed for protein- and peptide-level quantitative proteomics data. It features a more flexible differential analysis workflow and includes a user-friendly Shiny application to facilitate data analysis.Availability and implementation: DEP2 is available at https://github.com/mildpiggy/DEP2, released under the MIT license. For further information and usage details, please refer to the package website at https://mildpiggy.github.io/DEP2/.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"39 8","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10466079/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10335314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TALAIA: a 3D visual dictionary for protein structures. TALAIA:蛋白质结构的3D视觉词典。

IF 5.8 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Bioinformatics

Pub Date : 2023-08-01 DOI: 10.1093/bioinformatics/btad476

Mercè Alemany-Chavarria, Jaime Rodríguez-Guerra, Jean-Didier Maréchal

Motivation: Graphical analysis of the molecular structure of proteins can be very complex. Full-atom representations retain most geometric information but are generally crowded, and key structural patterns can be challenging to identify. Non-full-atom representations could be more instructive on physicochemical aspects but be insufficiently detailed regarding shapes (e.g. entity beans-like models in coarse grain approaches) or simple properties of amino acids (e.g. representation of superficial electrostatic properties). In this work, we present TALAIA a visual dictionary that aims to provide another layer of structural representations.TALAIA offers a visual grammar that combines simple representations of amino acids while retaining their general geometry and physicochemical properties. It uses unique objects, with differentiated shapes and colors to represent amino acids. It makes easier to spot crucial molecular information, including patches of amino acids or key interactions between side chains. Most conventions used in TALAIA are standard in chemistry and biochemistry, so experimentalists and modelers can rapidly grasp the meaning of any TALAIA depiction.

Results: We propose TALAIA as a tool that renders protein structures and encodes structure and physicochemical aspects as a simple visual grammar. The approach is fast, highly informative, and intuitive, allowing the identification of possible interactions, hydrophobic patches, and other characteristic structural features at first glance. The first implementation of TALAIA can be found at https://github.com/insilichem/talaia.

动机:蛋白质分子结构的图形分析可能非常复杂。全原子表示保留了大部分几何信息，但通常很拥挤，关键的结构模式很难识别。非全原子表示在物理化学方面可能更有指导意义，但在形状(例如粗粒方法中的实体豆状模型)或氨基酸的简单性质(例如表面静电性质的表示)方面不够详细。在这项工作中，我们为TALAIA提供了一个视觉词典，旨在提供另一层结构表示。TALAIA提供了一种视觉语法，它结合了氨基酸的简单表示，同时保留了它们的一般几何和物理化学性质。它使用独特的物体，用不同的形状和颜色来代表氨基酸。它可以更容易地发现关键的分子信息，包括氨基酸斑块或侧链之间的关键相互作用。TALAIA中使用的大多数惯例都是化学和生物化学的标准，因此实验人员和建模人员可以快速掌握任何TALAIA描述的含义。结果:我们提出TALAIA作为一个工具，呈现蛋白质结构和编码结构和物理化学方面作为一个简单的视觉语法。该方法快速，信息量大，直观，允许第一眼识别可能的相互作用，疏水斑块和其他特征结构特征。TALAIA的第一个实现可以在https://github.com/insilichem/talaia找到。

{"title":"TALAIA: a 3D visual dictionary for protein structures.","authors":"Mercè Alemany-Chavarria, Jaime Rodríguez-Guerra, Jean-Didier Maréchal","doi":"10.1093/bioinformatics/btad476","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad476","url":null,"abstract":"Motivation: Graphical analysis of the molecular structure of proteins can be very complex. Full-atom representations retain most geometric information but are generally crowded, and key structural patterns can be challenging to identify. Non-full-atom representations could be more instructive on physicochemical aspects but be insufficiently detailed regarding shapes (e.g. entity beans-like models in coarse grain approaches) or simple properties of amino acids (e.g. representation of superficial electrostatic properties). In this work, we present TALAIA a visual dictionary that aims to provide another layer of structural representations.TALAIA offers a visual grammar that combines simple representations of amino acids while retaining their general geometry and physicochemical properties. It uses unique objects, with differentiated shapes and colors to represent amino acids. It makes easier to spot crucial molecular information, including patches of amino acids or key interactions between side chains. Most conventions used in TALAIA are standard in chemistry and biochemistry, so experimentalists and modelers can rapidly grasp the meaning of any TALAIA depiction.Results: We propose TALAIA as a tool that renders protein structures and encodes structure and physicochemical aspects as a simple visual grammar. The approach is fast, highly informative, and intuitive, allowing the identification of possible interactions, hydrophobic patches, and other characteristic structural features at first glance. The first implementation of TALAIA can be found at https://github.com/insilichem/talaia.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"39 8","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10423020/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9988990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Short-read aligner performance in germline variant identification. 短读比对在种系变异鉴定中的表现。

IF 5.8 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Bioinformatics

Pub Date : 2023-08-01 DOI: 10.1093/bioinformatics/btad480

Richard Wilton, Alexander S Szalay

Motivation: Read alignment is an essential first step in the characterization of DNA sequence variation. The accuracy of variant-calling results depends not only on the quality of read alignment and variant-calling software but also on the interaction between these complex software tools.

Results: In this review, we evaluate short-read aligner performance with the goal of optimizing germline variant-calling accuracy. We examine the performance of three general-purpose short-read aligners-BWA-MEM, Bowtie 2, and Arioc-in conjunction with three germline variant callers: DeepVariant, FreeBayes, and GATK HaplotypeCaller. We discuss the behavior of the read aligners with regard to the data elements on which the variant callers rely, and illustrate how the runtime configurations of these software tools combine to affect variant-calling performance.

Availability and implementation: The quick brown fox jumps over the lazy dog.

动机:reads比对是鉴定DNA序列变异的重要的第一步。变量调用结果的准确性不仅取决于读取比对和变量调用软件的质量，还取决于这些复杂软件工具之间的相互作用。结果:在这篇综述中，我们评估了短读比对器的性能，目标是优化种系变异召唤的准确性。我们结合三个种系变异调用器(DeepVariant、FreeBayes和GATK HaplotypeCaller)，研究了三种通用短读比对器(bwa - mem、Bowtie 2和arioc)的性能。我们将讨论读取对齐器在变量调用者所依赖的数据元素方面的行为，并说明这些软件工具的运行时配置如何结合起来影响变量调用的性能。可用性和实现:敏捷的棕色狐狸跳过懒惰的狗。

引用次数: 0

XGDAG: explainable gene-disease associations via graph neural networks. XGDAG:通过图神经网络解释基因与疾病的关联。

IF 5.8 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Bioinformatics

Pub Date : 2023-08-01 DOI: 10.1093/bioinformatics/btad482

Andrea Mastropietro, Gianluca De Carlo, Aris Anagnostopoulos

Abstract Motivation Disease gene prioritization consists in identifying genes that are likely to be involved in the mechanisms of a given disease, providing a ranking of such genes. Recently, the research community has used computational methods to uncover unknown gene–disease associations; these methods range from combinatorial to machine learning-based approaches. In particular, during the last years, approaches based on deep learning have provided superior results compared to more traditional ones. Yet, the problem with these is their inherent black-box structure, which prevents interpretability. Results We propose a new methodology for disease gene discovery, which leverages graph-structured data using graph neural networks (GNNs) along with an explainability phase for determining the ranking of candidate genes and understanding the model’s output. Our approach is based on a positive–unlabeled learning strategy, which outperforms existing gene discovery methods by exploiting GNNs in a non-black-box fashion. Our methodology is effective even in scenarios where a large number of associated genes need to be retrieved, in which gene prioritization methods often tend to lose their reliability. Availability and implementation The source code of XGDAG is available on GitHub at: https://github.com/GiDeCarlo/XGDAG. The data underlying this article are available at: https://www.disgenet.org/, https://thebiogrid.org/, https://doi.org/10.1371/journal.pcbi.1004120.s003, and https://doi.org/10.1371/journal.pcbi.1004120.s004.

动机:疾病基因优先排序包括识别可能参与特定疾病机制的基因，并对这些基因进行排序。最近，研究界使用计算方法来揭示未知的基因与疾病的关联;这些方法包括从组合到基于机器学习的方法。特别是，在过去的几年里，基于深度学习的方法比传统的方法提供了更好的结果。然而，它们的问题在于它们固有的黑箱结构，这阻碍了可解释性。结果:我们提出了一种疾病基因发现的新方法，该方法利用图神经网络(gnn)的图结构数据以及可解释性阶段来确定候选基因的排名并理解模型的输出。我们的方法基于一种积极的无标签学习策略，该策略通过以非黑箱方式利用gnn，优于现有的基因发现方法。即使在需要检索大量相关基因的情况下，我们的方法也是有效的，在这种情况下，基因优先排序方法往往会失去其可靠性。可用性和实现:XGDAG的源代码可在GitHub上获得:https://github.com/GiDeCarlo/XGDAG。本文的基础数据可在以下位置获得:https://www.disgenet.org/、https://thebiogrid.org/、https://doi.org/10.1371/journal.pcbi.1004120.s003和https://doi.org/10.1371/journal.pcbi.1004120.s004。

{"title":"XGDAG: explainable gene-disease associations via graph neural networks.","authors":"Andrea Mastropietro, Gianluca De Carlo, Aris Anagnostopoulos","doi":"10.1093/bioinformatics/btad482","DOIUrl":"https://doi.org/10.1093/bioinformatics/btad482","url":null,"abstract":"Abstract Motivation Disease gene prioritization consists in identifying genes that are likely to be involved in the mechanisms of a given disease, providing a ranking of such genes. Recently, the research community has used computational methods to uncover unknown gene–disease associations; these methods range from combinatorial to machine learning-based approaches. In particular, during the last years, approaches based on deep learning have provided superior results compared to more traditional ones. Yet, the problem with these is their inherent black-box structure, which prevents interpretability. Results We propose a new methodology for disease gene discovery, which leverages graph-structured data using graph neural networks (GNNs) along with an explainability phase for determining the ranking of candidate genes and understanding the model’s output. Our approach is based on a positive–unlabeled learning strategy, which outperforms existing gene discovery methods by exploiting GNNs in a non-black-box fashion. Our methodology is effective even in scenarios where a large number of associated genes need to be retrieved, in which gene prioritization methods often tend to lose their reliability. Availability and implementation The source code of XGDAG is available on GitHub at: https://github.com/GiDeCarlo/XGDAG. The data underlying this article are available at: https://www.disgenet.org/, https://thebiogrid.org/, https://doi.org/10.1371/journal.pcbi.1004120.s003, and https://doi.org/10.1371/journal.pcbi.1004120.s004.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"39 8","pages":""},"PeriodicalIF":5.8,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10421968/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10055233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bayesian multitask learning for medicine recommendation based on online patient reviews. 基于在线患者评论的贝叶斯多任务学习药物推荐。

IF 5.4 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Bioinformatics

Pub Date : 2023-08-01 DOI: 10.1093/bioinformatics/btad491

Yichen Cheng, Yusen Xia, Xinlei Wang

Motivation: We propose a drug recommendation model that integrates information from both structured data (patient demographic information) and unstructured texts (patient reviews). It is based on multitask learning to predict review ratings of several satisfaction-related measures for a given medicine, where related tasks can learn from each other for prediction. The learned models can then be applied to new patients for drug recommendation. This is fundamentally different from most recommender systems in e-commerce, which do not work well for new customers (referred to as the cold-start problem). To extract information from review texts, we employ both topic modeling and sentiment analysis. We further incorporate variable selection into the model via Bayesian LASSO, which aims to filter out irrelevant features. To our best knowledge, this is the first Bayesian multitask learning method for ordinal responses. We are also the first to apply multitask learning to medicine recommendation. The sample code and data are made available at GitHub: https://github.com/thrushcyc-github/BMull.

Results: We evaluate the proposed method on two sets of drug reviews involving 17 depression/high blood pressure-related drugs. Overall, our method performs better than existing benchmark methods in terms of accuracy and AUC (area under the receiver operating characteristic curve). It is effective even with a small sample size and only a few available features, and more robust to possible noninformative covariates. Due to our model explainability, insights generated from our model may work as a useful reference for doctors. In practice, however, a final decision should be carefully made by combining the information from the proposed recommender with doctors' domain knowledge and past experience.

Availability and implementation: The sample code and data are publicly available at GitHub: https://github.com/thrushcyc-github/BMull.

动机：我们提出了一种药物推荐模型，该模型整合了结构化数据（患者人口信息）和非结构化文本（患者评论）中的信息。该模型以多任务学习为基础，预测给定药物的多个满意度相关指标的评论评级，相关任务可以相互学习以进行预测。然后，学习到的模型可应用于新患者的药物推荐。这与电子商务中的大多数推荐系统有本质区别，后者对新客户效果不佳（被称为冷启动问题）。为了从评论文本中提取信息，我们采用了主题建模和情感分析两种方法。我们还通过贝叶斯 LASSO 将变量选择纳入模型，旨在过滤掉不相关的特征。据我们所知，这是第一种针对序数反应的贝叶斯多任务学习方法。我们也是第一个将多任务学习应用于医药推荐的人。示例代码和数据可在 GitHub 上获取：https://github.com/thrushcyc-github/BMull.Results：我们在涉及 17 种抑郁症/高血压相关药物的两组药物评论中对所提出的方法进行了评估。总体而言，我们的方法在准确率和 AUC（接收者工作特征曲线下面积）方面都优于现有的基准方法。即使样本量较小，只有几个可用特征，我们的方法也很有效，而且对可能存在的非信息协变量也更加稳健。由于我们的模型具有可解释性，从我们的模型中得出的见解可作为医生的有用参考。但在实践中，应将建议的推荐器提供的信息与医生的领域知识和以往经验相结合，谨慎做出最终决定：示例代码和数据可在 GitHub 上公开获取：https://github.com/thrushcyc-github/BMull。

{"title":"Bayesian multitask learning for medicine recommendation based on online patient reviews.","authors":"Yichen Cheng, Yusen Xia, Xinlei Wang","doi":"10.1093/bioinformatics/btad491","DOIUrl":"10.1093/bioinformatics/btad491","url":null,"abstract":"Motivation: We propose a drug recommendation model that integrates information from both structured data (patient demographic information) and unstructured texts (patient reviews). It is based on multitask learning to predict review ratings of several satisfaction-related measures for a given medicine, where related tasks can learn from each other for prediction. The learned models can then be applied to new patients for drug recommendation. This is fundamentally different from most recommender systems in e-commerce, which do not work well for new customers (referred to as the cold-start problem). To extract information from review texts, we employ both topic modeling and sentiment analysis. We further incorporate variable selection into the model via Bayesian LASSO, which aims to filter out irrelevant features. To our best knowledge, this is the first Bayesian multitask learning method for ordinal responses. We are also the first to apply multitask learning to medicine recommendation. The sample code and data are made available at GitHub: https://github.com/thrushcyc-github/BMull.Results: We evaluate the proposed method on two sets of drug reviews involving 17 depression/high blood pressure-related drugs. Overall, our method performs better than existing benchmark methods in terms of accuracy and AUC (area under the receiver operating characteristic curve). It is effective even with a small sample size and only a few available features, and more robust to possible noninformative covariates. Due to our model explainability, insights generated from our model may work as a useful reference for doctors. In practice, however, a final decision should be carefully made by combining the information from the proposed recommender with doctors' domain knowledge and past experience.Availability and implementation: The sample code and data are publicly available at GitHub: https://github.com/thrushcyc-github/BMull.","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":"39 8","pages":""},"PeriodicalIF":5.4,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10425196/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10068713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0