首页 > 最新文献

Bioinformatics advances最新文献

英文 中文
Phylogenetic-informed graph deep learning to classify dynamic transmission clusters in infectious disease epidemics. 以系统发育为基础的图深度学习对传染病流行中的动态传播集群进行分类。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-07 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae158
Chaoyue Sun, Yanjun Li, Simone Marini, Alberto Riva, Dapeng Oliver Wu, Ruogu Fang, Marco Salemi, Brittany Rife Magalis

Motivation: In the midst of an outbreak, identification of groups of individuals that represent risk for transmission of the pathogen under investigation is critical to public health efforts. Dynamic transmission patterns within these clusters, whether it be the result of changes at the level of the virus (e.g. infectivity) or host (e.g. vaccination), are critical in strategizing public health interventions, particularly when resources are limited. Phylogenetic trees are widely used not only in the detection of transmission clusters, but the topological shape of the branches within can be useful sources of information regarding the dynamics of the represented population.

Results: We evaluated the limitation of existing tree shape metrics when dealing with dynamic transmission clusters and propose instead a phylogeny-based deep learning system -DeepDynaTree- for dynamic classification. Comprehensive experiments carried out on a variety of simulated epidemic growth models and HIV epidemic data indicate that this graph deep learning approach is effective, robust, and informative for cluster dynamic prediction. Our results confirm that DeepDynaTree is a promising tool for transmission cluster characterization that can be modified to address the existing limitations and deficiencies in knowledge regarding the dynamics of transmission trajectories for groups at risk of pathogen infection.

Availability and implementation: DeepDynaTree is available under an MIT Licence in https://github.com/salemilab/DeepDynaTree.

动机:在疫情爆发期间,确定哪些人群有传播所调查病原体的风险对公共卫生工作至关重要。无论是病毒水平(如传染性)还是宿主水平(如疫苗接种)的变化所导致的这些群组内的动态传播模式,对于制定公共卫生干预战略都至关重要,尤其是在资源有限的情况下。系统发生树不仅被广泛用于检测传播集群,而且其内部分支的拓扑形状也是有关所代表种群动态的有用信息来源:我们评估了现有树形指标在处理动态传播集群时的局限性,并提出了一种基于系统发育的深度学习系统--DeepDynaTree--用于动态分类。在各种模拟流行病增长模型和 HIV 流行病数据上进行的综合实验表明,这种图深度学习方法对于集群动态预测是有效、稳健和有参考价值的。我们的研究结果证实,DeepDynaTree 是一种很有前途的传播集群特征描述工具,它可以进行修改,以解决现有的局限性和病原体感染风险群体传播轨迹动态知识的不足:DeepDynaTree以MIT许可在https://github.com/salemilab/DeepDynaTree。
{"title":"Phylogenetic-informed graph deep learning to classify dynamic transmission clusters in infectious disease epidemics.","authors":"Chaoyue Sun, Yanjun Li, Simone Marini, Alberto Riva, Dapeng Oliver Wu, Ruogu Fang, Marco Salemi, Brittany Rife Magalis","doi":"10.1093/bioadv/vbae158","DOIUrl":"https://doi.org/10.1093/bioadv/vbae158","url":null,"abstract":"<p><strong>Motivation: </strong>In the midst of an outbreak, identification of groups of individuals that represent risk for transmission of the pathogen under investigation is critical to public health efforts. Dynamic transmission patterns within these clusters, whether it be the result of changes at the level of the virus (e.g. infectivity) or host (e.g. vaccination), are critical in strategizing public health interventions, particularly when resources are limited. Phylogenetic trees are widely used not only in the detection of transmission clusters, but the topological shape of the branches within can be useful sources of information regarding the dynamics of the represented population.</p><p><strong>Results: </strong>We evaluated the limitation of existing tree shape metrics when dealing with dynamic transmission clusters and propose instead a phylogeny-based deep learning system -<i>DeepDynaTree</i>- for dynamic classification. Comprehensive experiments carried out on a variety of simulated epidemic growth models and HIV epidemic data indicate that this graph deep learning approach is effective, robust, and informative for cluster dynamic prediction. Our results confirm that <i>DeepDynaTree</i> is a promising tool for transmission cluster characterization that can be modified to address the existing limitations and deficiencies in knowledge regarding the dynamics of transmission trajectories for groups at risk of pathogen infection.</p><p><strong>Availability and implementation: </strong><i>DeepDynaTree</i> is available under an MIT Licence in https://github.com/salemilab/DeepDynaTree.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae158"},"PeriodicalIF":2.4,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11552518/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142633757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MitoMAMMAL: a genome scale model of mammalian mitochondria predicts cardiac and BAT metabolism. MitoMAMMAL:哺乳动物线粒体的基因组尺度模型预测心脏和BAT代谢。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-05 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbae172
Stephen Chapman, Theo Brunet, Arnaud Mourier, Bianca H Habermann

Motivation: Mitochondria are essential for cellular metabolism and are inherently flexible to allow correct function in a wide range of tissues. Consequently, dysregulated mitochondrial metabolism affects different tissues in different ways leading to challenges in understanding the pathology of mitochondrial diseases. System-level metabolic modelling is useful in studying tissue-specific mitochondrial metabolism, yet despite the mouse being a common model organism in research, no mouse specific mitochondrial metabolic model is currently available.

Results: Building upon the similarity between human and mouse mitochondrial metabolism, we present mitoMammal, a genome-scale metabolic model that contains human and mouse specific gene-product reaction rules. MitoMammal is able to model mouse and human mitochondrial metabolism. To demonstrate this, using an adapted E-Flux algorithm, we integrated proteomic data from mitochondria of isolated mouse cardiomyocytes and mouse brown adipocyte tissue, as well as transcriptomic data from in vitro differentiated human brown adipocytes and modelled the context specific metabolism using flux balance analysis. In all three simulations, mitoMammal made mostly accurate, and some novel predictions relating to energy metabolism in the context of cardiomyocytes and brown adipocytes. This demonstrates its usefulness in research in cardiac disease and diabetes in both mouse and human contexts.

Availability and implementation: The MitoMammal Jupyter Notebook is available at: https://gitlab.com/habermann_lab/mitomammal.

动机:线粒体对细胞代谢至关重要,并且具有内在的灵活性,可以在广泛的组织中发挥正确的功能。因此,线粒体代谢失调以不同的方式影响不同的组织,这给理解线粒体疾病的病理带来了挑战。系统水平的代谢建模在研究组织特异性线粒体代谢方面是有用的,然而,尽管小鼠是研究中常见的模型生物,但目前还没有小鼠特异性线粒体代谢模型。结果:基于人类和小鼠线粒体代谢的相似性,我们提出了mitom哺乳动物,一个包含人类和小鼠特异性基因产物反应规则的基因组尺度代谢模型。mitom哺乳动物能够模拟小鼠和人类的线粒体代谢。为了证明这一点,我们使用了一种适应性的E-Flux算法,整合了来自分离小鼠心肌细胞和小鼠棕色脂肪组织线粒体的蛋白质组学数据,以及来自体外分化的人类棕色脂肪细胞的转录组学数据,并使用通量平衡分析模拟了特定环境下的代谢。在所有三种模拟中,mitom哺乳动物对心肌细胞和棕色脂肪细胞的能量代谢做出了大部分准确和一些新颖的预测。这证明了它在研究小鼠和人类的心脏病和糖尿病方面的有用性。可用性和实现:mitom哺乳Jupyter笔记本可在:https://gitlab.com/habermann_lab/mitomammal。
{"title":"MitoMAMMAL: a genome scale model of mammalian mitochondria predicts cardiac and BAT metabolism.","authors":"Stephen Chapman, Theo Brunet, Arnaud Mourier, Bianca H Habermann","doi":"10.1093/bioadv/vbae172","DOIUrl":"https://doi.org/10.1093/bioadv/vbae172","url":null,"abstract":"<p><strong>Motivation: </strong>Mitochondria are essential for cellular metabolism and are inherently flexible to allow correct function in a wide range of tissues. Consequently, dysregulated mitochondrial metabolism affects different tissues in different ways leading to challenges in understanding the pathology of mitochondrial diseases. System-level metabolic modelling is useful in studying tissue-specific mitochondrial metabolism, yet despite the mouse being a common model organism in research, no mouse specific mitochondrial metabolic model is currently available.</p><p><strong>Results: </strong>Building upon the similarity between human and mouse mitochondrial metabolism, we present mitoMammal, a genome-scale metabolic model that contains human and mouse specific gene-product reaction rules. MitoMammal is able to model mouse and human mitochondrial metabolism. To demonstrate this, using an adapted E-Flux algorithm, we integrated proteomic data from mitochondria of isolated mouse cardiomyocytes and mouse brown adipocyte tissue, as well as transcriptomic data from in vitro differentiated human brown adipocytes and modelled the context specific metabolism using flux balance analysis. In all three simulations, mitoMammal made mostly accurate, and some novel predictions relating to energy metabolism in the context of cardiomyocytes and brown adipocytes. This demonstrates its usefulness in research in cardiac disease and diabetes in both mouse and human contexts.</p><p><strong>Availability and implementation: </strong>The MitoMammal Jupyter Notebook is available at: https://gitlab.com/habermann_lab/mitomammal.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbae172"},"PeriodicalIF":2.4,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11696703/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142933933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SpecGMM: Integrating Spectral analysis and Gaussian Mixture Models for taxonomic classification and identification of discriminative DNA regions. SpecGMM:整合光谱分析和高斯混合模型用于分类分类和鉴别DNA区域。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-05 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae171
Saish Jaiswal, Hema A Murthy, Manikandan Narayanan

Motivation: Genomic signal processing (GSP), which transforms biomolecular sequences into discrete signals for spectral analysis, has provided valuable insights into DNA sequence, structure, and evolution. However, challenges persist with spectral representations of variable-length sequences for tasks like species classification and in interpreting these spectra to identify discriminative DNA regions.

Results: We introduce SpecGMM, a novel framework that integrates sliding window-based Spectral analysis with a Gaussian Mixture Model to transform variable-length DNA sequences into fixed-dimensional spectral representations for taxonomic classification. SpecGMM's hyperparameters were selected using a dataset of plant sequences, and applied unchanged across diverse datasets, including mitochondrial DNA, viral and bacterial genome, and 16S rRNA sequences. Across these datasets, SpecGMM outperformed a baseline method, with 9.45% average and 35.55% maximum improvement in test accuracies for a Linear Discriminant classifier. Regarding interpretability, SpecGMM revealed discriminative hypervariable regions in 16S rRNA sequences-particularly V3/V4 for discriminating higher taxa and V2/V3 for lower taxa-corroborating their known classification relevance. SpecGMM's spectrogram video analysis helped visualize species-specific DNA signatures. SpecGMM thus provides a robust and interpretable method for spectral DNA analysis, opening new avenues in GSP research.

Availability and implementation: SpecGMM's source code is available at https://github.com/BIRDSgroup/SpecGMM.

动机:基因组信号处理(GSP)将生物分子序列转化为离散信号进行光谱分析,为DNA序列、结构和进化提供了有价值的见解。然而,在物种分类和解释这些光谱以识别有区别的DNA区域等任务中,变长序列的光谱表示仍然存在挑战。结果:我们引入了一个新的框架SpecGMM,它将基于滑动窗口的光谱分析与高斯混合模型相结合,将变长DNA序列转换为固定维的光谱表示,用于分类分类。SpecGMM的超参数是使用植物序列数据集选择的,并在不同的数据集上保持不变,包括线粒体DNA、病毒和细菌基因组以及16S rRNA序列。在这些数据集中,SpecGMM优于基线方法,线性判别分类器的测试准确率平均提高9.45%,最大提高35.55%。在可解释性方面,SpecGMM在16S rRNA序列中发现了判别性高变区,特别是V3/V4区用于区分高级分类群,V2/V3区用于区分低级分类群,证实了它们已知的分类相关性。SpecGMM的光谱图视频分析有助于可视化特定物种的DNA特征。因此,SpecGMM为光谱DNA分析提供了一种强大且可解释的方法,为GSP研究开辟了新的途径。可用性和实现:SpecGMM的源代码可从https://github.com/BIRDSgroup/SpecGMM获得。
{"title":"SpecGMM: Integrating Spectral analysis and Gaussian Mixture Models for taxonomic classification and identification of discriminative DNA regions.","authors":"Saish Jaiswal, Hema A Murthy, Manikandan Narayanan","doi":"10.1093/bioadv/vbae171","DOIUrl":"10.1093/bioadv/vbae171","url":null,"abstract":"<p><strong>Motivation: </strong>Genomic signal processing (GSP), which transforms biomolecular sequences into discrete signals for spectral analysis, has provided valuable insights into DNA sequence, structure, and evolution. However, challenges persist with spectral representations of variable-length sequences for tasks like species classification and in interpreting these spectra to identify discriminative DNA regions.</p><p><strong>Results: </strong>We introduce SpecGMM, a novel framework that integrates sliding window-based Spectral analysis with a Gaussian Mixture Model to transform variable-length DNA sequences into fixed-dimensional spectral representations for taxonomic classification. SpecGMM's hyperparameters were selected using a dataset of plant sequences, and applied unchanged across diverse datasets, including mitochondrial DNA, viral and bacterial genome, and 16S rRNA sequences. Across these datasets, SpecGMM outperformed a baseline method, with 9.45% average and 35.55% maximum improvement in test accuracies for a Linear Discriminant classifier. Regarding interpretability, SpecGMM revealed discriminative hypervariable regions in 16S rRNA sequences-particularly V3/V4 for discriminating higher taxa and V2/V3 for lower taxa-corroborating their known classification relevance. SpecGMM's spectrogram video analysis helped visualize species-specific DNA signatures. SpecGMM thus provides a robust and interpretable method for spectral DNA analysis, opening new avenues in GSP research.</p><p><strong>Availability and implementation: </strong>SpecGMM's source code is available at https://github.com/BIRDSgroup/SpecGMM.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae171"},"PeriodicalIF":2.4,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11631429/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142808697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RecGOBD: accurate recognition of gene ontology related brain development protein functions through multi-feature fusion and attention mechanisms. RecGOBD:通过多特征融合和注意机制准确识别与大脑发育相关的基因本体蛋白功能。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-11-04 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae163
Zhiliang Xia, Shiqiang Ma, Jiawei Li, Yan Guo, Limin Jiang, Jijun Tang

Motivation: Protein function prediction is crucial in bioinformatics, driven by the growth of protein sequence data from high-throughput technologies. Traditional methods are costly and slow, underscoring the need for computational solutions. While deep learning offers powerful tools, many models lack optimization for brain development datasets, critical for neurodevelopmental disorder research. To address this, we developed RecGOBD (Recognition of Gene Ontology-related Brain Development protein function), a model tailored to predict protein functions essential to brain development.

Result: RecGOBD targets 10 key gene ontology (GO) terms for brain development, embedding protein sequences associated with these terms. Leveraging advanced pre-trained models, it captures both sequence and structure data, aligning them with GO terms through attention mechanisms. The category attention layer enhances prediction accuracy. RecGOBD surpassed five benchmark models in AUROC, AUPR, and Fmax metrics and was further used to predict autism-related protein functions and assess mutation impacts on GO terms. These findings highlight RecGOBD's potential in advancing protein function prediction for neurodevelopmental disorders.

Availability and implementation: All Python codes associated with this study are available at https://github.com/ZL-Xia/RECGOBD.git.

动机在高通量技术带来的蛋白质序列数据增长的推动下,蛋白质功能预测在生物信息学中至关重要。传统方法成本高、速度慢,凸显了对计算解决方案的需求。虽然深度学习提供了强大的工具,但许多模型缺乏对大脑发育数据集的优化,而这对神经发育障碍研究至关重要。为了解决这个问题,我们开发了 RecGOBD(基因本体相关脑发育蛋白功能识别),这是一个为预测对脑发育至关重要的蛋白功能而量身定制的模型:RecGOBD 针对大脑发育的 10 个关键基因本体(GO)术语,嵌入了与这些术语相关的蛋白质序列。利用先进的预训练模型,它可以捕捉序列和结构数据,并通过注意机制将它们与 GO 术语对齐。类别关注层提高了预测的准确性。RecGOBD 在 AUROC、AUPR 和 Fmax 指标上超过了五个基准模型,并被进一步用于预测自闭症相关蛋白质的功能和评估突变对 GO 术语的影响。这些发现凸显了 RecGOBD 在推进神经发育障碍蛋白质功能预测方面的潜力:与本研究相关的所有 Python 代码均可在 https://github.com/ZL-Xia/RECGOBD.git 上获取。
{"title":"RecGOBD: accurate recognition of gene ontology related brain development protein functions through multi-feature fusion and attention mechanisms.","authors":"Zhiliang Xia, Shiqiang Ma, Jiawei Li, Yan Guo, Limin Jiang, Jijun Tang","doi":"10.1093/bioadv/vbae163","DOIUrl":"10.1093/bioadv/vbae163","url":null,"abstract":"<p><strong>Motivation: </strong>Protein function prediction is crucial in bioinformatics, driven by the growth of protein sequence data from high-throughput technologies. Traditional methods are costly and slow, underscoring the need for computational solutions. While deep learning offers powerful tools, many models lack optimization for brain development datasets, critical for neurodevelopmental disorder research. To address this, we developed RecGOBD (Recognition of Gene Ontology-related Brain Development protein function), a model tailored to predict protein functions essential to brain development.</p><p><strong>Result: </strong>RecGOBD targets 10 key gene ontology (GO) terms for brain development, embedding protein sequences associated with these terms. Leveraging advanced pre-trained models, it captures both sequence and structure data, aligning them with GO terms through attention mechanisms. The category attention layer enhances prediction accuracy. RecGOBD surpassed five benchmark models in AUROC, AUPR, and Fmax metrics and was further used to predict autism-related protein functions and assess mutation impacts on GO terms. These findings highlight RecGOBD's potential in advancing protein function prediction for neurodevelopmental disorders.</p><p><strong>Availability and implementation: </strong>All Python codes associated with this study are available at https://github.com/ZL-Xia/RECGOBD.git.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae163"},"PeriodicalIF":2.4,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639192/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142831054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AAclust: k-optimized clustering for selecting redundancy-reduced sets of amino acid scales. AAclust:用于选择减少冗余的氨基酸尺度集的 k 优化聚类。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-30 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae165
Stephan Breimann, Dmitrij Frishman

Summary: Amino acid scales are crucial for sequence-based protein prediction tasks, yet no gold standard scale set or simple scale selection methods exist. We developed AAclust, a wrapper for clustering models that require a pre-defined number of clusters k, such as k-means. AAclust obtains redundancy-reduced scale sets by clustering and selecting one representative scale per cluster, where k can either be optimized by AAclust or defined by the user. The utility of AAclust scale selections was assessed by applying machine learning models to 24 protein benchmark datasets. We found that top-performing scale sets were different for each benchmark dataset and significantly outperformed scale sets used in previous studies. Noteworthy is the strong dependence of the model performance on the scale set size. AAclust enables a systematic optimization of scale-based feature engineering in machine learning applications.

Availability and implementation: The AAclust algorithm is part of AAanalysis, a Python-based framework for interpretable sequence-based protein prediction, which is documented and accessible at https://aaanalysis.readthedocs.io/en/latest and https://github.com/breimanntools/aaanalysis.

摘要:氨基酸尺度对于基于序列的蛋白质预测任务至关重要,但目前还没有黄金标准尺度集或简单的尺度选择方法。我们开发了 AAclust,它是需要预定义簇数 k 的聚类模型(如 k-means)的包装器。AAclust 通过聚类并为每个聚类选择一个具有代表性的标度,从而获得减少冗余的标度集,其中 k 既可以由 AAclust 优化,也可以由用户定义。通过将机器学习模型应用于 24 个蛋白质基准数据集,对 AAclust 标度选择的实用性进行了评估。我们发现,每个基准数据集的最佳规模集都不尽相同,而且明显优于以往研究中使用的规模集。值得注意的是,模型的性能与标度集的大小密切相关。AAclust 能够系统地优化机器学习应用中基于规模的特征工程:AAclust算法是AAanalysis的一部分,AAanalysis是一个基于Python的框架,用于基于序列的可解释蛋白质预测,其文档和访问地址为https://aaanalysis.readthedocs.io/en/latest 和 https://github.com/breimanntools/aaanalysis。
{"title":"AAclust: <i>k</i>-optimized clustering for selecting redundancy-reduced sets of amino acid scales.","authors":"Stephan Breimann, Dmitrij Frishman","doi":"10.1093/bioadv/vbae165","DOIUrl":"10.1093/bioadv/vbae165","url":null,"abstract":"<p><strong>Summary: </strong>Amino acid scales are crucial for sequence-based protein prediction tasks, yet no gold standard scale set or simple scale selection methods exist. We developed AAclust, a wrapper for clustering models that require a pre-defined number of clusters <i>k</i>, such as <i>k</i>-means. AAclust obtains redundancy-reduced scale sets by clustering and selecting one representative scale per cluster, where <i>k</i> can either be optimized by AAclust or defined by the user. The utility of AAclust scale selections was assessed by applying machine learning models to 24 protein benchmark datasets. We found that top-performing scale sets were different for each benchmark dataset and significantly outperformed scale sets used in previous studies. Noteworthy is the strong dependence of the model performance on the scale set size. AAclust enables a systematic optimization of scale-based feature engineering in machine learning applications.</p><p><strong>Availability and implementation: </strong>The AAclust algorithm is part of AAanalysis, a Python-based framework for interpretable sequence-based protein prediction, which is documented and accessible at https://aaanalysis.readthedocs.io/en/latest and https://github.com/breimanntools/aaanalysis.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae165"},"PeriodicalIF":2.4,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562964/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exon nomenclature and classification of transcripts database (ENACTdb): a resource for analyzing alternative splicing mediated proteome diversity. 外显子命名和转录本分类数据库(ENACTdb):分析替代剪接介导的蛋白质组多样性的资源。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-29 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae157
Paras Verma, Deeksha Thakur, Shashi B Pandit

Motivation: Gene transcripts are distinguished by the composition of their exons, and this different exon composition may contribute to advancing proteome complexity. Despite the availability of alternative splicing information documented in various databases, a ready association of exonic variations to the protein sequence remains a mammoth task.

Results: To associate exonic variation(s) with the protein systematically, we designed the Exon Nomenclature and Classification of Transcripts (ENACT) framework for uniquely annotating exons that tracks their loci in gene architecture context with encapsulating variations in splice site(s) and amino acid coding status. After ENACT annotation, predicted protein features (secondary structure/disorder/Pfam domains) are mapped to exon attributes. Thus, ENACTdb provides trackable exonic variation(s) association to isoform(s) and protein features, enabling the assessment of functional variation due to changes in exon composition. Such analyses can be readily performed through multiple views supported by the server. The exon-centric visualizations of ENACT annotated isoforms could provide insights on the functional repertoire of genes due to alternative splicing and its related processes and can serve as an important resource for the research community.

Availability and implementation: The database is publicly available at https://www.iscbglab.in/enactdb/. It contains protein-coding genes and isoforms for Caenorhabditis elegans, Drosophila melanogaster, Danio rerio, Mus musculus, and Homo sapiens.

动机基因转录本是通过其外显子的组成来区分的,而这种不同的外显子组成可能有助于提高蛋白质组的复杂性。尽管各种数据库都记录了替代剪接信息,但要将外显子变异与蛋白质序列联系起来仍是一项艰巨的任务:为了系统地将外显子变异与蛋白质联系起来,我们设计了外显子命名和转录本分类(ENACT)框架,用于唯一注释外显子,跟踪其在基因结构中的位置,包括剪接位点和氨基酸编码状态的变异。在 ENACT 注释之后,预测的蛋白质特征(二级结构/紊乱/Pfam 结构域)会映射到外显子属性。因此,ENACTdb 提供了可追踪的外显子变异与同工酶和蛋白质特征的关联,从而可以评估外显子组成变化引起的功能变异。此类分析可通过服务器支持的多种视图轻松完成。以外显子为中心的ENACT注释异构体可深入了解基因因替代剪接及其相关过程而产生的功能,并可作为研究界的重要资源:该数据库可通过 https://www.iscbglab.in/enactdb/ 公开获取。该数据库包含秀丽隐杆线虫(Caenorhabditis elegans)、黑腹果蝇(Drosophila melanogaster)、红腹锦鸡(Danio rerio)、麝香猫(Mus musculus)和智人(Homo sapiens)的蛋白质编码基因和同工酶。
{"title":"Exon nomenclature and classification of transcripts database (ENACTdb): a resource for analyzing alternative splicing mediated proteome diversity.","authors":"Paras Verma, Deeksha Thakur, Shashi B Pandit","doi":"10.1093/bioadv/vbae157","DOIUrl":"10.1093/bioadv/vbae157","url":null,"abstract":"<p><strong>Motivation: </strong>Gene transcripts are distinguished by the composition of their exons, and this different exon composition may contribute to advancing proteome complexity. Despite the availability of alternative splicing information documented in various databases, a ready association of exonic variations to the protein sequence remains a mammoth task.</p><p><strong>Results: </strong>To associate exonic variation(s) with the protein systematically, we designed the Exon Nomenclature and Classification of Transcripts (ENACT) framework for uniquely annotating exons that tracks their loci in gene architecture context with encapsulating variations in splice site(s) and amino acid coding status. After ENACT annotation, predicted protein features (secondary structure/disorder/Pfam domains) are mapped to exon attributes. Thus, ENACTdb provides trackable exonic variation(s) association to isoform(s) and protein features, enabling the assessment of functional variation due to changes in exon composition. Such analyses can be readily performed through multiple views supported by the server. The exon-centric visualizations of ENACT annotated isoforms could provide insights on the functional repertoire of genes due to alternative splicing and its related processes and can serve as an important resource for the research community.</p><p><strong>Availability and implementation: </strong>The database is publicly available at https://www.iscbglab.in/enactdb/. It contains protein-coding genes and isoforms for <i>Caenorhabditis elegans</i>, <i>Drosophila melanogaster</i>, <i>Danio rerio</i>, <i>Mus musculus</i>, and <i>Homo sapiens</i>.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae157"},"PeriodicalIF":2.4,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11576355/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142683380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MicroNet-MIMRF: a microbial network inference approach based on mutual information and Markov random fields. MicroNet-MIMRF:基于互信息和马尔可夫随机场的微生物网络推断方法。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-28 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae167
Chenqionglu Feng, Huiqun Jia, Hui Wang, Jiaojiao Wang, Mengxuan Lin, Xiaoyan Hu, Chenjing Yu, Hongbin Song, Ligui Wang

Motivation: The human microbiome, comprises complex associations and communication networks among microbial communities, which are crucial for maintaining health. The construction of microbial networks is vital for elucidating these associations. However, existing microbial networks inference methods cannot solve the issues of zero-inflation and non-linear associations. Therefore, necessitating novel methods to improve the accuracy of microbial networks inference.

Results: In this study, we introduce the Microbial Network based on Mutual Information and Markov Random Fields (MicroNet-MIMRF) as a novel approach for inferring microbial networks. Abundance data of microbes are modeled through the zero-inflated Poisson distribution, and the discrete matrix is estimated for further calculation. Markov random fields based on mutual information are used to construct accurate microbial networks. MicroNet-MIMRF excels at estimating pairwise associations between microbes, effectively addressing zero-inflation and non-linear associations in microbial abundance data. It outperforms commonly used techniques in simulation experiments, achieving area under the curve values exceeding 0.75 for all parameters. A case study on inflammatory bowel disease data further demonstrates the method's ability to identify insightful associations. Conclusively, MicroNet-MIMRF is a powerful tool for microbial network inference that handles the biases caused by zero-inflation and overestimation of associations.

Availability and implementation: The MicroNet-MIMRF is provided at https://github.com/Fionabiostats/MicroNet-MIMRF.

动机人类微生物组包括微生物群落之间复杂的关联和交流网络,这对维持健康至关重要。构建微生物网络对阐明这些关联至关重要。然而,现有的微生物网络推断方法无法解决零膨胀和非线性关联问题。因此,有必要采用新方法来提高微生物网络推断的准确性:在这项研究中,我们引入了基于互信息和马尔可夫随机场的微生物网络(MicroNet-MIMRF),作为推断微生物网络的一种新方法。微生物的丰度数据通过零膨胀泊松分布建模,并估计离散矩阵以进一步计算。基于互信息的马尔可夫随机场用于构建精确的微生物网络。MicroNet-MIMRF 擅长估计微生物之间的成对关联,能有效解决微生物丰度数据中的零膨胀和非线性关联问题。它在模拟实验中的表现优于常用技术,所有参数的曲线下面积值都超过了 0.75。一项关于炎症性肠病数据的案例研究进一步证明了该方法有能力识别有洞察力的关联。总之,MicroNet-MIMRF 是微生物网络推断的强大工具,可以处理零膨胀和高估关联所造成的偏差:MicroNet-MIMRF 在 https://github.com/Fionabiostats/MicroNet-MIMRF 上提供。
{"title":"MicroNet-MIMRF: a microbial network inference approach based on mutual information and Markov random fields.","authors":"Chenqionglu Feng, Huiqun Jia, Hui Wang, Jiaojiao Wang, Mengxuan Lin, Xiaoyan Hu, Chenjing Yu, Hongbin Song, Ligui Wang","doi":"10.1093/bioadv/vbae167","DOIUrl":"https://doi.org/10.1093/bioadv/vbae167","url":null,"abstract":"<p><strong>Motivation: </strong>The human microbiome, comprises complex associations and communication networks among microbial communities, which are crucial for maintaining health. The construction of microbial networks is vital for elucidating these associations. However, existing microbial networks inference methods cannot solve the issues of zero-inflation and non-linear associations. Therefore, necessitating novel methods to improve the accuracy of microbial networks inference.</p><p><strong>Results: </strong>In this study, we introduce the Microbial Network based on Mutual Information and Markov Random Fields (MicroNet-MIMRF) as a novel approach for inferring microbial networks. Abundance data of microbes are modeled through the zero-inflated Poisson distribution, and the discrete matrix is estimated for further calculation. Markov random fields based on mutual information are used to construct accurate microbial networks. MicroNet-MIMRF excels at estimating pairwise associations between microbes, effectively addressing zero-inflation and non-linear associations in microbial abundance data. It outperforms commonly used techniques in simulation experiments, achieving area under the curve values exceeding 0.75 for all parameters. A case study on inflammatory bowel disease data further demonstrates the method's ability to identify insightful associations. Conclusively, MicroNet-MIMRF is a powerful tool for microbial network inference that handles the biases caused by zero-inflation and overestimation of associations.</p><p><strong>Availability and implementation: </strong>The MicroNet-MIMRF is provided at https://github.com/Fionabiostats/MicroNet-MIMRF.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae167"},"PeriodicalIF":2.4,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11549015/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142633755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MetaboScope: a statistical toolbox for analyzing 1H nuclear magnetic resonance spectra from human clinical studies. MetaboScope:用于分析人体临床研究 1H 核磁共振谱的统计工具箱。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-28 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae142
Ruey Leng Loo, Javier Osorio Mosquera, Michael Zasso, Jacqueline Mathews, Desmond G Johnston, Jeremy K Nicholson, Luc Patiny, Elaine Holmes, Julien Wist

Motivation: Metabolic phenotyping, using high-resolution spectroscopic molecular fingerprints of biological samples, has demonstrated diagnostic, prognostic, and mechanistic value in clinical studies. However, clinical translation is hindered by the lack of viable workflows and challenges in converting spectral data into usable information.

Results: MetaboScope is an analytical and statistical workflow for learning, designing and analyzing clinically relevant 1H nuclear magnetic resonance data. It features modular preprocessing pipelines, multivariate modeling tools including Principal Components Analysis (PCA), Orthogonal-Projection to Latent Structure Discriminant Analysis (OPLS-DA), and biomarker discovery tools (multiblock PCA and statistical spectroscopy). A simulation tool is also provided, allowing users to create synthetic spectra for hypothesis testing and power calculations.

Availability and implementation: MetaboScope is built as a pipeline where each module accepts the output generated by the previous one. This provides flexibility and simplicity of use, while being straightforward to maintain. The system and its libraries were developed in JavaScript and run as a web app; therefore, all the operations are performed on the local computer, circumventing the need to upload data. The MetaboScope tool is available at https://www.cheminfo.org/flavor/metabolomics/index.html. The code is open-source and can be deployed locally if necessary. Module notes, video tutorials, and clinical spectral datasets are provided for modeling.

动机利用生物样本的高分辨率光谱分子指纹进行代谢表型分析,已在临床研究中显示出诊断、预后和机理价值。然而,由于缺乏可行的工作流程,以及将光谱数据转化为可用信息方面的挑战,临床转化受到了阻碍:MetaboScope 是一种分析和统计工作流程,用于学习、设计和分析临床相关的 1H 核磁共振数据。它具有模块化预处理管道、多元建模工具(包括主成分分析(PCA)、正交投影潜结构判别分析(OPLS-DA))和生物标记发现工具(多区块 PCA 和统计光谱学)。此外还提供了一个模拟工具,允许用户创建用于假设检验和功率计算的合成光谱:MetaboScope 以流水线的形式构建,每个模块都接受前一个模块生成的输出。这不仅提供了使用的灵活性和简便性,而且易于维护。该系统及其库使用 JavaScript 开发,以网络应用程序的形式运行;因此,所有操作都在本地计算机上执行,无需上传数据。MetaboScope 工具可在 https://www.cheminfo.org/flavor/metabolomics/index.html 上获取。代码是开源的,必要时可在本地部署。建模时会提供模块说明、视频教程和临床光谱数据集。
{"title":"MetaboScope: a statistical toolbox for analyzing <sup>1</sup>H nuclear magnetic resonance spectra from human clinical studies.","authors":"Ruey Leng Loo, Javier Osorio Mosquera, Michael Zasso, Jacqueline Mathews, Desmond G Johnston, Jeremy K Nicholson, Luc Patiny, Elaine Holmes, Julien Wist","doi":"10.1093/bioadv/vbae142","DOIUrl":"10.1093/bioadv/vbae142","url":null,"abstract":"<p><strong>Motivation: </strong>Metabolic phenotyping, using high-resolution spectroscopic molecular fingerprints of biological samples, has demonstrated diagnostic, prognostic, and mechanistic value in clinical studies. However, clinical translation is hindered by the lack of viable workflows and challenges in converting spectral data into usable information.</p><p><strong>Results: </strong>MetaboScope is an analytical and statistical workflow for learning, designing and analyzing clinically relevant <sup>1</sup>H nuclear magnetic resonance data. It features modular preprocessing pipelines, multivariate modeling tools including Principal Components Analysis (PCA), Orthogonal-Projection to Latent Structure Discriminant Analysis (OPLS-DA), and biomarker discovery tools (multiblock PCA and statistical spectroscopy). A simulation tool is also provided, allowing users to create synthetic spectra for hypothesis testing and power calculations.</p><p><strong>Availability and implementation: </strong>MetaboScope is built as a pipeline where each module accepts the output generated by the previous one. This provides flexibility and simplicity of use, while being straightforward to maintain. The system and its libraries were developed in JavaScript and run as a web app; therefore, all the operations are performed on the local computer, circumventing the need to upload data. The MetaboScope tool is available at https://www.cheminfo.org/flavor/metabolomics/index.html. The code is open-source and can be deployed locally if necessary. Module notes, video tutorials, and clinical spectral datasets are provided for modeling.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae142"},"PeriodicalIF":2.4,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11576352/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142683385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Population-aware permutation-based significance thresholds for genome-wide association studies. 全基因组关联研究中基于人群感知的置换显著性阈值。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-28 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae168
Maura John, Arthur Korte, Marco Todesco, Dominik G Grimm

Motivation: Permutation-based significance thresholds have been shown to be a robust alternative to classical Bonferroni significance thresholds in genome-wide association studies (GWAS) for skewed phenotype distributions. The recently published method permGWAS introduced a batch-wise approach to efficiently compute permutation-based GWAS. However, running multiple univariate tests in parallel leads to many repetitive computations and increased computational resources. More importantly, traditional permutation methods that permute only the phenotype break the underlying population structure.

Results: We propose permGWAS2, an improved method that does not break the population structure during permutations and uses an elegant block matrix decomposition to optimize computations, thereby reducing redundancies. We show on synthetic data that this improved approach yields a lower false discovery rate for skewed phenotype distributions compared to the previous version and the commonly used Bonferroni correction. In addition, we re-analyze a dataset covering phenotypic variation in 86 traits in a population of 615 wild sunflowers (Helianthus annuus L.). This led to the identification of dozens of novel associations with putatively adaptive traits, and removed several likely false-positive associations with limited biological support.

Availability and implementation: permGWAS2 is open-source and publicly available on GitHub for download: https://github.com/grimmlab/permGWAS.

动机在表型分布偏斜的全基因组关联研究(GWAS)中,基于换算的显著性阈值已被证明是经典Bonferroni显著性阈值的稳健替代品。最近发表的 permGWAS 方法引入了一种批处理方法,可高效计算基于 permutation 的 GWAS。然而,并行运行多个单变量检验会导致许多重复计算,增加计算资源。更重要的是,只对表型进行置换的传统置换方法会破坏潜在的群体结构:我们提出了 permGWAS2,这是一种改进的方法,它在排列过程中不会破坏种群结构,并使用优雅的块矩阵分解来优化计算,从而减少了冗余。我们在合成数据上表明,与之前的版本和常用的 Bonferroni 校正相比,这种改进的方法能降低偏斜表型分布的错误发现率。此外,我们还重新分析了一个数据集,该数据集涵盖了 615 个野生向日葵(Helianthus annuus L.)种群中 86 个性状的表型变异。这使得我们发现了数十种与可能具有适应性的性状有关的新关联,并删除了几种生物支持有限的假阳性关联。可用性和实现:permGWAS2 是开源的,可在 GitHub 上公开下载:https://github.com/grimmlab/permGWAS。
{"title":"Population-aware permutation-based significance thresholds for genome-wide association studies.","authors":"Maura John, Arthur Korte, Marco Todesco, Dominik G Grimm","doi":"10.1093/bioadv/vbae168","DOIUrl":"10.1093/bioadv/vbae168","url":null,"abstract":"<p><strong>Motivation: </strong>Permutation-based significance thresholds have been shown to be a robust alternative to classical Bonferroni significance thresholds in genome-wide association studies (GWAS) for skewed phenotype distributions. The recently published method permGWAS introduced a batch-wise approach to efficiently compute permutation-based GWAS. However, running multiple univariate tests in parallel leads to many repetitive computations and increased computational resources. More importantly, traditional permutation methods that permute only the phenotype break the underlying population structure.</p><p><strong>Results: </strong>We propose permGWAS2, an improved method that does not break the population structure during permutations and uses an elegant block matrix decomposition to optimize computations, thereby reducing redundancies. We show on synthetic data that this improved approach yields a lower false discovery rate for skewed phenotype distributions compared to the previous version and the commonly used Bonferroni correction. In addition, we re-analyze a dataset covering phenotypic variation in 86 traits in a population of 615 wild sunflowers (<i>Helianthus annuus</i> L.). This led to the identification of dozens of novel associations with putatively adaptive traits, and removed several likely false-positive associations with limited biological support.</p><p><strong>Availability and implementation: </strong>permGWAS2 is open-source and publicly available on GitHub for download: https://github.com/grimmlab/permGWAS.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae168"},"PeriodicalIF":2.4,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639184/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142831038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
arcMS: transformation of multi-dimensional high-resolution mass spectrometry data to columnar format for compact storage and fast access.
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-26 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae160
Julien Le Roux, Julien Sade

Summary: The arcMS R package addresses the challenges posed by proprietary and open-source high-resolution mass spectrometry data formats by providing functions to collect MSE data from the Waters UNIFI software and store it in the efficient Apache Parquet format, facilitating fast, easy access, and compatibility with various programming environments. This solution facilitates the manipulation of complex mass spectrometry data, including ion mobility or other additional dimensions, enabling potential integration into efficient and open-source workflows.

Availability and implementation: arcMS is an open-source R package and is available on GitHub at https://github.com/leesulab/arcMS. The complete documentation, including details on UNIFI configuration and tutorials for data conversion, access to Parquet files, and filtration of data, is available at https://leesulab.github.io/arcMS. An R/Shiny companion application is also provided for visualization of Parquet data and demonstration of data filtering with the Arrow library https://github.com/leesulab/arcms-dataviz.

{"title":"arcMS: transformation of multi-dimensional high-resolution mass spectrometry data to columnar format for compact storage and fast access.","authors":"Julien Le Roux, Julien Sade","doi":"10.1093/bioadv/vbae160","DOIUrl":"10.1093/bioadv/vbae160","url":null,"abstract":"<p><strong>Summary: </strong>The arcMS R package addresses the challenges posed by proprietary and open-source high-resolution mass spectrometry data formats by providing functions to collect MS<sup>E</sup> data from the Waters UNIFI software and store it in the efficient Apache Parquet format, facilitating fast, easy access, and compatibility with various programming environments. This solution facilitates the manipulation of complex mass spectrometry data, including ion mobility or other additional dimensions, enabling potential integration into efficient and open-source workflows.</p><p><strong>Availability and implementation: </strong>arcMS is an open-source R package and is available on GitHub at https://github.com/leesulab/arcMS. The complete documentation, including details on UNIFI configuration and tutorials for data conversion, access to Parquet files, and filtration of data, is available at https://leesulab.github.io/arcMS. An R/Shiny companion application is also provided for visualization of Parquet data and demonstration of data filtering with the Arrow library https://github.com/leesulab/arcms-dataviz.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae160"},"PeriodicalIF":2.4,"publicationDate":"2024-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11873790/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143544859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics advances
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1