What is a consistent glycan composition dataset?

Federico Saba, Julien Mariethoz, F. Lisacek
{"title":"What is a consistent glycan composition dataset?","authors":"Federico Saba, Julien Mariethoz, F. Lisacek","doi":"10.3389/frans.2023.1073540","DOIUrl":null,"url":null,"abstract":"Introduction: One of the main challenges in bioinformatics has been and still is, the comparison of entities through the development of algorithms for similarity scoring and data clustering according to biologically relevant aspects. Glycoinformatics also faces this challenge, in particular regarding the automated comparison of protein and/or tissue glycomes, that remains a relatively uncharted territory. Methods: Low and high throughput experimental glycomic and glycoproteomic results were collected, revealing a bias toward N-linked glycomes. Then, N-glycomes were considered and represented as networks of related glycan compositions as opposed to lists of glycans. They were processed and compared through a java application generating graphs and another producing a similarity matrix based on graph content. Several scoring schemes (e.g., Jaccard index or cosine) were tested and evaluated using the Matthews Correlation Coefficient, in order to capture a meaningful protein and tissue N-glycome similarity. Results: Assuming that a glycome corresponds to a well-connected graph of glycan compositions, graph comparison has revealed gaps that can be interpreted as inconsistencies. The outcome of systematic graph comparison is both formal and practical. In principle, it is shown that the idiosyncrasy of current glycome data limits the definition of appropriate estimates for systematically comparing N-glycomes. Yet, several potentially interesting criteria could be identified in a series of use cases detailed in the study. Discussion: Differentially expressed glycomes are usually compared manually, but the resulting work tends to remain in publications due to the lack of dedicated tools. Even manually, cross-comparison is challenging mostly because different sets of features are used from one study to the other. The work presented here enables laying down guidelines for developing a software tool comparing glycomes based on appropriate definitions of similarity and suitable methods for its evaluation and implementation.","PeriodicalId":73063,"journal":{"name":"Frontiers in analytical science","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in analytical science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frans.2023.1073540","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: One of the main challenges in bioinformatics has been and still is, the comparison of entities through the development of algorithms for similarity scoring and data clustering according to biologically relevant aspects. Glycoinformatics also faces this challenge, in particular regarding the automated comparison of protein and/or tissue glycomes, that remains a relatively uncharted territory. Methods: Low and high throughput experimental glycomic and glycoproteomic results were collected, revealing a bias toward N-linked glycomes. Then, N-glycomes were considered and represented as networks of related glycan compositions as opposed to lists of glycans. They were processed and compared through a java application generating graphs and another producing a similarity matrix based on graph content. Several scoring schemes (e.g., Jaccard index or cosine) were tested and evaluated using the Matthews Correlation Coefficient, in order to capture a meaningful protein and tissue N-glycome similarity. Results: Assuming that a glycome corresponds to a well-connected graph of glycan compositions, graph comparison has revealed gaps that can be interpreted as inconsistencies. The outcome of systematic graph comparison is both formal and practical. In principle, it is shown that the idiosyncrasy of current glycome data limits the definition of appropriate estimates for systematically comparing N-glycomes. Yet, several potentially interesting criteria could be identified in a series of use cases detailed in the study. Discussion: Differentially expressed glycomes are usually compared manually, but the resulting work tends to remain in publications due to the lack of dedicated tools. Even manually, cross-comparison is challenging mostly because different sets of features are used from one study to the other. The work presented here enables laying down guidelines for developing a software tool comparing glycomes based on appropriate definitions of similarity and suitable methods for its evaluation and implementation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
什么是一致的聚糖组成数据集?
引言:生物信息学的主要挑战之一一直是,现在仍然是,通过开发根据生物学相关方面进行相似性评分和数据聚类的算法来比较实体。糖信息学也面临着这一挑战,特别是在蛋白质和/或组织糖组的自动比较方面,这仍然是一个相对未知的领域。方法:收集低通量和高通量实验糖组学和糖蛋白质组学结果,揭示了对N-连接糖组的偏见。然后,N-糖组被认为是相关聚糖组成的网络,而不是聚糖列表。通过一个生成图形的java应用程序和另一个基于图形内容生成相似性矩阵的应用程序对它们进行处理和比较。使用Matthews相关系数测试和评估了几种评分方案(例如,Jaccard指数或余弦),以获取有意义的蛋白质和组织N-糖组相似性。结果:假设一个糖组对应于一个连接良好的聚糖组成图,图形比较揭示了可以被解释为不一致的差距。系统图比较的结果是形式化的和实用的。原则上,研究表明,当前糖组数据的特殊性限制了系统比较N-糖组的适当估计的定义。然而,在研究中详细介绍的一系列用例中,可以确定几个潜在的有趣标准。讨论:差异表达的糖组通常是手动比较的,但由于缺乏专用工具,结果往往保留在出版物中。即使是手动的,交叉比较也很有挑战性,主要是因为一项研究与另一项研究使用了不同的特征集。本文介绍的工作能够根据相似性的适当定义和评估和实施的适当方法,为开发比较糖组的软件工具制定指导方针。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Separation of isobaric phosphorothioate oligonucleotides in capillary electrophoresis: study of the influence of cationic cyclodextrins on chemo and stereoselectivity Simultaneous determination of small molecules and proteins in wastewater-based epidemiology A retrospective view on non-linear methods in chemometrics, and future directions A Bayesian approach for constituent estimation in nucleic acid mixture models Editorial: Plant-microbe omics
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1