努力调和生化数据库中不一致的分子结构。

IF 1.4 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Journal of Computational Biology Pub Date : 2024-06-01 Epub Date: 2024-05-17 DOI:10.1089/cmb.2024.0520
Casper Asbjørn Eriksen, Jakob Lykke Andersen, Rolf Fagerberg, Daniel Merkle
{"title":"努力调和生化数据库中不一致的分子结构。","authors":"Casper Asbjørn Eriksen, Jakob Lykke Andersen, Rolf Fagerberg, Daniel Merkle","doi":"10.1089/cmb.2024.0520","DOIUrl":null,"url":null,"abstract":"<p><p><b>Information on the structure of molecules, retrieved via biochemical databases, plays a pivotal role in various disciplines, including metabolomics, systems biology, and drug discovery. No such database can be complete and it is often necessary to incorporate data from several sources. However, the molecular structure for a given compound is not necessarily consistent between databases. This article presents StructRecon, a novel tool for resolving unique molecular structures from database identifiers. Currently, identifiers from BiGG, ChEBI,</b> <i>Escherichia coli</i> Metabolome Database <b>(ECMDB), MetaNetX, and PubChem are supported. StructRecon traverses the cross-links between entries in different databases to construct what we call identifier graphs. The goal of these graphs is to offer a more complete view of the total information available on a given compound across all the supported databases. To reconcile discrepancies met during the traversal of the databases, we develop an extensible model for molecular structure supporting multiple independent levels of detail, which allows standardization of the structure to be applied iteratively. In some cases, our standardization approach results in multiple candidate structures for a given compound, in which case a random walk-based algorithm is used to select the most likely structure among incompatible alternatives. As a case study, we applied StructRecon to the <i>EColiCore2</i> model. We found at least one structure for 98.66% of its compounds, which is more than twice as many as possible when using the databases in more standard ways not considering the complex network of cross-database references captured by our identifier graphs. StructRecon is open-source and modular, which enables support for more databases in the future.</b></p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.4000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Toward the Reconciliation of Inconsistent Molecular Structures from Biochemical Databases.\",\"authors\":\"Casper Asbjørn Eriksen, Jakob Lykke Andersen, Rolf Fagerberg, Daniel Merkle\",\"doi\":\"10.1089/cmb.2024.0520\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b>Information on the structure of molecules, retrieved via biochemical databases, plays a pivotal role in various disciplines, including metabolomics, systems biology, and drug discovery. No such database can be complete and it is often necessary to incorporate data from several sources. However, the molecular structure for a given compound is not necessarily consistent between databases. This article presents StructRecon, a novel tool for resolving unique molecular structures from database identifiers. Currently, identifiers from BiGG, ChEBI,</b> <i>Escherichia coli</i> Metabolome Database <b>(ECMDB), MetaNetX, and PubChem are supported. StructRecon traverses the cross-links between entries in different databases to construct what we call identifier graphs. The goal of these graphs is to offer a more complete view of the total information available on a given compound across all the supported databases. To reconcile discrepancies met during the traversal of the databases, we develop an extensible model for molecular structure supporting multiple independent levels of detail, which allows standardization of the structure to be applied iteratively. In some cases, our standardization approach results in multiple candidate structures for a given compound, in which case a random walk-based algorithm is used to select the most likely structure among incompatible alternatives. As a case study, we applied StructRecon to the <i>EColiCore2</i> model. We found at least one structure for 98.66% of its compounds, which is more than twice as many as possible when using the databases in more standard ways not considering the complex network of cross-database references captured by our identifier graphs. StructRecon is open-source and modular, which enables support for more databases in the future.</b></p>\",\"PeriodicalId\":15526,\"journal\":{\"name\":\"Journal of Computational Biology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2024-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computational Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1089/cmb.2024.0520\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/5/17 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1089/cmb.2024.0520","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/5/17 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

通过生化数据库获取的分子结构信息在代谢组学、系统生物学和药物发现等多个学科中发挥着举足轻重的作用。任何此类数据库都不可能是完整的,通常需要结合多个来源的数据。然而,不同数据库中给定化合物的分子结构并不一定一致。本文介绍的 StructRecon 是一种从数据库标识符解析独特分子结构的新型工具。目前,该工具支持来自 BiGG、ChEBI、大肠杆菌代谢组数据库(ECMDB)、MetaNetX 和 PubChem 的标识符。StructRecon 会遍历不同数据库中条目之间的交叉链接,以构建我们所说的标识符图。这些图谱的目的是提供一个更完整的视图,显示特定化合物在所有支持数据库中可用的全部信息。为了调和在遍历数据库过程中遇到的差异,我们开发了一个可扩展的分子结构模型,支持多个独立的细节级别,从而可以反复应用结构标准化。在某些情况下,我们的标准化方法会为给定化合物生成多个候选结构,在这种情况下,我们会使用一种基于随机漫步的算法,从不相容的备选结构中选择最有可能的结构。作为案例研究,我们将 StructRecon 应用于 EColiCore2 模型。我们为其中 98.66% 的化合物找到了至少一种结构,这比以更标准的方式使用数据库而不考虑我们的标识符图捕捉到的复杂的跨数据库引用网络所能找到的结构数量高出一倍多。StructRecon 是开源和模块化的,因此未来可以支持更多数据库。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Toward the Reconciliation of Inconsistent Molecular Structures from Biochemical Databases.

Information on the structure of molecules, retrieved via biochemical databases, plays a pivotal role in various disciplines, including metabolomics, systems biology, and drug discovery. No such database can be complete and it is often necessary to incorporate data from several sources. However, the molecular structure for a given compound is not necessarily consistent between databases. This article presents StructRecon, a novel tool for resolving unique molecular structures from database identifiers. Currently, identifiers from BiGG, ChEBI, Escherichia coli Metabolome Database (ECMDB), MetaNetX, and PubChem are supported. StructRecon traverses the cross-links between entries in different databases to construct what we call identifier graphs. The goal of these graphs is to offer a more complete view of the total information available on a given compound across all the supported databases. To reconcile discrepancies met during the traversal of the databases, we develop an extensible model for molecular structure supporting multiple independent levels of detail, which allows standardization of the structure to be applied iteratively. In some cases, our standardization approach results in multiple candidate structures for a given compound, in which case a random walk-based algorithm is used to select the most likely structure among incompatible alternatives. As a case study, we applied StructRecon to the EColiCore2 model. We found at least one structure for 98.66% of its compounds, which is more than twice as many as possible when using the databases in more standard ways not considering the complex network of cross-database references captured by our identifier graphs. StructRecon is open-source and modular, which enables support for more databases in the future.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Computational Biology
Journal of Computational Biology 生物-计算机:跨学科应用
CiteScore
3.60
自引率
5.90%
发文量
113
审稿时长
6-12 weeks
期刊介绍: Journal of Computational Biology is the leading peer-reviewed journal in computational biology and bioinformatics, publishing in-depth statistical, mathematical, and computational analysis of methods, as well as their practical impact. Available only online, this is an essential journal for scientists and students who want to keep abreast of developments in bioinformatics. Journal of Computational Biology coverage includes: -Genomics -Mathematical modeling and simulation -Distributed and parallel biological computing -Designing biological databases -Pattern matching and pattern detection -Linking disparate databases and data -New tools for computational biology -Relational and object-oriented database technology for bioinformatics -Biological expert system design and use -Reasoning by analogy, hypothesis formation, and testing by machine -Management of biological databases
期刊最新文献
Using Attention-UNet Models to Predict Protein Contact Maps. Sketching Methods with Small Window Guarantee Using Minimum Decycling Sets. CFINet: Cross-Modality MRI Feature Interaction Network for Pseudoprogression Prediction of Glioblastoma. Estimating Haplotype Structure and Frequencies: A Bayesian Approach to Unknown Design in Pooled Genomic Data. Detection and Segmentation of Glioma Tumors Utilizing a UNet Convolutional Neural Network Approach with Non-Subsampled Shearlet Transform.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1