Pub Date : 2024-05-27eCollection Date: 2024-06-01DOI: 10.1515/jib-2023-0041
Paloma Tejera-Nevado, Emilio Serrano, Ana González-Herrero, Rodrigo Bermejo, Alejandro Rodríguez-González
Protein structure determination has made progress with the aid of deep learning models, enabling the prediction of protein folding from protein sequences. However, obtaining accurate predictions becomes essential in certain cases where the protein structure remains undescribed. This is particularly challenging when dealing with rare, diverse structures and complex sample preparation. Different metrics assess prediction reliability and offer insights into result strength, providing a comprehensive understanding of protein structure by combining different models. In a previous study, two proteins named ARM58 and ARM56 were investigated. These proteins contain four domains of unknown function and are present in Leishmania spp. ARM refers to an antimony resistance marker. The study's main objective is to assess the accuracy of the model's predictions, thereby providing insights into the complexities and supporting metrics underlying these findings. The analysis also extends to the comparison of predictions obtained from other species and organisms. Notably, one of these proteins shares an ortholog with Trypanosoma cruzi and Trypanosoma brucei, leading further significance to our analysis. This attempt underscored the importance of evaluating the diverse outputs from deep learning models, facilitating comparisons across different organisms and proteins. This becomes particularly pertinent in cases where no previous structural information is available.
在深度学习模型的帮助下,蛋白质结构测定取得了进展,能够根据蛋白质序列预测蛋白质折叠。然而,在某些蛋白质结构仍未被描述的情况下,获得准确的预测变得至关重要。在处理罕见、多样的结构和复杂的样品制备时,这尤其具有挑战性。不同的指标可以评估预测的可靠性并深入了解预测结果的强度,通过结合不同的模型提供对蛋白质结构的全面了解。在之前的一项研究中,对名为 ARM58 和 ARM56 的两种蛋白质进行了研究。这两个蛋白含有四个功能未知的结构域,存在于利什曼原虫中。 ARM 指的是抗锑标记。研究的主要目的是评估模型预测的准确性,从而深入了解这些发现背后的复杂性和支持性指标。分析还扩展到了与其他物种和生物的预测结果进行比较。值得注意的是,其中一个蛋白质与克鲁斯锥虫和布氏锥虫有一个同源物,这为我们的分析带来了进一步的意义。这一尝试强调了评估深度学习模型不同输出结果的重要性,有助于在不同生物体和蛋白质之间进行比较。在没有先前结构信息的情况下,这一点尤为重要。
{"title":"Unlocking the power of AI models: exploring protein folding prediction through comparative analysis.","authors":"Paloma Tejera-Nevado, Emilio Serrano, Ana González-Herrero, Rodrigo Bermejo, Alejandro Rodríguez-González","doi":"10.1515/jib-2023-0041","DOIUrl":"10.1515/jib-2023-0041","url":null,"abstract":"<p><p>Protein structure determination has made progress with the aid of deep learning models, enabling the prediction of protein folding from protein sequences. However, obtaining accurate predictions becomes essential in certain cases where the protein structure remains undescribed. This is particularly challenging when dealing with rare, diverse structures and complex sample preparation. Different metrics assess prediction reliability and offer insights into result strength, providing a comprehensive understanding of protein structure by combining different models. In a previous study, two proteins named ARM58 and ARM56 were investigated. These proteins contain four domains of unknown function and are present in <i>Leishmania</i> spp. ARM refers to an antimony resistance marker. The study's main objective is to assess the accuracy of the model's predictions, thereby providing insights into the complexities and supporting metrics underlying these findings. The analysis also extends to the comparison of predictions obtained from other species and organisms. Notably, one of these proteins shares an ortholog with <i>Trypanosoma cruzi</i> and <i>Trypanosoma brucei</i>, leading further significance to our analysis. This attempt underscored the importance of evaluating the diverse outputs from deep learning models, facilitating comparisons across different organisms and proteins. This becomes particularly pertinent in cases where no previous structural information is available.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141155977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lucian P. Smith, Frank T. Bergmann, Alan Garny, Tomáš Helikar, Jonathan Karr, D. Nickerson, Herbert M. Sauro, Dagmar Waltemath, Matthias König
Modern biological research is increasingly informed by computational simulation experiments, which necessitate the development of methods for annotating, archiving, sharing, and reproducing the conducted experiments. These simulations increasingly require extensive collaboration among modelers, experimentalists, and engineers. The Minimum Information About a Simulation Experiment (MIASE) guidelines outline the information needed to share simulation experiments. SED-ML is a computer-readable format for the information outlined by MIASE, created as a community project and supported by many investigators and software tools. Level 1 Version 5 of SED-ML expands the ability of modelers to define simulations in SED-ML using the Kinetic Simulation Algorithm Onotoloy (KiSAO). While it was possible in Version 4 to define a simulation entirely using KiSAO, Version 5 now allows users to define tasks, model changes, ranges, and outputs using the ontology as well. SED-ML is supported by a growing ecosystem of investigators, model languages, and software tools, including various languages for constraint-based, kinetic, qualitative, rule-based, and spatial models, and many simulation tools, visual editors, model repositories, and validators. Additional information about SED-ML is available at https://sed-ml.org/.
{"title":"The simulation experiment description markup language (SED-ML): language specification for level 1 version 5.","authors":"Lucian P. Smith, Frank T. Bergmann, Alan Garny, Tomáš Helikar, Jonathan Karr, D. Nickerson, Herbert M. Sauro, Dagmar Waltemath, Matthias König","doi":"10.1515/jib-2024-0008","DOIUrl":"https://doi.org/10.1515/jib-2024-0008","url":null,"abstract":"Modern biological research is increasingly informed by computational simulation experiments, which necessitate the development of methods for annotating, archiving, sharing, and reproducing the conducted experiments. These simulations increasingly require extensive collaboration among modelers, experimentalists, and engineers. The Minimum Information About a Simulation Experiment (MIASE) guidelines outline the information needed to share simulation experiments. SED-ML is a computer-readable format for the information outlined by MIASE, created as a community project and supported by many investigators and software tools. Level 1 Version 5 of SED-ML expands the ability of modelers to define simulations in SED-ML using the Kinetic Simulation Algorithm Onotoloy (KiSAO). While it was possible in Version 4 to define a simulation entirely using KiSAO, Version 5 now allows users to define tasks, model changes, ranges, and outputs using the ontology as well. SED-ML is supported by a growing ecosystem of investigators, model languages, and software tools, including various languages for constraint-based, kinetic, qualitative, rule-based, and spatial models, and many simulation tools, visual editors, model repositories, and validators. Additional information about SED-ML is available at https://sed-ml.org/.","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140703567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-27eCollection Date: 2024-06-01DOI: 10.1515/jib-2023-0046
Hugo López-Fernández, Miguel Pinto, Cristina P Vieira, Pedro Duque, Miguel Reboiro-Jato, Jorge Vieira
The vast amount of genome sequence data that is available, and that is predicted to drastically increase in the near future, can only be efficiently dealt with by building automated pipelines. Indeed, the Earth Biogenome Project will produce high-quality reference genome sequences for all 1.8 million named living eukaryote species, providing unprecedented insight into the evolution of genes and gene families, and thus on biological issues. Here, new modules for gene annotation, further BLAST search algorithms, further multiple sequence alignment methods, the adding of reference sequences, further tree rooting methods, the estimation of rates of synonymous and nonsynonymous substitutions, and the identification of positively selected amino acid sites, have been added to auto-phylo (version 2), a recently developed software to address biological problems using phylogenetic inferences. Additionally, we present auto-phylo-pipeliner, a graphical user interface application that further facilitates the creation and running of auto-phylo pipelines. Inferences on S-RNase specificity, are critical for both cross-based breeding and for the establishment of pollination requirements. Therefore, as a test case, we develop an auto-phylo pipeline to identify amino acid sites under positive selection, that are, in principle, those determining S-RNase specificity, starting from both non-annotated Prunus genomes and sequences available in public databases.
{"title":"Auto-phylo v2 and auto-phylo-pipeliner: building advanced, flexible, and reusable pipelines for phylogenetic inferences, estimation of variability levels and identification of positively selected amino acid sites.","authors":"Hugo López-Fernández, Miguel Pinto, Cristina P Vieira, Pedro Duque, Miguel Reboiro-Jato, Jorge Vieira","doi":"10.1515/jib-2023-0046","DOIUrl":"10.1515/jib-2023-0046","url":null,"abstract":"<p><p>The vast amount of genome sequence data that is available, and that is predicted to drastically increase in the near future, can only be efficiently dealt with by building automated pipelines. Indeed, the Earth Biogenome Project will produce high-quality reference genome sequences for all 1.8 million named living eukaryote species, providing unprecedented insight into the evolution of genes and gene families, and thus on biological issues. Here, new modules for gene annotation, further BLAST search algorithms, further multiple sequence alignment methods, the adding of reference sequences, further tree rooting methods, the estimation of rates of synonymous and nonsynonymous substitutions, and the identification of positively selected amino acid sites, have been added to auto-phylo (version 2), a recently developed software to address biological problems using phylogenetic inferences. Additionally, we present auto-phylo-pipeliner, a graphical user interface application that further facilitates the creation and running of auto-phylo pipelines. Inferences on <i>S-RNase</i> specificity, are critical for both cross-based breeding and for the establishment of pollination requirements. Therefore, as a test case, we develop an auto-phylo pipeline to identify amino acid sites under positive selection, that are, in principle, those determining <i>S-RNase</i> specificity, starting from both non-annotated <i>Prunus</i> genomes and sequences available in public databases.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140289644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-06eCollection Date: 2024-03-01DOI: 10.1515/jib-2023-0048
Sahar Aghakhani, Anna Niarakis, Sylvain Soliman
Molecular interaction maps (MIMs) are static graphical representations depicting complex biochemical networks that can be formalized using one of the Systems Biology Graphical Notation languages. Regardless of their extensive coverage of various biological processes, they are limited in terms of dynamic insights. However, MIMs can serve as templates for developing dynamic computational models. We present MetaLo, an open-source Python package that enables the coupling of Boolean models inferred from process description MIMs with generic core metabolic networks. MetaLo provides a framework to study the impact of signaling cascades, gene regulation processes, and metabolic flux distribution of central energy production pathways. MetaLo computes the Boolean model's asynchronous asymptotic behavior, through the identification of trap-spaces, and extracts metabolic constraints to contextualize the generic metabolic network. MetaLo is able to handle large-scale Boolean models and genome-scale metabolic models without requiring kinetic information or manual tuning. The framework behind MetaLo enables in depth analysis of the regulatory model, and may allow tackling a lack of omics data in poorly addressed biological fields to contextualize generic metabolic networks along with improper automatic reconstructions of cell- and/or disease-specific metabolic networks. MetaLo is available at https://pypi.org/project/metalo/ under the terms of the GNU General Public License v3.
分子相互作用图(MIM)是描述复杂生化网络的静态图形表示法,可使用系统生物学图形符号语言之一进行形式化。尽管它们广泛覆盖了各种生物过程,但在动态洞察方面却很有限。然而,MIM 可以作为开发动态计算模型的模板。我们介绍的 MetaLo 是一个开源 Python 软件包,它能将从过程描述 MIMs 中推断出的布尔模型与通用核心代谢网络相耦合。MetaLo 提供了一个框架,用于研究信号级联、基因调控过程和中心能量生产途径的代谢通量分布的影响。MetaLo 通过识别陷阱空间来计算布尔模型的异步渐进行为,并提取代谢约束条件,从而将通用代谢网络背景化。MetaLo 能够处理大规模布尔模型和基因组规模的代谢模型,而无需动力学信息或人工调整。MetaLo 背后的框架可对调控模型进行深入分析,并可解决生物领域中缺乏 omics 数据的问题,从而将通用代谢网络与细胞和/或疾病特定代谢网络的不当自动重建结合起来。MetaLo 根据 GNU 通用公共许可证 v3 条款发布于 https://pypi.org/project/metalo/。
{"title":"MetaLo: metabolic analysis of Logical models extracted from molecular interaction maps.","authors":"Sahar Aghakhani, Anna Niarakis, Sylvain Soliman","doi":"10.1515/jib-2023-0048","DOIUrl":"10.1515/jib-2023-0048","url":null,"abstract":"<p><p>Molecular interaction maps (MIMs) are static graphical representations depicting complex biochemical networks that can be formalized using one of the Systems Biology Graphical Notation languages. Regardless of their extensive coverage of various biological processes, they are limited in terms of dynamic insights. However, MIMs can serve as templates for developing dynamic computational models. We present MetaLo, an open-source Python package that enables the coupling of Boolean models inferred from process description MIMs with generic core metabolic networks. MetaLo provides a framework to study the impact of signaling cascades, gene regulation processes, and metabolic flux distribution of central energy production pathways. MetaLo computes the Boolean model's asynchronous asymptotic behavior, through the identification of trap-spaces, and extracts metabolic constraints to contextualize the generic metabolic network. MetaLo is able to handle large-scale Boolean models and genome-scale metabolic models without requiring kinetic information or manual tuning. The framework behind MetaLo enables in depth analysis of the regulatory model, and may allow tackling a lack of omics data in poorly addressed biological fields to contextualize generic metabolic networks along with improper automatic reconstructions of cell- and/or disease-specific metabolic networks. MetaLo is available at https://pypi.org/project/metalo/ under the terms of the GNU General Public License v3.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2024-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139693479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-28eCollection Date: 2023-12-01DOI: 10.1515/jib-2023-0026
Lena Raupach, Cassandra Königs
The first approaches in recent years for the integration of pharmacogenomic plausibility checks into clinical practice show both a promising improvement in the drug therapy safety, but also difficulties in application. One of the difficulties is the meaningful interpretation of the text-based results by the medical practitioner. We propose here as an appropriate and sensible solution to avoid misunderstandings and to include evidence-based, pharmacogenomic recommendations in prescriptions, which should be the graph-based visualization of the reports. This allows for a plausible interpretation and relate complex, even contradictory guidelines. The improved overview over the pharmacogenomics (PGx) guidelines using the graphical visualization makes the medical practitioner's choice of dose and medication more patient-specific, improves the treatment outcome and thus, increases the drug therapy safety.
{"title":"PharmoCo: a graph-based visualization of pharmacogenomic plausibility check reports for clinical decision support systems.","authors":"Lena Raupach, Cassandra Königs","doi":"10.1515/jib-2023-0026","DOIUrl":"10.1515/jib-2023-0026","url":null,"abstract":"<p><p>The first approaches in recent years for the integration of pharmacogenomic plausibility checks into clinical practice show both a promising improvement in the drug therapy safety, but also difficulties in application. One of the difficulties is the meaningful interpretation of the text-based results by the medical practitioner. We propose here as an appropriate and sensible solution to avoid misunderstandings and to include evidence-based, pharmacogenomic recommendations in prescriptions, which should be the graph-based visualization of the reports. This allows for a plausible interpretation and relate complex, even contradictory guidelines. The improved overview over the pharmacogenomics (PGx) guidelines using the graphical visualization makes the medical practitioner's choice of dose and medication more patient-specific, improves the treatment outcome and thus, increases the drug therapy safety.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10777363/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139049773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-22eCollection Date: 2023-12-01DOI: 10.1515/jib-2023-0011
Andreas Ian Lackner, Jürgen Pollheimer, Paulina Latos, Martin Knöfler, Sandra Haider
During early pregnancy, extravillous trophoblasts (EVTs) play a crucial role in modifying the maternal uterine environment. Failures in EVT lineage formation and differentiation can lead to pregnancy complications such as preeclampsia, fetal growth restriction, and pregnancy loss. Despite recent advances, our knowledge on molecular and external factors that control and affect EVT development remains incomplete. Using trophoblast organoid in vitro models, we recently discovered that coordinated manipulation of the transforming growth factor beta (TGFβ) signaling is essential for EVT development. To further investigate gene networks involved in EVT function and development, we performed weighted gene co-expression network analysis (WGCNA) on our RNA-Seq data. We identified 10 modules with a median module membership of over 0.8 and sizes ranging from 1005 (M1) to 72 (M27) network genes associated with TGFβ activation status or in vitro culturing, the latter being indicative for yet undiscovered factors that shape the EVT phenotypes. Lastly, we hypothesized that certain therapeutic drugs might unintentionally interfere with placentation by affecting EVT-specific gene expression. We used the STRING database to map correlations and the Drug-Gene Interaction database to identify drug targets. Our comprehensive dataset of drug-gene interactions provides insights into potential risks associated with certain drugs in early gestation.
{"title":"Gene-network based analysis of human placental trophoblast subtypes identifies critical genes as potential targets of therapeutic drugs.","authors":"Andreas Ian Lackner, Jürgen Pollheimer, Paulina Latos, Martin Knöfler, Sandra Haider","doi":"10.1515/jib-2023-0011","DOIUrl":"10.1515/jib-2023-0011","url":null,"abstract":"<p><p>During early pregnancy, extravillous trophoblasts (EVTs) play a crucial role in modifying the maternal uterine environment. Failures in EVT lineage formation and differentiation can lead to pregnancy complications such as preeclampsia, fetal growth restriction, and pregnancy loss. Despite recent advances, our knowledge on molecular and external factors that control and affect EVT development remains incomplete. Using trophoblast organoid <i>in vitro</i> models, we recently discovered that coordinated manipulation of the transforming growth factor beta (TGFβ) signaling is essential for EVT development. To further investigate gene networks involved in EVT function and development, we performed weighted gene co-expression network analysis (WGCNA) on our RNA-Seq data. We identified 10 modules with a median module membership of over 0.8 and sizes ranging from 1005 (M1) to 72 (M27) network genes associated with TGFβ activation status or <i>in vitro</i> culturing, the latter being indicative for yet undiscovered factors that shape the EVT phenotypes. Lastly, we hypothesized that certain therapeutic drugs might unintentionally interfere with placentation by affecting EVT-specific gene expression. We used the STRING database to map correlations and the Drug-Gene Interaction database to identify drug targets. Our comprehensive dataset of drug-gene interactions provides insights into potential risks associated with certain drugs in early gestation.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10777358/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138832989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-15eCollection Date: 2023-12-01DOI: 10.1515/jib-2023-0002
Marco Zurdo-Tabernero, Ángel Canal-Alonso, Fernando de la Prieta, Sara Rodríguez, Javier Prieto, Juan Manuel Corchado
Epilepsy is a neurological disorder (the third most common, following stroke and migraines). A key aspect of its diagnosis is the presence of seizures that occur without a known cause and the potential for new seizures to occur. Machine learning has shown potential as a cost-effective alternative for rapid diagnosis. In this study, we review the current state of machine learning in the detection and prediction of epileptic seizures. The objective of this study is to portray the existing machine learning methods for seizure prediction. Internet bibliographical searches were conducted to identify relevant literature on the topic. Through cross-referencing from key articles, additional references were obtained to provide a comprehensive overview of the techniques. As the aim of this paper aims is not a pure bibliographical review of the subject, the publications here cited have been selected among many others based on their number of citations. To implement accurate diagnostic and treatment tools, it is necessary to achieve a balance between prediction time, sensitivity, and specificity. This balance can be achieved using deep learning algorithms. The best performance and results are often achieved by combining multiple techniques and features, but this approach can also increase computational requirements.
{"title":"An overview of machine learning and deep learning techniques for predicting epileptic seizures.","authors":"Marco Zurdo-Tabernero, Ángel Canal-Alonso, Fernando de la Prieta, Sara Rodríguez, Javier Prieto, Juan Manuel Corchado","doi":"10.1515/jib-2023-0002","DOIUrl":"10.1515/jib-2023-0002","url":null,"abstract":"<p><p>Epilepsy is a neurological disorder (the third most common, following stroke and migraines). A key aspect of its diagnosis is the presence of seizures that occur without a known cause and the potential for new seizures to occur. Machine learning has shown potential as a cost-effective alternative for rapid diagnosis. In this study, we review the current state of machine learning in the detection and prediction of epileptic seizures. The objective of this study is to portray the existing machine learning methods for seizure prediction. Internet bibliographical searches were conducted to identify relevant literature on the topic. Through cross-referencing from key articles, additional references were obtained to provide a comprehensive overview of the techniques. As the aim of this paper aims is not a pure bibliographical review of the subject, the publications here cited have been selected among many others based on their number of citations. To implement accurate diagnostic and treatment tools, it is necessary to achieve a balance between prediction time, sensitivity, and specificity. This balance can be achieved using deep learning algorithms. The best performance and results are often achieved by combining multiple techniques and features, but this approach can also increase computational requirements.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10777364/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138805520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-14eCollection Date: 2023-12-01DOI: 10.1515/jib-2023-0006
Jing Chen, Zixiang Wang, Jia Huang
Proteins are important parts of the biological structures and encode a lot of biological information. Protein-protein interaction network alignment is a model for analyzing proteins that helps discover conserved functions between organisms and predict unknown functions. In particular, multi-network alignment aims at finding the mapping relationship among multiple network nodes, so as to transfer the knowledge across species. However, with the increasing complexity of PPI networks, how to perform network alignment more accurately and efficiently is a new challenge. This paper proposes a new global network alignment algorithm called Simulated Annealing Multiple Network Alignment (SAMNA), using both network topology and sequence homology information. To generate the alignment, SAMNA first generates cross-network candidate clusters by a clustering algorithm on a k-partite similarity graph constructed with sequence similarity information, and then selects candidate cluster nodes as alignment results and optimizes them using an improved simulated annealing algorithm. Finally, the SAMNA algorithm was experimented on synthetic and real-world network datasets, and the results showed that SAMNA outperformed the state-of-the-art algorithm in biological performance.
蛋白质是生物结构的重要组成部分,并编码大量生物信息。蛋白质-蛋白质相互作用网络配准是一种分析蛋白质的模型,有助于发现生物体之间的保守功能和预测未知功能。其中,多网络配准旨在找到多个网络节点之间的映射关系,从而实现跨物种知识传递。然而,随着 PPI 网络的日益复杂,如何更准确、更高效地进行网络配准是一个新的挑战。本文提出了一种新的全局网络配准算法--模拟退火多重网络配准(SAMNA),同时使用网络拓扑和序列同源性信息。为了生成对齐结果,SAMNA 首先在利用序列相似性信息构建的 k-partite 相似性图上通过聚类算法生成跨网络候选簇,然后选择候选簇节点作为对齐结果,并利用改进的模拟退火算法对其进行优化。最后,SAMNA 算法在合成和实际网络数据集上进行了实验,结果表明 SAMNA 在生物学性能上优于最先进的算法。
{"title":"SAMNA: accurate alignment of multiple biological networks based on simulated annealing.","authors":"Jing Chen, Zixiang Wang, Jia Huang","doi":"10.1515/jib-2023-0006","DOIUrl":"10.1515/jib-2023-0006","url":null,"abstract":"<p><p>Proteins are important parts of the biological structures and encode a lot of biological information. Protein-protein interaction network alignment is a model for analyzing proteins that helps discover conserved functions between organisms and predict unknown functions. In particular, multi-network alignment aims at finding the mapping relationship among multiple network nodes, so as to transfer the knowledge across species. However, with the increasing complexity of PPI networks, how to perform network alignment more accurately and efficiently is a new challenge. This paper proposes a new global network alignment algorithm called Simulated Annealing Multiple Network Alignment (SAMNA), using both network topology and sequence homology information. To generate the alignment, SAMNA first generates cross-network candidate clusters by a clustering algorithm on a <i>k</i>-partite similarity graph constructed with sequence similarity information, and then selects candidate cluster nodes as alignment results and optimizes them using an improved simulated annealing algorithm. Finally, the SAMNA algorithm was experimented on synthetic and real-world network datasets, and the results showed that SAMNA outperformed the state-of-the-art algorithm in biological performance.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10777366/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138805553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-12eCollection Date: 2023-09-01DOI: 10.1515/jib-2023-0014
Grigory A Oborotov, Konstantin A Koshechkin, Yuriy L Orlov
Applications of Artificial Intelligence in medical informatics solutions risk sharing have social value. At a time of ever-increasing cost for the provision of medicines to citizens, there is a need to restrain the growth of health care costs. The search for computer technologies to stop or slow down the growth of costs acquires a new very important and significant meaning. We discussed the two information technologies in pharmacotherapy and the possibility of combining and sharing them, namely the combination of risk-sharing agreements and Machine Learning, which was made possible by the development of Artificial Intelligence (AI). Neural networks could be used to predict the outcome to reduce the risk factors for treatment. AI-based data processing automation technologies could be also used for risk-sharing agreements automation.
{"title":"Application of Artificial Intelligence or machine learning in risk sharing agreements for pharmacotherapy risk management.","authors":"Grigory A Oborotov, Konstantin A Koshechkin, Yuriy L Orlov","doi":"10.1515/jib-2023-0014","DOIUrl":"10.1515/jib-2023-0014","url":null,"abstract":"<p><p>Applications of Artificial Intelligence in medical informatics solutions risk sharing have social value. At a time of ever-increasing cost for the provision of medicines to citizens, there is a need to restrain the growth of health care costs. The search for computer technologies to stop or slow down the growth of costs acquires a new very important and significant meaning. We discussed the two information technologies in pharmacotherapy and the possibility of combining and sharing them, namely the combination of risk-sharing agreements and Machine Learning, which was made possible by the development of Artificial Intelligence (AI). Neural networks could be used to predict the outcome to reduce the risk factors for treatment. AI-based data processing automation technologies could be also used for risk-sharing agreements automation.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10757074/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138805521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-12-05eCollection Date: 2024-03-01DOI: 10.1515/jib-2023-0021
Avery Mecham, Ashlie Stephenson, Badi I Quinteros, Grace S Brown, Stephen R Piccolo
TidyGEO is a Web-based tool for downloading, tidying, and reformatting data series from Gene Expression Omnibus (GEO). As a freely accessible repository with data from over 6 million biological samples across more than 4000 organisms, GEO provides diverse opportunities for secondary research. Although scientists may find assay data relevant to a given research question, most analyses require sample-level annotations. In GEO, such annotations are stored alongside assay data in delimited, text-based files. However, the structure and semantics of the annotations vary widely from one series to another, and many annotations are not useful for analysis purposes. Thus, every GEO series must be tidied before it is analyzed. Manual approaches may be used, but these are error prone and take time away from other research tasks. Custom computer scripts can be written, but many scientists lack the computational expertise to create such scripts. To address these challenges, we created TidyGEO, which supports essential data-cleaning tasks for sample-level annotations, such as selecting informative columns, renaming columns, splitting or merging columns, standardizing data values, and filtering samples. Additionally, users can integrate annotations with assay data, restructure assay data, and generate code that enables others to reproduce these steps.
{"title":"TidyGEO: preparing analysis-ready datasets from Gene Expression Omnibus.","authors":"Avery Mecham, Ashlie Stephenson, Badi I Quinteros, Grace S Brown, Stephen R Piccolo","doi":"10.1515/jib-2023-0021","DOIUrl":"10.1515/jib-2023-0021","url":null,"abstract":"<p><p>TidyGEO is a Web-based tool for downloading, tidying, and reformatting data series from Gene Expression Omnibus (GEO). As a freely accessible repository with data from over 6 million biological samples across more than 4000 organisms, GEO provides diverse opportunities for secondary research. Although scientists may find assay data relevant to a given research question, most analyses require sample-level annotations. In GEO, such annotations are stored alongside assay data in delimited, text-based files. However, the structure and semantics of the annotations vary widely from one series to another, and many annotations are not useful for analysis purposes. Thus, every GEO series must be tidied before it is analyzed. Manual approaches may be used, but these are error prone and take time away from other research tasks. Custom computer scripts can be written, but many scientists lack the computational expertise to create such scripts. To address these challenges, we created TidyGEO, which supports essential data-cleaning tasks for sample-level annotations, such as selecting informative columns, renaming columns, splitting or merging columns, standardizing data values, and filtering samples. Additionally, users can integrate annotations with assay data, restructure assay data, and generate code that enables others to reproduce these steps.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138479290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}