DNA oxidation is one of the main types of damage to the genetic material of living organisms. Of the many dozens of oxidative lesions, the most abundant is 8-oxoguanine (8-oxoG), a premutagenic base that leads to G→T transversions during replication. Double-stranded DNA can conduct holes through the π system of stacked nucleobases. Such electron vacancies are ultimately localized at the 5'-terminal nucleotides of polyguanine runs (G-runs), making these positions characteristic sites of 8-oxoG formation. While such properties of G-runs have been studied in vitro at the level of chemical reactivity, the extent to which they can influence mutagenesis spectra in vivo remains unclear. Here, we have analyzed the nucleotide context of G-runs in a representative set of 62 high-quality prokaryotic genomes and in the human telomere-to-telomere genome. G-runs were, on average, shorter than polyadenine runs (A- runs), and the probability of a G-run being elongated by one nucleotide is lower than in the case of A-runs. The representation of T in the position 5'-flanking G-runs is increased, especially in organisms with aerobic metabolism, which is consistent with the model of preferential G→T substitutions at the 5'-position with 8-oxoG as a precursor. Conversely, the frequency of G and C is increased and the frequency of T is decreased in the position 5'-flanking A- runs. A biphasic pattern of G-run expansion is observed in the human genome: the probability of sequences longer than 8-9 nucleotides being elongated by one nucleotide increases significantly. An increased representation of C in the 5'-flanking position to long G-runs was found, together with an elevated frequency of 5'-G→A substitutions in telomere repeats. This may indicate the existence of mutagenic processes whose mechanism has not yet been characterized but may be associated with DNA polymerase errors during replication of the products of further oxidation of 8-oxoG.
{"title":"DNA damage reflected in the evolution of G-runs in genomes.","authors":"I R Grin, D O Zharkov","doi":"10.18699/vjgb-25-98","DOIUrl":"https://doi.org/10.18699/vjgb-25-98","url":null,"abstract":"<p><p>DNA oxidation is one of the main types of damage to the genetic material of living organisms. Of the many dozens of oxidative lesions, the most abundant is 8-oxoguanine (8-oxoG), a premutagenic base that leads to G→T transversions during replication. Double-stranded DNA can conduct holes through the π system of stacked nucleobases. Such electron vacancies are ultimately localized at the 5'-terminal nucleotides of polyguanine runs (G-runs), making these positions characteristic sites of 8-oxoG formation. While such properties of G-runs have been studied in vitro at the level of chemical reactivity, the extent to which they can influence mutagenesis spectra in vivo remains unclear. Here, we have analyzed the nucleotide context of G-runs in a representative set of 62 high-quality prokaryotic genomes and in the human telomere-to-telomere genome. G-runs were, on average, shorter than polyadenine runs (A- runs), and the probability of a G-run being elongated by one nucleotide is lower than in the case of A-runs. The representation of T in the position 5'-flanking G-runs is increased, especially in organisms with aerobic metabolism, which is consistent with the model of preferential G→T substitutions at the 5'-position with 8-oxoG as a precursor. Conversely, the frequency of G and C is increased and the frequency of T is decreased in the position 5'-flanking A- runs. A biphasic pattern of G-run expansion is observed in the human genome: the probability of sequences longer than 8-9 nucleotides being elongated by one nucleotide increases significantly. An increased representation of C in the 5'-flanking position to long G-runs was found, together with an elevated frequency of 5'-G→A substitutions in telomere repeats. This may indicate the existence of mutagenic processes whose mechanism has not yet been characterized but may be associated with DNA polymerase errors during replication of the products of further oxidation of 8-oxoG.</p>","PeriodicalId":44339,"journal":{"name":"Vavilovskii Zhurnal Genetiki i Selektsii","volume":"29 7","pages":"913-924"},"PeriodicalIF":1.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12795821/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
O A Podkolodnaya, M A Deryuzhenko, N N Tverdokhleb, K A Zolotareva, Yu V Makovka, N L Podkolodny, V V Suslov, I V Chadaeva, L A Fedoseeva, A A Seryapina, D Yu Oshchepkov, A G Bogomolov, E Yu Kondratyuk, O E Redina, A L Markel, N E Gruntenko, M P Ponomarenko
<p><p>Since the work of Nobel Prize winner Thomas Morgan in 1909, the fruit fly Drosophila melanogaster has been one of the most popular model animals in genetics. Research using this fly was honored with the Nobel Prize many times: in 1946 (Muller, X-ray mutagenesis), in 1995 (Lewis, Nüsslein-Volhard, Wieschaus, genetic control of embryogenesis), in 2004 (Axel and Buck, the olfactory system), in 2011 (Steinman, dendritic cells in adaptive immunity; Beutler and Hoffman, activation of innate immunity), and in 2017 (Hall, Rosbash and Young, the molecular mechanism of the circadian rhythm). The prominent role of Drosophila in genetics is due to its key features: short life cycle, frequent generational turnover, ease of maintenance, high fertility, small size, transparent embryos, simple larval structure, the possibility to observe visually chromosomal rearrangements due to the presence of polytene chromosomes, and accessibility to molecular genetic manipulation. Furthermore, the highly conserved nature of several signaling pathways and gene networks in Drosophila and their similarity to those of mammals and humans, taken together with the development of high-throughput genomic sequencing, motivated the use of D. melanogaster as a model organism in biomedical fields of inquiry: pharmacology, toxicology, cardiology, oncology, immunology, gerontology, and radiobiology. These studies add to the understanding of the genetic and epigenetic basis of the pathogenesis of human diseases. This paper describes our curated knowledge base, FlyDEGdb (https://www.sysbio.ru/FlyDEGdb), which stores information on differentially expressed genes (DEGs) in Drosophila. This information was extracted from 50 scientific articles containing experimental data on changes in the expression of 20,058 genes (80 %) out of the 25,079 Drosophila genes stored in the NCBI Gene database. The changes were induced by 52 stress factors, including heat and cold exposure, dehydration, heavy metals, radiation, starvation, household chemicals, drugs, fertilizers, insecticides, pesticides, herbicides, and other toxicants. The FlyDEGdb knowledge base is illustrated using the example of the dysf (dysfusion) Drosophila gene, which had been identified as a DEG under cold shock and in toxicity tests of the herbicide paraquat, the solvent toluene, the drug menadione, and the food additive E923. FlyDEGdb stores information on changes in the expression of the dysf gene and its homologues: (a) the Clk, cyc, and per genes in Drosophila, and (b) the NPAS4, CLOCK, BMAL1, PER1, and PER2 genes in humans. These data are supplemented with information on the biological processes in which these genes are involved: oocyte maturation (oogenesis), regulation of stress response and circadian rhythm, carcinogenesis, aging, etc. Therefore, FlyDEGdb, containing information on the widely used model organism, Drosophila, can be helpful for researchers working in the molecular biology and genetics of humans and animals,
自从1909年诺贝尔奖得主托马斯·摩根(Thomas Morgan)的研究以来,果蝇黑腹果蝇(Drosophila melanogaster)一直是遗传学中最受欢迎的模型动物之一。利用这种果蝇的研究多次获得诺贝尔奖:1946年(Muller, x射线诱变),1995年(Lewis, n sslein- volhard, Wieschaus,胚胎发生的遗传控制),2004年(Axel和Buck,嗅觉系统),2011年(Steinman,适应性免疫中的树突状细胞;Beutler和Hoffman,先天免疫的激活),以及2017年(Hall, Rosbash和Young,昼夜节律的分子机制)。果蝇在遗传学中的突出作用是由于其关键特征:生命周期短,世代更替频繁,易于维护,繁殖力高,体积小,胚胎透明,幼虫结构简单,由于多染色体染色体的存在,可以通过视觉观察染色体重排,以及易于进行分子遗传操作。此外,果蝇的一些信号通路和基因网络的高度保守性,以及它们与哺乳动物和人类的相似性,再加上高通量基因组测序的发展,促使黑腹果蝇作为生物医学研究领域的模式生物:药理学、毒理学、心脏病学、肿瘤学、免疫学、老年学和放射生物学。这些研究增加了对人类疾病发病机制的遗传和表观遗传基础的理解。本文描述了我们的知识库FlyDEGdb (https://www.sysbio.ru/FlyDEGdb),它存储了果蝇差异表达基因(DEGs)的信息。这些信息是从50篇科学文章中提取的,这些文章包含NCBI基因数据库中存储的25,079个果蝇基因中20,058个基因(80%)表达变化的实验数据。这些变化是由52种应激因素引起的,包括冷热暴露、脱水、重金属、辐射、饥饿、家用化学品、药物、肥料、杀虫剂、杀虫剂、除草剂和其他毒物。FlyDEGdb知识库以果蝇基因失调(dysf)为例进行说明,该基因在冷休克和除草剂百草枯、溶剂甲苯、药物甲萘醌和食品添加剂E923的毒性试验中被鉴定为DEG。FlyDEGdb存储了异常基因及其同源基因的表达变化信息:(a)果蝇的Clk、cyc和per基因,以及(b)人类的NPAS4、CLOCK、BMAL1、PER1和PER2基因。这些数据还补充了有关这些基因参与的生物学过程的信息:卵母细胞成熟(卵发生)、应激反应和昼夜节律的调节、致癌作用、衰老等。因此,包含广泛使用的模式生物果蝇的信息的FlyDEGdb可以帮助研究人员在人类和动物的分子生物学和遗传学、生理学、转化医学、药理学、营养学、农业化学、放射生物学、毒理学和生物信息学方面工作。
{"title":"FlyDEGdb knowledge base on differentially expressed genes of Drosophila melanogaster, a model object in biomedicine.","authors":"O A Podkolodnaya, M A Deryuzhenko, N N Tverdokhleb, K A Zolotareva, Yu V Makovka, N L Podkolodny, V V Suslov, I V Chadaeva, L A Fedoseeva, A A Seryapina, D Yu Oshchepkov, A G Bogomolov, E Yu Kondratyuk, O E Redina, A L Markel, N E Gruntenko, M P Ponomarenko","doi":"10.18699/vjgb-25-101","DOIUrl":"https://doi.org/10.18699/vjgb-25-101","url":null,"abstract":"<p><p>Since the work of Nobel Prize winner Thomas Morgan in 1909, the fruit fly Drosophila melanogaster has been one of the most popular model animals in genetics. Research using this fly was honored with the Nobel Prize many times: in 1946 (Muller, X-ray mutagenesis), in 1995 (Lewis, Nüsslein-Volhard, Wieschaus, genetic control of embryogenesis), in 2004 (Axel and Buck, the olfactory system), in 2011 (Steinman, dendritic cells in adaptive immunity; Beutler and Hoffman, activation of innate immunity), and in 2017 (Hall, Rosbash and Young, the molecular mechanism of the circadian rhythm). The prominent role of Drosophila in genetics is due to its key features: short life cycle, frequent generational turnover, ease of maintenance, high fertility, small size, transparent embryos, simple larval structure, the possibility to observe visually chromosomal rearrangements due to the presence of polytene chromosomes, and accessibility to molecular genetic manipulation. Furthermore, the highly conserved nature of several signaling pathways and gene networks in Drosophila and their similarity to those of mammals and humans, taken together with the development of high-throughput genomic sequencing, motivated the use of D. melanogaster as a model organism in biomedical fields of inquiry: pharmacology, toxicology, cardiology, oncology, immunology, gerontology, and radiobiology. These studies add to the understanding of the genetic and epigenetic basis of the pathogenesis of human diseases. This paper describes our curated knowledge base, FlyDEGdb (https://www.sysbio.ru/FlyDEGdb), which stores information on differentially expressed genes (DEGs) in Drosophila. This information was extracted from 50 scientific articles containing experimental data on changes in the expression of 20,058 genes (80 %) out of the 25,079 Drosophila genes stored in the NCBI Gene database. The changes were induced by 52 stress factors, including heat and cold exposure, dehydration, heavy metals, radiation, starvation, household chemicals, drugs, fertilizers, insecticides, pesticides, herbicides, and other toxicants. The FlyDEGdb knowledge base is illustrated using the example of the dysf (dysfusion) Drosophila gene, which had been identified as a DEG under cold shock and in toxicity tests of the herbicide paraquat, the solvent toluene, the drug menadione, and the food additive E923. FlyDEGdb stores information on changes in the expression of the dysf gene and its homologues: (a) the Clk, cyc, and per genes in Drosophila, and (b) the NPAS4, CLOCK, BMAL1, PER1, and PER2 genes in humans. These data are supplemented with information on the biological processes in which these genes are involved: oocyte maturation (oogenesis), regulation of stress response and circadian rhythm, carcinogenesis, aging, etc. Therefore, FlyDEGdb, containing information on the widely used model organism, Drosophila, can be helpful for researchers working in the molecular biology and genetics of humans and animals,","PeriodicalId":44339,"journal":{"name":"Vavilovskii Zhurnal Genetiki i Selektsii","volume":"29 7","pages":"952-962"},"PeriodicalIF":1.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12795857/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M Zhao, T E Tyugashev, A T Davletgildeeva, N A Kuznetsov
The ABH2 enzyme belongs to the AlkB-like family of Fe(II)/α-ketoglutarate-dependent dioxygenases. Various non-heme dioxygenases act on a wide range of substrates and have a complex catalytic mechanism involving α-ketoglutarate and an Fe(II) ion as a cofactor. Representatives of the AlkB family catalyze the direct oxidation of alkyl substituents in the nitrogenous bases of DNA and RNA, providing protection against the mutagenic effects of endogenous and exogenous alkylating agents, and also participate in the regulation of the methylation level of some RNAs. DNA dioxygenase ABH2, localized predominantly in the cell nucleus, is specific for double-stranded DNA substrates and, unlike most other human AlkB-like enzymes, has a fairly broad spectrum of substrate specificity, oxidizing alkyl groups of such modified nitrogenous bases as, for example, N 1-methyladenosine, N 3-methylcytidine, 1,N 6-ethenoadenosine and 3,N 4-ethenocytidine. To analyze the mechanism underlying the enzyme's substrate specificity and to clarify the functional role of key active-site amino acid residues, we performed molecular dynamics simulations of complexes of the wild-type ABH2 enzyme and its mutant forms containing amino acid substitutions V99A, F124A and S125A with two types of DNA substrates carrying methylated bases N 1-methyladenine and N 3-methylcytosine, respectively. It was found that the V99A substitution leads to an increase in the mobility of protein loops L1 and L2 involved in binding the DNA substrate and changes the distribution of π-π contacts between the side chain of residue F102 and nitrogenous bases located near the damaged nucleotide. The F124A substitution leads to the loss of π-π stacking with the damaged base, which in turn destabilizes the architecture of the active site, disrupts the interaction with the iron ion and prevents optimal catalytic positioning of α-ketoglutarate in the active site. The S125A substitution leads to the loss of direct interaction of the L2 loop with the 5'-phosphate group of the damaged nucleotide, weakening the binding of the enzyme to the DNA substrate. Thus, the obtained data revealed the functional role of three amino acid residues of the active site and contributed to the understanding of the structural-functional relationships in the recognition of a damaged nucleotide and the formation of a catalytic complex by the human ABH2 enzyme.
{"title":"Molecular dynamic analysis of the functional role of amino acid residues V99, F124 and S125 of human DNA dioxygenase ABH2.","authors":"M Zhao, T E Tyugashev, A T Davletgildeeva, N A Kuznetsov","doi":"10.18699/vjgb-25-111","DOIUrl":"https://doi.org/10.18699/vjgb-25-111","url":null,"abstract":"<p><p>The ABH2 enzyme belongs to the AlkB-like family of Fe(II)/α-ketoglutarate-dependent dioxygenases. Various non-heme dioxygenases act on a wide range of substrates and have a complex catalytic mechanism involving α-ketoglutarate and an Fe(II) ion as a cofactor. Representatives of the AlkB family catalyze the direct oxidation of alkyl substituents in the nitrogenous bases of DNA and RNA, providing protection against the mutagenic effects of endogenous and exogenous alkylating agents, and also participate in the regulation of the methylation level of some RNAs. DNA dioxygenase ABH2, localized predominantly in the cell nucleus, is specific for double-stranded DNA substrates and, unlike most other human AlkB-like enzymes, has a fairly broad spectrum of substrate specificity, oxidizing alkyl groups of such modified nitrogenous bases as, for example, N 1-methyladenosine, N 3-methylcytidine, 1,N 6-ethenoadenosine and 3,N 4-ethenocytidine. To analyze the mechanism underlying the enzyme's substrate specificity and to clarify the functional role of key active-site amino acid residues, we performed molecular dynamics simulations of complexes of the wild-type ABH2 enzyme and its mutant forms containing amino acid substitutions V99A, F124A and S125A with two types of DNA substrates carrying methylated bases N 1-methyladenine and N 3-methylcytosine, respectively. It was found that the V99A substitution leads to an increase in the mobility of protein loops L1 and L2 involved in binding the DNA substrate and changes the distribution of π-π contacts between the side chain of residue F102 and nitrogenous bases located near the damaged nucleotide. The F124A substitution leads to the loss of π-π stacking with the damaged base, which in turn destabilizes the architecture of the active site, disrupts the interaction with the iron ion and prevents optimal catalytic positioning of α-ketoglutarate in the active site. The S125A substitution leads to the loss of direct interaction of the L2 loop with the 5'-phosphate group of the damaged nucleotide, weakening the binding of the enzyme to the DNA substrate. Thus, the obtained data revealed the functional role of three amino acid residues of the active site and contributed to the understanding of the structural-functional relationships in the recognition of a damaged nucleotide and the formation of a catalytic complex by the human ABH2 enzyme.</p>","PeriodicalId":44339,"journal":{"name":"Vavilovskii Zhurnal Genetiki i Selektsii","volume":"29 7","pages":"1062-1072"},"PeriodicalIF":1.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12795828/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N M Levanova, E G Vergunov, A N Savostyanov, I V Yatsyk, V A Ivanisenko
Accumulated evidence links dysregulated cytokine signaling to the pathogenesis of autism spectrum disorder (ASD), implicating genes, proteins, and their intermolecular networks. This paper systematizes these findings using bioinformatics analysis and machine learning methods. The primary tool employed in the study was the ANDSystem cognitive platform, developed at the Institute of Cytology and Genetics, which utilizes artificial intelligence techniques for automated knowledge extraction from biomedical databases and scientific publications. Using ANDSystem, we reconstructed a gene network of cytokine-mediated regulation of autism spectrum disorder (ASD)-associated genes and proteins. The analysis identified 110 cytokines that regulate the activity, degradation, and transport of 58 proteins involved in ASD pathogenesis, as well as the expression of 91 ASD-associated genes. Gene Ontology (GO) enrichment analysis revealed statistically significant associations of these genes with biological processes related to the development and function of the central nervous system. Furthermore, topological network analysis and functional significance assessment based on association with ASD-related GO biological processes allowed us to identify 21 cytokines exerting the strongest influence on the regulatory network. Among these, eight cytokines (IL-4, TGF-β1, BMP4, VEGFA, BMP2, IL-10, IFN-γ, TNF-α) had the highest priority, ranking at the top across all employed metrics. Notably, eight of the 21 prioritized cytokines (TNF-α, IL-6, IL-4, VEGFA, IL-2, IL-1β, IFN-γ, IL-17) are known targets of drugs currently used as immunosuppressants and antitumor agents. The pivotal role of these cytokines in ASD pathogenesis provides a rationale for potentially repurposing such inhibitory drugs for the treatment of autism spectrum disorders.
{"title":"In silico reconstruction of the gene network for cytokine regulation of ASD-associated genes and proteins.","authors":"N M Levanova, E G Vergunov, A N Savostyanov, I V Yatsyk, V A Ivanisenko","doi":"10.18699/vjgb-25-105","DOIUrl":"https://doi.org/10.18699/vjgb-25-105","url":null,"abstract":"<p><p>Accumulated evidence links dysregulated cytokine signaling to the pathogenesis of autism spectrum disorder (ASD), implicating genes, proteins, and their intermolecular networks. This paper systematizes these findings using bioinformatics analysis and machine learning methods. The primary tool employed in the study was the ANDSystem cognitive platform, developed at the Institute of Cytology and Genetics, which utilizes artificial intelligence techniques for automated knowledge extraction from biomedical databases and scientific publications. Using ANDSystem, we reconstructed a gene network of cytokine-mediated regulation of autism spectrum disorder (ASD)-associated genes and proteins. The analysis identified 110 cytokines that regulate the activity, degradation, and transport of 58 proteins involved in ASD pathogenesis, as well as the expression of 91 ASD-associated genes. Gene Ontology (GO) enrichment analysis revealed statistically significant associations of these genes with biological processes related to the development and function of the central nervous system. Furthermore, topological network analysis and functional significance assessment based on association with ASD-related GO biological processes allowed us to identify 21 cytokines exerting the strongest influence on the regulatory network. Among these, eight cytokines (IL-4, TGF-β1, BMP4, VEGFA, BMP2, IL-10, IFN-γ, TNF-α) had the highest priority, ranking at the top across all employed metrics. Notably, eight of the 21 prioritized cytokines (TNF-α, IL-6, IL-4, VEGFA, IL-2, IL-1β, IFN-γ, IL-17) are known targets of drugs currently used as immunosuppressants and antitumor agents. The pivotal role of these cytokines in ASD pathogenesis provides a rationale for potentially repurposing such inhibitory drugs for the treatment of autism spectrum disorders.</p>","PeriodicalId":44339,"journal":{"name":"Vavilovskii Zhurnal Genetiki i Selektsii","volume":"29 7","pages":"1000-1008"},"PeriodicalIF":1.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12795833/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E A Antropova, I V Yatsyk, P S Demenkov, T V Ivanisenko, V A Ivanisenko
Macrophages are immune system cells that perform various, often opposing, functions in the organism depending on the incoming microenvironment signals. This is possible due to the plasticity of macrophages, which allows them to radically alter their phenotypic characteristics and gene expression profiles, as well as return to their original, non-activated state. Depending on the inductors acting on the cell, macrophages are activated into various functional states. There are five main phenotypes of activated macrophages: M1, M2a, M2b, M2c, and M2d. Although the amount of genome-wide transcriptomic and proteomic data showing differences between major macrophage phenotypes and non-activated macrophages (M0) is rapidly growing, questions regarding the mechanisms regulating gene and protein expression profiles in macrophages of different phenotypes still remain. We compiled lists of proteins associated with the macrophage phenotypes M1, M2a, M2b, M2c, and M2d (phenotype-associated proteins) and analyzed the data on potential mediators of macrophage polarization. Furthermore, using the computational system ANDSystem, we conducted a search and analysis of the relationships between potential regulatory proteins and the genes encoding the proteins associated with the M2 group phenotypes, obtaining estimates of the statistical significance of these relationships. The results indicate that the differences in the M2a, M2b, M2c, and M2d macrophage phenotypes may be attributed to the regulatory effects of the proteins JUN, IL8, NFAC2, CCND1, and YAP1. The expression levels of these proteins vary among the M2 group phenotypes, which in turn leads to different levels of gene expression associated with specific phenotypes.
{"title":"Identification of proteins regulating phenotype-associated genes of M2 macrophages: a bioinformatic analysis.","authors":"E A Antropova, I V Yatsyk, P S Demenkov, T V Ivanisenko, V A Ivanisenko","doi":"10.18699/vjgb-25-104","DOIUrl":"https://doi.org/10.18699/vjgb-25-104","url":null,"abstract":"<p><p>Macrophages are immune system cells that perform various, often opposing, functions in the organism depending on the incoming microenvironment signals. This is possible due to the plasticity of macrophages, which allows them to radically alter their phenotypic characteristics and gene expression profiles, as well as return to their original, non-activated state. Depending on the inductors acting on the cell, macrophages are activated into various functional states. There are five main phenotypes of activated macrophages: M1, M2a, M2b, M2c, and M2d. Although the amount of genome-wide transcriptomic and proteomic data showing differences between major macrophage phenotypes and non-activated macrophages (M0) is rapidly growing, questions regarding the mechanisms regulating gene and protein expression profiles in macrophages of different phenotypes still remain. We compiled lists of proteins associated with the macrophage phenotypes M1, M2a, M2b, M2c, and M2d (phenotype-associated proteins) and analyzed the data on potential mediators of macrophage polarization. Furthermore, using the computational system ANDSystem, we conducted a search and analysis of the relationships between potential regulatory proteins and the genes encoding the proteins associated with the M2 group phenotypes, obtaining estimates of the statistical significance of these relationships. The results indicate that the differences in the M2a, M2b, M2c, and M2d macrophage phenotypes may be attributed to the regulatory effects of the proteins JUN, IL8, NFAC2, CCND1, and YAP1. The expression levels of these proteins vary among the M2 group phenotypes, which in turn leads to different levels of gene expression associated with specific phenotypes.</p>","PeriodicalId":44339,"journal":{"name":"Vavilovskii Zhurnal Genetiki i Selektsii","volume":"29 7","pages":"990-999"},"PeriodicalIF":1.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12800646/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, the rapid growth of sequencing data has exacerbated the problem of functional annotation of protein sequences, as traditional homology-based methods face limitations when working with distant homologs, making it difficult to accurately determine protein functions. This paper introduces the OrthoML2GO method for protein function prediction, which integrates homology searches using the USEARCH algorithm, orthogroup analysis based on OrthoDB version 12.0, and a machine learning algorithm (gradient boosting). A key feature of our approach is the use of orthogroup information to account for the evolutionary and functional similarity of proteins and the application of machine learning to refine the assigned GO terms for the target sequence. To select the optimal algorithm for protein annotation, the following approaches were applied sequentially: the k-nearest neighbors (KNN) method; a method based on the annotation of the orthogroup most represented in the k-nearest homologs (OG); a method of verifying the GO terms identified in the previous stage using machine learning algorithms. A comparison of the prediction accuracy of GO terms using the OrthoML2GO method with the Blast2GO and PANNZER2 annotation programs was performed on sequence samples from both individual organisms (humans, Arabidopsis) and a combined sample represented by different taxa. Our results demonstrate that the proposed method is comparable to, and by some evaluation metrics outperforms, these existing methods in terms of the quality of protein function prediction, especially on large and heterogeneous samples of organisms. The greatest performance improvement is achieved by combining information about the closest homologs and orthogroups with verification of terms using machine learning methods. Our approach demonstrates high performance for large-scale automatic protein annotation, and prospects for further development include optimizing machine learning model parameters for specific biological tasks and integrating additional sources of structural and functional information, which will further improve the method's accuracy and versatility. In addition, the introduction of new bioinformatics tools and the expansion of the annotated protein database will contribute to the further improvement of the proposed approach.
近年来,测序数据的快速增长加剧了蛋白质序列的功能标注问题,传统的基于同源性的方法在处理远同源物时存在局限性,难以准确确定蛋白质的功能。本文介绍了用于蛋白质功能预测的OrthoML2GO方法,该方法集成了使用USEARCH算法的同源性搜索、基于OrthoDB version 12.0的正交群分析和机器学习算法(梯度增强)。我们方法的一个关键特征是使用正群信息来解释蛋白质的进化和功能相似性,并应用机器学习来优化目标序列的GO术语。为了选择最优的蛋白质注释算法,我们依次采用了以下几种方法:k近邻(KNN)方法;基于k近邻同系物(OG)中最具代表性的正群注释的方法;一种使用机器学习算法验证在前一阶段识别的GO术语的方法。利用OrthoML2GO方法与Blast2GO和PANNZER2注释程序对来自个体生物(人类、拟南芥)和不同分类群代表的组合样本的序列样本进行了GO项预测精度的比较。我们的研究结果表明,就蛋白质功能预测的质量而言,所提出的方法与这些现有方法相当,并且通过一些评估指标优于这些方法,特别是在大型和异质生物体样本上。最大的性能改进是通过使用机器学习方法将关于最接近的同系词和正群的信息与术语验证相结合来实现的。我们的方法证明了大规模自动蛋白质注释的高性能,进一步发展的前景包括优化特定生物任务的机器学习模型参数,整合额外的结构和功能信息源,这将进一步提高方法的准确性和通用性。此外,新的生物信息学工具的引入和注释蛋白数据库的扩展将有助于进一步改进所提出的方法。
{"title":"OrthoML2GO: homology-based protein function prediction using orthogroups and machine learning.","authors":"E V Malyugin, D A Afonnikov","doi":"10.18699/vjgb-25-119","DOIUrl":"https://doi.org/10.18699/vjgb-25-119","url":null,"abstract":"<p><p>In recent years, the rapid growth of sequencing data has exacerbated the problem of functional annotation of protein sequences, as traditional homology-based methods face limitations when working with distant homologs, making it difficult to accurately determine protein functions. This paper introduces the OrthoML2GO method for protein function prediction, which integrates homology searches using the USEARCH algorithm, orthogroup analysis based on OrthoDB version 12.0, and a machine learning algorithm (gradient boosting). A key feature of our approach is the use of orthogroup information to account for the evolutionary and functional similarity of proteins and the application of machine learning to refine the assigned GO terms for the target sequence. To select the optimal algorithm for protein annotation, the following approaches were applied sequentially: the k-nearest neighbors (KNN) method; a method based on the annotation of the orthogroup most represented in the k-nearest homologs (OG); a method of verifying the GO terms identified in the previous stage using machine learning algorithms. A comparison of the prediction accuracy of GO terms using the OrthoML2GO method with the Blast2GO and PANNZER2 annotation programs was performed on sequence samples from both individual organisms (humans, Arabidopsis) and a combined sample represented by different taxa. Our results demonstrate that the proposed method is comparable to, and by some evaluation metrics outperforms, these existing methods in terms of the quality of protein function prediction, especially on large and heterogeneous samples of organisms. The greatest performance improvement is achieved by combining information about the closest homologs and orthogroups with verification of terms using machine learning methods. Our approach demonstrates high performance for large-scale automatic protein annotation, and prospects for further development include optimizing machine learning model parameters for specific biological tasks and integrating additional sources of structural and functional information, which will further improve the method's accuracy and versatility. In addition, the introduction of new bioinformatics tools and the expansion of the annotated protein database will contribute to the further improvement of the proposed approach.</p>","PeriodicalId":44339,"journal":{"name":"Vavilovskii Zhurnal Genetiki i Selektsii","volume":"29 7","pages":"1145-1154"},"PeriodicalIF":1.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12799360/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One of the main goals of modern evolutionary biology is to understand the mechanisms that lead to the initial differentiation (primary divergence) of populations into groups with genetic traits. This divergence requires reproductive isolation, which prevents or hinders contact and the exchange of genetic material between populations. This study explores the potential for isolation based not on obvious geographical barriers, population distance, or ecological specialization, but rather on hereditary mechanisms, such as gene drift and flow and selection against heterozygous individuals. To this end, we propose and investigate a dynamic discrete-time model that describes the dynamics of frequencies and numbers in a system of limited populations coupled by migrations. We consider a panmictic population with Mendelian inheritance rules, one-locus selection, and density-dependent factors limiting population growth. Individuals freely mate and randomly move around a one-dimensional ring-shaped habitat. The model was verified using data from an experiment on the box population system of Drosophila melanogaster performed by Yu.P. Altukhov et al. With rather simple assumptions, the model explains some mechanisms for the emergence and preservation of significant genetic differences between subpopulations (primary genetic divergence), accompanied by heterogeneity in allele frequencies and abundances within a homogeneous area. In this scenario, several large groups of genetically homogeneous subpopulations form and independently develop. Hybridization occurs at contact sites, and polymorphism is maintained through migration from genetically homogeneous nearby sites. It was found that only disruptive selection, directed against heterozygous individuals, can sustainably maintain such a spatial distribution. Under directional selection, divergence may occur for a short time as part of the transitional evolutionary process towards the best-adapted genotype. Because of the reduced adaptability of heterozygous (hybrid) individuals and low growth rates in these sites (hybrid zones), gene flow between adjacent sites with opposite genotypes (phenotypes) is significantly impeded. As a result, the hybrid zones can become effective geographical barriers that prevent the genetic flow between coupled subpopulations.
{"title":"Computer modeling of spatial dynamics and primary genetic divergence for a population system in a ring areal.","authors":"M P Kulakov, O L Zhdanova, E Ya Frisman","doi":"10.18699/vjgb-25-115","DOIUrl":"https://doi.org/10.18699/vjgb-25-115","url":null,"abstract":"<p><p>One of the main goals of modern evolutionary biology is to understand the mechanisms that lead to the initial differentiation (primary divergence) of populations into groups with genetic traits. This divergence requires reproductive isolation, which prevents or hinders contact and the exchange of genetic material between populations. This study explores the potential for isolation based not on obvious geographical barriers, population distance, or ecological specialization, but rather on hereditary mechanisms, such as gene drift and flow and selection against heterozygous individuals. To this end, we propose and investigate a dynamic discrete-time model that describes the dynamics of frequencies and numbers in a system of limited populations coupled by migrations. We consider a panmictic population with Mendelian inheritance rules, one-locus selection, and density-dependent factors limiting population growth. Individuals freely mate and randomly move around a one-dimensional ring-shaped habitat. The model was verified using data from an experiment on the box population system of Drosophila melanogaster performed by Yu.P. Altukhov et al. With rather simple assumptions, the model explains some mechanisms for the emergence and preservation of significant genetic differences between subpopulations (primary genetic divergence), accompanied by heterogeneity in allele frequencies and abundances within a homogeneous area. In this scenario, several large groups of genetically homogeneous subpopulations form and independently develop. Hybridization occurs at contact sites, and polymorphism is maintained through migration from genetically homogeneous nearby sites. It was found that only disruptive selection, directed against heterozygous individuals, can sustainably maintain such a spatial distribution. Under directional selection, divergence may occur for a short time as part of the transitional evolutionary process towards the best-adapted genotype. Because of the reduced adaptability of heterozygous (hybrid) individuals and low growth rates in these sites (hybrid zones), gene flow between adjacent sites with opposite genotypes (phenotypes) is significantly impeded. As a result, the hybrid zones can become effective geographical barriers that prevent the genetic flow between coupled subpopulations.</p>","PeriodicalId":44339,"journal":{"name":"Vavilovskii Zhurnal Genetiki i Selektsii","volume":"29 7","pages":"1109-1121"},"PeriodicalIF":1.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12799358/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vision plays a key role in the lives of various organisms, enabling spatial orientation, foraging, predator avoidance and social interaction. In species with relatively simple visual systems, such as insects, effective behavioral strategies are achieved through high neural specialization, adaptation to specific environmental conditions, and the use of additional sensory systems such as olfaction or hearing. Animals with more complex vision and nervous systems, such as mammals, have greater cognitive abilities and flexibility, but this comes with increased demands on the brain's energy costs and computational resources. Modeling the features of such systems in a virtual environment could allow researchers to explore the fundamental principles of sensorimotor integration and the limits of cognitive complexity, as well as test hypotheses about the interaction between perception, memory and decision-making mechanisms. In this work, we implement and investigate a model of virtual organisms with a visual system operating in a three-dimensional physical environment using the Unity ML-Agents software - one of the most high-performance simulation platforms currently available. We propose a hierarchical control architecture that separates locomotion and navigation tasks between two modules: (1) visual perception and decision-making, and (2) coordinated control of limb movement for locomotion in the physical environment. A series of numerical experiments was conducted to examine the influence of visual system parameters (e. g, resolution of the "first-person" view), environmental configuration and agent architectural features on the efficiency and outcomes of reinforcement learning (using the PPO algorithm). The results demonstrate the existence of an optimal range of resolutions that provide a trade-off between computational complexity and success in accomplishing the task, while excessive dimensionality of sensory inputs or action space leads to slower learning. We performed system performance profiling and identified key bottlenecks in large-scale simulations. The discussion considers biological parallels, highlighting cases of high behavioral efficiency in insects with relatively low-resolution visual systems, and the potential of neuroevolutionary approaches for adapting agent architectures. The proposed approach and the results obtained are of potential interest to researchers working on biologically inspired artificial agents, evolutionary modeling, and the study of cognitive processes in artificial systems.
{"title":"Self-learning virtual organisms in a physics simulator: on the optimal resolution of their visual system, the architecture of the nervous system and the computational complexity of the problem.","authors":"M S Zenin, A P Devyaterikov, A Yu Palyanov","doi":"10.18699/vjgb-25-110","DOIUrl":"https://doi.org/10.18699/vjgb-25-110","url":null,"abstract":"<p><p>Vision plays a key role in the lives of various organisms, enabling spatial orientation, foraging, predator avoidance and social interaction. In species with relatively simple visual systems, such as insects, effective behavioral strategies are achieved through high neural specialization, adaptation to specific environmental conditions, and the use of additional sensory systems such as olfaction or hearing. Animals with more complex vision and nervous systems, such as mammals, have greater cognitive abilities and flexibility, but this comes with increased demands on the brain's energy costs and computational resources. Modeling the features of such systems in a virtual environment could allow researchers to explore the fundamental principles of sensorimotor integration and the limits of cognitive complexity, as well as test hypotheses about the interaction between perception, memory and decision-making mechanisms. In this work, we implement and investigate a model of virtual organisms with a visual system operating in a three-dimensional physical environment using the Unity ML-Agents software - one of the most high-performance simulation platforms currently available. We propose a hierarchical control architecture that separates locomotion and navigation tasks between two modules: (1) visual perception and decision-making, and (2) coordinated control of limb movement for locomotion in the physical environment. A series of numerical experiments was conducted to examine the influence of visual system parameters (e. g, resolution of the \"first-person\" view), environmental configuration and agent architectural features on the efficiency and outcomes of reinforcement learning (using the PPO algorithm). The results demonstrate the existence of an optimal range of resolutions that provide a trade-off between computational complexity and success in accomplishing the task, while excessive dimensionality of sensory inputs or action space leads to slower learning. We performed system performance profiling and identified key bottlenecks in large-scale simulations. The discussion considers biological parallels, highlighting cases of high behavioral efficiency in insects with relatively low-resolution visual systems, and the potential of neuroevolutionary approaches for adapting agent architectures. The proposed approach and the results obtained are of potential interest to researchers working on biologically inspired artificial agents, evolutionary modeling, and the study of cognitive processes in artificial systems.</p>","PeriodicalId":44339,"journal":{"name":"Vavilovskii Zhurnal Genetiki i Selektsii","volume":"29 7","pages":"1051-1061"},"PeriodicalIF":1.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12795856/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145970907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
De novo motif search is the main approach for determining the nucleotide specificity of binding of the key regulators of gene transcription, transcription factors (TFs), based on data from massive genome-wide sequencing of their binding site regions in vivo, such as ChIP-seq. The number of motifs of known TF binding sites (TFBSs) has increased several times in recent years. Due to the similarity in the structure of the DNA-binding domains of TFs, many structurally cognate TFs have similar and sometimes almost indistinguishable binding site motifs. The classification of TFs by the structure of the DNA-binding domains from the TFClass database defines the top levels of the hierarchy (superclasses and classes of TFs) by the structure of these domains, and the next levels (families and subfamilies of TFs) by the alignments of amino acid sequences of domains. However, this classification does not take into account the similarity of TFBS motifs, whereas identification of valid TFs from massive sequencing data of TFBSs, such as ChIP- seq, requires working with TFBS motifs rather than TFs themselves. Therefore, in this study we extracted from the Hocomoco and Jaspar databases the TFBS motifs for human and fruit fly Drosophila melanogaster, and considered the pairwise similarity of binding site motifs of cognate TFs according to their classification from the TFClass database. We have shown that the common tree of the TF hierarchy by the structure of DNA-binding domains can be split into separate branches representing non-overlapping sets of TFs. Within each branch, the majority of TF pairs have significantly similar binding site motifs. Each branch can include one or more sister elementary units of the hierarchy and all its/their lower levels: one or more TFs of the same subfamily, or the whole subfamily, one or several subfamilies of the same family, an entire family, etc., up to the entire class. Analysis of the seven largest human and two largest Drosophila TF classes showed that the similarity of TFs in terms of TFBS motifs for different corresponding levels (classes, families) is noticeably different. Supplementing the hierarchical classification of TFs with branches combining significantly similar motifs of TFBSs can increase the efficiency of identifying involved TFs through enriched motifs detected by de novo motif search for massive sequencing data of TFBSs from the ChIP-seq technology.
De novo motif search是确定基因转录关键调控因子转录因子(transcription factors, TFs)结合核苷酸特异性的主要方法,它基于大量体内转录因子结合位点区域的全基因组测序数据,如ChIP-seq。近年来,已知TF结合位点(TFBSs)的基序数量增加了几倍。由于tf的dna结合域结构相似,许多结构同源的tf具有相似的,有时几乎无法区分的结合位点基序。根据TFClass数据库中dna结合结构域的结构对tf进行分类,根据这些结构域的结构定义了tf的上层(超类和类),根据结构域的氨基酸序列比对定义了tf的下一级(家族和亚家族)。然而,这种分类没有考虑到TFBS基序的相似性,而从大量的TFBS测序数据(如ChIP- seq)中识别有效的TFBS,需要使用TFBS基序而不是tffs本身。因此,本研究从Hocomoco和jasar数据库中提取了人类和果蝇的TFBS基序,并根据TFClass数据库中的分类考虑同源tf结合位点基序的两两相似性。我们已经证明,由dna结合域结构构成的TF层次结构的共同树可以分裂成代表非重叠TF集合的单独分支。在每个分支中,大多数TF对具有显著相似的结合位点基序。每个分支可以包括层次结构的一个或多个姐妹基本单位及其所有较低的层次:同一亚族的一个或多个tf,或整个亚族,同一家族的一个或几个亚族,整个家族,等等,直到整个类。对7个最大的人类TF类和2个最大的果蝇TF类的分析表明,不同相应水平(类、科)的TF在TFBS基序方面的相似性有显著差异。利用ChIP-seq技术对大量的TFBSs测序数据进行从头基序搜索,检测到丰富的基序,通过结合显著相似基序的分支来补充tffs的分层分类,可以提高识别相关tf的效率。
{"title":"Linking hierarchical classification of transcription factors by the structure of their DNA-binding domains to the variability of their binding site motifs.","authors":"V G Levitsky, T Yu Vatolina, V V Raditsa","doi":"10.18699/vjgb-25-99","DOIUrl":"https://doi.org/10.18699/vjgb-25-99","url":null,"abstract":"<p><p>De novo motif search is the main approach for determining the nucleotide specificity of binding of the key regulators of gene transcription, transcription factors (TFs), based on data from massive genome-wide sequencing of their binding site regions in vivo, such as ChIP-seq. The number of motifs of known TF binding sites (TFBSs) has increased several times in recent years. Due to the similarity in the structure of the DNA-binding domains of TFs, many structurally cognate TFs have similar and sometimes almost indistinguishable binding site motifs. The classification of TFs by the structure of the DNA-binding domains from the TFClass database defines the top levels of the hierarchy (superclasses and classes of TFs) by the structure of these domains, and the next levels (families and subfamilies of TFs) by the alignments of amino acid sequences of domains. However, this classification does not take into account the similarity of TFBS motifs, whereas identification of valid TFs from massive sequencing data of TFBSs, such as ChIP- seq, requires working with TFBS motifs rather than TFs themselves. Therefore, in this study we extracted from the Hocomoco and Jaspar databases the TFBS motifs for human and fruit fly Drosophila melanogaster, and considered the pairwise similarity of binding site motifs of cognate TFs according to their classification from the TFClass database. We have shown that the common tree of the TF hierarchy by the structure of DNA-binding domains can be split into separate branches representing non-overlapping sets of TFs. Within each branch, the majority of TF pairs have significantly similar binding site motifs. Each branch can include one or more sister elementary units of the hierarchy and all its/their lower levels: one or more TFs of the same subfamily, or the whole subfamily, one or several subfamilies of the same family, an entire family, etc., up to the entire class. Analysis of the seven largest human and two largest Drosophila TF classes showed that the similarity of TFs in terms of TFBS motifs for different corresponding levels (classes, families) is noticeably different. Supplementing the hierarchical classification of TFs with branches combining significantly similar motifs of TFBSs can increase the efficiency of identifying involved TFs through enriched motifs detected by de novo motif search for massive sequencing data of TFBSs from the ChIP-seq technology.</p>","PeriodicalId":44339,"journal":{"name":"Vavilovskii Zhurnal Genetiki i Selektsii","volume":"29 7","pages":"925-939"},"PeriodicalIF":1.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12795858/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T V Ivanisenko, P S Demenkov, M A Kleshchev, V A Ivanisenko
In recent years, artificial intelligence methods based on the analysis of heterogeneous graphs of biomedical networks have become widely used for predicting molecular interactions. In particular, graph neural networks (GNNs) effectively identify missing edges in gene networks - such as protein-protein interaction, gene-disease, drug-target, and other networks - thereby enabling the prediction of new biological relationships. To reconstruct gene networks, cognitive systems for automatic text mining of scientific publications and databases are often employed. One such AI-driven platform, ANDSystem, is designed for automatic knowledge extraction of molecular interactions and, on this basis, the reconstruction of associative gene networks. The ANDSystem knowledge base contains information on more than 100 million interactions among diverse molecular genetic entities (genes, proteins, metabolites, drugs, etc.). The interactions span a wide range of types: regulatory relationships, physical interactions (protein-protein, protein-ligand), catalytic and chemical reactions, and associations among genes, phenotypes, diseases, and more. In the present study, we applied attention-based graph neural networks trained on the ANDSystem knowledge graph to predict new edges between proteins and ligands and to identify potential ligands for the SARS-CoV-2 ORF3a protein. The accessory protein ORF3a plays an important role in viral pathogenesis through ion-channel activity, induction of apoptosis, and the ability to modulate endolysosomal processes and the host innate immune response. Despite this broad functional spectrum, ORF3a has been explored far less as a pharmacological target than other viral proteins. Using a graph neural network, we predicted five small molecules of different origins (metabolites and a drug) that potentially interact with ORF3a: N-acetyl-D-glucosamine, 4-(benzoylamino)benzoic acid, austocystin D, bictegravirum, and L-threonine. Molecular docking and MM/GBSA affinity estimation indicate the potential ability of these compounds to form complexes with ORF3a. Localization analysis showed that the binding sites of bictegravir and 4-(benzoylamino)benzoic acid lie in a cytosolic surface pocket of the protein that is solvent-exposed; L-threonine binds within the intersubunit cleft of the dimer; and austocystin D and N-acetyl-D-glucosamine are positioned at the boundary between the cytosolic surface and the transmembrane region. The accessibility of these binding sites may be reduced by the influence of the lipid bilayer. The binding energetics for bictegravirum were more favorable than for 4-(benzoylamino)benzoic acid (docking score -7.37 kcal/mol; MM/GBSA ΔG -14.71 ± 3.12 kcal/mol), making bictegravirum a promising candidate for repurposing as an ORF3a inhibitor.
{"title":"Prediction of interactions between the SARS-CoV-2 ORF3a protein and small-molecule ligands using the ANDSystem cognitive platform, graph neural networks, and molecular modeling.","authors":"T V Ivanisenko, P S Demenkov, M A Kleshchev, V A Ivanisenko","doi":"10.18699/vjgb-25-113","DOIUrl":"https://doi.org/10.18699/vjgb-25-113","url":null,"abstract":"<p><p>In recent years, artificial intelligence methods based on the analysis of heterogeneous graphs of biomedical networks have become widely used for predicting molecular interactions. In particular, graph neural networks (GNNs) effectively identify missing edges in gene networks - such as protein-protein interaction, gene-disease, drug-target, and other networks - thereby enabling the prediction of new biological relationships. To reconstruct gene networks, cognitive systems for automatic text mining of scientific publications and databases are often employed. One such AI-driven platform, ANDSystem, is designed for automatic knowledge extraction of molecular interactions and, on this basis, the reconstruction of associative gene networks. The ANDSystem knowledge base contains information on more than 100 million interactions among diverse molecular genetic entities (genes, proteins, metabolites, drugs, etc.). The interactions span a wide range of types: regulatory relationships, physical interactions (protein-protein, protein-ligand), catalytic and chemical reactions, and associations among genes, phenotypes, diseases, and more. In the present study, we applied attention-based graph neural networks trained on the ANDSystem knowledge graph to predict new edges between proteins and ligands and to identify potential ligands for the SARS-CoV-2 ORF3a protein. The accessory protein ORF3a plays an important role in viral pathogenesis through ion-channel activity, induction of apoptosis, and the ability to modulate endolysosomal processes and the host innate immune response. Despite this broad functional spectrum, ORF3a has been explored far less as a pharmacological target than other viral proteins. Using a graph neural network, we predicted five small molecules of different origins (metabolites and a drug) that potentially interact with ORF3a: N-acetyl-D-glucosamine, 4-(benzoylamino)benzoic acid, austocystin D, bictegravirum, and L-threonine. Molecular docking and MM/GBSA affinity estimation indicate the potential ability of these compounds to form complexes with ORF3a. Localization analysis showed that the binding sites of bictegravir and 4-(benzoylamino)benzoic acid lie in a cytosolic surface pocket of the protein that is solvent-exposed; L-threonine binds within the intersubunit cleft of the dimer; and austocystin D and N-acetyl-D-glucosamine are positioned at the boundary between the cytosolic surface and the transmembrane region. The accessibility of these binding sites may be reduced by the influence of the lipid bilayer. The binding energetics for bictegravirum were more favorable than for 4-(benzoylamino)benzoic acid (docking score -7.37 kcal/mol; MM/GBSA ΔG -14.71 ± 3.12 kcal/mol), making bictegravirum a promising candidate for repurposing as an ORF3a inhibitor.</p>","PeriodicalId":44339,"journal":{"name":"Vavilovskii Zhurnal Genetiki i Selektsii","volume":"29 7","pages":"1084-1096"},"PeriodicalIF":1.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12799363/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}