E A Antropova, I V Yatsyk, P S Demenkov, T V Ivanisenko, V A Ivanisenko
Macrophages are immune system cells that perform various, often opposing, functions in the organism depending on the incoming microenvironment signals. This is possible due to the plasticity of macrophages, which allows them to radically alter their phenotypic characteristics and gene expression profiles, as well as return to their original, non-activated state. Depending on the inductors acting on the cell, macrophages are activated into various functional states. There are five main phenotypes of activated macrophages: M1, M2a, M2b, M2c, and M2d. Although the amount of genome-wide transcriptomic and proteomic data showing differences between major macrophage phenotypes and non-activated macrophages (M0) is rapidly growing, questions regarding the mechanisms regulating gene and protein expression profiles in macrophages of different phenotypes still remain. We compiled lists of proteins associated with the macrophage phenotypes M1, M2a, M2b, M2c, and M2d (phenotype-associated proteins) and analyzed the data on potential mediators of macrophage polarization. Furthermore, using the computational system ANDSystem, we conducted a search and analysis of the relationships between potential regulatory proteins and the genes encoding the proteins associated with the M2 group phenotypes, obtaining estimates of the statistical significance of these relationships. The results indicate that the differences in the M2a, M2b, M2c, and M2d macrophage phenotypes may be attributed to the regulatory effects of the proteins JUN, IL8, NFAC2, CCND1, and YAP1. The expression levels of these proteins vary among the M2 group phenotypes, which in turn leads to different levels of gene expression associated with specific phenotypes.
{"title":"Identification of proteins regulating phenotype-associated genes of M2 macrophages: a bioinformatic analysis.","authors":"E A Antropova, I V Yatsyk, P S Demenkov, T V Ivanisenko, V A Ivanisenko","doi":"10.18699/vjgb-25-104","DOIUrl":"https://doi.org/10.18699/vjgb-25-104","url":null,"abstract":"<p><p>Macrophages are immune system cells that perform various, often opposing, functions in the organism depending on the incoming microenvironment signals. This is possible due to the plasticity of macrophages, which allows them to radically alter their phenotypic characteristics and gene expression profiles, as well as return to their original, non-activated state. Depending on the inductors acting on the cell, macrophages are activated into various functional states. There are five main phenotypes of activated macrophages: M1, M2a, M2b, M2c, and M2d. Although the amount of genome-wide transcriptomic and proteomic data showing differences between major macrophage phenotypes and non-activated macrophages (M0) is rapidly growing, questions regarding the mechanisms regulating gene and protein expression profiles in macrophages of different phenotypes still remain. We compiled lists of proteins associated with the macrophage phenotypes M1, M2a, M2b, M2c, and M2d (phenotype-associated proteins) and analyzed the data on potential mediators of macrophage polarization. Furthermore, using the computational system ANDSystem, we conducted a search and analysis of the relationships between potential regulatory proteins and the genes encoding the proteins associated with the M2 group phenotypes, obtaining estimates of the statistical significance of these relationships. The results indicate that the differences in the M2a, M2b, M2c, and M2d macrophage phenotypes may be attributed to the regulatory effects of the proteins JUN, IL8, NFAC2, CCND1, and YAP1. The expression levels of these proteins vary among the M2 group phenotypes, which in turn leads to different levels of gene expression associated with specific phenotypes.</p>","PeriodicalId":44339,"journal":{"name":"Vavilovskii Zhurnal Genetiki i Selektsii","volume":"29 7","pages":"990-999"},"PeriodicalIF":1.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12800646/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, the rapid growth of sequencing data has exacerbated the problem of functional annotation of protein sequences, as traditional homology-based methods face limitations when working with distant homologs, making it difficult to accurately determine protein functions. This paper introduces the OrthoML2GO method for protein function prediction, which integrates homology searches using the USEARCH algorithm, orthogroup analysis based on OrthoDB version 12.0, and a machine learning algorithm (gradient boosting). A key feature of our approach is the use of orthogroup information to account for the evolutionary and functional similarity of proteins and the application of machine learning to refine the assigned GO terms for the target sequence. To select the optimal algorithm for protein annotation, the following approaches were applied sequentially: the k-nearest neighbors (KNN) method; a method based on the annotation of the orthogroup most represented in the k-nearest homologs (OG); a method of verifying the GO terms identified in the previous stage using machine learning algorithms. A comparison of the prediction accuracy of GO terms using the OrthoML2GO method with the Blast2GO and PANNZER2 annotation programs was performed on sequence samples from both individual organisms (humans, Arabidopsis) and a combined sample represented by different taxa. Our results demonstrate that the proposed method is comparable to, and by some evaluation metrics outperforms, these existing methods in terms of the quality of protein function prediction, especially on large and heterogeneous samples of organisms. The greatest performance improvement is achieved by combining information about the closest homologs and orthogroups with verification of terms using machine learning methods. Our approach demonstrates high performance for large-scale automatic protein annotation, and prospects for further development include optimizing machine learning model parameters for specific biological tasks and integrating additional sources of structural and functional information, which will further improve the method's accuracy and versatility. In addition, the introduction of new bioinformatics tools and the expansion of the annotated protein database will contribute to the further improvement of the proposed approach.
近年来,测序数据的快速增长加剧了蛋白质序列的功能标注问题,传统的基于同源性的方法在处理远同源物时存在局限性,难以准确确定蛋白质的功能。本文介绍了用于蛋白质功能预测的OrthoML2GO方法,该方法集成了使用USEARCH算法的同源性搜索、基于OrthoDB version 12.0的正交群分析和机器学习算法(梯度增强)。我们方法的一个关键特征是使用正群信息来解释蛋白质的进化和功能相似性,并应用机器学习来优化目标序列的GO术语。为了选择最优的蛋白质注释算法,我们依次采用了以下几种方法:k近邻(KNN)方法;基于k近邻同系物(OG)中最具代表性的正群注释的方法;一种使用机器学习算法验证在前一阶段识别的GO术语的方法。利用OrthoML2GO方法与Blast2GO和PANNZER2注释程序对来自个体生物(人类、拟南芥)和不同分类群代表的组合样本的序列样本进行了GO项预测精度的比较。我们的研究结果表明,就蛋白质功能预测的质量而言,所提出的方法与这些现有方法相当,并且通过一些评估指标优于这些方法,特别是在大型和异质生物体样本上。最大的性能改进是通过使用机器学习方法将关于最接近的同系词和正群的信息与术语验证相结合来实现的。我们的方法证明了大规模自动蛋白质注释的高性能,进一步发展的前景包括优化特定生物任务的机器学习模型参数,整合额外的结构和功能信息源,这将进一步提高方法的准确性和通用性。此外,新的生物信息学工具的引入和注释蛋白数据库的扩展将有助于进一步改进所提出的方法。
{"title":"OrthoML2GO: homology-based protein function prediction using orthogroups and machine learning.","authors":"E V Malyugin, D A Afonnikov","doi":"10.18699/vjgb-25-119","DOIUrl":"https://doi.org/10.18699/vjgb-25-119","url":null,"abstract":"<p><p>In recent years, the rapid growth of sequencing data has exacerbated the problem of functional annotation of protein sequences, as traditional homology-based methods face limitations when working with distant homologs, making it difficult to accurately determine protein functions. This paper introduces the OrthoML2GO method for protein function prediction, which integrates homology searches using the USEARCH algorithm, orthogroup analysis based on OrthoDB version 12.0, and a machine learning algorithm (gradient boosting). A key feature of our approach is the use of orthogroup information to account for the evolutionary and functional similarity of proteins and the application of machine learning to refine the assigned GO terms for the target sequence. To select the optimal algorithm for protein annotation, the following approaches were applied sequentially: the k-nearest neighbors (KNN) method; a method based on the annotation of the orthogroup most represented in the k-nearest homologs (OG); a method of verifying the GO terms identified in the previous stage using machine learning algorithms. A comparison of the prediction accuracy of GO terms using the OrthoML2GO method with the Blast2GO and PANNZER2 annotation programs was performed on sequence samples from both individual organisms (humans, Arabidopsis) and a combined sample represented by different taxa. Our results demonstrate that the proposed method is comparable to, and by some evaluation metrics outperforms, these existing methods in terms of the quality of protein function prediction, especially on large and heterogeneous samples of organisms. The greatest performance improvement is achieved by combining information about the closest homologs and orthogroups with verification of terms using machine learning methods. Our approach demonstrates high performance for large-scale automatic protein annotation, and prospects for further development include optimizing machine learning model parameters for specific biological tasks and integrating additional sources of structural and functional information, which will further improve the method's accuracy and versatility. In addition, the introduction of new bioinformatics tools and the expansion of the annotated protein database will contribute to the further improvement of the proposed approach.</p>","PeriodicalId":44339,"journal":{"name":"Vavilovskii Zhurnal Genetiki i Selektsii","volume":"29 7","pages":"1145-1154"},"PeriodicalIF":1.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12799360/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One of the main goals of modern evolutionary biology is to understand the mechanisms that lead to the initial differentiation (primary divergence) of populations into groups with genetic traits. This divergence requires reproductive isolation, which prevents or hinders contact and the exchange of genetic material between populations. This study explores the potential for isolation based not on obvious geographical barriers, population distance, or ecological specialization, but rather on hereditary mechanisms, such as gene drift and flow and selection against heterozygous individuals. To this end, we propose and investigate a dynamic discrete-time model that describes the dynamics of frequencies and numbers in a system of limited populations coupled by migrations. We consider a panmictic population with Mendelian inheritance rules, one-locus selection, and density-dependent factors limiting population growth. Individuals freely mate and randomly move around a one-dimensional ring-shaped habitat. The model was verified using data from an experiment on the box population system of Drosophila melanogaster performed by Yu.P. Altukhov et al. With rather simple assumptions, the model explains some mechanisms for the emergence and preservation of significant genetic differences between subpopulations (primary genetic divergence), accompanied by heterogeneity in allele frequencies and abundances within a homogeneous area. In this scenario, several large groups of genetically homogeneous subpopulations form and independently develop. Hybridization occurs at contact sites, and polymorphism is maintained through migration from genetically homogeneous nearby sites. It was found that only disruptive selection, directed against heterozygous individuals, can sustainably maintain such a spatial distribution. Under directional selection, divergence may occur for a short time as part of the transitional evolutionary process towards the best-adapted genotype. Because of the reduced adaptability of heterozygous (hybrid) individuals and low growth rates in these sites (hybrid zones), gene flow between adjacent sites with opposite genotypes (phenotypes) is significantly impeded. As a result, the hybrid zones can become effective geographical barriers that prevent the genetic flow between coupled subpopulations.
{"title":"Computer modeling of spatial dynamics and primary genetic divergence for a population system in a ring areal.","authors":"M P Kulakov, O L Zhdanova, E Ya Frisman","doi":"10.18699/vjgb-25-115","DOIUrl":"https://doi.org/10.18699/vjgb-25-115","url":null,"abstract":"<p><p>One of the main goals of modern evolutionary biology is to understand the mechanisms that lead to the initial differentiation (primary divergence) of populations into groups with genetic traits. This divergence requires reproductive isolation, which prevents or hinders contact and the exchange of genetic material between populations. This study explores the potential for isolation based not on obvious geographical barriers, population distance, or ecological specialization, but rather on hereditary mechanisms, such as gene drift and flow and selection against heterozygous individuals. To this end, we propose and investigate a dynamic discrete-time model that describes the dynamics of frequencies and numbers in a system of limited populations coupled by migrations. We consider a panmictic population with Mendelian inheritance rules, one-locus selection, and density-dependent factors limiting population growth. Individuals freely mate and randomly move around a one-dimensional ring-shaped habitat. The model was verified using data from an experiment on the box population system of Drosophila melanogaster performed by Yu.P. Altukhov et al. With rather simple assumptions, the model explains some mechanisms for the emergence and preservation of significant genetic differences between subpopulations (primary genetic divergence), accompanied by heterogeneity in allele frequencies and abundances within a homogeneous area. In this scenario, several large groups of genetically homogeneous subpopulations form and independently develop. Hybridization occurs at contact sites, and polymorphism is maintained through migration from genetically homogeneous nearby sites. It was found that only disruptive selection, directed against heterozygous individuals, can sustainably maintain such a spatial distribution. Under directional selection, divergence may occur for a short time as part of the transitional evolutionary process towards the best-adapted genotype. Because of the reduced adaptability of heterozygous (hybrid) individuals and low growth rates in these sites (hybrid zones), gene flow between adjacent sites with opposite genotypes (phenotypes) is significantly impeded. As a result, the hybrid zones can become effective geographical barriers that prevent the genetic flow between coupled subpopulations.</p>","PeriodicalId":44339,"journal":{"name":"Vavilovskii Zhurnal Genetiki i Selektsii","volume":"29 7","pages":"1109-1121"},"PeriodicalIF":1.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12799358/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Т S Golubeva, V A Cherenko, E A Filipenko, I V Zhirnov, A A Ivanov, A V Kochetov
RNA interference (RNAi) is a powerful tool for gene silencing. It has recently been used to design promising plant protection strategies against pests such as viruses, insects, etc. This generally requires modifying the plant genome to achieve in planta synthesis of the double-stranded RNA (dsRNA), which guides the cellular RNA interference machinery to silence the genes of interest. However, given Russian legislation, the approach in which dsRNA is synthesized by the plant itself remains unavailable for crop protection. The use of exogenously produced dsRNA appears to be a promising alternative, allowing researchers to avoid genetic modification of plants, making it possible to implement potential results in agriculture. Furthermore, exogenous dsRNAs are superior to chemical pesticides (fungicides, insecticides, etc.), which are widely used to control various plant diseases. The dsRNA acts through sequence-specific nucleic acid interactions, making it extremely selective and unlikely to harm off-target organisms. Thus, it seems promising to utilize RNAi technology for agricultural plant protection. In this case, questions arise regarding how to produce the required amounts of pathogen-specific exogenous dsRNA, and which delivery method will be optimal for providing sufficient protection. This work aims to utilize exogenous dsRNA to silence the Nicotiana benthamiana phytoene desaturase gene. Phytoene desaturase is a convenient model gene in gene silencing experiments, as its knockdown results in a distinct phenotypic manifestation, namely, leaf bleaching. The dsRNA synthesis for this work was performed in vivo in Escherichia coli cells, and the chosen delivery method was root treatment through watering, both techniques being as simple and accessible as possible. It is surmised that the proposed approach could be adapted for broader use of RNAi technologies in agricultural crop protection.
{"title":"Silencing of the Nicotiana benthamiana phytoendesaturase gene by root treatment of exogenous dsRNA.","authors":"Т S Golubeva, V A Cherenko, E A Filipenko, I V Zhirnov, A A Ivanov, A V Kochetov","doi":"10.18699/vjgb-25-123","DOIUrl":"https://doi.org/10.18699/vjgb-25-123","url":null,"abstract":"<p><p>RNA interference (RNAi) is a powerful tool for gene silencing. It has recently been used to design promising plant protection strategies against pests such as viruses, insects, etc. This generally requires modifying the plant genome to achieve in planta synthesis of the double-stranded RNA (dsRNA), which guides the cellular RNA interference machinery to silence the genes of interest. However, given Russian legislation, the approach in which dsRNA is synthesized by the plant itself remains unavailable for crop protection. The use of exogenously produced dsRNA appears to be a promising alternative, allowing researchers to avoid genetic modification of plants, making it possible to implement potential results in agriculture. Furthermore, exogenous dsRNAs are superior to chemical pesticides (fungicides, insecticides, etc.), which are widely used to control various plant diseases. The dsRNA acts through sequence-specific nucleic acid interactions, making it extremely selective and unlikely to harm off-target organisms. Thus, it seems promising to utilize RNAi technology for agricultural plant protection. In this case, questions arise regarding how to produce the required amounts of pathogen-specific exogenous dsRNA, and which delivery method will be optimal for providing sufficient protection. This work aims to utilize exogenous dsRNA to silence the Nicotiana benthamiana phytoene desaturase gene. Phytoene desaturase is a convenient model gene in gene silencing experiments, as its knockdown results in a distinct phenotypic manifestation, namely, leaf bleaching. The dsRNA synthesis for this work was performed in vivo in Escherichia coli cells, and the chosen delivery method was root treatment through watering, both techniques being as simple and accessible as possible. It is surmised that the proposed approach could be adapted for broader use of RNAi technologies in agricultural crop protection.</p>","PeriodicalId":44339,"journal":{"name":"Vavilovskii Zhurnal Genetiki i Selektsii","volume":"29 8","pages":"1169-1175"},"PeriodicalIF":1.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12876927/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146144128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper reviews existing approaches for reconstructing frame-based mathematical models of molecular genetic systems from the level of genetic synthesis to models of metabolic networks. A frame-based mathematical model is a model in which the following terms are specified: formal structure, type of mathematical model for a particular biochemical process, reactants and their roles. Typically, such models are generated automatically on the basis of description of biological processes in terms of domain-specific languages. For molecular genetic systems, these languages use constructions familiar to a wide range of biologists in the form of a list of biochemical reactions. They rely on the concepts of elementary subsystems, where complex models are assembled from small block units ("frames"). In this paper, we have shown an example with the generation of a classical repressilator model consisting of three genes that mutually inhibit each other's synthesis. We have given it in three different versions of the graphic standard, its characteristic mathematical interpretation and variants of its numerical calculation. We have shown that even at the level of frame models it is possible to identify qualitatively new behaviour of the model through the introduction of just one gene into the model structure. This change provides a way to control the modes of behaviour of the model through changing the concentrations of reactants. The frame-based approach opens the way to generate models of cells, tissues, organs, organisms and communities through frame-based model generation tools that specify structure, roles of modelled reactants using domain-specific languages and graphical methods of model specification.
{"title":"Frame-based mathematical models - a tool for the study of molecular genetic systems.","authors":"F V Kazantsev, S A Lashin, Yu G Matushkin","doi":"10.18699/vjgb-25-135","DOIUrl":"https://doi.org/10.18699/vjgb-25-135","url":null,"abstract":"<p><p>This paper reviews existing approaches for reconstructing frame-based mathematical models of molecular genetic systems from the level of genetic synthesis to models of metabolic networks. A frame-based mathematical model is a model in which the following terms are specified: formal structure, type of mathematical model for a particular biochemical process, reactants and their roles. Typically, such models are generated automatically on the basis of description of biological processes in terms of domain-specific languages. For molecular genetic systems, these languages use constructions familiar to a wide range of biologists in the form of a list of biochemical reactions. They rely on the concepts of elementary subsystems, where complex models are assembled from small block units (\"frames\"). In this paper, we have shown an example with the generation of a classical repressilator model consisting of three genes that mutually inhibit each other's synthesis. We have given it in three different versions of the graphic standard, its characteristic mathematical interpretation and variants of its numerical calculation. We have shown that even at the level of frame models it is possible to identify qualitatively new behaviour of the model through the introduction of just one gene into the model structure. This change provides a way to control the modes of behaviour of the model through changing the concentrations of reactants. The frame-based approach opens the way to generate models of cells, tissues, organs, organisms and communities through frame-based model generation tools that specify structure, roles of modelled reactants using domain-specific languages and graphical methods of model specification.</p>","PeriodicalId":44339,"journal":{"name":"Vavilovskii Zhurnal Genetiki i Selektsii","volume":"25 8","pages":"1288-1294"},"PeriodicalIF":1.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12876924/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146144143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M F Sanamyan, Sh U Bobokhujayev, Sh S Abdukarimov, J S Uralov, A B Rustamov
The creation of chromosome substitution lines containing one pair of chromosomes from a related species is one method for introgression of alien genetic material. The frequency of substitutions in different chromosomes of the genome varies due to the selective transmission of alien chromosomes through the gametes of hybrids. The use of monosomic lines with identified univalent chromosomes and molecular genetic SSR markers at the seedling stage allowed rapid screening of the identity of the alien chromosome in backcross hybrids, significantly accelerating and facilitating the backcrossing process for the creation of new chromosome substitution cotton lines. As a result of studying the process of transmission of chromosome 2 of the At subgenome of the cotton plant G. barbadense L. during backcrossing of four original monosomic lines of G. hirsutum L. with monosomic backcross hybrids with substitution of chromosome 2 of the At subgenome, the following specific consequences of the introgression of this chromosome were revealed: decreased crossability, setting and germination of hybrid seeds; differences in the frequency and nature of transmission of chromosome 2 of the At subgenome of the cotton plant G. barbadensе; regularity of chromosome behavior in meiosis; a high meiotic index; a significant decrease in pollen fertility in backcross monosomic hybrids BC1F1; specific morphobiological characteristics of monosomic backcrossed plants, such as delayed development of vegetative and generative organs; dwarfism; reduced foliage; and poor budding and flowering during the first year of vegetation. All of these factors negatively impact the study and backcrossing of monosomic hybrids and significantly complicate and delay the creation of chromosome-substituted forms concerning chromosome 2 of the At subgenome of cotton, G. barbadense. These specific changes likely occurred as a result of hybrid genome reorganization and introgression of alien chromatin. Furthermore, the effectiveness of using molecular genetic microsatellite (SSR) markers to monitor backcrossing processes and eliminate genetic material from the Pima 3-79 donor line of G. barbadense for the selection of genotypes with alien chromosome substitutions has been demonstrated.
{"title":"Study of the influence of introgression from chromosome 2 of the At subgenome of cotton Gossypium barbadense L. during backcrossing with the original lines of G. hirsutum L.","authors":"M F Sanamyan, Sh U Bobokhujayev, Sh S Abdukarimov, J S Uralov, A B Rustamov","doi":"10.18699/vjgb-25-125","DOIUrl":"https://doi.org/10.18699/vjgb-25-125","url":null,"abstract":"<p><p>The creation of chromosome substitution lines containing one pair of chromosomes from a related species is one method for introgression of alien genetic material. The frequency of substitutions in different chromosomes of the genome varies due to the selective transmission of alien chromosomes through the gametes of hybrids. The use of monosomic lines with identified univalent chromosomes and molecular genetic SSR markers at the seedling stage allowed rapid screening of the identity of the alien chromosome in backcross hybrids, significantly accelerating and facilitating the backcrossing process for the creation of new chromosome substitution cotton lines. As a result of studying the process of transmission of chromosome 2 of the At subgenome of the cotton plant G. barbadense L. during backcrossing of four original monosomic lines of G. hirsutum L. with monosomic backcross hybrids with substitution of chromosome 2 of the At subgenome, the following specific consequences of the introgression of this chromosome were revealed: decreased crossability, setting and germination of hybrid seeds; differences in the frequency and nature of transmission of chromosome 2 of the At subgenome of the cotton plant G. barbadensе; regularity of chromosome behavior in meiosis; a high meiotic index; a significant decrease in pollen fertility in backcross monosomic hybrids BC1F1; specific morphobiological characteristics of monosomic backcrossed plants, such as delayed development of vegetative and generative organs; dwarfism; reduced foliage; and poor budding and flowering during the first year of vegetation. All of these factors negatively impact the study and backcrossing of monosomic hybrids and significantly complicate and delay the creation of chromosome-substituted forms concerning chromosome 2 of the At subgenome of cotton, G. barbadense. These specific changes likely occurred as a result of hybrid genome reorganization and introgression of alien chromatin. Furthermore, the effectiveness of using molecular genetic microsatellite (SSR) markers to monitor backcrossing processes and eliminate genetic material from the Pima 3-79 donor line of G. barbadense for the selection of genotypes with alien chromosome substitutions has been demonstrated.</p>","PeriodicalId":44339,"journal":{"name":"Vavilovskii Zhurnal Genetiki i Selektsii","volume":"29 8","pages":"1184-1194"},"PeriodicalIF":1.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12876928/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146144171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vision plays a key role in the lives of various organisms, enabling spatial orientation, foraging, predator avoidance and social interaction. In species with relatively simple visual systems, such as insects, effective behavioral strategies are achieved through high neural specialization, adaptation to specific environmental conditions, and the use of additional sensory systems such as olfaction or hearing. Animals with more complex vision and nervous systems, such as mammals, have greater cognitive abilities and flexibility, but this comes with increased demands on the brain's energy costs and computational resources. Modeling the features of such systems in a virtual environment could allow researchers to explore the fundamental principles of sensorimotor integration and the limits of cognitive complexity, as well as test hypotheses about the interaction between perception, memory and decision-making mechanisms. In this work, we implement and investigate a model of virtual organisms with a visual system operating in a three-dimensional physical environment using the Unity ML-Agents software - one of the most high-performance simulation platforms currently available. We propose a hierarchical control architecture that separates locomotion and navigation tasks between two modules: (1) visual perception and decision-making, and (2) coordinated control of limb movement for locomotion in the physical environment. A series of numerical experiments was conducted to examine the influence of visual system parameters (e. g, resolution of the "first-person" view), environmental configuration and agent architectural features on the efficiency and outcomes of reinforcement learning (using the PPO algorithm). The results demonstrate the existence of an optimal range of resolutions that provide a trade-off between computational complexity and success in accomplishing the task, while excessive dimensionality of sensory inputs or action space leads to slower learning. We performed system performance profiling and identified key bottlenecks in large-scale simulations. The discussion considers biological parallels, highlighting cases of high behavioral efficiency in insects with relatively low-resolution visual systems, and the potential of neuroevolutionary approaches for adapting agent architectures. The proposed approach and the results obtained are of potential interest to researchers working on biologically inspired artificial agents, evolutionary modeling, and the study of cognitive processes in artificial systems.
{"title":"Self-learning virtual organisms in a physics simulator: on the optimal resolution of their visual system, the architecture of the nervous system and the computational complexity of the problem.","authors":"M S Zenin, A P Devyaterikov, A Yu Palyanov","doi":"10.18699/vjgb-25-110","DOIUrl":"https://doi.org/10.18699/vjgb-25-110","url":null,"abstract":"<p><p>Vision plays a key role in the lives of various organisms, enabling spatial orientation, foraging, predator avoidance and social interaction. In species with relatively simple visual systems, such as insects, effective behavioral strategies are achieved through high neural specialization, adaptation to specific environmental conditions, and the use of additional sensory systems such as olfaction or hearing. Animals with more complex vision and nervous systems, such as mammals, have greater cognitive abilities and flexibility, but this comes with increased demands on the brain's energy costs and computational resources. Modeling the features of such systems in a virtual environment could allow researchers to explore the fundamental principles of sensorimotor integration and the limits of cognitive complexity, as well as test hypotheses about the interaction between perception, memory and decision-making mechanisms. In this work, we implement and investigate a model of virtual organisms with a visual system operating in a three-dimensional physical environment using the Unity ML-Agents software - one of the most high-performance simulation platforms currently available. We propose a hierarchical control architecture that separates locomotion and navigation tasks between two modules: (1) visual perception and decision-making, and (2) coordinated control of limb movement for locomotion in the physical environment. A series of numerical experiments was conducted to examine the influence of visual system parameters (e. g, resolution of the \"first-person\" view), environmental configuration and agent architectural features on the efficiency and outcomes of reinforcement learning (using the PPO algorithm). The results demonstrate the existence of an optimal range of resolutions that provide a trade-off between computational complexity and success in accomplishing the task, while excessive dimensionality of sensory inputs or action space leads to slower learning. We performed system performance profiling and identified key bottlenecks in large-scale simulations. The discussion considers biological parallels, highlighting cases of high behavioral efficiency in insects with relatively low-resolution visual systems, and the potential of neuroevolutionary approaches for adapting agent architectures. The proposed approach and the results obtained are of potential interest to researchers working on biologically inspired artificial agents, evolutionary modeling, and the study of cognitive processes in artificial systems.</p>","PeriodicalId":44339,"journal":{"name":"Vavilovskii Zhurnal Genetiki i Selektsii","volume":"29 7","pages":"1051-1061"},"PeriodicalIF":1.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12795856/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145970907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
De novo motif search is the main approach for determining the nucleotide specificity of binding of the key regulators of gene transcription, transcription factors (TFs), based on data from massive genome-wide sequencing of their binding site regions in vivo, such as ChIP-seq. The number of motifs of known TF binding sites (TFBSs) has increased several times in recent years. Due to the similarity in the structure of the DNA-binding domains of TFs, many structurally cognate TFs have similar and sometimes almost indistinguishable binding site motifs. The classification of TFs by the structure of the DNA-binding domains from the TFClass database defines the top levels of the hierarchy (superclasses and classes of TFs) by the structure of these domains, and the next levels (families and subfamilies of TFs) by the alignments of amino acid sequences of domains. However, this classification does not take into account the similarity of TFBS motifs, whereas identification of valid TFs from massive sequencing data of TFBSs, such as ChIP- seq, requires working with TFBS motifs rather than TFs themselves. Therefore, in this study we extracted from the Hocomoco and Jaspar databases the TFBS motifs for human and fruit fly Drosophila melanogaster, and considered the pairwise similarity of binding site motifs of cognate TFs according to their classification from the TFClass database. We have shown that the common tree of the TF hierarchy by the structure of DNA-binding domains can be split into separate branches representing non-overlapping sets of TFs. Within each branch, the majority of TF pairs have significantly similar binding site motifs. Each branch can include one or more sister elementary units of the hierarchy and all its/their lower levels: one or more TFs of the same subfamily, or the whole subfamily, one or several subfamilies of the same family, an entire family, etc., up to the entire class. Analysis of the seven largest human and two largest Drosophila TF classes showed that the similarity of TFs in terms of TFBS motifs for different corresponding levels (classes, families) is noticeably different. Supplementing the hierarchical classification of TFs with branches combining significantly similar motifs of TFBSs can increase the efficiency of identifying involved TFs through enriched motifs detected by de novo motif search for massive sequencing data of TFBSs from the ChIP-seq technology.
De novo motif search是确定基因转录关键调控因子转录因子(transcription factors, TFs)结合核苷酸特异性的主要方法,它基于大量体内转录因子结合位点区域的全基因组测序数据,如ChIP-seq。近年来,已知TF结合位点(TFBSs)的基序数量增加了几倍。由于tf的dna结合域结构相似,许多结构同源的tf具有相似的,有时几乎无法区分的结合位点基序。根据TFClass数据库中dna结合结构域的结构对tf进行分类,根据这些结构域的结构定义了tf的上层(超类和类),根据结构域的氨基酸序列比对定义了tf的下一级(家族和亚家族)。然而,这种分类没有考虑到TFBS基序的相似性,而从大量的TFBS测序数据(如ChIP- seq)中识别有效的TFBS,需要使用TFBS基序而不是tffs本身。因此,本研究从Hocomoco和jasar数据库中提取了人类和果蝇的TFBS基序,并根据TFClass数据库中的分类考虑同源tf结合位点基序的两两相似性。我们已经证明,由dna结合域结构构成的TF层次结构的共同树可以分裂成代表非重叠TF集合的单独分支。在每个分支中,大多数TF对具有显著相似的结合位点基序。每个分支可以包括层次结构的一个或多个姐妹基本单位及其所有较低的层次:同一亚族的一个或多个tf,或整个亚族,同一家族的一个或几个亚族,整个家族,等等,直到整个类。对7个最大的人类TF类和2个最大的果蝇TF类的分析表明,不同相应水平(类、科)的TF在TFBS基序方面的相似性有显著差异。利用ChIP-seq技术对大量的TFBSs测序数据进行从头基序搜索,检测到丰富的基序,通过结合显著相似基序的分支来补充tffs的分层分类,可以提高识别相关tf的效率。
{"title":"Linking hierarchical classification of transcription factors by the structure of their DNA-binding domains to the variability of their binding site motifs.","authors":"V G Levitsky, T Yu Vatolina, V V Raditsa","doi":"10.18699/vjgb-25-99","DOIUrl":"https://doi.org/10.18699/vjgb-25-99","url":null,"abstract":"<p><p>De novo motif search is the main approach for determining the nucleotide specificity of binding of the key regulators of gene transcription, transcription factors (TFs), based on data from massive genome-wide sequencing of their binding site regions in vivo, such as ChIP-seq. The number of motifs of known TF binding sites (TFBSs) has increased several times in recent years. Due to the similarity in the structure of the DNA-binding domains of TFs, many structurally cognate TFs have similar and sometimes almost indistinguishable binding site motifs. The classification of TFs by the structure of the DNA-binding domains from the TFClass database defines the top levels of the hierarchy (superclasses and classes of TFs) by the structure of these domains, and the next levels (families and subfamilies of TFs) by the alignments of amino acid sequences of domains. However, this classification does not take into account the similarity of TFBS motifs, whereas identification of valid TFs from massive sequencing data of TFBSs, such as ChIP- seq, requires working with TFBS motifs rather than TFs themselves. Therefore, in this study we extracted from the Hocomoco and Jaspar databases the TFBS motifs for human and fruit fly Drosophila melanogaster, and considered the pairwise similarity of binding site motifs of cognate TFs according to their classification from the TFClass database. We have shown that the common tree of the TF hierarchy by the structure of DNA-binding domains can be split into separate branches representing non-overlapping sets of TFs. Within each branch, the majority of TF pairs have significantly similar binding site motifs. Each branch can include one or more sister elementary units of the hierarchy and all its/their lower levels: one or more TFs of the same subfamily, or the whole subfamily, one or several subfamilies of the same family, an entire family, etc., up to the entire class. Analysis of the seven largest human and two largest Drosophila TF classes showed that the similarity of TFs in terms of TFBS motifs for different corresponding levels (classes, families) is noticeably different. Supplementing the hierarchical classification of TFs with branches combining significantly similar motifs of TFBSs can increase the efficiency of identifying involved TFs through enriched motifs detected by de novo motif search for massive sequencing data of TFBSs from the ChIP-seq technology.</p>","PeriodicalId":44339,"journal":{"name":"Vavilovskii Zhurnal Genetiki i Selektsii","volume":"29 7","pages":"925-939"},"PeriodicalIF":1.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12795858/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T V Ivanisenko, P S Demenkov, M A Kleshchev, V A Ivanisenko
In recent years, artificial intelligence methods based on the analysis of heterogeneous graphs of biomedical networks have become widely used for predicting molecular interactions. In particular, graph neural networks (GNNs) effectively identify missing edges in gene networks - such as protein-protein interaction, gene-disease, drug-target, and other networks - thereby enabling the prediction of new biological relationships. To reconstruct gene networks, cognitive systems for automatic text mining of scientific publications and databases are often employed. One such AI-driven platform, ANDSystem, is designed for automatic knowledge extraction of molecular interactions and, on this basis, the reconstruction of associative gene networks. The ANDSystem knowledge base contains information on more than 100 million interactions among diverse molecular genetic entities (genes, proteins, metabolites, drugs, etc.). The interactions span a wide range of types: regulatory relationships, physical interactions (protein-protein, protein-ligand), catalytic and chemical reactions, and associations among genes, phenotypes, diseases, and more. In the present study, we applied attention-based graph neural networks trained on the ANDSystem knowledge graph to predict new edges between proteins and ligands and to identify potential ligands for the SARS-CoV-2 ORF3a protein. The accessory protein ORF3a plays an important role in viral pathogenesis through ion-channel activity, induction of apoptosis, and the ability to modulate endolysosomal processes and the host innate immune response. Despite this broad functional spectrum, ORF3a has been explored far less as a pharmacological target than other viral proteins. Using a graph neural network, we predicted five small molecules of different origins (metabolites and a drug) that potentially interact with ORF3a: N-acetyl-D-glucosamine, 4-(benzoylamino)benzoic acid, austocystin D, bictegravirum, and L-threonine. Molecular docking and MM/GBSA affinity estimation indicate the potential ability of these compounds to form complexes with ORF3a. Localization analysis showed that the binding sites of bictegravir and 4-(benzoylamino)benzoic acid lie in a cytosolic surface pocket of the protein that is solvent-exposed; L-threonine binds within the intersubunit cleft of the dimer; and austocystin D and N-acetyl-D-glucosamine are positioned at the boundary between the cytosolic surface and the transmembrane region. The accessibility of these binding sites may be reduced by the influence of the lipid bilayer. The binding energetics for bictegravirum were more favorable than for 4-(benzoylamino)benzoic acid (docking score -7.37 kcal/mol; MM/GBSA ΔG -14.71 ± 3.12 kcal/mol), making bictegravirum a promising candidate for repurposing as an ORF3a inhibitor.
{"title":"Prediction of interactions between the SARS-CoV-2 ORF3a protein and small-molecule ligands using the ANDSystem cognitive platform, graph neural networks, and molecular modeling.","authors":"T V Ivanisenko, P S Demenkov, M A Kleshchev, V A Ivanisenko","doi":"10.18699/vjgb-25-113","DOIUrl":"https://doi.org/10.18699/vjgb-25-113","url":null,"abstract":"<p><p>In recent years, artificial intelligence methods based on the analysis of heterogeneous graphs of biomedical networks have become widely used for predicting molecular interactions. In particular, graph neural networks (GNNs) effectively identify missing edges in gene networks - such as protein-protein interaction, gene-disease, drug-target, and other networks - thereby enabling the prediction of new biological relationships. To reconstruct gene networks, cognitive systems for automatic text mining of scientific publications and databases are often employed. One such AI-driven platform, ANDSystem, is designed for automatic knowledge extraction of molecular interactions and, on this basis, the reconstruction of associative gene networks. The ANDSystem knowledge base contains information on more than 100 million interactions among diverse molecular genetic entities (genes, proteins, metabolites, drugs, etc.). The interactions span a wide range of types: regulatory relationships, physical interactions (protein-protein, protein-ligand), catalytic and chemical reactions, and associations among genes, phenotypes, diseases, and more. In the present study, we applied attention-based graph neural networks trained on the ANDSystem knowledge graph to predict new edges between proteins and ligands and to identify potential ligands for the SARS-CoV-2 ORF3a protein. The accessory protein ORF3a plays an important role in viral pathogenesis through ion-channel activity, induction of apoptosis, and the ability to modulate endolysosomal processes and the host innate immune response. Despite this broad functional spectrum, ORF3a has been explored far less as a pharmacological target than other viral proteins. Using a graph neural network, we predicted five small molecules of different origins (metabolites and a drug) that potentially interact with ORF3a: N-acetyl-D-glucosamine, 4-(benzoylamino)benzoic acid, austocystin D, bictegravirum, and L-threonine. Molecular docking and MM/GBSA affinity estimation indicate the potential ability of these compounds to form complexes with ORF3a. Localization analysis showed that the binding sites of bictegravir and 4-(benzoylamino)benzoic acid lie in a cytosolic surface pocket of the protein that is solvent-exposed; L-threonine binds within the intersubunit cleft of the dimer; and austocystin D and N-acetyl-D-glucosamine are positioned at the boundary between the cytosolic surface and the transmembrane region. The accessibility of these binding sites may be reduced by the influence of the lipid bilayer. The binding energetics for bictegravirum were more favorable than for 4-(benzoylamino)benzoic acid (docking score -7.37 kcal/mol; MM/GBSA ΔG -14.71 ± 3.12 kcal/mol), making bictegravirum a promising candidate for repurposing as an ORF3a inhibitor.</p>","PeriodicalId":44339,"journal":{"name":"Vavilovskii Zhurnal Genetiki i Selektsii","volume":"29 7","pages":"1084-1096"},"PeriodicalIF":1.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12799363/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145991272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E L Mishchenko, I V Yatsyk, P S Demenkov, A V Adamovskaya, T V Ivanisenko, M A Kleshchev, V A Ivanisenko
Rheumatoid arthritis (RA) is a systemic autoimmune disease characterized primarily by joint involvement with progressive destruction of cartilage and bone tissue. To date, RA remains an incurable disease that leads to a significant deterioration in quality of life and patient disability. Despite a wide arsenal of disease-modifying antirheumatic drugs, approximately 40 % of patients show an insufficient response to standard treatment, highlighting the urgent need to identify new pharmacological targets. The aim of this study was to search for novel biological processes that could serve as promising targets for the targeted therapy of RA. To achieve this goal, we employed an approach based on the automated extraction of knowledge from scientific publications and biomedical databases using the ANDSystem software. This approach involved the reconstruction and subsequent analysis of two types of associative gene networks: a) gene networks describing genes and proteins associated with the development of RA, and b) gene networks describing genes and proteins involved in the functional responses to drugs used for the disease's therapy. The analysis of the reconstructed networks identified 11 biological processes that play a significant role in the pathogenesis of RA but are not yet direct targets of existing disease-modifying antirheumatic drugs. The most promising of these, described by Gene Ontology terms, include: a) the Toll-like receptor signaling pathway; b) neutrophil activation; c) regulation of osteoblast differentiation; d) regulation of osteoclast differentiation; e) the prostaglandin biosynthetic process, and f) the canonical Wnt signaling pathway. The identified biological processes and their key regulators represent promising targets for the development of new drugs capable of improving the efficacy of RA therapy, particularly in patients resistant to existing treatments. The developed approach can also be successfully applied to the search for new targeted therapy targets for other diseases.
{"title":"Searching for biological processes as targets for rheumatoid arthritis targeted therapy with ANDSystem, an integrated software and information platform.","authors":"E L Mishchenko, I V Yatsyk, P S Demenkov, A V Adamovskaya, T V Ivanisenko, M A Kleshchev, V A Ivanisenko","doi":"10.18699/vjgb-25-107","DOIUrl":"https://doi.org/10.18699/vjgb-25-107","url":null,"abstract":"<p><p>Rheumatoid arthritis (RA) is a systemic autoimmune disease characterized primarily by joint involvement with progressive destruction of cartilage and bone tissue. To date, RA remains an incurable disease that leads to a significant deterioration in quality of life and patient disability. Despite a wide arsenal of disease-modifying antirheumatic drugs, approximately 40 % of patients show an insufficient response to standard treatment, highlighting the urgent need to identify new pharmacological targets. The aim of this study was to search for novel biological processes that could serve as promising targets for the targeted therapy of RA. To achieve this goal, we employed an approach based on the automated extraction of knowledge from scientific publications and biomedical databases using the ANDSystem software. This approach involved the reconstruction and subsequent analysis of two types of associative gene networks: a) gene networks describing genes and proteins associated with the development of RA, and b) gene networks describing genes and proteins involved in the functional responses to drugs used for the disease's therapy. The analysis of the reconstructed networks identified 11 biological processes that play a significant role in the pathogenesis of RA but are not yet direct targets of existing disease-modifying antirheumatic drugs. The most promising of these, described by Gene Ontology terms, include: a) the Toll-like receptor signaling pathway; b) neutrophil activation; c) regulation of osteoblast differentiation; d) regulation of osteoclast differentiation; e) the prostaglandin biosynthetic process, and f) the canonical Wnt signaling pathway. The identified biological processes and their key regulators represent promising targets for the development of new drugs capable of improving the efficacy of RA therapy, particularly in patients resistant to existing treatments. The developed approach can also be successfully applied to the search for new targeted therapy targets for other diseases.</p>","PeriodicalId":44339,"journal":{"name":"Vavilovskii Zhurnal Genetiki i Selektsii","volume":"29 7","pages":"1020-1030"},"PeriodicalIF":1.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12795834/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145970931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}