Data from genomics, proteomics, structural biology and cryo-electron microscopy are integrated into a structural illustration of a cross section through an entire JCVI-syn3.0 minimal cell. The illustration is designed with several goals: to inspire excitement in science, to depict the underlying scientific results accurately, and to be feasible in traditional media. Design choices to achieve these goals include reduction of visual complexity with simplified representations, use of orthographic projection to retain scale relationships, and an approach to color that highlights functional compartments of the cell. Given that this simple cell provides an attractive laboratory for exploring the central processes needed for life, several functional narratives are included in the illustration, including division of the cell and the first depiction of an entire cellular proteome. The illustration lays the foundation for 3D molecular modeling of this cell.
{"title":"Integrative illustration of a JCVI-syn3A minimal cell.","authors":"David S Goodsell","doi":"10.1515/jib-2022-0013","DOIUrl":"10.1515/jib-2022-0013","url":null,"abstract":"<p><p>Data from genomics, proteomics, structural biology and cryo-electron microscopy are integrated into a structural illustration of a cross section through an entire JCVI-syn3.0 minimal cell. The illustration is designed with several goals: to inspire excitement in science, to depict the underlying scientific results accurately, and to be feasible in traditional media. Design choices to achieve these goals include reduction of visual complexity with simplified representations, use of orthographic projection to retain scale relationships, and an approach to color that highlights functional compartments of the cell. Given that this simple cell provides an attractive laboratory for exploring the central processes needed for life, several functional narratives are included in the illustration, including division of the cell and the first depiction of an entire cellular proteome. The illustration lays the foundation for 3D molecular modeling of this cell.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9377704/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40395611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Biomedical illustration and visualization techniques provide a window into complex molecular worlds that are difficult to capture through experimental means alone. Biomedical illustrators frequently employ color to help tell a molecular story, e.g., to identify key molecules in a signaling pathway. Currently, color use for molecules is largely arbitrary and often chosen based on the client, cultural factors, or personal taste. The study of molecular dynamics is relatively young, and some stakeholders argue that color use guidelines would throttle the growth of the field. Instead, content authors have ample creative freedom to choose an aesthetic that, e.g., supports the story they want to tell. However, such creative freedom comes at a price. The color design process is challenging, particularly for those without a background in color theory. The result is a semantically inconsistent color space that reduces the interpretability and effectiveness of molecular visualizations as a whole. Our contribution in this paper is threefold. We first discuss some of the factors that contribute to this array of color palettes. Second, we provide a brief sampling of color palettes used in both industry and research sectors. Lastly, we suggest considerations for developing best practices around color palettes applied to molecular visualization.
{"title":"Considering best practices in color palettes for molecular visualizations.","authors":"Laura Garrison, Stefan Bruckner","doi":"10.1515/jib-2022-0016","DOIUrl":"https://doi.org/10.1515/jib-2022-0016","url":null,"abstract":"<p><p>Biomedical illustration and visualization techniques provide a window into complex molecular worlds that are difficult to capture through experimental means alone. Biomedical illustrators frequently employ color to help tell a molecular story, e.g., to identify key molecules in a signaling pathway. Currently, color use for molecules is largely arbitrary and often chosen based on the client, cultural factors, or personal taste. The study of molecular dynamics is relatively young, and some stakeholders argue that color use guidelines would throttle the growth of the field. Instead, content authors have ample creative freedom to choose an aesthetic that, e.g., supports the story they want to tell. However, such creative freedom comes at a price. The color design process is challenging, particularly for those without a background in color theory. The result is a semantically inconsistent color space that reduces the interpretability and effectiveness of molecular visualizations as a whole. Our contribution in this paper is threefold. We first discuss some of the factors that contribute to this array of color palettes. Second, we provide a brief sampling of color palettes used in both industry and research sectors. Lastly, we suggest considerations for developing best practices around color palettes applied to molecular visualization.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2022-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9377702/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40192929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract The prediction of adverse drug reactions (ADR) is an important step of drug discovery and design process. Different drug properties have been employed for ADR prediction but the prediction capability of drug properties and drug functions in integrated manner is yet to be explored. In the present work, a multi-label deep neural network and MLSMOTE based methodology has been proposed for ADR prediction. The proposed methodology has been applied on SMILES Strings data of drugs, 17 molecular descriptors data of drugs and drug functions data individually and in integrated manner for ADR prediction. The experimental results shows that the SMILES Strings + drug functions has outperformed other types of data with regards to ADR prediction capability.
{"title":"Integrative analysis of chemical properties and functions of drugs for adverse drug reaction prediction based on multi-label deep neural network","authors":"Pranab Das, Yogita, V. Pal","doi":"10.1515/jib-2022-0007","DOIUrl":"https://doi.org/10.1515/jib-2022-0007","url":null,"abstract":"Abstract The prediction of adverse drug reactions (ADR) is an important step of drug discovery and design process. Different drug properties have been employed for ADR prediction but the prediction capability of drug properties and drug functions in integrated manner is yet to be explored. In the present work, a multi-label deep neural network and MLSMOTE based methodology has been proposed for ADR prediction. The proposed methodology has been applied on SMILES Strings data of drugs, 17 molecular descriptors data of drugs and drug functions data individually and in integrated manner for ADR prediction. The experimental results shows that the SMILES Strings + drug functions has outperformed other types of data with regards to ADR prediction capability.","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2022-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49603676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yashbir Singh, N. Subbarao, Abhinav Jaimini, Quincy A. Hathaway, Amina Kunovac, Bradley J. Erickson, V. Swarup, H. Singh
Abstract Breast cancer metastases are most commonly found in bone, an indication of poor prognosis. Pathway-based biomarkers identification may help elucidate the cellular signature of breast cancer metastasis in bone, further characterizing the etiology and promoting new therapeutic approaches. We extracted gene expression profiles from mouse macrophages from the GEO dataset, GSE152795 using the GEO2R webtool. The differentially expressed genes (DEGs) were filtered by log2 fold-change with threshold 1.5 (FDR < 0.05). STRING database and Enrichr were used for GO-term analysis, miRNA and TF analysis associated with DEGs. Autodock Vienna was exploited to investigate interaction of anti-cancer drugs, Actinomycin-D and Adriamycin. Sensitivity and specificity of DEGs was assessed using receiver operating characteristic (ROC) analyses. A total of 61 DEGs, included 27 down-regulated and 34 up-regulated, were found to be significant in breast cancer bone metastasis. Major DEGs were associated with lipid metabolism and immunological response of tumor tissue. Crucial DEGs, Bcl3, ADGRG7, FABP4, VCAN, and IRF4 were regulated by miRNAs, miR-497, miR-574, miR-138 and TFs, CCDN1, STAT6, IRF8. Docking analysis showed that these genes possessed strong binding with the drugs. ROC analysis demonstrated Bcl3 is specific to metastasis. DEGs Bcl3, ADGRG7, FABP4, IRF4, their regulating miRNAs and TFs have strong impact on proliferation and metastasis of breast cancer in bone tissues. In conclusion, present study revealed that DEGs are directly involved in of breast tumor metastasis in bone tissues. Identified genes, miRNAs, and TFs can be possible drug targets that may be used for the therapeutics. However, further experimental validation is necessary.
{"title":"Genome-wide expression reveals potential biomarkers in breast cancer bone metastasis","authors":"Yashbir Singh, N. Subbarao, Abhinav Jaimini, Quincy A. Hathaway, Amina Kunovac, Bradley J. Erickson, V. Swarup, H. Singh","doi":"10.1515/jib-2021-0041","DOIUrl":"https://doi.org/10.1515/jib-2021-0041","url":null,"abstract":"Abstract Breast cancer metastases are most commonly found in bone, an indication of poor prognosis. Pathway-based biomarkers identification may help elucidate the cellular signature of breast cancer metastasis in bone, further characterizing the etiology and promoting new therapeutic approaches. We extracted gene expression profiles from mouse macrophages from the GEO dataset, GSE152795 using the GEO2R webtool. The differentially expressed genes (DEGs) were filtered by log2 fold-change with threshold 1.5 (FDR < 0.05). STRING database and Enrichr were used for GO-term analysis, miRNA and TF analysis associated with DEGs. Autodock Vienna was exploited to investigate interaction of anti-cancer drugs, Actinomycin-D and Adriamycin. Sensitivity and specificity of DEGs was assessed using receiver operating characteristic (ROC) analyses. A total of 61 DEGs, included 27 down-regulated and 34 up-regulated, were found to be significant in breast cancer bone metastasis. Major DEGs were associated with lipid metabolism and immunological response of tumor tissue. Crucial DEGs, Bcl3, ADGRG7, FABP4, VCAN, and IRF4 were regulated by miRNAs, miR-497, miR-574, miR-138 and TFs, CCDN1, STAT6, IRF8. Docking analysis showed that these genes possessed strong binding with the drugs. ROC analysis demonstrated Bcl3 is specific to metastasis. DEGs Bcl3, ADGRG7, FABP4, IRF4, their regulating miRNAs and TFs have strong impact on proliferation and metastasis of breast cancer in bone tissues. In conclusion, present study revealed that DEGs are directly involved in of breast tumor metastasis in bone tissues. Identified genes, miRNAs, and TFs can be possible drug targets that may be used for the therapeutics. However, further experimental validation is necessary.","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2022-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46059130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Identification of complex interactions between miRNAs and mRNAs in a regulatory network helps better understand the underlying biological processes. Previously, identification of these interactions was based on sequence-based predicted target binding information. With the advancement in high-throughput omics technologies, miRNA and mRNA expression for the same set of samples are available. This helps develop more efficient and flexible approaches that work by integrating miRNA and mRNA expression profiles with target binding information. Since these integrative approaches of miRNA–mRNA regulatory modules (MRMs) detection is sufficiently able to capture the minute biological details, 26 such algorithms/methods/tools for MRMs identification are comprehensively reviewed in this article. The study covers the significant features underlying every method. Therefore, the methods are classified into eight groups based on mathematical approaches to understand their working and suitability for one’s study. An algorithm could be selected based on the available information with the users and the biological question under investigation.
{"title":"A review on methods for predicting miRNA–mRNA regulatory modules","authors":"Madhumita Madhumita, Sushmita Paul","doi":"10.1515/jib-2020-0048","DOIUrl":"https://doi.org/10.1515/jib-2020-0048","url":null,"abstract":"Abstract Identification of complex interactions between miRNAs and mRNAs in a regulatory network helps better understand the underlying biological processes. Previously, identification of these interactions was based on sequence-based predicted target binding information. With the advancement in high-throughput omics technologies, miRNA and mRNA expression for the same set of samples are available. This helps develop more efficient and flexible approaches that work by integrating miRNA and mRNA expression profiles with target binding information. Since these integrative approaches of miRNA–mRNA regulatory modules (MRMs) detection is sufficiently able to capture the minute biological details, 26 such algorithms/methods/tools for MRMs identification are comprehensively reviewed in this article. The study covers the significant features underlying every method. Therefore, the methods are classified into eight groups based on mathematical approaches to understand their working and suitability for one’s study. An algorithm could be selected based on the available information with the users and the biological question under investigation.","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46670434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Spliced alignments are a key step in the construction of high-quality homology-based annotations of protein sequences. The exon/intron structure, which is computed as part of spliced alignment procedures, often conveys important information for the distinguishing paralogous members of gene families. Here we present an exon-centric pipeline for spliced alignment that is intended in particular for applications that involve exon-by-exon comparisons of coding sequences. We show that the simple, blat-based approach has advantages over established tools in particular for genes with very large introns and applications to fragmented genome assemblies.
{"title":"ExceS-A: an exon-centric split aligner","authors":"Franziska Reinhardt, P. Stadler","doi":"10.1515/jib-2021-0040","DOIUrl":"https://doi.org/10.1515/jib-2021-0040","url":null,"abstract":"Abstract Spliced alignments are a key step in the construction of high-quality homology-based annotations of protein sequences. The exon/intron structure, which is computed as part of spliced alignment procedures, often conveys important information for the distinguishing paralogous members of gene families. Here we present an exon-centric pipeline for spliced alignment that is intended in particular for applications that involve exon-by-exon comparisons of coding sequences. We show that the simple, blat-based approach has advantages over established tools in particular for genes with very large introns and applications to fragmented genome assemblies.","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48295319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pairs of interacting transcription factors (TFs) have previously been shown to bind to enhancers and promoters and contribute to their physical interactions. However, to date, we have limited knowledge about such TF pairs. To fill this void, we systematically studied the co-occurrence of TF-binding motifs in interacting enhancer-promoter (EP) pairs in seven human cell lines. We discovered 423 motif pairs that significantly co-occur in enhancers and promoters of interacting EP pairs. We demonstrated that these motif pairs are biologically meaningful and significantly enriched with motif pairs of known interacting TF pairs. We also showed that the identified motif pairs facilitated the discovery of the interacting EP pairs. The developed pipeline, EPmotifPair, together with the predicted motifs and motif pairs, is available at https://doi.org/10.6084/m9.figshare.14192000. Our study provides a comprehensive list of motif pairs that may contribute to EP physical interactions, which facilitate generating meaningful hypotheses for experimental validation.
{"title":"A systematic study of motif pairs that may facilitate enhancer-promoter interactions.","authors":"Saidi Wang, Haiyan Hu, Xiaoman Li","doi":"10.1515/jib-2021-0038","DOIUrl":"https://doi.org/10.1515/jib-2021-0038","url":null,"abstract":"<p><p>Pairs of interacting transcription factors (TFs) have previously been shown to bind to enhancers and promoters and contribute to their physical interactions. However, to date, we have limited knowledge about such TF pairs. To fill this void, we systematically studied the co-occurrence of TF-binding motifs in interacting enhancer-promoter (EP) pairs in seven human cell lines. We discovered 423 motif pairs that significantly co-occur in enhancers and promoters of interacting EP pairs. We demonstrated that these motif pairs are biologically meaningful and significantly enriched with motif pairs of known interacting TF pairs. We also showed that the identified motif pairs facilitated the discovery of the interacting EP pairs. The developed pipeline, EPmotifPair, together with the predicted motifs and motif pairs, is available at https://doi.org/10.6084/m9.figshare.14192000. Our study provides a comprehensive list of motif pairs that may contribute to EP physical interactions, which facilitate generating meaningful hypotheses for experimental validation.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2022-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9069648/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39897140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The analysis of enormous datasets with missing data entries is a standard task in biological and medical data processing. Large-scale, multi-institution clinical studies are the typical examples of such datasets. These sets make possible the search for multi-parametric relations since from the plenty of the data one is likely to find a satisfying number of subjects with the required parameter ensembles. Specifically, finding combinatorial biomarkers for some given condition also needs a very large dataset to analyze. For fast and automatic multi-parametric relation discovery association-rule finding tools are used for more than two decades in the data-mining community. Here we present the SCARF webserver for generalized association rule mining. Association rules are of the form: a AND b AND … AND x → y, meaning that the presence of properties a AND b AND … AND x implies property y; our algorithm finds generalized association rules, since it also finds logical disjunctions (i.e., ORs) at the left-hand side, allowing the discovery of more complex rules in a more compressed form in the database. This feature also helps reducing the typically very large result-tables of such studies, since allowing ORs in the left-hand side of a single rule could include dozens of classical rules. The capabilities of the SCARF algorithm were demonstrated in mining the Alzheimer's database of the Coalition Against Major Diseases (CAMD) in our recent publication (Archives of Gerontology and Geriatrics Vol. 73, pp. 300-307, 2017). Here we describe the webserver implementation of the algorithm.
在生物和医学数据处理中,分析大量缺少数据条目的数据集是一项标准任务。大规模、多机构临床研究是此类数据集的典型例子。这些集合使得搜索多参数关系成为可能,因为从大量的数据中,人们很可能找到数量令人满意的具有所需参数集合的主题。具体来说,为某些特定疾病寻找组合生物标志物也需要一个非常大的数据集来分析。为了快速、自动地发现多参数关系,关联规则查找工具在数据挖掘领域已经使用了20多年。本文提出了用于广义关联规则挖掘的SCARF web服务器。关联规则的形式是:a AND b AND…AND x→y,这意味着属性a AND b AND…AND x的存在意味着属性y;我们的算法发现了广义关联规则,因为它也在左侧发现了逻辑析取(即or),从而允许在数据库中以更压缩的形式发现更复杂的规则。这个特性还有助于减少此类研究中通常非常大的结果表,因为在单个规则的左侧允许or可能包含数十个经典规则。在我们最近发表的一篇文章(《老年学和老年病学档案》第73卷,第300-307页,2017年)中,我们在挖掘抗重大疾病联盟(CAMD)的阿尔茨海默病数据库中展示了SCARF算法的能力。这里我们描述了该算法的web服务器实现。
{"title":"SCARF: a biomedical association rule finding webserver.","authors":"Balázs Szalkai, Vince Grolmusz","doi":"10.1515/jib-2021-0035","DOIUrl":"https://doi.org/10.1515/jib-2021-0035","url":null,"abstract":"<p><p>The analysis of enormous datasets with missing data entries is a standard task in biological and medical data processing. Large-scale, multi-institution clinical studies are the typical examples of such datasets. These sets make possible the search for multi-parametric relations since from the plenty of the data one is likely to find a satisfying number of subjects with the required parameter ensembles. Specifically, finding combinatorial biomarkers for some given condition also needs a very large dataset to analyze. For fast and automatic multi-parametric relation discovery association-rule finding tools are used for more than two decades in the data-mining community. Here we present the SCARF webserver for <i>generalized</i> association rule mining. Association rules are of the form: <i>a</i> AND <i>b</i> AND … AND <i>x</i> → <i>y</i>, meaning that the presence of properties <i>a</i> AND <i>b</i> AND … AND <i>x</i> implies property <i>y</i>; our algorithm finds generalized association rules, since it also finds logical disjunctions (i.e., ORs) at the left-hand side, allowing the discovery of more complex rules in a more compressed form in the database. This feature also helps reducing the typically very large result-tables of such studies, since allowing ORs in the left-hand side of a single rule could include dozens of classical rules. The capabilities of the SCARF algorithm were demonstrated in mining the Alzheimer's database of the Coalition Against Major Diseases (CAMD) in our recent publication (Archives of Gerontology and Geriatrics Vol. 73, pp. 300-307, 2017). Here we describe the webserver implementation of the algorithm.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2022-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9135138/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39888837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-01DOI: 10.1007/978-981-16-6795-4
{"title":"Integrative Bioinformatics: History and Future","authors":"","doi":"10.1007/978-981-16-6795-4","DOIUrl":"https://doi.org/10.1007/978-981-16-6795-4","url":null,"abstract":"","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83634855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arthur I Dergilev, Nina G Orlova, Oxana B Dobrovolskaya, Yuriy L Orlov
The development of high-throughput genomic sequencing coupled with chromatin immunoprecipitation technologies allows studying the binding sites of the protein transcription factors (TF) in the genome scale. The growth of data volume on the experimentally determined binding sites raises qualitatively new problems for the analysis of gene expression regulation, prediction of transcription factors target genes, and regulatory gene networks reconstruction. Genome regulation remains an insufficiently studied though plants have complex molecular regulatory mechanisms of gene expression and response to environmental stresses. It is important to develop new software tools for the analysis of the TF binding sites location and their clustering in the plant genomes, visualization, and the following statistical estimates. This study presents application of the analysis of multiple TF binding profiles in three evolutionarily distant model plant organisms. The construction and analysis of non-random ChIP-seq binding clusters of the different TFs in mammalian embryonic stem cells were discussed earlier using similar bioinformatics approaches. Such clusters of TF binding sites may indicate the gene regulatory regions, enhancers and gene transcription regulatory hubs. It can be used for analysis of the gene promoters as well as a background for transcription networks reconstruction. We discuss the statistical estimates of the TF binding sites clusters in the model plant genomes. The distributions of the number of different TFs per binding cluster follow same power law distribution for all the genomes studied. The binding clusters in Arabidopsis thaliana genome were discussed here in detail.
{"title":"Statistical estimates of multiple transcription factors binding in the model plant genomes based on ChIP-seq data.","authors":"Arthur I Dergilev, Nina G Orlova, Oxana B Dobrovolskaya, Yuriy L Orlov","doi":"10.1515/jib-2020-0036","DOIUrl":"10.1515/jib-2020-0036","url":null,"abstract":"<p><p>The development of high-throughput genomic sequencing coupled with chromatin immunoprecipitation technologies allows studying the binding sites of the protein transcription factors (TF) in the genome scale. The growth of data volume on the experimentally determined binding sites raises qualitatively new problems for the analysis of gene expression regulation, prediction of transcription factors target genes, and regulatory gene networks reconstruction. Genome regulation remains an insufficiently studied though plants have complex molecular regulatory mechanisms of gene expression and response to environmental stresses. It is important to develop new software tools for the analysis of the TF binding sites location and their clustering in the plant genomes, visualization, and the following statistical estimates. This study presents application of the analysis of multiple TF binding profiles in three evolutionarily distant model plant organisms. The construction and analysis of non-random ChIP-seq binding clusters of the different TFs in mammalian embryonic stem cells were discussed earlier using similar bioinformatics approaches. Such clusters of TF binding sites may indicate the gene regulatory regions, enhancers and gene transcription regulatory hubs. It can be used for analysis of the gene promoters as well as a background for transcription networks reconstruction. We discuss the statistical estimates of the TF binding sites clusters in the model plant genomes. The distributions of the number of different TFs per binding cluster follow same power law distribution for all the genomes studied. The binding clusters in <i>Arabidopsis thaliana</i> genome were discussed here in detail.</p>","PeriodicalId":53625,"journal":{"name":"Journal of Integrative Bioinformatics","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9069649/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39761184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}