Pairs of interacting transcription factors (TFs) have previously been shown to bind to enhancers and promoters and contribute to their physical interactions. However, to date, we have limited knowledge about such TF pairs. To fill this void, we systematically studied the co-occurrence of TF-binding motifs in interacting enhancer-promoter (EP) pairs in seven human cell lines. We discovered 423 motif pairs that significantly co-occur in enhancers and promoters of interacting EP pairs. We demonstrated that these motif pairs are biologically meaningful and significantly enriched with motif pairs of known interacting TF pairs. We also showed that the identified motif pairs facilitated the discovery of the interacting EP pairs. The developed pipeline, EPmotifPair, together with the predicted motifs and motif pairs, is available at https://doi.org/10.6084/m9.figshare.14192000. Our study provides a comprehensive list of motif pairs that may contribute to EP physical interactions, which facilitate generating meaningful hypotheses for experimental validation.
The analysis of enormous datasets with missing data entries is a standard task in biological and medical data processing. Large-scale, multi-institution clinical studies are the typical examples of such datasets. These sets make possible the search for multi-parametric relations since from the plenty of the data one is likely to find a satisfying number of subjects with the required parameter ensembles. Specifically, finding combinatorial biomarkers for some given condition also needs a very large dataset to analyze. For fast and automatic multi-parametric relation discovery association-rule finding tools are used for more than two decades in the data-mining community. Here we present the SCARF webserver for generalized association rule mining. Association rules are of the form: a AND b AND … AND x → y, meaning that the presence of properties a AND b AND … AND x implies property y; our algorithm finds generalized association rules, since it also finds logical disjunctions (i.e., ORs) at the left-hand side, allowing the discovery of more complex rules in a more compressed form in the database. This feature also helps reducing the typically very large result-tables of such studies, since allowing ORs in the left-hand side of a single rule could include dozens of classical rules. The capabilities of the SCARF algorithm were demonstrated in mining the Alzheimer's database of the Coalition Against Major Diseases (CAMD) in our recent publication (Archives of Gerontology and Geriatrics Vol. 73, pp. 300-307, 2017). Here we describe the webserver implementation of the algorithm.
The development of high-throughput genomic sequencing coupled with chromatin immunoprecipitation technologies allows studying the binding sites of the protein transcription factors (TF) in the genome scale. The growth of data volume on the experimentally determined binding sites raises qualitatively new problems for the analysis of gene expression regulation, prediction of transcription factors target genes, and regulatory gene networks reconstruction. Genome regulation remains an insufficiently studied though plants have complex molecular regulatory mechanisms of gene expression and response to environmental stresses. It is important to develop new software tools for the analysis of the TF binding sites location and their clustering in the plant genomes, visualization, and the following statistical estimates. This study presents application of the analysis of multiple TF binding profiles in three evolutionarily distant model plant organisms. The construction and analysis of non-random ChIP-seq binding clusters of the different TFs in mammalian embryonic stem cells were discussed earlier using similar bioinformatics approaches. Such clusters of TF binding sites may indicate the gene regulatory regions, enhancers and gene transcription regulatory hubs. It can be used for analysis of the gene promoters as well as a background for transcription networks reconstruction. We discuss the statistical estimates of the TF binding sites clusters in the model plant genomes. The distributions of the number of different TFs per binding cluster follow same power law distribution for all the genomes studied. The binding clusters in Arabidopsis thaliana genome were discussed here in detail.
Aberrant expression of genes involved in methylation, including DNA methyltransferase 3 Beta (DNMT3B), can cause hypermethylation of various tumor suppressor genes. In this regard, various molecular factors such as microRNAs can play a critical role in regulating these methyltransferase enzymes and eventually downstream genes such as growth arrest specific 7 (GAS7). Accordingly, in the present study we aimed to predict regulatory effect of miRNAs on DNMT3B and GAS7 genes expression in melanoma cell line. hsa-miR-203a-3p and hsa-miR-29a-3p were predicted and selected using bioinformatics software. The Real-time PCR technique was performed to investigate the regulatory effect of these molecules on the DNMT3B and GAS7 genes expression. Expression analysis of DNMT3B gene in A375 cell line showed that there was a significant increase compared to control (p value = 0.0015). Analysis of hsa-miR-203a-3p and hsa-miR-29a-3p indicated the insignificant decreased expression in melanoma cell line compared to control (p value < 0.05). Compared to control, the expression of GAS7 gene in melanoma cells showed a significant decrease (p value = 0.0323). Finally, our findings showed that the decreased expression of hsa-miR-203a-3p and hsa-miR-29a-3p can hypothesize that their aberrant expression caused DNMT3B dysfunction, possible methylation of the GAS7 gene, and ultimately decreased its expression. However, complementary studies are necessary to definite comment.

