Clustering Some MicroRNAs Expressed in the Breast Tissue Using Shannon Information Theory and Comparing the Results With UPGMA, Neighbor-Joining, and Maximum-Likelihood Methods

Arezo Askari Rad, J. Fayazi, Houshang Dehghanzadeh
{"title":"Clustering Some MicroRNAs Expressed in the Breast Tissue Using Shannon Information Theory and Comparing the Results With UPGMA, Neighbor-Joining, and Maximum-Likelihood Methods","authors":"Arezo Askari Rad, J. Fayazi, Houshang Dehghanzadeh","doi":"10.32598/RMM.8.4.3","DOIUrl":null,"url":null,"abstract":"Background: Because milk and milk products play a vital role in human nutrition, dairy cattle farmers are working in increasing milk production or changing its composition. For this reason, researching the genes which play an important role in milk production and its composition is of high value. Information theory is an interdisciplinary branch of mathematics which overlaps with communications engineering, biology, and medicine. It has been used in genetic and bioinformatics analyses such as the biological structures and sequences. Materials and methods: In this study, a total of 20 microRNAs from those affecting the breast tissue and mammary glands have been extracted from the microRNA database. For each microRNA sequence, the entropy values of the first- to third-order were calculated and the Kullback-Leibler divergence criteria were estimated. Then, the Kullback-Leibler divergence matrix of the microRNAs was considered as the inputs for clustering methods. All calculations were performed in the R program. The biological pathway of each target was predicted using the KEGG server. Results: MicroRNAs are divided into two main groups based upon comparing and analyzing all the created clusters. The first group contains 18 microRNA and the second group contains 2 microRNAs at the first- and third-order entropies. The second-order entropy contains 19 microRNA in the first group and only 1 microRNA in the second group. The clustering topology changes as the entropy order changes from 1 to 3, with the most significant changes being seen in the clustering resulted from the third-order entropy. Conclusion: In the proposed method of clustering, we obtained a biological grouping of genes. There is a good concordance between most of the microRNAs within one cluster and their biological pathway. The algorithm is applicable for clustering a range of genes and even genomes based on their DNA sequences entropy. Our method can help assign and predict the biological activity of those genes that lack robust annotations because it relies only on the DNA sequence and length of the genes.","PeriodicalId":30778,"journal":{"name":"Research in Molecular Medicine","volume":"8 1","pages":"179-188"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research in Molecular Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32598/RMM.8.4.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Background: Because milk and milk products play a vital role in human nutrition, dairy cattle farmers are working in increasing milk production or changing its composition. For this reason, researching the genes which play an important role in milk production and its composition is of high value. Information theory is an interdisciplinary branch of mathematics which overlaps with communications engineering, biology, and medicine. It has been used in genetic and bioinformatics analyses such as the biological structures and sequences. Materials and methods: In this study, a total of 20 microRNAs from those affecting the breast tissue and mammary glands have been extracted from the microRNA database. For each microRNA sequence, the entropy values of the first- to third-order were calculated and the Kullback-Leibler divergence criteria were estimated. Then, the Kullback-Leibler divergence matrix of the microRNAs was considered as the inputs for clustering methods. All calculations were performed in the R program. The biological pathway of each target was predicted using the KEGG server. Results: MicroRNAs are divided into two main groups based upon comparing and analyzing all the created clusters. The first group contains 18 microRNA and the second group contains 2 microRNAs at the first- and third-order entropies. The second-order entropy contains 19 microRNA in the first group and only 1 microRNA in the second group. The clustering topology changes as the entropy order changes from 1 to 3, with the most significant changes being seen in the clustering resulted from the third-order entropy. Conclusion: In the proposed method of clustering, we obtained a biological grouping of genes. There is a good concordance between most of the microRNAs within one cluster and their biological pathway. The algorithm is applicable for clustering a range of genes and even genomes based on their DNA sequences entropy. Our method can help assign and predict the biological activity of those genes that lack robust annotations because it relies only on the DNA sequence and length of the genes.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用香农信息理论对乳腺组织中表达的一些microrna进行聚类,并与UPGMA、邻居连接和最大似然方法进行结果比较
背景:由于牛奶和奶制品在人类营养中起着至关重要的作用,奶牛养殖户正在努力提高牛奶产量或改变其成分。因此,研究在产奶量中起重要作用的基因及其组成具有重要意义。信息论是数学的一个交叉学科分支,与通信工程、生物学和医学交叉。它已被用于遗传和生物信息学分析,如生物结构和序列。材料和方法:本研究从microRNA数据库中提取了20个影响乳腺组织和乳腺的microRNA。对于每个microRNA序列,计算一阶到三阶的熵值,并估计Kullback-Leibler散度准则。然后,将microrna的Kullback-Leibler散度矩阵作为聚类方法的输入。所有计算均在R程序中完成。利用KEGG服务器预测每个靶点的生物学通路。结果:通过对所有创建的microrna簇的比较和分析,将microrna分为两大类。第一组含有18个microRNA,第二组在一阶和三阶熵上含有2个microRNA。二阶熵在第一组中包含19个microRNA,在第二组中只有1个microRNA。聚类拓扑随着熵阶从1到3的变化而变化,其中三阶熵导致的聚类变化最为显著。结论:在本文提出的聚类方法中,我们获得了基因的生物分组。一个簇内的大多数microrna与它们的生物学途径具有良好的一致性。该算法适用于基于DNA序列熵对一系列基因甚至基因组进行聚类。我们的方法可以帮助分配和预测那些缺乏鲁棒注释的基因的生物活性,因为它只依赖于基因的DNA序列和长度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
21 weeks
期刊最新文献
The speed of internationalization in regionally clustered family firms: a deeper understanding of innovation activities and cluster affiliation. Investigation of Seroprevalence of Hydatidosis in High-risk Individuals in Sistan and Baluchestan Province, Southeast of Iran Upcoming Multi-drug-Resistant and Extensively Drug-Resistant Bacteria Inhibition of Cervical Cancer Cell Migration by Human Wharton’s Jelly Stem Cells Optimization for Rapid Detection of E. coli O157:H7 Using Real-Time Loop-Mediated Isothermal Amplification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1