Robust and rigorous identification of tissue-specific genes by statistically extending tau score.

IF 4 3区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Biodata Mining Pub Date : 2022-12-09 DOI:10.1186/s13040-022-00315-9
Hatice Büşra Lüleci, Alper Yılmaz
{"title":"Robust and rigorous identification of tissue-specific genes by statistically extending tau score.","authors":"Hatice Büşra Lüleci,&nbsp;Alper Yılmaz","doi":"10.1186/s13040-022-00315-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>In this study, we aimed to identify tissue-specific genes for various human tissues/organs more robustly and rigorously by extending the tau score algorithm.</p><p><strong>Introduction: </strong>Tissue-specific genes are a class of genes whose functions and expressions are preferred in one or several tissues restrictedly. Identification of tissue-specific genes is essential for discovering multi-cellular biological processes such as tissue-specific molecular regulations, tissue development, physiology, and the pathogenesis of tissue-associated diseases.</p><p><strong>Materials and methods: </strong>Gene expression data derived from five large RNA sequencing (RNA-seq) projects, spanning 96 different human tissues, were retrieved from ArrayExpress and ExpressionAtlas. The first step is categorizing genes using significant filters and tau score as a specificity index. After calculating tau for each gene in all datasets separately, statistical distance from the maximum expression level was estimated using a new meaningful procedure. Specific expression of a gene in one or several tissues was calculated after the integration of tau and statistical distance estimation, which is called as extended tau approach. Obtained tissue-specific genes for 96 different human tissues were functionally annotated, and some comparisons were carried out to show the effectiveness of the extended tau method.</p><p><strong>Results and discussion: </strong>Categorization of genes based on expression level and identification of tissue-specific genes for a large number of tissues/organs were executed. Genes were successfully assigned to multiple tissues by generating the extended tau approach as opposed to the original tau score, which can assign tissue specificity to single tissue only.</p>","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":null,"pages":null},"PeriodicalIF":4.0000,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9733102/pdf/","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodata Mining","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13040-022-00315-9","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 2

Abstract

Objectives: In this study, we aimed to identify tissue-specific genes for various human tissues/organs more robustly and rigorously by extending the tau score algorithm.

Introduction: Tissue-specific genes are a class of genes whose functions and expressions are preferred in one or several tissues restrictedly. Identification of tissue-specific genes is essential for discovering multi-cellular biological processes such as tissue-specific molecular regulations, tissue development, physiology, and the pathogenesis of tissue-associated diseases.

Materials and methods: Gene expression data derived from five large RNA sequencing (RNA-seq) projects, spanning 96 different human tissues, were retrieved from ArrayExpress and ExpressionAtlas. The first step is categorizing genes using significant filters and tau score as a specificity index. After calculating tau for each gene in all datasets separately, statistical distance from the maximum expression level was estimated using a new meaningful procedure. Specific expression of a gene in one or several tissues was calculated after the integration of tau and statistical distance estimation, which is called as extended tau approach. Obtained tissue-specific genes for 96 different human tissues were functionally annotated, and some comparisons were carried out to show the effectiveness of the extended tau method.

Results and discussion: Categorization of genes based on expression level and identification of tissue-specific genes for a large number of tissues/organs were executed. Genes were successfully assigned to multiple tissues by generating the extended tau approach as opposed to the original tau score, which can assign tissue specificity to single tissue only.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过统计扩展tau评分稳健和严格的组织特异性基因鉴定。
目的:在本研究中,我们旨在通过扩展tau评分算法来更稳健和严格地识别各种人体组织/器官的组织特异性基因。组织特异性基因是一类功能和表达局限于一个或几个组织的基因。组织特异性基因的鉴定对于发现多细胞生物学过程至关重要,如组织特异性分子调控、组织发育、生理学和组织相关疾病的发病机制。材料和方法:基因表达数据来源于5个大型RNA测序(RNA-seq)项目,涵盖96种不同的人体组织,从ArrayExpress和ExpressionAtlas检索。第一步是使用显著过滤器和tau分数作为特异性指数对基因进行分类。在分别计算所有数据集中每个基因的tau后,使用一种新的有意义的程序估计与最大表达水平的统计距离。将tau和统计距离估计相结合,计算基因在一个或多个组织中的特异性表达,称为扩展tau法。对96种不同人体组织获得的组织特异性基因进行了功能注释,并进行了一些比较,以证明扩展tau方法的有效性。结果与讨论:对大量组织/器官进行了基于表达水平的基因分类和组织特异性基因鉴定。通过产生扩展tau方法,基因成功地分配到多个组织,而不是原始的tau评分,它只能将组织特异性分配到单个组织。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Biodata Mining
Biodata Mining MATHEMATICAL & COMPUTATIONAL BIOLOGY-
CiteScore
7.90
自引率
0.00%
发文量
28
审稿时长
23 weeks
期刊介绍: BioData Mining is an open access, open peer-reviewed journal encompassing research on all aspects of data mining applied to high-dimensional biological and biomedical data, focusing on computational aspects of knowledge discovery from large-scale genetic, transcriptomic, genomic, proteomic, and metabolomic data. Topical areas include, but are not limited to: -Development, evaluation, and application of novel data mining and machine learning algorithms. -Adaptation, evaluation, and application of traditional data mining and machine learning algorithms. -Open-source software for the application of data mining and machine learning algorithms. -Design, development and integration of databases, software and web services for the storage, management, retrieval, and analysis of data from large scale studies. -Pre-processing, post-processing, modeling, and interpretation of data mining and machine learning results for biological interpretation and knowledge discovery.
期刊最新文献
Deep joint learning diagnosis of Alzheimer's disease based on multimodal feature fusion. Modeling heterogeneity of Sudanese hospital stay in neonatal and maternal unit: non-parametric random effect models with Gamma distribution. Ensemble feature selection and tabular data augmentation with generative adversarial networks to enhance cutaneous melanoma identification and interpretability. Priority-Elastic net for binary disease outcome prediction based on multi-omics data. A regularized Cox hierarchical model for incorporating annotation information in predictive omic studies.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1