Daigo Okada, Jianshen Zhu, Kan Shota, Yuuki Nishimura, Kazuya Haraguchi
{"title":"Data mining method of single-cell omics data to evaluate a pure tissue environmental effect on gene expression level","authors":"Daigo Okada, Jianshen Zhu, Kan Shota, Yuuki Nishimura, Kazuya Haraguchi","doi":"arxiv-2406.06969","DOIUrl":null,"url":null,"abstract":"While single-cell RNA-seq enables the investigation of the celltype effect on\nthe transcriptome, the pure tissue environmental effect has not been well\ninvestigated. The bias in the combination of tissue and celltype in the body\nmade it difficult to evaluate the effect of pure tissue environment by omics\ndata mining. It is important to prevent statistical confounding among discrete\nvariables such as celltype, tissue, and other categorical variables when\nevaluating the effects of these variables. We propose a novel method to\nenumerate suitable analysis units of variables for estimating the effects of\ntissue environment by extending the maximal biclique enumeration problem for\nbipartite graphs to $k$-partite hypergraphs. We applied the proposed method to\na large mouse single-cell transcriptome dataset of Tabala Muris Senis to\nevaluate pure tissue environmental effects on gene expression. Data Mining\nusing the proposed method revealed pure tissue environment effects on gene\nexpression and its age-related change among adipose sub-tissues. The method\nproposed in this study helps evaluations of the effects of discrete variables\nin exploratory data mining of large-scale genomics datasets.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"50 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.06969","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
While single-cell RNA-seq enables the investigation of the celltype effect on
the transcriptome, the pure tissue environmental effect has not been well
investigated. The bias in the combination of tissue and celltype in the body
made it difficult to evaluate the effect of pure tissue environment by omics
data mining. It is important to prevent statistical confounding among discrete
variables such as celltype, tissue, and other categorical variables when
evaluating the effects of these variables. We propose a novel method to
enumerate suitable analysis units of variables for estimating the effects of
tissue environment by extending the maximal biclique enumeration problem for
bipartite graphs to $k$-partite hypergraphs. We applied the proposed method to
a large mouse single-cell transcriptome dataset of Tabala Muris Senis to
evaluate pure tissue environmental effects on gene expression. Data Mining
using the proposed method revealed pure tissue environment effects on gene
expression and its age-related change among adipose sub-tissues. The method
proposed in this study helps evaluations of the effects of discrete variables
in exploratory data mining of large-scale genomics datasets.