Patterns of Unwanted Biological and Technical Expression Variation Among 49 Human Tissues

IF 5.1 2区 医学 Q1 MEDICINE, RESEARCH & EXPERIMENTAL Laboratory Investigation Pub Date : 2024-04-24 DOI:10.1016/j.labinv.2024.102069
Tim O. Nieuwenhuis , Hunter H. Giles , Jeremy V.A. Arking , Arun H. Patil , Wen Shi , Matthew N. McCall , Marc K. Halushka
{"title":"Patterns of Unwanted Biological and Technical Expression Variation Among 49 Human Tissues","authors":"Tim O. Nieuwenhuis ,&nbsp;Hunter H. Giles ,&nbsp;Jeremy V.A. Arking ,&nbsp;Arun H. Patil ,&nbsp;Wen Shi ,&nbsp;Matthew N. McCall ,&nbsp;Marc K. Halushka","doi":"10.1016/j.labinv.2024.102069","DOIUrl":null,"url":null,"abstract":"<div><p>Tissue gene expression studies are impacted by biological and technical sources of variation, which can be broadly classified into wanted and unwanted variation. The latter, if not addressed, results in misleading biological conclusions. Methods have been proposed to reduce unwanted variation, such as normalization and batch correction. A more accurate understanding of all causes of variation could significantly improve the ability of these methods to remove unwanted variation while retaining variation corresponding to the biological question of interest. We used 17,282 samples from 49 human tissues in the Genotype-Tissue Expression data set (v8) to investigate patterns and causes of expression variation. Transcript expression was transformed to z-scores, and only the most variable 2% of transcripts were evaluated and clustered based on coexpression patterns. Clustered gene sets were assigned to different biological or technical causes based on histologic appearances and metadata elements. We identified 522 variable transcript clusters (median: 11 per tissue) among the samples. Of these, 63% were confidently explained, 16% were likely explained, 7% were low confidence explanations, and 14% had no clear cause. Histologic analysis annotated 46 clusters. Other common causes of variability included sex, sequencing contamination, immunoglobulin diversity, and compositional tissue differences. Less common biological causes included death interval (Hardy score), disease status, and age. Technical causes included blood draw timing and harvesting differences. Many of the causes of variation in bulk tissue expression were identifiable in the Tabula Sapiens data set of single-cell expression. This is among the largest explorations of the underlying sources of tissue expression variation. It uncovered expected and unexpected causes of variable gene expression and demonstrated the utility of matched histologic specimens. It further demonstrated the value of acquiring meaningful tissue harvesting metadata elements to use for improved normalization, batch correction, and analysis of both bulk and single-cell RNA-seq data.</p></div>","PeriodicalId":17930,"journal":{"name":"Laboratory Investigation","volume":"104 6","pages":"Article 102069"},"PeriodicalIF":5.1000,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0023683724017471/pdfft?md5=e1c5e83b5c111e0ec6eaf22031e5e112&pid=1-s2.0-S0023683724017471-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Laboratory Investigation","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0023683724017471","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

Abstract

Tissue gene expression studies are impacted by biological and technical sources of variation, which can be broadly classified into wanted and unwanted variation. The latter, if not addressed, results in misleading biological conclusions. Methods have been proposed to reduce unwanted variation, such as normalization and batch correction. A more accurate understanding of all causes of variation could significantly improve the ability of these methods to remove unwanted variation while retaining variation corresponding to the biological question of interest. We used 17,282 samples from 49 human tissues in the Genotype-Tissue Expression data set (v8) to investigate patterns and causes of expression variation. Transcript expression was transformed to z-scores, and only the most variable 2% of transcripts were evaluated and clustered based on coexpression patterns. Clustered gene sets were assigned to different biological or technical causes based on histologic appearances and metadata elements. We identified 522 variable transcript clusters (median: 11 per tissue) among the samples. Of these, 63% were confidently explained, 16% were likely explained, 7% were low confidence explanations, and 14% had no clear cause. Histologic analysis annotated 46 clusters. Other common causes of variability included sex, sequencing contamination, immunoglobulin diversity, and compositional tissue differences. Less common biological causes included death interval (Hardy score), disease status, and age. Technical causes included blood draw timing and harvesting differences. Many of the causes of variation in bulk tissue expression were identifiable in the Tabula Sapiens data set of single-cell expression. This is among the largest explorations of the underlying sources of tissue expression variation. It uncovered expected and unexpected causes of variable gene expression and demonstrated the utility of matched histologic specimens. It further demonstrated the value of acquiring meaningful tissue harvesting metadata elements to use for improved normalization, batch correction, and analysis of both bulk and single-cell RNA-seq data.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
49 种人体组织中不需要的生物和技术表达变异模式
组织基因表达研究受到生物和技术变异来源的影响,这些变异可大致分为想要的变异和不想要的变异。后者如果不加以解决,会导致误导性的生物学结论。目前已提出了一些方法来减少不必要的变异,如归一化和批次校正。如果能更准确地了解造成变异的所有原因,就能大大提高这些方法去除不必要变异的能力,同时保留与感兴趣的生物学问题相对应的变异。我们使用基因型-组织表达数据集(v8)中 49 个人体组织的 17282 个样本来研究表达变异的模式和原因。转录本表达被转化为 z 分数,只有变化最大的 2% 的转录本被评估,并根据共表达模式进行聚类。根据组织学外观和元数据元素,将聚类基因组归入不同的生物学或技术原因。我们在样本中发现了 522 个可变转录本簇(中位数:每个组织 11 个)。其中,63%有把握解释,16%可能解释,7%可信度低,14%无明确原因。组织学分析注释了 46 个群组。其他常见的变异原因包括性别、测序污染、免疫球蛋白多样性和组织成分差异。较少见的生物学原因包括死亡间隔(哈代评分)、疾病状态和年龄。技术原因包括抽血时间和采血差异。在单细胞表达的 Tabula Sapiens 数据集中,可以识别出造成大量组织表达差异的许多原因。这是对组织表达变异潜在来源的最大规模探索之一。它揭示了基因表达变异的预期和意外原因,并证明了匹配组织学标本的实用性。它进一步证明了获取有意义的组织采集元数据元素的价值,以用于改进批量和单细胞 RNA-seq 数据的归一化、批量校正和分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Laboratory Investigation
Laboratory Investigation 医学-病理学
CiteScore
8.30
自引率
0.00%
发文量
125
审稿时长
2 months
期刊介绍: Laboratory Investigation is an international journal owned by the United States and Canadian Academy of Pathology. Laboratory Investigation offers prompt publication of high-quality original research in all biomedical disciplines relating to the understanding of human disease and the application of new methods to the diagnosis of disease. Both human and experimental studies are welcome.
期刊最新文献
CD248 cleaved form in human colorectal cancer stroma: implications for tumor behavior and prognosis. Lymph node metastasis prediction from in-situ lung squamous cell carcinoma histopathology images using deep learning. Spatial lipidomics reveals myelin defects and pro-tumor macrophage infiltration in MPNST adjacent nerves. SWI/SNF deficient tumors - morphology, immunophenotype, genetics, epigenetics, nosology and therapy. Genomic landscape of superficial malignant peripheral nerve sheath tumor.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1