STimage-1K4M: A histopathology image-gene expression dataset for spatial transcriptomics

arXiv - QuanBio - Genomics Pub Date : 2024-06-10 DOI:arxiv-2406.06393

Jiawen Chen, Muqing Zhou, Wenrong Wu, Jinwei Zhang, Yun Li, Didong Li

{"title":"STimage-1K4M: A histopathology image-gene expression dataset for spatial transcriptomics","authors":"Jiawen Chen, Muqing Zhou, Wenrong Wu, Jinwei Zhang, Yun Li, Didong Li","doi":"arxiv-2406.06393","DOIUrl":null,"url":null,"abstract":"Recent advances in multi-modal algorithms have driven and been driven by the\nincreasing availability of large image-text datasets, leading to significant\nstrides in various fields, including computational pathology. However, in most\nexisting medical image-text datasets, the text typically provides high-level\nsummaries that may not sufficiently describe sub-tile regions within a large\npathology image. For example, an image might cover an extensive tissue area\ncontaining cancerous and healthy regions, but the accompanying text might only\nspecify that this image is a cancer slide, lacking the nuanced details needed\nfor in-depth analysis. In this study, we introduce STimage-1K4M, a novel\ndataset designed to bridge this gap by providing genomic features for sub-tile\nimages. STimage-1K4M contains 1,149 images derived from spatial transcriptomics\ndata, which captures gene expression information at the level of individual\nspatial spots within a pathology image. Specifically, each image in the dataset\nis broken down into smaller sub-image tiles, with each tile paired with\n15,000-30,000 dimensional gene expressions. With 4,293,195 pairs of sub-tile\nimages and gene expressions, STimage-1K4M offers unprecedented granularity,\npaving the way for a wide range of advanced research in multi-modal data\nanalysis an innovative applications in computational pathology, and beyond.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"14 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.06393","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advances in multi-modal algorithms have driven and been driven by the increasing availability of large image-text datasets, leading to significant strides in various fields, including computational pathology. However, in most existing medical image-text datasets, the text typically provides high-level summaries that may not sufficiently describe sub-tile regions within a large pathology image. For example, an image might cover an extensive tissue area containing cancerous and healthy regions, but the accompanying text might only specify that this image is a cancer slide, lacking the nuanced details needed for in-depth analysis. In this study, we introduce STimage-1K4M, a novel dataset designed to bridge this gap by providing genomic features for sub-tile images. STimage-1K4M contains 1,149 images derived from spatial transcriptomics data, which captures gene expression information at the level of individual spatial spots within a pathology image. Specifically, each image in the dataset is broken down into smaller sub-image tiles, with each tile paired with 15,000-30,000 dimensional gene expressions. With 4,293,195 pairs of sub-tile images and gene expressions, STimage-1K4M offers unprecedented granularity, paving the way for a wide range of advanced research in multi-modal data analysis an innovative applications in computational pathology, and beyond.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

STimage-1K4M：用于空间转录组学的组织病理学图像-基因表达数据集

大型图像-文本数据集的可用性不断提高，推动了多模态算法的最新进展，使包括计算病理学在内的各个领域都取得了长足进步。然而，在大多数现有的医学图像-文本数据集中，文本通常提供高水平的摘要，而这些摘要可能无法充分描述大型病理图像中的细分区域。例如，一幅图像可能覆盖了一个包含癌变和健康区域的大范围组织区域，但随附的文本可能只说明这幅图像是癌症切片，缺乏深入分析所需的细微细节。在本研究中，我们介绍了 STimage-1K4M，这是一个新数据集，旨在通过提供子平分图像的基因组特征来弥补这一差距。STimage-1K4M 包含 1,149 张源自空间转录组学数据的图像，该数据捕捉病理图像中单个空间点水平的基因表达信息。具体来说，数据集中的每张图像都被分解成更小的子图像瓦片，每个瓦片配对 15,000-30,000 个维度的基因表达。STimage-1K4M 拥有 4,293,195 对子瓦片图像和基因表达，提供了前所未有的粒度，为多模态数据分析的广泛高级研究和计算病理学等领域的创新应用铺平了道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - QuanBio - Genomics

自引率

0.00%

发文量