{"title":"STimage-1K4M:用于空间转录组学的组织病理学图像-基因表达数据集","authors":"Jiawen Chen, Muqing Zhou, Wenrong Wu, Jinwei Zhang, Yun Li, Didong Li","doi":"arxiv-2406.06393","DOIUrl":null,"url":null,"abstract":"Recent advances in multi-modal algorithms have driven and been driven by the\nincreasing availability of large image-text datasets, leading to significant\nstrides in various fields, including computational pathology. However, in most\nexisting medical image-text datasets, the text typically provides high-level\nsummaries that may not sufficiently describe sub-tile regions within a large\npathology image. For example, an image might cover an extensive tissue area\ncontaining cancerous and healthy regions, but the accompanying text might only\nspecify that this image is a cancer slide, lacking the nuanced details needed\nfor in-depth analysis. In this study, we introduce STimage-1K4M, a novel\ndataset designed to bridge this gap by providing genomic features for sub-tile\nimages. STimage-1K4M contains 1,149 images derived from spatial transcriptomics\ndata, which captures gene expression information at the level of individual\nspatial spots within a pathology image. Specifically, each image in the dataset\nis broken down into smaller sub-image tiles, with each tile paired with\n15,000-30,000 dimensional gene expressions. With 4,293,195 pairs of sub-tile\nimages and gene expressions, STimage-1K4M offers unprecedented granularity,\npaving the way for a wide range of advanced research in multi-modal data\nanalysis an innovative applications in computational pathology, and beyond.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"14 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"STimage-1K4M: A histopathology image-gene expression dataset for spatial transcriptomics\",\"authors\":\"Jiawen Chen, Muqing Zhou, Wenrong Wu, Jinwei Zhang, Yun Li, Didong Li\",\"doi\":\"arxiv-2406.06393\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advances in multi-modal algorithms have driven and been driven by the\\nincreasing availability of large image-text datasets, leading to significant\\nstrides in various fields, including computational pathology. However, in most\\nexisting medical image-text datasets, the text typically provides high-level\\nsummaries that may not sufficiently describe sub-tile regions within a large\\npathology image. For example, an image might cover an extensive tissue area\\ncontaining cancerous and healthy regions, but the accompanying text might only\\nspecify that this image is a cancer slide, lacking the nuanced details needed\\nfor in-depth analysis. In this study, we introduce STimage-1K4M, a novel\\ndataset designed to bridge this gap by providing genomic features for sub-tile\\nimages. STimage-1K4M contains 1,149 images derived from spatial transcriptomics\\ndata, which captures gene expression information at the level of individual\\nspatial spots within a pathology image. Specifically, each image in the dataset\\nis broken down into smaller sub-image tiles, with each tile paired with\\n15,000-30,000 dimensional gene expressions. With 4,293,195 pairs of sub-tile\\nimages and gene expressions, STimage-1K4M offers unprecedented granularity,\\npaving the way for a wide range of advanced research in multi-modal data\\nanalysis an innovative applications in computational pathology, and beyond.\",\"PeriodicalId\":501070,\"journal\":{\"name\":\"arXiv - QuanBio - Genomics\",\"volume\":\"14 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Genomics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2406.06393\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.06393","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
STimage-1K4M: A histopathology image-gene expression dataset for spatial transcriptomics
Recent advances in multi-modal algorithms have driven and been driven by the
increasing availability of large image-text datasets, leading to significant
strides in various fields, including computational pathology. However, in most
existing medical image-text datasets, the text typically provides high-level
summaries that may not sufficiently describe sub-tile regions within a large
pathology image. For example, an image might cover an extensive tissue area
containing cancerous and healthy regions, but the accompanying text might only
specify that this image is a cancer slide, lacking the nuanced details needed
for in-depth analysis. In this study, we introduce STimage-1K4M, a novel
dataset designed to bridge this gap by providing genomic features for sub-tile
images. STimage-1K4M contains 1,149 images derived from spatial transcriptomics
data, which captures gene expression information at the level of individual
spatial spots within a pathology image. Specifically, each image in the dataset
is broken down into smaller sub-image tiles, with each tile paired with
15,000-30,000 dimensional gene expressions. With 4,293,195 pairs of sub-tile
images and gene expressions, STimage-1K4M offers unprecedented granularity,
paving the way for a wide range of advanced research in multi-modal data
analysis an innovative applications in computational pathology, and beyond.