Pub Date : 2025-02-15DOI: 10.1038/s41597-025-04606-8
Yang Guo, Zhaoshan Zhong, Nannan Zhang, Minxiao Wang, Chaolun Li
Lucinidae, renowned as the most diverse chemosymbiotic invertebrate group, functions as a sulfide cleaner in coastal ecosystems and is thus ecologically important. Despite their significance, genomic studies on these organisms have been limited. Here, we present the chromosome-level genome assembly of Indoaustriella scarlatoi, an intertidal lucinid clam. Employing both short and long reads, and Hi-C sequencing, we assembled a 1.58 Gb genome comprising 690 contigs with a contig N50 length of 9.00 Mb, which were anchored to 17 chromosomes. The genome exhibits a high completeness of 95.4%, as assessed by the BUSCO analysis. Transposable elements account for 56.02% of the genome, with long terminal repeat retrotransposons (LTR, 42.66%) being the most abundant. We identified 34,469 protein-coding genes, 74.43% of which were functionally annotated. This high-quality genome assembly serves as a valuable resource for further studies on the evolutionary and ecological aspects of chemosymbiotic bivalves.
{"title":"Chromosome-level genome assembly of the intertidal lucinid clam Indoaustriella scarlatoi.","authors":"Yang Guo, Zhaoshan Zhong, Nannan Zhang, Minxiao Wang, Chaolun Li","doi":"10.1038/s41597-025-04606-8","DOIUrl":"https://doi.org/10.1038/s41597-025-04606-8","url":null,"abstract":"<p><p>Lucinidae, renowned as the most diverse chemosymbiotic invertebrate group, functions as a sulfide cleaner in coastal ecosystems and is thus ecologically important. Despite their significance, genomic studies on these organisms have been limited. Here, we present the chromosome-level genome assembly of Indoaustriella scarlatoi, an intertidal lucinid clam. Employing both short and long reads, and Hi-C sequencing, we assembled a 1.58 Gb genome comprising 690 contigs with a contig N50 length of 9.00 Mb, which were anchored to 17 chromosomes. The genome exhibits a high completeness of 95.4%, as assessed by the BUSCO analysis. Transposable elements account for 56.02% of the genome, with long terminal repeat retrotransposons (LTR, 42.66%) being the most abundant. We identified 34,469 protein-coding genes, 74.43% of which were functionally annotated. This high-quality genome assembly serves as a valuable resource for further studies on the evolutionary and ecological aspects of chemosymbiotic bivalves.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"275"},"PeriodicalIF":5.8,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143425767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-15DOI: 10.1038/s41597-025-04597-6
Anabel García-Heredia, Luna Guerra-Núñez, Paula Martín-Climent, Estefanía Rojas, Raúl López-Domínguez, Clara Alcántara-Domínguez, Cristina Alenda, Luis M Valor
The access of public omics-based datasets is of paramount importance in brain cancer research as allows the proposal and validation of both biomarkers and therapeutic targets in gliomas, especially in the most prevalent and aggressive glioblastomas. Taking profit of current advances in next generation sequencing and DNA methylation profiling, we have created datasets from approximately 150 formalin-fixed paraffin embedded (FFPE) tumours. These datasets enable for the first time integrative transcriptional and epigenetics studies in a context that consider the degradation and fixation-derived chemical alterations of the most extended archiving format in hospitals, and provide an independent cohort from current public databases for further validation of putative novel biomarkers. Alongside with the most profusely known glioblastomas, astrocytomas and oligodendrogliomas, we have also included for comparison purposes few examples of rare tumours that are often neglected in brain cancer research. Taken together, we provide a valuable tool to explore combined gene expression and DNA methylation patterns in the study of gliomas and glioneuronal tumours.
{"title":"Transcriptomics and epigenomics datasets of primary brain cancers in formalin-fixed paraffin embedded format.","authors":"Anabel García-Heredia, Luna Guerra-Núñez, Paula Martín-Climent, Estefanía Rojas, Raúl López-Domínguez, Clara Alcántara-Domínguez, Cristina Alenda, Luis M Valor","doi":"10.1038/s41597-025-04597-6","DOIUrl":"https://doi.org/10.1038/s41597-025-04597-6","url":null,"abstract":"<p><p>The access of public omics-based datasets is of paramount importance in brain cancer research as allows the proposal and validation of both biomarkers and therapeutic targets in gliomas, especially in the most prevalent and aggressive glioblastomas. Taking profit of current advances in next generation sequencing and DNA methylation profiling, we have created datasets from approximately 150 formalin-fixed paraffin embedded (FFPE) tumours. These datasets enable for the first time integrative transcriptional and epigenetics studies in a context that consider the degradation and fixation-derived chemical alterations of the most extended archiving format in hospitals, and provide an independent cohort from current public databases for further validation of putative novel biomarkers. Alongside with the most profusely known glioblastomas, astrocytomas and oligodendrogliomas, we have also included for comparison purposes few examples of rare tumours that are often neglected in brain cancer research. Taken together, we provide a valuable tool to explore combined gene expression and DNA methylation patterns in the study of gliomas and glioneuronal tumours.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"273"},"PeriodicalIF":5.8,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143425932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-15DOI: 10.1038/s41597-025-04554-3
Mulham Fawakherji, Jeffrey Blay, Matilda Anokye, Leila Hashemi-Beni, Jennifer Dorton
Rapid and accurate assessment of flood extent is important for effective disaster response, mitigation planning, and resource allocation. Traditional flood mapping methods encounter challenges in scalability and transferability. However, the emergence of deep learning, particularly convolutional neural networks (CNNs), revolutionizes flood mapping by autonomously learning intricate spatial patterns and semantic features directly from raw data. DeepFlood is introduced to address the essential requirement for high-quality training datasets. This is a novel dataset comprising high-resolution manned and unmanned aerial imagery and Synthetic Aperture Radar (SAR) imagery, enriched with detailed labels including inundated vegetation, one of the most challenging areas for flood mapping. DeepFlood enables multi-modal flood mapping approaches and mitigates limitations in existing datasets by providing comprehensive annotations and diverse landscape coverage. We evaluate several semantic segmentation architectures on DeepFlood, demonstrating its usability and efficacy in post-disaster flood mapping scenarios.
{"title":"DeepFlood for Inundated Vegetation High-Resolution Dataset for Accurate Flood Mapping and Segmentation.","authors":"Mulham Fawakherji, Jeffrey Blay, Matilda Anokye, Leila Hashemi-Beni, Jennifer Dorton","doi":"10.1038/s41597-025-04554-3","DOIUrl":"https://doi.org/10.1038/s41597-025-04554-3","url":null,"abstract":"<p><p>Rapid and accurate assessment of flood extent is important for effective disaster response, mitigation planning, and resource allocation. Traditional flood mapping methods encounter challenges in scalability and transferability. However, the emergence of deep learning, particularly convolutional neural networks (CNNs), revolutionizes flood mapping by autonomously learning intricate spatial patterns and semantic features directly from raw data. DeepFlood is introduced to address the essential requirement for high-quality training datasets. This is a novel dataset comprising high-resolution manned and unmanned aerial imagery and Synthetic Aperture Radar (SAR) imagery, enriched with detailed labels including inundated vegetation, one of the most challenging areas for flood mapping. DeepFlood enables multi-modal flood mapping approaches and mitigates limitations in existing datasets by providing comprehensive annotations and diverse landscape coverage. We evaluate several semantic segmentation architectures on DeepFlood, demonstrating its usability and efficacy in post-disaster flood mapping scenarios.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"271"},"PeriodicalIF":5.8,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143425711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rosa hugonis is widely distributed in the Hengduan Mountains, Qinling Mountains, and northern China. It is an important candidate species for ecological restoration, given its good adaptability. Here, we present the first high-quality chromosome-level assembly of R. hugonis based on HiFi reads and Hi-C data. The sequencing data were then assembled onto seven pseudochromosomes of R. hugonis. The genome sizes of R. hugonis is 337.92 Mb, with contig N50 length of 26.84 Mb. We annotated 36,218 protein-coding genes in R. hugonis. In summary, the high-quality genome sequences of R. hugonis provide a genetic roadmap for the study of its genetics and species relationships. This will facilitate future genomic comparative studies across more species within Rosa.
{"title":"A chromosomal-scale reference genome for Rosa hugonis.","authors":"Zhenlong Liang, Jia Miao, Hengning Deng, Ruifang Jiao, Liangying Li, Shiqi Li, Zhongyu Tang, Jian Ru, Xinfen Gao","doi":"10.1038/s41597-025-04526-7","DOIUrl":"https://doi.org/10.1038/s41597-025-04526-7","url":null,"abstract":"<p><p>Rosa hugonis is widely distributed in the Hengduan Mountains, Qinling Mountains, and northern China. It is an important candidate species for ecological restoration, given its good adaptability. Here, we present the first high-quality chromosome-level assembly of R. hugonis based on HiFi reads and Hi-C data. The sequencing data were then assembled onto seven pseudochromosomes of R. hugonis. The genome sizes of R. hugonis is 337.92 Mb, with contig N50 length of 26.84 Mb. We annotated 36,218 protein-coding genes in R. hugonis. In summary, the high-quality genome sequences of R. hugonis provide a genetic roadmap for the study of its genetics and species relationships. This will facilitate future genomic comparative studies across more species within Rosa.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"272"},"PeriodicalIF":5.8,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143425759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-15DOI: 10.1038/s41597-025-04537-4
Anna Sofia Lippolis, Giorgia Lodi, Andrea Giovanni Nuzzolese
Global sustainability challenges have recently led to an increasing interest in the management of water and health resources. Thus, the availability of effective, meaningful and open data is crucial to address those issues in the broader context of the Sustainable Development Goals of clean water and sanitation as targeted by the United Nations. In this paper, we present the Water Health Open Knowledge Graph (WHOW-KG) along with its design methodology and analysis on impact. Developed in the context of the EU-funded WHOW (Water Health Open Knowledge) project, the WHOW-KG is a semantic knowledge graph that models data on water consumption, pollution, extreme weather events, infectious disease rates and drug distribution. Indeed, it aims at supporting a wide range of applications: from knowledge discovery to decision-making, making it a valuable resource for researchers, policymakers, and practitioners in the water and health domains. The WHOW-KG consists of a network of five ontologies and related linked open data, modelled according to those ontologies. As a fully distributed system, it is sustainable over time, can handle large datasets, and allows data providers full control, establishing it as a vital European asset in the fields of water consumption and pollution.
{"title":"The Water Health Open Knowledge Graph.","authors":"Anna Sofia Lippolis, Giorgia Lodi, Andrea Giovanni Nuzzolese","doi":"10.1038/s41597-025-04537-4","DOIUrl":"https://doi.org/10.1038/s41597-025-04537-4","url":null,"abstract":"<p><p>Global sustainability challenges have recently led to an increasing interest in the management of water and health resources. Thus, the availability of effective, meaningful and open data is crucial to address those issues in the broader context of the Sustainable Development Goals of clean water and sanitation as targeted by the United Nations. In this paper, we present the Water Health Open Knowledge Graph (WHOW-KG) along with its design methodology and analysis on impact. Developed in the context of the EU-funded WHOW (Water Health Open Knowledge) project, the WHOW-KG is a semantic knowledge graph that models data on water consumption, pollution, extreme weather events, infectious disease rates and drug distribution. Indeed, it aims at supporting a wide range of applications: from knowledge discovery to decision-making, making it a valuable resource for researchers, policymakers, and practitioners in the water and health domains. The WHOW-KG consists of a network of five ontologies and related linked open data, modelled according to those ontologies. As a fully distributed system, it is sustainable over time, can handle large datasets, and allows data providers full control, establishing it as a vital European asset in the fields of water consumption and pollution.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"274"},"PeriodicalIF":5.8,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143425928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The rapid development of perovskite solar devices has led to a rising number of publications over the past decade. As a result, a project aiming to compile all published device data was initiated in 2022. However, with its method of manual data collection, one of the project's hurdles is encouraging the participation of the perovskite community to spend time and effort in inputting new device data. To ensure the project's sustainability, adequate participation is necessary but is challenging to achieve. In response to this, we propose the utilization of natural language processing algorithms to extract various attributes of perovskite solar devices from journal articles. When data collection is performed by programs instead of humans, the lack of community participation can be overcome. For each device, the identifying device information, intrinsic device data, extrinsic cell definition, and the details of the fabrication procedure were extracted. A total of 30 attributes from 3164 journal articles were compiled, with an average accuracy of 0.899. The dataset and source code are made publicly available.
{"title":"Auto-generating a database on the fabrication details of perovskite solar devices.","authors":"Agnes Valencia, Fei Liu, Xiangyang Zhang, Xiangkun Bo, Weilu Li, Walid A Daoud","doi":"10.1038/s41597-025-04566-z","DOIUrl":"https://doi.org/10.1038/s41597-025-04566-z","url":null,"abstract":"<p><p>The rapid development of perovskite solar devices has led to a rising number of publications over the past decade. As a result, a project aiming to compile all published device data was initiated in 2022. However, with its method of manual data collection, one of the project's hurdles is encouraging the participation of the perovskite community to spend time and effort in inputting new device data. To ensure the project's sustainability, adequate participation is necessary but is challenging to achieve. In response to this, we propose the utilization of natural language processing algorithms to extract various attributes of perovskite solar devices from journal articles. When data collection is performed by programs instead of humans, the lack of community participation can be overcome. For each device, the identifying device information, intrinsic device data, extrinsic cell definition, and the details of the fabrication procedure were extracted. A total of 30 attributes from 3164 journal articles were compiled, with an average accuracy of 0.899. The dataset and source code are made publicly available.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"270"},"PeriodicalIF":5.8,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143425764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-14DOI: 10.1038/s41597-025-04532-9
Chenhui Shen, Guofeng Yang, Min Tang, Xiaofei Li, Li Zhu, Wei Li, Lin Jin, Pan Deng, Huanhuan Zhang, Qing Zhai, Gang Wu, Xiaohong Yan
Mylabris sibirica is a hypermetamorphic insect that primarily feeds on oilseed rape during the adult stage. However, the limited availability of genomic resources hinders our understanding of the gene function, medical use, and ecological adaptation in M. sibirica. Here, a high-quality chromosome-level genome of M. sibirica was generated by PacBio, Illumina, and Hi-C technologies. Its genome size was 138.45 Mb, with a scaffold N50 of 13.84 Mb and 99.85% (138.25 Mb) of the assembly anchors onto 10 pseudo-chromosomes. BUSCO analysis showed this genome assembly had a high-level completeness of 100% (n = 1,367), containing 1,358 (99.4%) single-copy BUSCOs and 8 (0.6%) duplicated BUSCOs. In addition, a total of 11,687 protein-coding genes and 35.46% (49.10 Mb) repetitive elements were identified. The high-quality genome assembly offers valuable genomic resources for exploring gene function, medical use, and ecology.
{"title":"A chromosome-level genome assembly of Mylabris sibirica Fischer von Waldheim, 1823 (Coleoptera, Meloidae).","authors":"Chenhui Shen, Guofeng Yang, Min Tang, Xiaofei Li, Li Zhu, Wei Li, Lin Jin, Pan Deng, Huanhuan Zhang, Qing Zhai, Gang Wu, Xiaohong Yan","doi":"10.1038/s41597-025-04532-9","DOIUrl":"https://doi.org/10.1038/s41597-025-04532-9","url":null,"abstract":"<p><p>Mylabris sibirica is a hypermetamorphic insect that primarily feeds on oilseed rape during the adult stage. However, the limited availability of genomic resources hinders our understanding of the gene function, medical use, and ecological adaptation in M. sibirica. Here, a high-quality chromosome-level genome of M. sibirica was generated by PacBio, Illumina, and Hi-C technologies. Its genome size was 138.45 Mb, with a scaffold N50 of 13.84 Mb and 99.85% (138.25 Mb) of the assembly anchors onto 10 pseudo-chromosomes. BUSCO analysis showed this genome assembly had a high-level completeness of 100% (n = 1,367), containing 1,358 (99.4%) single-copy BUSCOs and 8 (0.6%) duplicated BUSCOs. In addition, a total of 11,687 protein-coding genes and 35.46% (49.10 Mb) repetitive elements were identified. The high-quality genome assembly offers valuable genomic resources for exploring gene function, medical use, and ecology.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"269"},"PeriodicalIF":5.8,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143425762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-14DOI: 10.1038/s41597-025-04522-x
Mathilde Resell, Hanne-Line Rabben, Animesh Sharma, Lars Hagen, Linh Hoang, Nan T Skogaker, Anne Aarvik, Eirik Knudsen Bjåstad, Magnus K Svensson, Manoj Amrutkar, Caroline S Verbeke, Surinder K Batra, Gunnar Qvigstad, Timothy C Wang, Anil Rustgi, Duan Chen, Chun-Mei Zhao
Pancreatic ductal adenocarcinoma (PDAC) remains one of the most lethal malignancies, with a five-year survival rate of 10-15% due to late-stage diagnosis and limited efficacy of existing treatments. This study utilized proteomics-based systems modelling to generate multimodal datasets from various research models, including PDAC cells, spheroids, organoids, and tissues derived from murine and human samples. Identical mass spectrometry-based proteomics was applied across the different models. The preparation and validation of the research models and the proteomics were described in detail. The assembly datasets we present here contribute to the data collection on PDAC, which will be useful for systems modelling, data mining, knowledge discovery in databases, and bioinformatics of individual models. Further data analysis may lead to the generation of research hypotheses, predictions of targets for diagnosis and treatment, and relationships between data variables.
{"title":"Proteomics profiling of research models for studying pancreatic ductal adenocarcinoma.","authors":"Mathilde Resell, Hanne-Line Rabben, Animesh Sharma, Lars Hagen, Linh Hoang, Nan T Skogaker, Anne Aarvik, Eirik Knudsen Bjåstad, Magnus K Svensson, Manoj Amrutkar, Caroline S Verbeke, Surinder K Batra, Gunnar Qvigstad, Timothy C Wang, Anil Rustgi, Duan Chen, Chun-Mei Zhao","doi":"10.1038/s41597-025-04522-x","DOIUrl":"https://doi.org/10.1038/s41597-025-04522-x","url":null,"abstract":"<p><p>Pancreatic ductal adenocarcinoma (PDAC) remains one of the most lethal malignancies, with a five-year survival rate of 10-15% due to late-stage diagnosis and limited efficacy of existing treatments. This study utilized proteomics-based systems modelling to generate multimodal datasets from various research models, including PDAC cells, spheroids, organoids, and tissues derived from murine and human samples. Identical mass spectrometry-based proteomics was applied across the different models. The preparation and validation of the research models and the proteomics were described in detail. The assembly datasets we present here contribute to the data collection on PDAC, which will be useful for systems modelling, data mining, knowledge discovery in databases, and bioinformatics of individual models. Further data analysis may lead to the generation of research hypotheses, predictions of targets for diagnosis and treatment, and relationships between data variables.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"266"},"PeriodicalIF":5.8,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143425925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-14DOI: 10.1038/s41597-025-04528-5
Alice Tian, Sangbae Kim, Hasna Baidouri, Jin Li, Xuesen Cheng, Janice Vranka, Yumei Li, Rui Chen, VijayKrishna Raghunathan
The trabecular meshwork within the outflow apparatus is critical in maintaining intraocular pressure homeostasis. In vitro studies employing primary cell cultures of the human trabecular meshwork (hTM) have conventionally served as surrogates for investigating the pathobiology of TM dysfunction. Despite its abundant use, translation of outcomes from in vitro studies to ex vivo and/or in vivo studies remains a challenge. Given the cell heterogeneity, performing single-cell RNA sequencing comparing primary hTM cell cultures to hTM tissue may provide important insights on cellular identity and translatability, as such an approach has not been reported before. In this study, we assembled a total of 14 primary hTM in vitro samples across passages 1-4, including 4 samples from individuals diagnosed with glaucoma. This dataset offers a comprehensive transcriptomic resource of primary hTM in vitro scRNA-seq data to study global changes in gene expression in comparison to cells in tissue in situ. We have performed extensive preprocessing and quality control, allowing the research community to access and utilize this public resource.
{"title":"Divergence in cellular markers observed in single-cell transcriptomics datasets between cultured primary trabecular meshwork cells and tissues.","authors":"Alice Tian, Sangbae Kim, Hasna Baidouri, Jin Li, Xuesen Cheng, Janice Vranka, Yumei Li, Rui Chen, VijayKrishna Raghunathan","doi":"10.1038/s41597-025-04528-5","DOIUrl":"https://doi.org/10.1038/s41597-025-04528-5","url":null,"abstract":"<p><p>The trabecular meshwork within the outflow apparatus is critical in maintaining intraocular pressure homeostasis. In vitro studies employing primary cell cultures of the human trabecular meshwork (hTM) have conventionally served as surrogates for investigating the pathobiology of TM dysfunction. Despite its abundant use, translation of outcomes from in vitro studies to ex vivo and/or in vivo studies remains a challenge. Given the cell heterogeneity, performing single-cell RNA sequencing comparing primary hTM cell cultures to hTM tissue may provide important insights on cellular identity and translatability, as such an approach has not been reported before. In this study, we assembled a total of 14 primary hTM in vitro samples across passages 1-4, including 4 samples from individuals diagnosed with glaucoma. This dataset offers a comprehensive transcriptomic resource of primary hTM in vitro scRNA-seq data to study global changes in gene expression in comparison to cells in tissue in situ. We have performed extensive preprocessing and quality control, allowing the research community to access and utilize this public resource.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"264"},"PeriodicalIF":5.8,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143425780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-14DOI: 10.1038/s41597-024-04135-w
Franz Pablo Antezana Lopez, Guanhua Zhou, Guifei Jing, Kai Zhang, Liangfu Chen, Lin Chen, Yumin Tan
Accurate global carbon dioxide (CO2) distribution with high spatial and temporal resolution is essential for understanding its dynamics and impacts on climate change. This study tackles the challenge of data gaps in satellite observations of greenhouse gases, caused by orbital and observational limitations. We reconstructed a comprehensive dataset of Column-averaged CO2 (XCO2) concentrations by integrating re-analyzed data from the Copernicus Atmosphere Monitoring Service (CAMS) with observations from GOSAT and OCO-3 satellites. Using two advanced data reconstruction methods-Data Interpolating Empirical Orthogonal Functions (DINEOF) and Convolutional Auto-Encoder (DINCAE)-we imputed missing data, preserving spatial and temporal consistency. The combined approach achieved high accuracy, with Pearson correlation values between 0.94 and 0.95 against TCCON measurements, and we also reported root mean square error (RMSE) to assess model performance further. Our results indicate that these techniques generate a daily, high-resolution, gap-free XCO2 dataset, enabling improved CO2 monitoring, climate modeling, and policy development.
{"title":"Global Daily Column Average CO<sub>2</sub> at 0.1° × 0.1° Spatial Resolution Integrating OCO-3, GOSAT, CAMS with EOF and Deep Learning.","authors":"Franz Pablo Antezana Lopez, Guanhua Zhou, Guifei Jing, Kai Zhang, Liangfu Chen, Lin Chen, Yumin Tan","doi":"10.1038/s41597-024-04135-w","DOIUrl":"https://doi.org/10.1038/s41597-024-04135-w","url":null,"abstract":"<p><p>Accurate global carbon dioxide (CO<sub>2</sub>) distribution with high spatial and temporal resolution is essential for understanding its dynamics and impacts on climate change. This study tackles the challenge of data gaps in satellite observations of greenhouse gases, caused by orbital and observational limitations. We reconstructed a comprehensive dataset of Column-averaged CO2 (XCO<sub>2</sub>) concentrations by integrating re-analyzed data from the Copernicus Atmosphere Monitoring Service (CAMS) with observations from GOSAT and OCO-3 satellites. Using two advanced data reconstruction methods-Data Interpolating Empirical Orthogonal Functions (DINEOF) and Convolutional Auto-Encoder (DINCAE)-we imputed missing data, preserving spatial and temporal consistency. The combined approach achieved high accuracy, with Pearson correlation values between 0.94 and 0.95 against TCCON measurements, and we also reported root mean square error (RMSE) to assess model performance further. Our results indicate that these techniques generate a daily, high-resolution, gap-free XCO<sub>2</sub> dataset, enabling improved CO<sub>2</sub> monitoring, climate modeling, and policy development.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"268"},"PeriodicalIF":5.8,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143425906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}