Image databases are central to empirical aesthetics, enabling tests of how image statistics relate to observers' appreciation. However, many existing databases have two key limitations: (1) they conflate low-level visual features with high-level semantic content, making it difficult to separate visual from cognitive influences on aesthetic judgments; and (2) they are imbalanced, overrepresenting highly appreciated images. To address these issues, we present the Minimum Semantic Content (MSC) database, a large, systematically curated resource for computational aesthetics. It comprises 10,426 natural scenes with reduced, homogenized semantic content, minimizing cognitive and emotional confounds. Each received 100 individual aesthetic ratings from naïve observers, drawn from a pool of approximately 10,000 participants, via crowdsourcing. The database includes both "beautified" and "uglified" versions, generated with a manipulation technique that promotes uniform coverage across the aesthetic spectrum. This broader distribution mitigates bias and overfitting in models. Validation also shows improved robustness in computational models overall. This database enables researchers to study how perceptual features shape aesthetic judgments, using stimuli with very limited semantic and contextual confounds.
{"title":"The Minimum Semantic Content (MSC) Dataset: A Large, Balanced Resource for Computational Aesthetics Research.","authors":"Olivier Penacchio, Arslan Javed, Bogdan Raducanu, Xavier Otazu, C Alejandro Parraga","doi":"10.1038/s41597-026-06816-0","DOIUrl":"https://doi.org/10.1038/s41597-026-06816-0","url":null,"abstract":"<p><p>Image databases are central to empirical aesthetics, enabling tests of how image statistics relate to observers' appreciation. However, many existing databases have two key limitations: (1) they conflate low-level visual features with high-level semantic content, making it difficult to separate visual from cognitive influences on aesthetic judgments; and (2) they are imbalanced, overrepresenting highly appreciated images. To address these issues, we present the Minimum Semantic Content (MSC) database, a large, systematically curated resource for computational aesthetics. It comprises 10,426 natural scenes with reduced, homogenized semantic content, minimizing cognitive and emotional confounds. Each received 100 individual aesthetic ratings from naïve observers, drawn from a pool of approximately 10,000 participants, via crowdsourcing. The database includes both \"beautified\" and \"uglified\" versions, generated with a manipulation technique that promotes uniform coverage across the aesthetic spectrum. This broader distribution mitigates bias and overfitting in models. Validation also shows improved robustness in computational models overall. This database enables researchers to study how perceptual features shape aesthetic judgments, using stimuli with very limited semantic and contextual confounds.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146213959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-17DOI: 10.1038/s41597-026-06562-3
Salvador Herrando-Pérez, Kieren J Mitchell, John R Southon, Chris S M Turney, Thomas W Stafford
Radiocarbon dates from megafaunal remains provide insights into climatic and anthropogenic factors shaping past ecosystems. Chronologies have advanced through rigorous chemical purification (pretreatment) of fossil vertebrate collagen for accelerator mass spectrometry (AMS) radiocarbon dating. We present MEGA14C, a comprehensive dataset of late Quaternary AMS radiocarbon dates for Holarctic large-bodied mammals, based on collagen purified by ultrafiltration (92% of records), XAD-2 purification (7%) and hydroxyproline isolation (1%). MEGA14C includes 11,715 dates spanning 8 orders, 23 families, 78 genera, 133 species and 18 subspecies, 27% from extinct taxa, and dominated by Equus, Bos, Mammuthus, Rangifer, Bison, Ursus, Cervus, Canis, Coelodonta and Sus. Where available, geolocation, genetic and isotopic data are provided. Pretreatment is critical for accurate and reproducible radiocarbon measurements, yet 44% of published dates lack this information. We addressed this gap through over 10,000 personal communications (out of >100,000 emails) with researchers and AMS laboratories among the parties involved in fossil dating. This unique dataset supports (pre)historical research and provides a foundation for future expansion and/or integration into a global radiocarbon repository.
{"title":"A dataset of radiocarbon dates from Holarctic mammal collagen purified with high-quality chemistry.","authors":"Salvador Herrando-Pérez, Kieren J Mitchell, John R Southon, Chris S M Turney, Thomas W Stafford","doi":"10.1038/s41597-026-06562-3","DOIUrl":"https://doi.org/10.1038/s41597-026-06562-3","url":null,"abstract":"<p><p>Radiocarbon dates from megafaunal remains provide insights into climatic and anthropogenic factors shaping past ecosystems. Chronologies have advanced through rigorous chemical purification (pretreatment) of fossil vertebrate collagen for accelerator mass spectrometry (AMS) radiocarbon dating. We present MEGA14C, a comprehensive dataset of late Quaternary AMS radiocarbon dates for Holarctic large-bodied mammals, based on collagen purified by ultrafiltration (92% of records), XAD-2 purification (7%) and hydroxyproline isolation (1%). MEGA14C includes 11,715 dates spanning 8 orders, 23 families, 78 genera, 133 species and 18 subspecies, 27% from extinct taxa, and dominated by Equus, Bos, Mammuthus, Rangifer, Bison, Ursus, Cervus, Canis, Coelodonta and Sus. Where available, geolocation, genetic and isotopic data are provided. Pretreatment is critical for accurate and reproducible radiocarbon measurements, yet 44% of published dates lack this information. We addressed this gap through over 10,000 personal communications (out of >100,000 emails) with researchers and AMS laboratories among the parties involved in fossil dating. This unique dataset supports (pre)historical research and provides a foundation for future expansion and/or integration into a global radiocarbon repository.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146213852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-17DOI: 10.1038/s41597-026-06661-1
Liuqing Wei, Thomas Talhelm, Jiong Zhu, Alexander Scott English, An Huang
We created a collectivism index to measure regional differences in China. The index uses eight Census indicators that reflect family living arrangements, marriage stability, innovation, and independence. The Census data offers large, nationally representative data, which ensures high-fidelity measurement and fine-grained geographic resolution from provinces down to prefectures (N = 356). The data also allows researchers to track change over time because the data stretches from 1982 to 2020. This dataset is useful for exploring causes of societal differences, outcomes of collectivism, and cultural shifts in longitudinal data.
{"title":"A Collectivism Index for Investigating Cultural Variation in China across Regions and Time.","authors":"Liuqing Wei, Thomas Talhelm, Jiong Zhu, Alexander Scott English, An Huang","doi":"10.1038/s41597-026-06661-1","DOIUrl":"https://doi.org/10.1038/s41597-026-06661-1","url":null,"abstract":"<p><p>We created a collectivism index to measure regional differences in China. The index uses eight Census indicators that reflect family living arrangements, marriage stability, innovation, and independence. The Census data offers large, nationally representative data, which ensures high-fidelity measurement and fine-grained geographic resolution from provinces down to prefectures (N = 356). The data also allows researchers to track change over time because the data stretches from 1982 to 2020. This dataset is useful for exploring causes of societal differences, outcomes of collectivism, and cultural shifts in longitudinal data.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146213903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-17DOI: 10.1038/s41597-026-06875-3
Bas Turpijn, Fedor Baart, Lóránt Tavasszy, Mark van Koningsveld
To support a modal shift toward sustainable freight solutions, such as inland waterway transport (IWT), researchers and practitioners require long-term historical data on IWT freight flows. However, such comprehensive time series have been unavailable until now. This study addresses this gap by presenting a harmonized dataset encompassing 50 years (1970-2023) of IWT freight data across Europe, with a focus on the Rhine-Alpine Corridor. The dataset includes transport volumes (in tonnes) and transport performance (in ton-kilometers), classified according to NST-R, NST2007, and CCR nomenclatures. To ensure data continuity and completeness, processing techniques-including imputation and optical character recognition-were applied. The dataset offers valuable insights for researchers, policymakers, and transport planners aiming to comprehend and enhance the role of IWT in Europe's freight transport landscape.
{"title":"50-Years Inland Waterway Freight Data in the Rhine-Alpine Corridor.","authors":"Bas Turpijn, Fedor Baart, Lóránt Tavasszy, Mark van Koningsveld","doi":"10.1038/s41597-026-06875-3","DOIUrl":"https://doi.org/10.1038/s41597-026-06875-3","url":null,"abstract":"<p><p>To support a modal shift toward sustainable freight solutions, such as inland waterway transport (IWT), researchers and practitioners require long-term historical data on IWT freight flows. However, such comprehensive time series have been unavailable until now. This study addresses this gap by presenting a harmonized dataset encompassing 50 years (1970-2023) of IWT freight data across Europe, with a focus on the Rhine-Alpine Corridor. The dataset includes transport volumes (in tonnes) and transport performance (in ton-kilometers), classified according to NST-R, NST2007, and CCR nomenclatures. To ensure data continuity and completeness, processing techniques-including imputation and optical character recognition-were applied. The dataset offers valuable insights for researchers, policymakers, and transport planners aiming to comprehend and enhance the role of IWT in Europe's freight transport landscape.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146213905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-17DOI: 10.1038/s41597-026-06866-4
Tao Wang, Xinle Zhang, Mulin Shan, Mingyuan Deng, Jiaheng Wang, Huanjun Liu, Hao Li, Jinyu Sun
The American satellite reconnaissance program (Keyhole imagery) is serving as a significant data source for geoscience research because of its high-resolution and early temporal coverage, while lack of spatial and temporal description of its uneven distribution could hinder researchers from selecting/accessing appropriate the Keyhole images. Here we introduce a global grid-based dataset that organizes declassified U.S. Keyhole imagery (1960-1984) for direct reuse, built on a global equal-area sinusoidal grid. This dataset standardizes scene metadata and provides indicators designed to inform study design and data integration: coverage count (how often a place was imaged), unique acquisition dates (temporal sampling richness), first/last observation year (temporal bounds), observation span (duration), peak observation year and a three-year window (temporal concentration), resolution class (C1-C3), temporal-coverage class across five five-year intervals, and resolution-coverage class (A-G) for multi-scale availability. This dataset enables users to quickly locate usable scenes, assess temporal suitability, combine historical images with modern satellites, and determine which non-free images to purchase if free images were unsuitable for their research.
{"title":"Global 0.05° Grid-Based Dataset of Keyhole Imagery with Spatio-Temporal Indicators (1960-1984).","authors":"Tao Wang, Xinle Zhang, Mulin Shan, Mingyuan Deng, Jiaheng Wang, Huanjun Liu, Hao Li, Jinyu Sun","doi":"10.1038/s41597-026-06866-4","DOIUrl":"https://doi.org/10.1038/s41597-026-06866-4","url":null,"abstract":"<p><p>The American satellite reconnaissance program (Keyhole imagery) is serving as a significant data source for geoscience research because of its high-resolution and early temporal coverage, while lack of spatial and temporal description of its uneven distribution could hinder researchers from selecting/accessing appropriate the Keyhole images. Here we introduce a global grid-based dataset that organizes declassified U.S. Keyhole imagery (1960-1984) for direct reuse, built on a global equal-area sinusoidal grid. This dataset standardizes scene metadata and provides indicators designed to inform study design and data integration: coverage count (how often a place was imaged), unique acquisition dates (temporal sampling richness), first/last observation year (temporal bounds), observation span (duration), peak observation year and a three-year window (temporal concentration), resolution class (C1-C3), temporal-coverage class across five five-year intervals, and resolution-coverage class (A-G) for multi-scale availability. This dataset enables users to quickly locate usable scenes, assess temporal suitability, combine historical images with modern satellites, and determine which non-free images to purchase if free images were unsuitable for their research.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146213941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neolissochilus pnar, identified as the world's largest cave fish, belongs to the family Cyprinidae and is endemic to one of India's biodiversity hotspots, specifically in the limestones caves of Meghalaya, India. This species is notably different from its closely related counterpart, Neolissochilus hexastichus, primarily in its lack of pigmentation and the absence or reduction of eyes. While juvenile N. pnar may have small or reduced eyes, adults exhibit a absence of external ocular features. Thus, genome sequence resources of this species would be an effective tool for bioprospecting and mining of novel genes responsible for the important traits. In this study, genome sequencing was done through long reads technology (PacBio) and high quality draft genome assembly, of 1.56 Gb in size with 1,423 contigs, N50 of 18.990 Mb was generated, which showed 99% (BUSCO) genome completenes. The genome assembly contains 44.30% repetitive elements, 1,416,376 SSRs, and 37,559 functionally annotated genes. Single-copy orthologs (SOGs) analysis indicated N. pnar to be in the same cluster with other cave dwelling Cyprinids used in the sudy.The extensive genomic information generated in present study would be a useful resource for understanding evolutionaly significance and genes governing the traits including the body colour and eye development in Mahseer species.
{"title":"Genome sequencing and assembly of Neolissochilus pnar, the largest cavefish species of Mahseer.","authors":"Vindhya Mohindra, Labrechai Mog Chowdhury, Dran Khlur Baiaineh Mukhim, Kangkan Sarma, Deisakee Pyrbot Warbah, Dandadhar Sarma, Joykrushna Jena","doi":"10.1038/s41597-026-06842-y","DOIUrl":"https://doi.org/10.1038/s41597-026-06842-y","url":null,"abstract":"<p><p>Neolissochilus pnar, identified as the world's largest cave fish, belongs to the family Cyprinidae and is endemic to one of India's biodiversity hotspots, specifically in the limestones caves of Meghalaya, India. This species is notably different from its closely related counterpart, Neolissochilus hexastichus, primarily in its lack of pigmentation and the absence or reduction of eyes. While juvenile N. pnar may have small or reduced eyes, adults exhibit a absence of external ocular features. Thus, genome sequence resources of this species would be an effective tool for bioprospecting and mining of novel genes responsible for the important traits. In this study, genome sequencing was done through long reads technology (PacBio) and high quality draft genome assembly, of 1.56 Gb in size with 1,423 contigs, N<sub>50</sub> of 18.990 Mb was generated, which showed 99% (BUSCO) genome completenes. The genome assembly contains 44.30% repetitive elements, 1,416,376 SSRs, and 37,559 functionally annotated genes. Single-copy orthologs (SOGs) analysis indicated N. pnar to be in the same cluster with other cave dwelling Cyprinids used in the sudy.The extensive genomic information generated in present study would be a useful resource for understanding evolutionaly significance and genes governing the traits including the body colour and eye development in Mahseer species.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146213885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-17DOI: 10.1038/s41597-026-06789-0
Ditao Niu, Juan Zhou, Bingbing Guo
Carbon dioxide (CO2) emissions from concrete have grown rapidly, ranking second after the power sector. Current emission factors often overlook regional heterogeneity. To bridge this knowledge gap, this study takes Shandong Province, a typical region in China, as a case study. Considering the difference in geography, history, culture, and economic development, Shandong is divided into five subregions: Eastern, Western, Southern, Northern, and Central Shandong. This study developed a fundamental carbon footprint dataset of concrete by collecting 993 mix proportions of strength grades (C25-C60) from field surveys over the past five years. Statistical analysis showed that raw material dosages followed normal distributions (Kolmogorov-Smirnovtest, p > 0.05), while transportation distances and electricity consumption followed lognormal distributions. Based on statistical characteristics, a Monte Carlo simulation with 10,000 iterations was conducted to establish a stochastic model for carbon emissions accounting. Model performance was validated against survey data, achieving a mean absolute percentage error (MAPE) of 1.89% and a coefficient of determination (R²) of 0.9904. Sensitivity analysis identified cement dosage as the key driver of emissions.
混凝土的二氧化碳(CO2)排放量迅速增长,仅次于电力行业。目前的排放因子往往忽略了区域异质性。为了弥补这一知识缺口,本研究以中国典型地区山东省为研究对象。考虑到地理、历史、文化和经济发展的差异,山东被划分为五个分区:山东东部、西部、山东南部、山东北部和山东中部。本研究通过收集过去五年现场调查中993种强度等级(C25-C60)的配合比,开发了一个基本的混凝土碳足迹数据集。统计分析表明,原料用量服从正态分布(Kolmogorov-Smirnovtest, p > 0.05),运输距离和用电量服从对数正态分布。基于统计特性,进行1万次蒙特卡罗模拟,建立碳排放核算的随机模型。根据调查数据验证了模型的性能,平均绝对百分比误差(MAPE)为1.89%,决定系数(R²)为0.9904。敏感性分析表明水泥用量是影响排放的关键因素。
{"title":"Carbon footprint dataset of concrete based on field surveys at commercial mixing plants in Shandong, China.","authors":"Ditao Niu, Juan Zhou, Bingbing Guo","doi":"10.1038/s41597-026-06789-0","DOIUrl":"https://doi.org/10.1038/s41597-026-06789-0","url":null,"abstract":"<p><p>Carbon dioxide (CO<sub>2</sub>) emissions from concrete have grown rapidly, ranking second after the power sector. Current emission factors often overlook regional heterogeneity. To bridge this knowledge gap, this study takes Shandong Province, a typical region in China, as a case study. Considering the difference in geography, history, culture, and economic development, Shandong is divided into five subregions: Eastern, Western, Southern, Northern, and Central Shandong. This study developed a fundamental carbon footprint dataset of concrete by collecting 993 mix proportions of strength grades (C25-C60) from field surveys over the past five years. Statistical analysis showed that raw material dosages followed normal distributions (Kolmogorov-Smirnovtest, p > 0.05), while transportation distances and electricity consumption followed lognormal distributions. Based on statistical characteristics, a Monte Carlo simulation with 10,000 iterations was conducted to establish a stochastic model for carbon emissions accounting. Model performance was validated against survey data, achieving a mean absolute percentage error (MAPE) of 1.89% and a coefficient of determination (R²) of 0.9904. Sensitivity analysis identified cement dosage as the key driver of emissions.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146213919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-17DOI: 10.1038/s41597-026-06596-7
Abdellah Idrissi Azami, Stacy Pirro, Nihal Habib, Turgay Unver, M Gonzalo Claros, Juan de Dios Alché, Sofia Sehli, Zainab El Ouafi, Douae El Ghoubali, Dalila Bousta, Najib Al Idrissi, Fatima Gaboun, Abderrazak Rfaki, Abdelkhalek Legsyer, Chakib Nejjari, Saaid Amzazi, Lahcen Belyamani, Abdelhamid El Mousadik, Hassan Ghazal
We present a comprehensive annotation dataset for the scaffold-level nuclear genome assembly of the argan tree (Argania spinosa (L.) Skeels, Sapotaceae). Using Illumina whole-genome shotgun reads that we previously generated for the "Argan Amghar" individual and deposited under BioProject PRJNA294096, together with the corresponding GenBank assembly GCA_003260245.2, we re-assembled and curated a 690 Mbp draft with scaffold N50 of 25 Mbp and L50 of 11 large macro-scaffolds. Ab initio gene prediction with AUGUSTUS and GeneMark-ES, integrated by EVidenceModeler, produced 51,078 protein-coding genes and 2,081 non-coding RNA genes, while repeat annotation covers 53.0% of the assembly. Functional annotation combined eggNOG-mapper, InterProScan and BLASTp searches against UniProtKB/Swiss-Prot to assign curated functions, domains and Gene Ontology terms to 32,785 genes and to support 25,484 proteins with UniProt evidence. BUSCO analyses indicate high completeness of the assembly gene space and completeness of the predicted proteome (74.6%). All primary data products, including a unified GFF3 file and the predicted proteome FASTA, are openly available via NCBI and Zenodo ( https://doi.org/10.5281/zenodo.17901083 ).
{"title":"Comprehensive re-assembly and annotation dataset for the argan tree (Argania spinosa L., Sapotaceae) genome.","authors":"Abdellah Idrissi Azami, Stacy Pirro, Nihal Habib, Turgay Unver, M Gonzalo Claros, Juan de Dios Alché, Sofia Sehli, Zainab El Ouafi, Douae El Ghoubali, Dalila Bousta, Najib Al Idrissi, Fatima Gaboun, Abderrazak Rfaki, Abdelkhalek Legsyer, Chakib Nejjari, Saaid Amzazi, Lahcen Belyamani, Abdelhamid El Mousadik, Hassan Ghazal","doi":"10.1038/s41597-026-06596-7","DOIUrl":"10.1038/s41597-026-06596-7","url":null,"abstract":"<p><p>We present a comprehensive annotation dataset for the scaffold-level nuclear genome assembly of the argan tree (Argania spinosa (L.) Skeels, Sapotaceae). Using Illumina whole-genome shotgun reads that we previously generated for the \"Argan Amghar\" individual and deposited under BioProject PRJNA294096, together with the corresponding GenBank assembly GCA_003260245.2, we re-assembled and curated a 690 Mbp draft with scaffold N50 of 25 Mbp and L50 of 11 large macro-scaffolds. Ab initio gene prediction with AUGUSTUS and GeneMark-ES, integrated by EVidenceModeler, produced 51,078 protein-coding genes and 2,081 non-coding RNA genes, while repeat annotation covers 53.0% of the assembly. Functional annotation combined eggNOG-mapper, InterProScan and BLASTp searches against UniProtKB/Swiss-Prot to assign curated functions, domains and Gene Ontology terms to 32,785 genes and to support 25,484 proteins with UniProt evidence. BUSCO analyses indicate high completeness of the assembly gene space and completeness of the predicted proteome (74.6%). All primary data products, including a unified GFF3 file and the predicted proteome FASTA, are openly available via NCBI and Zenodo ( https://doi.org/10.5281/zenodo.17901083 ).</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":"267"},"PeriodicalIF":6.9,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146213836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-17DOI: 10.1038/s41597-026-06841-z
Augusto C Lima, Helen E Dulfer, Anna L C Hughes, Martin Margold, Iestyn Barr, Benjamin J C Laabs, Suzette G A Flantua
Mountain regions experienced repeated glacial expansions and retreats during the Quaternary, shaping landscapes, ecosystems, and regional climates. While numerous reconstructions exist for individual mountain glaciers, global geodatabases remain scarce and rarely updated to reflect the latest observations. Here, we present GLACIMONTIS, a global geodatabase of maximum recorded areal extents of mountain glaciers at local Last Glacial Maximum, spanning 57-14 kyr BP. Our synthesis integrates reconstructions from 209 studies across 271 mountain ranges worldwide, compiling 15,014 individual glacier reconstructions, including 8,809 reconstructions compiled for the first time in a global geodatabase. Our work updates knowledge in 135 mountain ranges and highlights research gaps in 71 others. GLACIMONTIS represents the most comprehensive and up-to-date synthesis of mountain glacier areal extent at the global and local Last Glacial Maximum, providing spatial boundaries for refining climate-glacier modeling and delineating paleoecological reconstructions, and a framework for identifying regional research gaps. GLACIMONTIS advances Quaternary science by enhancing access to paleoglacier reconstructions and fostering interdisciplinary research in and across mountains worldwide.
{"title":"Mountain glacier extents at the Last Glacial Maximum.","authors":"Augusto C Lima, Helen E Dulfer, Anna L C Hughes, Martin Margold, Iestyn Barr, Benjamin J C Laabs, Suzette G A Flantua","doi":"10.1038/s41597-026-06841-z","DOIUrl":"https://doi.org/10.1038/s41597-026-06841-z","url":null,"abstract":"<p><p>Mountain regions experienced repeated glacial expansions and retreats during the Quaternary, shaping landscapes, ecosystems, and regional climates. While numerous reconstructions exist for individual mountain glaciers, global geodatabases remain scarce and rarely updated to reflect the latest observations. Here, we present GLACIMONTIS, a global geodatabase of maximum recorded areal extents of mountain glaciers at local Last Glacial Maximum, spanning 57-14 kyr BP. Our synthesis integrates reconstructions from 209 studies across 271 mountain ranges worldwide, compiling 15,014 individual glacier reconstructions, including 8,809 reconstructions compiled for the first time in a global geodatabase. Our work updates knowledge in 135 mountain ranges and highlights research gaps in 71 others. GLACIMONTIS represents the most comprehensive and up-to-date synthesis of mountain glacier areal extent at the global and local Last Glacial Maximum, providing spatial boundaries for refining climate-glacier modeling and delineating paleoecological reconstructions, and a framework for identifying regional research gaps. GLACIMONTIS advances Quaternary science by enhancing access to paleoglacier reconstructions and fostering interdisciplinary research in and across mountains worldwide.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146213999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-17DOI: 10.1038/s41597-026-06801-7
Han Zhang, Qi Li, Dake Chen, Xiaodong Shang, Di Tian, Tongya Liu, Min He, Jiacheng Hong, Guofei Wei, Jian Liu
This work presents a triangle-shaped moored array dataset comprising three buoys and two moorings with synchronous atmospheric and oceanic data in the northern South China Sea during 2016. The moored array was deployed from late July to early August and recovered in October. The atmospheric data were observed by meteorological sensors and automatic meteorological stations ~2.5 m above sea surface at the buoys. The oceanic data consist of temperature and salinity measurements using conductivity, temperature, and depth (CTD) recorders or temperature sensors. It also includes currents observed by acoustic Doppler current profilers (ADCPs) and current meters. The data reveal air-sea interactions and oceanic processes in the upper and deep ocean. Multiscale processes were recorded, such as air-sea fluxes, tides, internal waves, and low-frequency flows and variations. The data are valuable and may have a lot of potential applications, including analyzing the phenomena and mechanisms of air-sea interactions and ocean dynamics as well as validating and improving numerical model simulations, data reanalysis, and data assimilation.
{"title":"Atmospheric and oceanic data from a triangle-shaped moored array in the northern South China Sea during 2016.","authors":"Han Zhang, Qi Li, Dake Chen, Xiaodong Shang, Di Tian, Tongya Liu, Min He, Jiacheng Hong, Guofei Wei, Jian Liu","doi":"10.1038/s41597-026-06801-7","DOIUrl":"https://doi.org/10.1038/s41597-026-06801-7","url":null,"abstract":"<p><p>This work presents a triangle-shaped moored array dataset comprising three buoys and two moorings with synchronous atmospheric and oceanic data in the northern South China Sea during 2016. The moored array was deployed from late July to early August and recovered in October. The atmospheric data were observed by meteorological sensors and automatic meteorological stations ~2.5 m above sea surface at the buoys. The oceanic data consist of temperature and salinity measurements using conductivity, temperature, and depth (CTD) recorders or temperature sensors. It also includes currents observed by acoustic Doppler current profilers (ADCPs) and current meters. The data reveal air-sea interactions and oceanic processes in the upper and deep ocean. Multiscale processes were recorded, such as air-sea fluxes, tides, internal waves, and low-frequency flows and variations. The data are valuable and may have a lot of potential applications, including analyzing the phenomena and mechanisms of air-sea interactions and ocean dynamics as well as validating and improving numerical model simulations, data reanalysis, and data assimilation.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":" ","pages":""},"PeriodicalIF":6.9,"publicationDate":"2026-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146213872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}