Pub Date : 2024-10-01DOI: 10.1038/s41597-024-03675-5
Stefanie Meliss, Cristina Pascua-Martin, Jeremy I Skipper, Kou Murayama
Videos of magic tricks offer lots of opportunities to study the human mind. They violate the expectations of the viewer, causing prediction errors, misdirect attention, and elicit epistemic emotions. Herein we describe and share the Magic, Memory, and Curiosity (MMC) Dataset where 50 participants watched 36 magic tricks filmed and edited specifically for functional magnetic imaging (fMRI) experiments. The MMC Dataset includes a contextual incentive manipulation, curiosity ratings for the magic tricks, and incidental memory performance tested a week later. We additionally measured individual differences in working memory and constructs relevant to motivated learning. fMRI data were acquired before, during, and after learning. We show that both behavioural and fMRI data are of high quality, as indicated by basic validation analysis, i.e., variance decomposition as well as intersubject correlation and seed-based functional connectivity, respectively. The richness and complexity of the MMC Dataset will allow researchers to explore dynamic cognitive and motivational processes from various angles during task and rest.
{"title":"The magic, memory, and curiosity fMRI dataset of people viewing magic tricks.","authors":"Stefanie Meliss, Cristina Pascua-Martin, Jeremy I Skipper, Kou Murayama","doi":"10.1038/s41597-024-03675-5","DOIUrl":"10.1038/s41597-024-03675-5","url":null,"abstract":"<p><p>Videos of magic tricks offer lots of opportunities to study the human mind. They violate the expectations of the viewer, causing prediction errors, misdirect attention, and elicit epistemic emotions. Herein we describe and share the Magic, Memory, and Curiosity (MMC) Dataset where 50 participants watched 36 magic tricks filmed and edited specifically for functional magnetic imaging (fMRI) experiments. The MMC Dataset includes a contextual incentive manipulation, curiosity ratings for the magic tricks, and incidental memory performance tested a week later. We additionally measured individual differences in working memory and constructs relevant to motivated learning. fMRI data were acquired before, during, and after learning. We show that both behavioural and fMRI data are of high quality, as indicated by basic validation analysis, i.e., variance decomposition as well as intersubject correlation and seed-based functional connectivity, respectively. The richness and complexity of the MMC Dataset will allow researchers to explore dynamic cognitive and motivational processes from various angles during task and rest.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11445505/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142361973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-30DOI: 10.1038/s41597-024-03879-9
Velomalala Solo Andrianjafindrainibe, Nicole Andrianirina, Florent Bédécarrats, Isabelle Droy, Jean-Luc Dubois, Jeanne de Montalembert, Bako Nirina Rabevohitra, Rolland Rafidimanana, Patrick Rasolofo, Raphaël Ratovoarinony, Lalasoa Anjarafara Onivola Ratsaramiarina, Jean Dieudonné Ravelonandro, Voahirana Razanamavo, Mireille Razafindrakoto, Bezaka Rivolala, François Roubaud, Camille Saint-Macary
A Rural Observatory System (ROS) was established in Madagascar to address the lack of socioeconomic data on rural areas. It collected, analyzed, and disseminated data to help formulate and evaluate development policies. From 1995 to 2015, the ROS surveyed a total of 26 areas. The ROS methodology involved annual household panel surveys using consistent questionnaires supplemented by modules covering new themes. Qualitative community surveys were used to understand local features and dynamics. The site selection combined quantitative and qualitative insights to reflect the diversity of Madagascar's rural challenges. Quality control was comprehensive, with measures such as limiting the number of daily surveyor interviews and daily field supervision. By making this data available for 21 consecutive years, along with documentation, metadata, and code with analysis examples, we aim to facilitate their discovery, assessment, and understanding by researchers, policymakers, and social organizations. To our knowledge, this is the only available data for an in-depth analysis of the situation and trends in the rural areas of Madagascar.
{"title":"Madagascar rural observatory surveys, a longitudinal dataset on household living conditions 1995-2015.","authors":"Velomalala Solo Andrianjafindrainibe, Nicole Andrianirina, Florent Bédécarrats, Isabelle Droy, Jean-Luc Dubois, Jeanne de Montalembert, Bako Nirina Rabevohitra, Rolland Rafidimanana, Patrick Rasolofo, Raphaël Ratovoarinony, Lalasoa Anjarafara Onivola Ratsaramiarina, Jean Dieudonné Ravelonandro, Voahirana Razanamavo, Mireille Razafindrakoto, Bezaka Rivolala, François Roubaud, Camille Saint-Macary","doi":"10.1038/s41597-024-03879-9","DOIUrl":"10.1038/s41597-024-03879-9","url":null,"abstract":"<p><p>A Rural Observatory System (ROS) was established in Madagascar to address the lack of socioeconomic data on rural areas. It collected, analyzed, and disseminated data to help formulate and evaluate development policies. From 1995 to 2015, the ROS surveyed a total of 26 areas. The ROS methodology involved annual household panel surveys using consistent questionnaires supplemented by modules covering new themes. Qualitative community surveys were used to understand local features and dynamics. The site selection combined quantitative and qualitative insights to reflect the diversity of Madagascar's rural challenges. Quality control was comprehensive, with measures such as limiting the number of daily surveyor interviews and daily field supervision. By making this data available for 21 consecutive years, along with documentation, metadata, and code with analysis examples, we aim to facilitate their discovery, assessment, and understanding by researchers, policymakers, and social organizations. To our knowledge, this is the only available data for an in-depth analysis of the situation and trends in the rural areas of Madagascar.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11442863/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142353153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-30DOI: 10.1038/s41597-024-03853-5
Yulu Yan, Ke Zhao, Longwei Yang, Nan Liu, Yufei Xu, Junyi Gai, Guangnan Xing
The soybean hawkmoth Clanis bilineata tsingtauica Mell (Lepidoptera, Sphingidae; CBT), as one of the main leaf-chewing pests of soybeans, has gained popularity as an edible insect in China recently due to its high nutritional value. However, high-quality genome of CBT remains unclear, which greatly limits further research. In the present study, we assembled a high-quality chromosome-level genome of CBT using PacBio HiFi reads and Hi-C technologies for the first time. The size of the assembled genome is 477.45 Mb with a contig N50 length of 17.43 Mb. After Hi-C scaffolding, the contigs were anchored to 29 chromosomes with a mapping rate of 99.61%. Benchmarking Universal Single-Copy Orthologues (BUSCO) completeness value is 99.49%. The genome contains 252.16 Mb of repeat elements and 14,214 protein-coding genes. In addition, chromosomal synteny analysis showed that the genome of CBT has a strong synteny with that of Manduca sexta. In conclusion, this high-quality genome provides an important resource for future studies of CBT and contributes to the development of integrated pest management strategies.
{"title":"Chromosome-level genome assembly and annotation of Clanis bilineata tsingtauica Mell (Lepidoptera: Sphingidae).","authors":"Yulu Yan, Ke Zhao, Longwei Yang, Nan Liu, Yufei Xu, Junyi Gai, Guangnan Xing","doi":"10.1038/s41597-024-03853-5","DOIUrl":"10.1038/s41597-024-03853-5","url":null,"abstract":"<p><p>The soybean hawkmoth Clanis bilineata tsingtauica Mell (Lepidoptera, Sphingidae; CBT), as one of the main leaf-chewing pests of soybeans, has gained popularity as an edible insect in China recently due to its high nutritional value. However, high-quality genome of CBT remains unclear, which greatly limits further research. In the present study, we assembled a high-quality chromosome-level genome of CBT using PacBio HiFi reads and Hi-C technologies for the first time. The size of the assembled genome is 477.45 Mb with a contig N50 length of 17.43 Mb. After Hi-C scaffolding, the contigs were anchored to 29 chromosomes with a mapping rate of 99.61%. Benchmarking Universal Single-Copy Orthologues (BUSCO) completeness value is 99.49%. The genome contains 252.16 Mb of repeat elements and 14,214 protein-coding genes. In addition, chromosomal synteny analysis showed that the genome of CBT has a strong synteny with that of Manduca sexta. In conclusion, this high-quality genome provides an important resource for future studies of CBT and contributes to the development of integrated pest management strategies.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11443141/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142353140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-28DOI: 10.1038/s41597-024-03904-x
Denis Grouzdev, Emmanuelle Pales Espinosa, Stephen Tettelbach, Sarah Farhat, Arnaud Tanguy, Isabelle Boutet, Nadège Guiglielmoni, Jean-François Flot, Harrison Tobi, Bassem Allam
The bay scallop, Argopecten irradians, is a species of major commercial, cultural, and ecological importance. It is endemic to the eastern coast of the United States, but has also been introduced to China, where it supports a significant aquaculture industry. Here, we provide an annotated chromosome-level reference genome assembly for the bay scallop, assembled using PacBio and Hi-C data. The total genome size is 845.9 Mb, distributed over 1,503 scaffolds with a scaffold N50 of 44.3 Mb. The majority (92.9%) of the assembled genome is contained within the 16 largest scaffolds, corresponding to the 16 chromosomes confirmed by Hi-C analysis. The assembly also includes the complete mitochondrial genome. Approximately 36.2% of the genome consists of repetitive elements. The BUSCO analysis showed a completeness of 96.2%. We identified 33,772 protein-coding genes. This genome assembly will be a valuable resource for future research on evolutionary dynamics, adaptive mechanisms, and will support genome-assisted breeding, contributing to the conservation and management of this iconic species in the face of environmental and pathogenic challenges.
{"title":"Chromosome-level genome assembly of the bay scallop Argopecten irradians.","authors":"Denis Grouzdev, Emmanuelle Pales Espinosa, Stephen Tettelbach, Sarah Farhat, Arnaud Tanguy, Isabelle Boutet, Nadège Guiglielmoni, Jean-François Flot, Harrison Tobi, Bassem Allam","doi":"10.1038/s41597-024-03904-x","DOIUrl":"https://doi.org/10.1038/s41597-024-03904-x","url":null,"abstract":"<p><p>The bay scallop, Argopecten irradians, is a species of major commercial, cultural, and ecological importance. It is endemic to the eastern coast of the United States, but has also been introduced to China, where it supports a significant aquaculture industry. Here, we provide an annotated chromosome-level reference genome assembly for the bay scallop, assembled using PacBio and Hi-C data. The total genome size is 845.9 Mb, distributed over 1,503 scaffolds with a scaffold N50 of 44.3 Mb. The majority (92.9%) of the assembled genome is contained within the 16 largest scaffolds, corresponding to the 16 chromosomes confirmed by Hi-C analysis. The assembly also includes the complete mitochondrial genome. Approximately 36.2% of the genome consists of repetitive elements. The BUSCO analysis showed a completeness of 96.2%. We identified 33,772 protein-coding genes. This genome assembly will be a valuable resource for future research on evolutionary dynamics, adaptive mechanisms, and will support genome-assisted breeding, contributing to the conservation and management of this iconic species in the face of environmental and pathogenic challenges.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11439060/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142353141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-28DOI: 10.1038/s41597-024-03847-3
Tibor Furtenbacher, Roland Tóbiás, Jonathan Tennyson, Robert R Gamache, Attila G Császár
The rovibrational spectrum of the water molecule is the crown jewel of high-resolution molecular spectroscopy. While its significance in numerous scientific and engineering applications and the challenges behind its interpretation have been well known, the extensive experimental analysis performed for this molecule, from the microwave to the ultraviolet, is admirable. To determine empirical energy levels for , this study utilizes an improved version of the MARVEL (Measured Active Rotational-Vibrational Energy Levels) scheme, which now takes into account multiplet constraints and first-principles energy-level splittings. This analysis delivers 19027 empirical energy values, with individual uncertainties and confidence intervals, utilizing 309 290 transition wavenumbers collected from 189 (mostly experimental) data sources. Relying on these empirical, as well as some computed, energies and first-principles intensities, an extensive composite line list, named CW2024, has been assembled. The CW2024 dataset is compared to lines in the canonical HITRAN 2020 spectroscopic database, providing guidance for future experimental investigations.
水分子的振动光谱是高分辨率分子光谱学的皇冠上的明珠。虽然它在众多科学和工程应用中的重要性及其解释背后的挑战已众所周知,但对该分子进行的从微波到紫外线的广泛实验分析令人钦佩。为了确定 H 2 16 O 的经验能级,本研究采用了 MARVEL(测量到的有源旋转振动能级)方案的改进版本,该方案现在考虑到了多重约束和第一原理能级分裂。这项分析利用从 189 个数据源(主要是实验数据源)收集到的 309 290 个转变波文数,提供了 19027 个经验能量值,其中包括各个不确定性和置信区间。根据这些经验值以及一些计算值、能量和第一原理强度,我们编制了一份内容广泛的复合线表,命名为 CW2024。CW2024 数据集与 HITRAN 2020 光谱数据库中的典型谱线进行了比较,为未来的实验研究提供了指导。
{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\">The W2024 database of the water isotopologue <ns0:math> <ns0:msubsup> <ns0:mrow> <ns0:mrow><ns0:mrow><ns0:mi>H</ns0:mi></ns0:mrow> </ns0:mrow> </ns0:mrow> <ns0:mrow><ns0:mn>2</ns0:mn></ns0:mrow> <ns0:mrow><ns0:mspace /> <ns0:mn>16</ns0:mn></ns0:mrow> </ns0:msubsup> <ns0:mrow><ns0:mrow><ns0:mi>O</ns0:mi></ns0:mrow> </ns0:mrow></ns0:math>.","authors":"Tibor Furtenbacher, Roland Tóbiás, Jonathan Tennyson, Robert R Gamache, Attila G Császár","doi":"10.1038/s41597-024-03847-3","DOIUrl":"https://doi.org/10.1038/s41597-024-03847-3","url":null,"abstract":"<p><p>The rovibrational spectrum of the water molecule is the crown jewel of high-resolution molecular spectroscopy. While its significance in numerous scientific and engineering applications and the challenges behind its interpretation have been well known, the extensive experimental analysis performed for this molecule, from the microwave to the ultraviolet, is admirable. To determine empirical energy levels for <math> <msubsup> <mrow> <mrow><mrow><mi>H</mi></mrow> </mrow> </mrow> <mrow><mn>2</mn></mrow> <mrow><mspace></mspace> <mn>16</mn></mrow> </msubsup> <mrow><mrow><mi>O</mi></mrow> </mrow> </math> , this study utilizes an improved version of the MARVEL (Measured Active Rotational-Vibrational Energy Levels) scheme, which now takes into account multiplet constraints and first-principles energy-level splittings. This analysis delivers 19027 empirical energy values, with individual uncertainties and confidence intervals, utilizing 309 290 transition wavenumbers collected from 189 (mostly experimental) data sources. Relying on these empirical, as well as some computed, energies and first-principles intensities, an extensive composite line list, named CW2024, has been assembled. The CW2024 dataset is compared to lines in the canonical HITRAN 2020 spectroscopic database, providing guidance for future experimental investigations.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11439062/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142353129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In nature, diploids and tetraploids are two common types of polyploid evolution. Misgurnus anguillicaudatus (mud loach) is a remarkable fish species that exhibits both diploid and tetraploid forms. However, reconstructing the four haplotypes of its autotetraploid genome remains unresolved. Here, we generated the first haplotype-resolved, chromosome-level genome of autotetraploid M. anguillicaudatus with a size of 4.76 Gb, contig N50 of 6.78 Mb, and scaffold N50 of 44.11 Mb. We identified approximately 2.9 Gb (61.03% of genome) of repetitive sequences and predicted 91,485 protein-coding genes. Moreover, allelic gene expression levels indicated the absence of significant dominant haplotypes within the autotetraploid loach genome. This genome will provide a valuable biological model for unraveling the mechanisms of polyploid formation and evolution, adaptation to environmental changes, and benefit for aquaculture applications and biodiversity conservation.
{"title":"Chromosome-scale and haplotype-resolved genome assembly of the autotetraploid Misgurnus anguillicaudatus.","authors":"Bing Sun, Qingshan Li, Yihui Mei, Yunbang Zhang, Yuxuan Zheng, Yuwei Huang, Xinxin Xiao, Jianwei Zhang, Gao Jian, Xiaojuan Cao","doi":"10.1038/s41597-024-03891-z","DOIUrl":"https://doi.org/10.1038/s41597-024-03891-z","url":null,"abstract":"<p><p>In nature, diploids and tetraploids are two common types of polyploid evolution. Misgurnus anguillicaudatus (mud loach) is a remarkable fish species that exhibits both diploid and tetraploid forms. However, reconstructing the four haplotypes of its autotetraploid genome remains unresolved. Here, we generated the first haplotype-resolved, chromosome-level genome of autotetraploid M. anguillicaudatus with a size of 4.76 Gb, contig N50 of 6.78 Mb, and scaffold N50 of 44.11 Mb. We identified approximately 2.9 Gb (61.03% of genome) of repetitive sequences and predicted 91,485 protein-coding genes. Moreover, allelic gene expression levels indicated the absence of significant dominant haplotypes within the autotetraploid loach genome. This genome will provide a valuable biological model for unraveling the mechanisms of polyploid formation and evolution, adaptation to environmental changes, and benefit for aquaculture applications and biodiversity conservation.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11438953/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142353142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-28DOI: 10.1038/s41597-024-03886-w
Yang Jeong Park, Sung Eun Jerng, Sungroh Yoon, Ju Li
The advent of artificial intelligence (AI) has enabled a comprehensive exploration of materials for various applications. However, AI models often prioritize frequently encountered material examples in the scientific literature, limiting the selection of suitable candidates based on inherent physical and chemical attributes. To address this imbalance, we generated a dataset consisting of 1,453,493 natural language-material narratives from OQMD, Materials Project, JARVIS, and AFLOW2 databases based on ab initio calculation results that are more evenly distributed across the periodic table. The generated text narratives were then scored by both human experts and GPT-4, based on three rubrics: technical accuracy, language and structure, and relevance and depth of content, showing similar scores but with human-scored depth of content being the most lagging. The integration of multimodal data sources and large language models holds immense potential for AI frameworks to aid the exploration and discovery of solid-state materials for specific applications of interest.
人工智能(AI)的出现使人们能够全面探索各种应用材料。然而,人工智能模型往往优先考虑科学文献中经常出现的材料实例,从而限制了根据固有物理和化学属性选择合适的候选材料。为了解决这一不平衡问题,我们从 OQMD、Materials Project、JARVIS 和 AFLOW2 数据库中生成了一个由 1,453,493 篇自然语言材料叙述组成的数据集,该数据集基于在整个元素周期表中分布较为均匀的 ab initio 计算结果。然后,人类专家和 GPT-4 根据技术准确性、语言和结构以及内容的相关性和深度三个评分标准对生成的文本叙述进行评分,结果显示得分相近,但人类评分的内容深度最为滞后。多模态数据源与大型语言模型的整合为人工智能框架提供了巨大的潜力,有助于探索和发现固态材料的特定应用。
{"title":"1.5 million materials narratives generated by chatbots.","authors":"Yang Jeong Park, Sung Eun Jerng, Sungroh Yoon, Ju Li","doi":"10.1038/s41597-024-03886-w","DOIUrl":"https://doi.org/10.1038/s41597-024-03886-w","url":null,"abstract":"<p><p>The advent of artificial intelligence (AI) has enabled a comprehensive exploration of materials for various applications. However, AI models often prioritize frequently encountered material examples in the scientific literature, limiting the selection of suitable candidates based on inherent physical and chemical attributes. To address this imbalance, we generated a dataset consisting of 1,453,493 natural language-material narratives from OQMD, Materials Project, JARVIS, and AFLOW2 databases based on ab initio calculation results that are more evenly distributed across the periodic table. The generated text narratives were then scored by both human experts and GPT-4, based on three rubrics: technical accuracy, language and structure, and relevance and depth of content, showing similar scores but with human-scored depth of content being the most lagging. The integration of multimodal data sources and large language models holds immense potential for AI frameworks to aid the exploration and discovery of solid-state materials for specific applications of interest.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11439064/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142353130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-27DOI: 10.1038/s41597-024-03669-3
Cristina Di Muri, Martina Pulieri, Davide Raho, Alexandra N Muresan, Andrea Tarallo, Jessica Titocci, Enrica Nestola, Alberto Basset, Sabrina Mazzoni, Ilaria Rosati
The integration and reuse of digital research products can be only ensured through the adoption of machine-actionable (meta)data standards enriched with semantic artefacts. This study compiles 540 semantic artefacts in environmental sciences to: i. examine their coverage in scientific domains and topics; ii. assess key aspects of their FAIRness; and iii. evaluate management and governance concerns. The analyses showed that the majority of semantic artefacts concern the terrestrial biosphere domain, and that a small portion of the total failed to meet the FAIR principles. For example, 5.5% of semantic artefacts were not available in semantic catalogues, 8% were not built with standard model languages and formats, 24.6% were published without usage licences and 22.4% without version information or with divergent versions across catalogues in which they were available. This investigation discusses common semantic practices, outlines existing gaps and suggests potential solutions to address semantic interoperability challenges in some of the resources originally designed to guarantee it.
{"title":"Assessing semantic interoperability in environmental sciences: variety of approaches and semantic artefacts.","authors":"Cristina Di Muri, Martina Pulieri, Davide Raho, Alexandra N Muresan, Andrea Tarallo, Jessica Titocci, Enrica Nestola, Alberto Basset, Sabrina Mazzoni, Ilaria Rosati","doi":"10.1038/s41597-024-03669-3","DOIUrl":"https://doi.org/10.1038/s41597-024-03669-3","url":null,"abstract":"<p><p>The integration and reuse of digital research products can be only ensured through the adoption of machine-actionable (meta)data standards enriched with semantic artefacts. This study compiles 540 semantic artefacts in environmental sciences to: i. examine their coverage in scientific domains and topics; ii. assess key aspects of their FAIRness; and iii. evaluate management and governance concerns. The analyses showed that the majority of semantic artefacts concern the terrestrial biosphere domain, and that a small portion of the total failed to meet the FAIR principles. For example, 5.5% of semantic artefacts were not available in semantic catalogues, 8% were not built with standard model languages and formats, 24.6% were published without usage licences and 22.4% without version information or with divergent versions across catalogues in which they were available. This investigation discusses common semantic practices, outlines existing gaps and suggests potential solutions to address semantic interoperability challenges in some of the resources originally designed to guarantee it.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11437166/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142353137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-27DOI: 10.1038/s41597-024-03877-x
Malarvizhi Arulraj, Veljko Petković, Susan Wen, Ralph R Ferraro, Huan Meng
Satellite-based Quantitative Precipitation Estimates (QPE) are indirect estimates of precipitation rates and as such are often prone to errors, warranting a need for characterizing the associated uncertainties before being used in application-specific studies. Moreover, multiple satellite-based QPE products are offered through different agencies, each with their own specifications, formats and requirements, posing a challenge to understanding the products uncertainties. This manuscript presents a standardized validation system named NPreciSe - NOAA Satellite-based Precipitation Validation System, which assesses the performance of satellite-based precipitation products in near real-time over the continental United States. NPreciSe is coupled with a user-interactive web platform and built using an open-source software, Python. It is structured to help (1) the end-users determine the best satellite QPE for their specific application, and (2) the algorithm developers identify systematic biases in QPE retrievals. This manuscript presents the capabilities of the NPreciSe, discusses the methodology adopted in developing the standardized validation system, and introduces the web portal.
{"title":"NPreciSe - An Automated Satellite Precipitation Product Assessment Tool.","authors":"Malarvizhi Arulraj, Veljko Petković, Susan Wen, Ralph R Ferraro, Huan Meng","doi":"10.1038/s41597-024-03877-x","DOIUrl":"https://doi.org/10.1038/s41597-024-03877-x","url":null,"abstract":"<p><p>Satellite-based Quantitative Precipitation Estimates (QPE) are indirect estimates of precipitation rates and as such are often prone to errors, warranting a need for characterizing the associated uncertainties before being used in application-specific studies. Moreover, multiple satellite-based QPE products are offered through different agencies, each with their own specifications, formats and requirements, posing a challenge to understanding the products uncertainties. This manuscript presents a standardized validation system named NPreciSe - NOAA Satellite-based Precipitation Validation System, which assesses the performance of satellite-based precipitation products in near real-time over the continental United States. NPreciSe is coupled with a user-interactive web platform and built using an open-source software, Python. It is structured to help (1) the end-users determine the best satellite QPE for their specific application, and (2) the algorithm developers identify systematic biases in QPE retrievals. This manuscript presents the capabilities of the NPreciSe, discusses the methodology adopted in developing the standardized validation system, and introduces the web portal.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11437106/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142353156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-27DOI: 10.1038/s41597-024-03884-y
Babak Ghassemi, Emma Izquierdo-Verdiguier, Astrid Verhegghen, Momchil Yordanov, Guido Lemoine, Álvaro Moreno Martínez, Davide De Marchi, Marijn van der Velde, Francesco Vuolo, Raphaël d'Andrimont
To provide the information needed for a detailed monitoring of crop types across the European Union (EU), we present an advanced 10-metre resolution map for the EU and Ukraine with 19 crop types for 2022, updating the 2018 version. Using Earth Observation (EO) and in-situ data from Eurostat's Land Use and Coverage Area Frame Survey (LUCAS) 2022, the methodology included 134,684 LUCAS Copernicus polygons, Sentinel-1 and Sentinel-2 satellite imagery, land surface temperature and a digital elevation model. Based on this data, two classification layers were developed using a Random Forest machine learning approach: a primary map and a gap-filling map to address cloud-covered gaps. The combined maps, covering 27 EU countries, show an overall accuracy of 79.3% for seven major land cover classes and 70.6% for all 19 crop types. The trained model was used to derive the 2022 map for Ukraine, demonstrating its robustness even in regions without labelled samples for model training.
{"title":"European Union crop map 2022: Earth observation's 10-meter dive into Europe's crop tapestry.","authors":"Babak Ghassemi, Emma Izquierdo-Verdiguier, Astrid Verhegghen, Momchil Yordanov, Guido Lemoine, Álvaro Moreno Martínez, Davide De Marchi, Marijn van der Velde, Francesco Vuolo, Raphaël d'Andrimont","doi":"10.1038/s41597-024-03884-y","DOIUrl":"https://doi.org/10.1038/s41597-024-03884-y","url":null,"abstract":"<p><p>To provide the information needed for a detailed monitoring of crop types across the European Union (EU), we present an advanced 10-metre resolution map for the EU and Ukraine with 19 crop types for 2022, updating the 2018 version. Using Earth Observation (EO) and in-situ data from Eurostat's Land Use and Coverage Area Frame Survey (LUCAS) 2022, the methodology included 134,684 LUCAS Copernicus polygons, Sentinel-1 and Sentinel-2 satellite imagery, land surface temperature and a digital elevation model. Based on this data, two classification layers were developed using a Random Forest machine learning approach: a primary map and a gap-filling map to address cloud-covered gaps. The combined maps, covering 27 EU countries, show an overall accuracy of 79.3% for seven major land cover classes and 70.6% for all 19 crop types. The trained model was used to derive the 2022 map for Ukraine, demonstrating its robustness even in regions without labelled samples for model training.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":null,"pages":null},"PeriodicalIF":5.8,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11436679/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142353148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}