Pub Date : 2024-11-06DOI: 10.1038/s41597-024-04001-9
Yu Zou, Jingqiang Fu, Yuan Liang, Xuan Luo, Minghui Shen, Miaoqin Huang, Yexin Chen, Weiwei You, Caihuan Ke
The ivory shell Babylonia areolata is an economically important marine benthic gastropod known for its rapid growth and high nutritional value. B. areolata is distributed in Southeast Asia and the southeast coastal areas of China. In this study, we constructed a high-quality genome for B. areolata using PacBio, Illumina, and Hi-C sequencing technologies. The genome assembly comprised 35 chromosomal sequences with a total length of 1.65 Gb. The scaffold and contig N50 lengths were 53.17 Mb and 2.64 Mb, respectively, with repeat sequences constituting 64.46% of the genome. Furthermore, 26,130 protein-coding genes and 96.75% of the genome's BUSCOs were identified. This inaugural report of a B. areolata genome provides crucial foundational information for further investigations into the biology, genomics, and genetic improvement of economic traits of this species.
{"title":"Chromosome-level genome assembly of the ivory shell Babylonia areolata.","authors":"Yu Zou, Jingqiang Fu, Yuan Liang, Xuan Luo, Minghui Shen, Miaoqin Huang, Yexin Chen, Weiwei You, Caihuan Ke","doi":"10.1038/s41597-024-04001-9","DOIUrl":"10.1038/s41597-024-04001-9","url":null,"abstract":"<p><p>The ivory shell Babylonia areolata is an economically important marine benthic gastropod known for its rapid growth and high nutritional value. B. areolata is distributed in Southeast Asia and the southeast coastal areas of China. In this study, we constructed a high-quality genome for B. areolata using PacBio, Illumina, and Hi-C sequencing technologies. The genome assembly comprised 35 chromosomal sequences with a total length of 1.65 Gb. The scaffold and contig N50 lengths were 53.17 Mb and 2.64 Mb, respectively, with repeat sequences constituting 64.46% of the genome. Furthermore, 26,130 protein-coding genes and 96.75% of the genome's BUSCOs were identified. This inaugural report of a B. areolata genome provides crucial foundational information for further investigations into the biology, genomics, and genetic improvement of economic traits of this species.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1201"},"PeriodicalIF":5.8,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11542075/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142591269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-05DOI: 10.1038/s41597-024-04050-0
Thomas E Blanford, David P Williams, J Daniel Park, Brian T Reinhardt, Kyle S Dalton, Shawn F Johnson, Daniel C Brown
This paper describes a synthetic aperture sonar (SAS) dataset collected in-air consisting of four types of targets in four environments of different complexity. The in-air laboratory based experiments produced data with a level of fidelity and ground truth accuracy that is not easily attainable in data collected underwater. The range of complexity, high level of data fidelity, and accurate ground truth provides a rich dataset with acoustic features on multiple scales. It can be used to develop new signal-processing and image reconstruction algorithms, as well as machine learning models for object detection and classification. It may also find application in model verification and validation for acoustic simulators. The dataset consists of raw acoustic time series returns, associated environmental conditions, hardware configuration, array motion, as well as the reconstructed imagery.
{"title":"An in-air synthetic aperture sonar dataset of target scattering in environments of varying complexity.","authors":"Thomas E Blanford, David P Williams, J Daniel Park, Brian T Reinhardt, Kyle S Dalton, Shawn F Johnson, Daniel C Brown","doi":"10.1038/s41597-024-04050-0","DOIUrl":"10.1038/s41597-024-04050-0","url":null,"abstract":"<p><p>This paper describes a synthetic aperture sonar (SAS) dataset collected in-air consisting of four types of targets in four environments of different complexity. The in-air laboratory based experiments produced data with a level of fidelity and ground truth accuracy that is not easily attainable in data collected underwater. The range of complexity, high level of data fidelity, and accurate ground truth provides a rich dataset with acoustic features on multiple scales. It can be used to develop new signal-processing and image reconstruction algorithms, as well as machine learning models for object detection and classification. It may also find application in model verification and validation for acoustic simulators. The dataset consists of raw acoustic time series returns, associated environmental conditions, hardware configuration, array motion, as well as the reconstructed imagery.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1196"},"PeriodicalIF":5.8,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11538465/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142582986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-05DOI: 10.1038/s41597-024-04034-0
Komlavi Akpoti, Naga Manohar Velpuri, Naoki Mizukami, Stefanie Kagone, Mansoor Leh, Kirubel Mekonnen, Afua Owusu, Primrose Tinonetsana, Michael Phiri, Lahiru Madushanka, Tharindu Perera, Paranamana Thilina Prabhath, Gabriel E L Parrish, Gabriel B Senay, Abdulkarim Seid
VegDischarge v1, which covers over 64,000 river segments in Africa, is a natural river discharge dataset produced by coupled modeling; the agro-hydrologic VegET model and the mizuRoute routing model for the period 2001-2021. Using remote sensing data and hydrological modeling system, the 1-km runoff field simulated by VegET, was routed with mizuRoute. Performance metrics show strong model reliability, with R² of 0.5-0.9, NSE of 0.6-0.9, and KGE of 0.5-0.8 at the continental scale. The total average annual discharge for Africa is quantified at 3271.4 km³·year-1, with contributions to oceanic basins: 1000.0 km³·year-1 to the North Atlantic, primarily from the Senegal, Gambia, Volta, and Niger Rivers; 1327.2 km³·year-1 to the South Atlantic, largely from the Congo River; 214.7 km³·year-1 to the Mediterranean Sea, predominantly from the Nile River; and 729.4 km³·year-1 to the Indian Ocean, with inputs from rivers such as the Zambezi. The dataset is valuable for stakeholders and researchers to understand water availability, its temporal and spatial variations that affect water-related infrastructure planning, sustainable resource allocation, and the development of climate resilience strategies.
{"title":"Advancing water security in Africa with new high-resolution discharge data.","authors":"Komlavi Akpoti, Naga Manohar Velpuri, Naoki Mizukami, Stefanie Kagone, Mansoor Leh, Kirubel Mekonnen, Afua Owusu, Primrose Tinonetsana, Michael Phiri, Lahiru Madushanka, Tharindu Perera, Paranamana Thilina Prabhath, Gabriel E L Parrish, Gabriel B Senay, Abdulkarim Seid","doi":"10.1038/s41597-024-04034-0","DOIUrl":"10.1038/s41597-024-04034-0","url":null,"abstract":"<p><p>VegDischarge v1, which covers over 64,000 river segments in Africa, is a natural river discharge dataset produced by coupled modeling; the agro-hydrologic VegET model and the mizuRoute routing model for the period 2001-2021. Using remote sensing data and hydrological modeling system, the 1-km runoff field simulated by VegET, was routed with mizuRoute. Performance metrics show strong model reliability, with R² of 0.5-0.9, NSE of 0.6-0.9, and KGE of 0.5-0.8 at the continental scale. The total average annual discharge for Africa is quantified at 3271.4 km³·year<sup>-1</sup>, with contributions to oceanic basins: 1000.0 km³·year<sup>-1</sup> to the North Atlantic, primarily from the Senegal, Gambia, Volta, and Niger Rivers; 1327.2 km³·year<sup>-1</sup> to the South Atlantic, largely from the Congo River; 214.7 km³·year<sup>-1</sup> to the Mediterranean Sea, predominantly from the Nile River; and 729.4 km³·year<sup>-1</sup> to the Indian Ocean, with inputs from rivers such as the Zambezi. The dataset is valuable for stakeholders and researchers to understand water availability, its temporal and spatial variations that affect water-related infrastructure planning, sustainable resource allocation, and the development of climate resilience strategies.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1195"},"PeriodicalIF":5.8,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11538507/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142582719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Grazing is a significant anthropogenic disturbance to grasslands, impacting their function and composition, and affecting carbon budgets and greenhouse gas emissions. However, accurate evaluations of grazing impacts are limited by the absence of long-term high-resolution grazing intensity data (i.e., the number of livestock per unit area). This study utilized census livestock data and a satellite-based vegetation index to develop the first Long-term High-resolution Grazing Intensity (LHGI) dataset of grassland in seven pastoral provinces in western China from 1980 to 2022. The LHGI dataset effectively captured spatial variations in grazing intensity, with validation at 73 sites showing a correlation coefficient (R2) of 0.78. The county-level validation showed an averaged R2 values of 0.73 ± 0.03 from 1980 to 2022. This dataset serves as a vital resource for estimating grassland carbon cycling and livestock system CH4 emissions, as well as contributing to grassland management.
{"title":"A long-term high-resolution dataset of grasslands grazing intensity in China.","authors":"Daju Wang, Qiongyan Peng, Xiangqian Li, Wen Zhang, Xiaosheng Xia, Zhangcai Qin, Peiyang Ren, Shunlin Liang, Wenping Yuan","doi":"10.1038/s41597-024-04045-x","DOIUrl":"10.1038/s41597-024-04045-x","url":null,"abstract":"<p><p>Grazing is a significant anthropogenic disturbance to grasslands, impacting their function and composition, and affecting carbon budgets and greenhouse gas emissions. However, accurate evaluations of grazing impacts are limited by the absence of long-term high-resolution grazing intensity data (i.e., the number of livestock per unit area). This study utilized census livestock data and a satellite-based vegetation index to develop the first Long-term High-resolution Grazing Intensity (LHGI) dataset of grassland in seven pastoral provinces in western China from 1980 to 2022. The LHGI dataset effectively captured spatial variations in grazing intensity, with validation at 73 sites showing a correlation coefficient (R<sup>2</sup>) of 0.78. The county-level validation showed an averaged R<sup>2</sup> values of 0.73 ± 0.03 from 1980 to 2022. This dataset serves as a vital resource for estimating grassland carbon cycling and livestock system CH<sub>4</sub> emissions, as well as contributing to grassland management.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1194"},"PeriodicalIF":5.8,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11538541/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142581804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-05DOI: 10.1038/s41597-024-04055-9
Karel Hynek, Jan Luxemburk, Jaroslav Pešek, Tomáš Čejka, Pavel Šiška
{"title":"Author Correction: CESNET-TLS-Year22: A year-spanning TLS network traffic dataset from backbone lines.","authors":"Karel Hynek, Jan Luxemburk, Jaroslav Pešek, Tomáš Čejka, Pavel Šiška","doi":"10.1038/s41597-024-04055-9","DOIUrl":"10.1038/s41597-024-04055-9","url":null,"abstract":"","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1199"},"PeriodicalIF":5.8,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11538410/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142582927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-05DOI: 10.1038/s41597-024-04048-8
Ruth H Thurstan, Hannah McCormick, Joanne Preston, Elizabeth C Ashton, Floris P Bennema, Ana Bratoš Cetinić, Janet H Brown, Tom C Cameron, Fiz da Costa, David W Donnan, Christine Ewers, Tomaso Fortibuoni, Eve Galimany, Otello Giovanardi, Romain Grancher, Daniele Grech, Maria Hayden-Hughes, Luke Helmer, K Thomas Jensen, José A Juanes, Janie Latchford, Alec B M Moore, Dimitrios K Moutopoulos, Pernille Nielsen, Henning von Nordheim, Bárbara Ondiviela, Corina Peter, Bernadette Pogoda, Bo Poulsen, Stéphane Pouvreau, Cordula Scherer, Aad C Smaal, David Smyth, Åsa Strand, John A Theodorou, Philine S E Zu Ermgassen
Ocean ecosystems have been subjected to anthropogenic influences for centuries, but the scale of past ecosystem changes is often unknown. For centuries, the European flat oyster (Ostrea edulis), an ecosystem engineer providing biogenic reef habitats, was a culturally and economically significant source of food and trade. These reef habitats are now functionally extinct, and almost no memory of where or at what scales this ecosystem once existed, or its past form, remains. The described datasets present qualitative and quantitative extracts from written records published between 1524 and 2022. These show: (1) locations of past flat oyster fisheries and/or oyster reef habitat described across its biogeographical range, with associated levels of confidence; (2) reported extent of past oyster reef habitats, and; (3) species associated with these habitats. These datasets will be of use to inform accelerating flat oyster restoration activities, to establish reference models for anchoring adaptive management of restoration action, and in contributing to global efforts to recover records on the hidden history of anthropogenic-driven ocean ecosystem degradation.
{"title":"Historical dataset details the distribution, extent and form of lost Ostrea edulis reef ecosystems.","authors":"Ruth H Thurstan, Hannah McCormick, Joanne Preston, Elizabeth C Ashton, Floris P Bennema, Ana Bratoš Cetinić, Janet H Brown, Tom C Cameron, Fiz da Costa, David W Donnan, Christine Ewers, Tomaso Fortibuoni, Eve Galimany, Otello Giovanardi, Romain Grancher, Daniele Grech, Maria Hayden-Hughes, Luke Helmer, K Thomas Jensen, José A Juanes, Janie Latchford, Alec B M Moore, Dimitrios K Moutopoulos, Pernille Nielsen, Henning von Nordheim, Bárbara Ondiviela, Corina Peter, Bernadette Pogoda, Bo Poulsen, Stéphane Pouvreau, Cordula Scherer, Aad C Smaal, David Smyth, Åsa Strand, John A Theodorou, Philine S E Zu Ermgassen","doi":"10.1038/s41597-024-04048-8","DOIUrl":"10.1038/s41597-024-04048-8","url":null,"abstract":"<p><p>Ocean ecosystems have been subjected to anthropogenic influences for centuries, but the scale of past ecosystem changes is often unknown. For centuries, the European flat oyster (Ostrea edulis), an ecosystem engineer providing biogenic reef habitats, was a culturally and economically significant source of food and trade. These reef habitats are now functionally extinct, and almost no memory of where or at what scales this ecosystem once existed, or its past form, remains. The described datasets present qualitative and quantitative extracts from written records published between 1524 and 2022. These show: (1) locations of past flat oyster fisheries and/or oyster reef habitat described across its biogeographical range, with associated levels of confidence; (2) reported extent of past oyster reef habitats, and; (3) species associated with these habitats. These datasets will be of use to inform accelerating flat oyster restoration activities, to establish reference models for anchoring adaptive management of restoration action, and in contributing to global efforts to recover records on the hidden history of anthropogenic-driven ocean ecosystem degradation.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1198"},"PeriodicalIF":5.8,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11538340/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142582932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-04DOI: 10.1038/s41597-024-04033-1
Julan Kim, Yoonsik Kim, Jeongwoen Shin, Yeong-Kuk Kim, Doo Ho Lee, Jong-Won Park, Dain Lee, Hyun-Chul Kim, Jeong-Ho Lee, Seung Hwan Lee, Jun Kim
The olive flounder, Paralichthys olivaceus, also known as the Korean halibut, is an economically important flatfish in East Asian countries. Here, we provided four fully phased genome assemblies of two different olive flounder individuals using high-fidelity long-read sequencing and their parental short-read sequencing data. We obtained 42-44 Gb of ~15-kb and ~Q30 high-fidelity long reads, and their assembly quality values were ~53. We annotated ~30 K genes, ~170-Mb repetitive sequences, and ~3 M 5-methylcytosine positions for each genome assembly, and established a graph-based draft pan-genome of the olive flounder. We identified 5 M single-nucleotide variants and 100 K structural variants with their genotype information, where ~13% of the variants were possibly fixed in the two Korean individuals. Based on our chromosome-level genome assembly, we also explored chromosome evolution in the Pleuronectiformes family, as reported earlier. Our high-quality genomic resources will contribute to future genomic selection for accelerating the breeding process of the olive flounder.
橄榄鲽(Paralichthys olivaceus),又称韩国比目鱼,是东亚国家一种具有重要经济价值的比目鱼。在这里,我们利用高保真长线程测序及其亲本短线程测序数据,提供了两个不同橄榄鲽个体的四个全相位基因组组装。我们获得了 42-44 Gb ~15-kb 和 ~Q30 高保真长读数,其组装质量值为 ~53。我们对每个基因组的 ~30 K 个基因、~170 MB 重复序列和 ~3 M 个 5-甲基胞嘧啶位置进行了注释,并建立了基于图谱的橄榄鲽泛基因组草案。我们发现了 500 万个单核苷酸变异和 100 K 个结构变异及其基因型信息,其中约 13% 的变异在两个韩国个体中可能是固定的。基于染色体水平的基因组组装,我们还探讨了早先报道的胸棘鲷家族的染色体进化。我们的高质量基因组资源将有助于未来的基因组选育,加快橄榄鲽的育种进程。
{"title":"Fully phased genome assemblies and graph-based genetic variants of the olive flounder, Paralichthys olivaceus.","authors":"Julan Kim, Yoonsik Kim, Jeongwoen Shin, Yeong-Kuk Kim, Doo Ho Lee, Jong-Won Park, Dain Lee, Hyun-Chul Kim, Jeong-Ho Lee, Seung Hwan Lee, Jun Kim","doi":"10.1038/s41597-024-04033-1","DOIUrl":"10.1038/s41597-024-04033-1","url":null,"abstract":"<p><p>The olive flounder, Paralichthys olivaceus, also known as the Korean halibut, is an economically important flatfish in East Asian countries. Here, we provided four fully phased genome assemblies of two different olive flounder individuals using high-fidelity long-read sequencing and their parental short-read sequencing data. We obtained 42-44 Gb of ~15-kb and ~Q30 high-fidelity long reads, and their assembly quality values were ~53. We annotated ~30 K genes, ~170-Mb repetitive sequences, and ~3 M 5-methylcytosine positions for each genome assembly, and established a graph-based draft pan-genome of the olive flounder. We identified 5 M single-nucleotide variants and 100 K structural variants with their genotype information, where ~13% of the variants were possibly fixed in the two Korean individuals. Based on our chromosome-level genome assembly, we also explored chromosome evolution in the Pleuronectiformes family, as reported earlier. Our high-quality genomic resources will contribute to future genomic selection for accelerating the breeding process of the olive flounder.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1193"},"PeriodicalIF":5.8,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11535246/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142576855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-02DOI: 10.1038/s41597-024-03951-4
Otávio Napoli, Dami Duarte, Patrick Alves, Darlinne Hubert Palo Soto, Henrique Evangelista de Oliveira, Anderson Rocha, Levy Boccato, Edson Borin
Human activity recognition (HAR) using smartphone inertial sensors, like accelerometers and gyroscopes, enhances smartphones' adaptability and user experience. Data distribution from these sensors is affected by several factors including sensor hardware, software, device placement, user demographics, terrain, and more. Most datasets focus on providing variability in user and (sometimes) device placement, limiting domain adaptation and generalization studies. Consequently, models trained on one dataset often perform poorly on others. Despite many publicly available HAR datasets, cross-dataset generalization remains challenging due to data format incompatibilities, such as differences in measurement units, sampling rates, and label encoding. Hence, we introduce the DAGHAR benchmark, a curated collection of datasets for domain adaptation and generalization studies in smartphone-based HAR. We standardized six datasets in terms of accelerometer units, sampling rate, gravity component, activity labels, user partitioning, and time window size, removing trivial biases while preserving intrinsic differences. This enables controlled evaluation of model generalization capabilities. Additionally, we provide baseline performance metrics from state-of-the-art machine learning models, crucial for comprehensive evaluations of generalization in HAR tasks.
利用智能手机惯性传感器(如加速计和陀螺仪)进行人类活动识别(HAR)可增强智能手机的适应性和用户体验。这些传感器的数据分布受多种因素影响,包括传感器硬件、软件、设备位置、用户人口统计、地形等。大多数数据集都侧重于提供用户和(有时)设备位置的可变性,从而限制了领域适应性和泛化研究。因此,在一个数据集上训练的模型往往在其他数据集上表现不佳。尽管有许多公开可用的 HAR 数据集,但由于数据格式不兼容(如测量单位、采样率和标签编码的差异),跨数据集泛化仍具有挑战性。因此,我们引入了 DAGHAR 基准,这是一个经过精心策划的数据集集合,用于基于智能手机的 HAR 领域适应和泛化研究。我们在加速度计单位、采样率、重力分量、活动标签、用户分区和时间窗口大小方面对六个数据集进行了标准化,消除了琐碎的偏差,同时保留了内在差异。这样就能对模型的泛化能力进行有控制的评估。此外,我们还提供了最先进的机器学习模型的基准性能指标,这对于全面评估 HAR 任务中的泛化能力至关重要。
{"title":"A benchmark for domain adaptation and generalization in smartphone-based human activity recognition.","authors":"Otávio Napoli, Dami Duarte, Patrick Alves, Darlinne Hubert Palo Soto, Henrique Evangelista de Oliveira, Anderson Rocha, Levy Boccato, Edson Borin","doi":"10.1038/s41597-024-03951-4","DOIUrl":"10.1038/s41597-024-03951-4","url":null,"abstract":"<p><p>Human activity recognition (HAR) using smartphone inertial sensors, like accelerometers and gyroscopes, enhances smartphones' adaptability and user experience. Data distribution from these sensors is affected by several factors including sensor hardware, software, device placement, user demographics, terrain, and more. Most datasets focus on providing variability in user and (sometimes) device placement, limiting domain adaptation and generalization studies. Consequently, models trained on one dataset often perform poorly on others. Despite many publicly available HAR datasets, cross-dataset generalization remains challenging due to data format incompatibilities, such as differences in measurement units, sampling rates, and label encoding. Hence, we introduce the DAGHAR benchmark, a curated collection of datasets for domain adaptation and generalization studies in smartphone-based HAR. We standardized six datasets in terms of accelerometer units, sampling rate, gravity component, activity labels, user partitioning, and time window size, removing trivial biases while preserving intrinsic differences. This enables controlled evaluation of model generalization capabilities. Additionally, we provide baseline performance metrics from state-of-the-art machine learning models, crucial for comprehensive evaluations of generalization in HAR tasks.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1192"},"PeriodicalIF":5.8,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11531562/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142564888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-02DOI: 10.1038/s41597-024-04023-3
Shekhar Sharan Goyal, Rohini Kumar, Udit Bhatia
Nitrogen (N) is essential for agricultural productivity, yet its surplus poses significant environmental risks. Currently, over half of applied nitrogen is lost, resulting in resource wastage, contributing to increased greenhouse gas emissions and biodiversity loss. Excess nitrogen persists in the environment, contaminating soil and water bodies for decades. Quantifying detailed historical N-surplus estimation in India remains limited, despite national and global-scaled assessments. Our study develops a district-level dataset of annual agricultural N-surplus from 1966-2017, integrating 12 different estimates to address uncertainties arising from multiple data sources and methodological choices across major elements of the N surplus. This dataset supports flexible spatial aggregation, aiding policymakers in implementing effective nitrogen management strategies in India. In addition, we verified our estimates by comparing them with previous studies. This work underscores the importance of setting realistic nitrogen management targets that account for inherent uncertainties, paving the way for sustainable agricultural practices in India, reducing environmental impacts, and boosting productivity.
{"title":"Assessing temporal dynamics of nitrogen surplus in Indian agriculture: district scale data from 1966 to 2017.","authors":"Shekhar Sharan Goyal, Rohini Kumar, Udit Bhatia","doi":"10.1038/s41597-024-04023-3","DOIUrl":"10.1038/s41597-024-04023-3","url":null,"abstract":"<p><p>Nitrogen (N) is essential for agricultural productivity, yet its surplus poses significant environmental risks. Currently, over half of applied nitrogen is lost, resulting in resource wastage, contributing to increased greenhouse gas emissions and biodiversity loss. Excess nitrogen persists in the environment, contaminating soil and water bodies for decades. Quantifying detailed historical N-surplus estimation in India remains limited, despite national and global-scaled assessments. Our study develops a district-level dataset of annual agricultural N-surplus from 1966-2017, integrating 12 different estimates to address uncertainties arising from multiple data sources and methodological choices across major elements of the N surplus. This dataset supports flexible spatial aggregation, aiding policymakers in implementing effective nitrogen management strategies in India. In addition, we verified our estimates by comparing them with previous studies. This work underscores the importance of setting realistic nitrogen management targets that account for inherent uncertainties, paving the way for sustainable agricultural practices in India, reducing environmental impacts, and boosting productivity.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1191"},"PeriodicalIF":5.8,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11531528/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142563863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}