Pub Date : 2025-02-05DOI: 10.1038/s41597-025-04507-w
Anne Reichmuth, Oldrich Rakovec, Friedrich Boeing, Sebastian Müller, Luis Samaniego, Andreas Marx, Hanna Komischke, Andreas Schmidt, Daniel Doktor
Ongoing ecological research is concerned with analysing climate-induced changes in species distribution. For this purpose, the projection must have high-quality bioclimatic variables from historical and future climatic periods for the projection. To date, there are many global bioclimatic variables on this topic. Nevertheless, a consistent dataset with identical model variables from historic and projected periods is rare. We present 26 bioclimatic variables that are calculated based on a large ensemble consisting of 70 bias-adjusted GCM-RCM simulations for 1971-2098. Both, the historic and the projection periods were calculated using the same models to ensure consistency between the periods. The variables are validated against E-OBS observations from which we calculated the same bioclimatic variables. For projection periods we chose 20 year ranges between 2021-2098. Here, we offer two versions of them: (1) variables separated into RCP 2.6, 4.5 and 8.5, including percentiles among the realisations and within the RCPs; and (2) variables per realisation separately. We then extracted the temporal 5th, 50th and 95th percentile per period as representing values.
{"title":"BioVars - A bioclimatic dataset for Europe based on a large regional climate ensemble for periods in 1971-2098.","authors":"Anne Reichmuth, Oldrich Rakovec, Friedrich Boeing, Sebastian Müller, Luis Samaniego, Andreas Marx, Hanna Komischke, Andreas Schmidt, Daniel Doktor","doi":"10.1038/s41597-025-04507-w","DOIUrl":"10.1038/s41597-025-04507-w","url":null,"abstract":"<p><p>Ongoing ecological research is concerned with analysing climate-induced changes in species distribution. For this purpose, the projection must have high-quality bioclimatic variables from historical and future climatic periods for the projection. To date, there are many global bioclimatic variables on this topic. Nevertheless, a consistent dataset with identical model variables from historic and projected periods is rare. We present 26 bioclimatic variables that are calculated based on a large ensemble consisting of 70 bias-adjusted GCM-RCM simulations for 1971-2098. Both, the historic and the projection periods were calculated using the same models to ensure consistency between the periods. The variables are validated against E-OBS observations from which we calculated the same bioclimatic variables. For projection periods we chose 20 year ranges between 2021-2098. Here, we offer two versions of them: (1) variables separated into RCP 2.6, 4.5 and 8.5, including percentiles among the realisations and within the RCPs; and (2) variables per realisation separately. We then extracted the temporal 5th, 50th and 95th percentile per period as representing values.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"217"},"PeriodicalIF":5.8,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11799319/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143256730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent studies have demonstrated that integrating AI into colonoscopy procedures significantly improves the adenoma detection rate (ADR) and reduces the adenoma miss rate (AMR). However, few studies address the critical issue of endoscopist-AI collaboration in real-world settings. Eye-tracking data collection is considered a promising approach to uncovering how endoscopists and AI interact and influence each other during colonoscopy procedures. A common limitation of existing studies is their reliance on retrospective video clips, which fail to capture the dynamic demands of real-time colonoscopy, where endoscopists must simultaneously navigate the colonoscope and identify lesions on the screen. To address this gap, we established a dataset to analyze changes in endoscopists' eye movements during the colonoscopy withdrawal phase. Eye-tracking data was collected from graduate students, nurses, senior endoscopists, and novice endoscopists while they reviewed retrospectively recorded colonoscopy withdrawal videos, both with and without computer-aided detection (CADe) assistance. Furthermore, 80 real-time video segments were prospectively collected during endoscopists' actual colonoscopy withdrawal procedures, comprising 43 segments with CADe assistance and 37 segments without assistance (normal control).
{"title":"Eye-tracking dataset of endoscopist-AI teaming during colonoscopy: Retrospective and real-time acquisition.","authors":"Yan Zhu, Rui-Jie Yang, Pei-Yao Fu, Zhen Zhang, Yi-Zhe Zhang, Quan-Lin Li, Shuo Wang, Ping-Hong Zhou","doi":"10.1038/s41597-025-04535-6","DOIUrl":"10.1038/s41597-025-04535-6","url":null,"abstract":"<p><p>Recent studies have demonstrated that integrating AI into colonoscopy procedures significantly improves the adenoma detection rate (ADR) and reduces the adenoma miss rate (AMR). However, few studies address the critical issue of endoscopist-AI collaboration in real-world settings. Eye-tracking data collection is considered a promising approach to uncovering how endoscopists and AI interact and influence each other during colonoscopy procedures. A common limitation of existing studies is their reliance on retrospective video clips, which fail to capture the dynamic demands of real-time colonoscopy, where endoscopists must simultaneously navigate the colonoscope and identify lesions on the screen. To address this gap, we established a dataset to analyze changes in endoscopists' eye movements during the colonoscopy withdrawal phase. Eye-tracking data was collected from graduate students, nurses, senior endoscopists, and novice endoscopists while they reviewed retrospectively recorded colonoscopy withdrawal videos, both with and without computer-aided detection (CADe) assistance. Furthermore, 80 real-time video segments were prospectively collected during endoscopists' actual colonoscopy withdrawal procedures, comprising 43 segments with CADe assistance and 37 segments without assistance (normal control).</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"212"},"PeriodicalIF":5.8,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11799166/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143256773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-05DOI: 10.1038/s41597-025-04561-4
Ye Jin Ha, Seong-Hwan Park, Seon-Kyu Kim, Ka Hee Tak, Jeong-Hwan Kim, Chan Wook Kim, Yong Sik Yoon, Seon-Young Kim, Jong Lyul Lee
Pseudomyxoma peritonei (PMP), a rare condition characterized by mucinous ascites in the peritoneal cavity, often leads to a poor prognosis. However, omics profiling of this disease remains significantly underexplored. Here, we present single-cell transcriptomic profiling of five PMP cases to identify cell type-specific gene features associated with PMP pathogenesis. Additionally, we provide bulk RNA-seq datasets from two independent cohorts: 19 fresh frozen tissue samples (12 PMPs) and 34 formalin-fixed paraffin-embedded (FFPE) samples (25 PMPs). We also offer protein expression data from a tissue microarray (TMA) analysis of 90 samples (45 PMPs). Our single-cell and bulk transcriptomic profiles, along with TMA verifications, reveal the cellular diversity of PMP, highlighting the coexistence of epithelial and mesenchymal characteristics within PMP cells. These datasets enhance our understanding of PMP pathogenesis and provide a valuable resource for uncovering the intricate molecular landscape of PMP, with the potential to improve clinical utility through further research.
{"title":"Molecular characterization of Pseudomyxoma peritonei with single-cell and bulk RNA sequencing.","authors":"Ye Jin Ha, Seong-Hwan Park, Seon-Kyu Kim, Ka Hee Tak, Jeong-Hwan Kim, Chan Wook Kim, Yong Sik Yoon, Seon-Young Kim, Jong Lyul Lee","doi":"10.1038/s41597-025-04561-4","DOIUrl":"10.1038/s41597-025-04561-4","url":null,"abstract":"<p><p>Pseudomyxoma peritonei (PMP), a rare condition characterized by mucinous ascites in the peritoneal cavity, often leads to a poor prognosis. However, omics profiling of this disease remains significantly underexplored. Here, we present single-cell transcriptomic profiling of five PMP cases to identify cell type-specific gene features associated with PMP pathogenesis. Additionally, we provide bulk RNA-seq datasets from two independent cohorts: 19 fresh frozen tissue samples (12 PMPs) and 34 formalin-fixed paraffin-embedded (FFPE) samples (25 PMPs). We also offer protein expression data from a tissue microarray (TMA) analysis of 90 samples (45 PMPs). Our single-cell and bulk transcriptomic profiles, along with TMA verifications, reveal the cellular diversity of PMP, highlighting the coexistence of epithelial and mesenchymal characteristics within PMP cells. These datasets enhance our understanding of PMP pathogenesis and provide a valuable resource for uncovering the intricate molecular landscape of PMP, with the potential to improve clinical utility through further research.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"213"},"PeriodicalIF":5.8,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11799422/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143256779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-05DOI: 10.1038/s41597-025-04551-6
Maria Teresa Brunetti, Stefano Luigi Gariano, Massimo Melillo, Mauro Rossi, Silvia Peruccacci
With the increasing use of data-driven landslide prediction models also based on artificial intelligence, the availability of accurate information on the occurrence of landslides and the rigorous reconstruction of their triggering rainfall conditions are crucial. To this end, an enhanced rainfall-induced landslide catalogue, e-ITALICA, is presented here. e-ITALICA contains spatial and temporal information on 6312 rainfall-induced landslides that occurred in Italy between 1996 and 2021 (already listed in the previous ITALICA catalogue published in 2023), with the addition of their rainfall triggering conditions in terms of rainfall duration D (h) and cumulative event rainfall E (mm). The triggering conditions are calculated using hourly rainfall measurements from 4033 rain gauges and applying a rigorous and reproducible method. In addition, topographic and land cover information is also provided. e-ITALICA can be used to analyse rainfall conditions capable of triggering landslides, to calibrate and validate physically based landslide prediction models, and to define empirical rainfall thresholds from local to national scales in Italy, thus contributing to landslide risk reduction.
{"title":"An enhanced rainfall-induced landslide catalogue in Italy.","authors":"Maria Teresa Brunetti, Stefano Luigi Gariano, Massimo Melillo, Mauro Rossi, Silvia Peruccacci","doi":"10.1038/s41597-025-04551-6","DOIUrl":"10.1038/s41597-025-04551-6","url":null,"abstract":"<p><p>With the increasing use of data-driven landslide prediction models also based on artificial intelligence, the availability of accurate information on the occurrence of landslides and the rigorous reconstruction of their triggering rainfall conditions are crucial. To this end, an enhanced rainfall-induced landslide catalogue, e-ITALICA, is presented here. e-ITALICA contains spatial and temporal information on 6312 rainfall-induced landslides that occurred in Italy between 1996 and 2021 (already listed in the previous ITALICA catalogue published in 2023), with the addition of their rainfall triggering conditions in terms of rainfall duration D (h) and cumulative event rainfall E (mm). The triggering conditions are calculated using hourly rainfall measurements from 4033 rain gauges and applying a rigorous and reproducible method. In addition, topographic and land cover information is also provided. e-ITALICA can be used to analyse rainfall conditions capable of triggering landslides, to calibrate and validate physically based landslide prediction models, and to define empirical rainfall thresholds from local to national scales in Italy, thus contributing to landslide risk reduction.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"216"},"PeriodicalIF":5.8,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11799417/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143256725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Malus niedzwetzkyana (MN) is the mother of Rosybloom hybrids, ornamental crabapples with red and purple flowers. We present a high-quality chromosome-scale genome for MN with a size of 672.64 Mb, anchored to 17 chromosomes and with a high BUSCO completeness score of 98.6%, reaching the 'gold standard' level. Moreover, our assembly has captured 28 telomeres. A total of 43,813 protein-coding genes was annotated in the MN genome. The assembled high quality provides a valuable opportunity to enhance our understanding of the genetic basis of flower colour and other ornamental traits in crabapples, thereby advancing the field of genetics and breeding.
{"title":"Chromosome-level genome assembly of Malus niedzwetzkyana, the mother of Rosybloom crabapple.","authors":"Ruizhen Wang, Jian Quan, Boyang Liu, Yu Wei, Hengxing Liu, Ran He, Ling Guo, Leiming Dong","doi":"10.1038/s41597-024-04221-z","DOIUrl":"10.1038/s41597-024-04221-z","url":null,"abstract":"<p><p>Malus niedzwetzkyana (MN) is the mother of Rosybloom hybrids, ornamental crabapples with red and purple flowers. We present a high-quality chromosome-scale genome for MN with a size of 672.64 Mb, anchored to 17 chromosomes and with a high BUSCO completeness score of 98.6%, reaching the 'gold standard' level. Moreover, our assembly has captured 28 telomeres. A total of 43,813 protein-coding genes was annotated in the MN genome. The assembled high quality provides a valuable opportunity to enhance our understanding of the genetic basis of flower colour and other ornamental traits in crabapples, thereby advancing the field of genetics and breeding.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"211"},"PeriodicalIF":5.8,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11799236/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143256746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-05DOI: 10.1038/s41597-025-04508-9
Elma Dervić, Katharina Ledebur, Stefan Thurner, Peter Klimek
Comorbidity networks have become a valuable tool to support data-driven biomedical research. Yet, studies often are severely hindered by the availability of the necessary comprehensive data, often due to the sensitivity of health care information. This study presents a population-wide comorbidity network dataset derived from 45 million hospital stays of 8.9 million patients over 17 years in Austria. We present co-occurrence networks of hospital diagnoses, stratified by age, sex, and observation period in a total of 96 different subgroups. For each of these groups we report a range of association measures (e.g., count data, and odds ratios) for all pairs of diagnoses. The dataset provides the possibility to researchers to create their own, tailor-made comorbidity networks from real patient data that can be used as a starting point in quantitative and machine learning methods. This data platform is intended to lead to deeper insights into a wide range of epidemiological, public health, and biomedical research questions.
{"title":"Comorbidity Networks From Population-Wide Health Data: Aggregated Data of 8.9M Hospital Patients (1997-2014).","authors":"Elma Dervić, Katharina Ledebur, Stefan Thurner, Peter Klimek","doi":"10.1038/s41597-025-04508-9","DOIUrl":"10.1038/s41597-025-04508-9","url":null,"abstract":"<p><p>Comorbidity networks have become a valuable tool to support data-driven biomedical research. Yet, studies often are severely hindered by the availability of the necessary comprehensive data, often due to the sensitivity of health care information. This study presents a population-wide comorbidity network dataset derived from 45 million hospital stays of 8.9 million patients over 17 years in Austria. We present co-occurrence networks of hospital diagnoses, stratified by age, sex, and observation period in a total of 96 different subgroups. For each of these groups we report a range of association measures (e.g., count data, and odds ratios) for all pairs of diagnoses. The dataset provides the possibility to researchers to create their own, tailor-made comorbidity networks from real patient data that can be used as a starting point in quantitative and machine learning methods. This data platform is intended to lead to deeper insights into a wide range of epidemiological, public health, and biomedical research questions.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"215"},"PeriodicalIF":5.8,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11799221/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143256772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-05DOI: 10.1038/s41597-025-04524-9
Bing Song, Fernando D K Tria, Josip Skejo
Cellulose is a carbon source widespread in nature. However, it is a difficult task for any organism to get carbon atoms from the cellulose as it has a highly complex structure. Only a few taxonomic groups are known to decompose cellulose. They do it by producing cellulases, the various enzymes which break beta-glycosidic bonds in the cellulose. Cellulases were identified in 1,735 metagenomes from 225 bioprojects. The set of 12,837 metagenome-derived cellulases encompass three catalytic functions: exoglucanases (CBH, 1,042), endoglucanases (EG, 5,685), and beta-glucosidases (βG, 6,110). All three enzymatic functions are thought to be necessary for driving cellulase to a cascade of reactions that can make cellulose available as glucose. These metagenome-derived cellulases were clustered into protein families for each EC category individually, resulting in a total of 136 clusters, with the majority observed for EG (97 clusters), followed by βG (19 clusters) and CBH (19 clusters). These clusters provided a useful cellulase dataset for future research on cellulase utilization.
{"title":"Prokaryotic cellulase gene clusters derived from 2,305 metagenomes.","authors":"Bing Song, Fernando D K Tria, Josip Skejo","doi":"10.1038/s41597-025-04524-9","DOIUrl":"10.1038/s41597-025-04524-9","url":null,"abstract":"<p><p>Cellulose is a carbon source widespread in nature. However, it is a difficult task for any organism to get carbon atoms from the cellulose as it has a highly complex structure. Only a few taxonomic groups are known to decompose cellulose. They do it by producing cellulases, the various enzymes which break beta-glycosidic bonds in the cellulose. Cellulases were identified in 1,735 metagenomes from 225 bioprojects. The set of 12,837 metagenome-derived cellulases encompass three catalytic functions: exoglucanases (CBH, 1,042), endoglucanases (EG, 5,685), and beta-glucosidases (βG, 6,110). All three enzymatic functions are thought to be necessary for driving cellulase to a cascade of reactions that can make cellulose available as glucose. These metagenome-derived cellulases were clustered into protein families for each EC category individually, resulting in a total of 136 clusters, with the majority observed for EG (97 clusters), followed by βG (19 clusters) and CBH (19 clusters). These clusters provided a useful cellulase dataset for future research on cellulase utilization.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"218"},"PeriodicalIF":5.8,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11799192/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143255287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-04DOI: 10.1038/s41597-025-04486-y
Byeong-Hee Kim, Young-Oh Kim, Jonghun Kam
Long-term streamflow data at a hyper-resolution (less than 1 km) is essential for hydroclimatic extreme and ecological assessment, which is not available over a river basin where rapid socioeconomic growth have been experienced. Here, we use the Variable Infiltration Capacity-River Routing Model (VIC-RRM) to reconstruct naturalized daily streamflow at 90-meter resolution for the Geum River, one of South Korea's major rivers, over 1951-2020. VIC-RRM demonstrates high temporal consistency with a correlation coefficient exceeding 0.6 for observed streamflow seasonality at over 60% of the 90 gauge stations along the Geum River. However, 36% of the stations show low modified Kling-Gupta Efficiency (0.2-0.4), primarily due to uncertainties in runoff data and human disturbance impacts like irrigation and reservoir storage. Our simulated naturalized data reveal decadal variability in the 1990s and an increase in day-to-day variability of the Geum River in the 2010s compared to those in the 1970s. This dataset provides physically consistent naturalized streamflow data for reference data to evaluate climate change-driven changes in streamflow for the Geum River.
{"title":"Hyper-resolution naturalized streamflow data for Geum River in South Korea (1951-2020).","authors":"Byeong-Hee Kim, Young-Oh Kim, Jonghun Kam","doi":"10.1038/s41597-025-04486-y","DOIUrl":"10.1038/s41597-025-04486-y","url":null,"abstract":"<p><p>Long-term streamflow data at a hyper-resolution (less than 1 km) is essential for hydroclimatic extreme and ecological assessment, which is not available over a river basin where rapid socioeconomic growth have been experienced. Here, we use the Variable Infiltration Capacity-River Routing Model (VIC-RRM) to reconstruct naturalized daily streamflow at 90-meter resolution for the Geum River, one of South Korea's major rivers, over 1951-2020. VIC-RRM demonstrates high temporal consistency with a correlation coefficient exceeding 0.6 for observed streamflow seasonality at over 60% of the 90 gauge stations along the Geum River. However, 36% of the stations show low modified Kling-Gupta Efficiency (0.2-0.4), primarily due to uncertainties in runoff data and human disturbance impacts like irrigation and reservoir storage. Our simulated naturalized data reveal decadal variability in the 1990s and an increase in day-to-day variability of the Geum River in the 2010s compared to those in the 1970s. This dataset provides physically consistent naturalized streamflow data for reference data to evaluate climate change-driven changes in streamflow for the Geum River.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"210"},"PeriodicalIF":5.8,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11794663/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143190413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-04DOI: 10.1038/s41597-025-04539-2
Steven R Evett, Gary W Marek, Paul D Colaizzi, Karen S Copeland, Brice B Ruthardt, Terry A Howell
A collection of datasets describing six years of experiments on maize (Zea mays, L.) (corn) is presented (1989, 1990, 1994, 2013, 2016, and 2018). Four weighing lysimeters were used to determine crop evapotranspiration (ET). In-soil and above ground microclimate and ET data are presented on a 15-minute interval as are weather data for all days of the year. Data analysis determined ET, precipitation, irrigation, and dew and frost accumulation on a 15-minute basis from lysimeter mass data. Soil water content data from calibrated neutron probe readings is presented on a periodic basis. Crop planting, harvest, fertilization, pest control, and other agronomic information are presented in agronomic calendars by day of year. Crop growth data are presented on a periodic basis throughout the growing season, as are final crop biomass and yield data. The data are suitable for analysis of effects of irrigation and other agronomic decisions on crop yield and water productivity in the Southern High Plains region of the USA, for model calibration and testing, and for model improvement.
{"title":"The Bushland, Texas, maize evapotranspiration, growth, and yield dataset Collection.","authors":"Steven R Evett, Gary W Marek, Paul D Colaizzi, Karen S Copeland, Brice B Ruthardt, Terry A Howell","doi":"10.1038/s41597-025-04539-2","DOIUrl":"10.1038/s41597-025-04539-2","url":null,"abstract":"<p><p>A collection of datasets describing six years of experiments on maize (Zea mays, L.) (corn) is presented (1989, 1990, 1994, 2013, 2016, and 2018). Four weighing lysimeters were used to determine crop evapotranspiration (ET). In-soil and above ground microclimate and ET data are presented on a 15-minute interval as are weather data for all days of the year. Data analysis determined ET, precipitation, irrigation, and dew and frost accumulation on a 15-minute basis from lysimeter mass data. Soil water content data from calibrated neutron probe readings is presented on a periodic basis. Crop planting, harvest, fertilization, pest control, and other agronomic information are presented in agronomic calendars by day of year. Crop growth data are presented on a periodic basis throughout the growing season, as are final crop biomass and yield data. The data are suitable for analysis of effects of irrigation and other agronomic decisions on crop yield and water productivity in the Southern High Plains region of the USA, for model calibration and testing, and for model improvement.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"209"},"PeriodicalIF":5.8,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11794518/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143190429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-04DOI: 10.1038/s41597-025-04519-6
Shutong Du, Zhitong Liu, Bingyu Pan
To promote the development of scientific fitness research and practice, we propose the Chinese Knowledge Graph Dataset in the Field of Scientific Fitness (FitKG-CN). This knowledge graph contains over 10,000 fitness-related terms, categorized into eight main groups: body parts, items of exercise, fitness movement, equipment and tools, exercise goals, anatomical structures, nutrients, and technical terms. The construction of FitKG-CN is based on authoritative data sources, undergoing rigorous preprocessing, including noise removal, format standardization, and normalization of entities and relationships. The data is manually annotated on a professional platform and ultimately stored in a Neo4j graph database for visualization. Additionally, we trained a Chinese SpERT model using the manually annotated data to enhance the automation of data processing. The experimental results show that the model achieved an F1 score of 94.05% in entity recognition tasks and 82.00% in relation extraction tasks, validating the effectiveness of the model and improving the scalability of the dataset.
{"title":"A Chinese Knowledge Graph Dataset in the Field of Scientific Fitness.","authors":"Shutong Du, Zhitong Liu, Bingyu Pan","doi":"10.1038/s41597-025-04519-6","DOIUrl":"10.1038/s41597-025-04519-6","url":null,"abstract":"<p><p>To promote the development of scientific fitness research and practice, we propose the Chinese Knowledge Graph Dataset in the Field of Scientific Fitness (FitKG-CN). This knowledge graph contains over 10,000 fitness-related terms, categorized into eight main groups: body parts, items of exercise, fitness movement, equipment and tools, exercise goals, anatomical structures, nutrients, and technical terms. The construction of FitKG-CN is based on authoritative data sources, undergoing rigorous preprocessing, including noise removal, format standardization, and normalization of entities and relationships. The data is manually annotated on a professional platform and ultimately stored in a Neo4j graph database for visualization. Additionally, we trained a Chinese SpERT model using the manually annotated data to enhance the automation of data processing. The experimental results show that the model achieved an F1 score of 94.05% in entity recognition tasks and 82.00% in relation extraction tasks, validating the effectiveness of the model and improving the scalability of the dataset.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"205"},"PeriodicalIF":5.8,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11794866/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143190400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}