Pub Date : 2025-02-12DOI: 10.1038/s41597-025-04494-y
Yue Li, Qunshan Zhao, Mingshu Wang
Traffic flow data has been used in various disciplines, including geography, transportation, urban planning, and public health. However, existing datasets often have limitations such as low spatiotemporal resolution and inconsistent quality due to data collection methods and the need for an adequate data cleaning process. This paper introduces a long-term traffic flow dataset at an intra-city scale with high spatio-temporal granularity. The dataset covers the Glasgow City Council area for four consecutive years spanning the COVID-19 pandemic, from October 2019 to September 2023, providing comprehensive temporal and spatial coverage. Such detailed information facilitates diverse applications, including traffic dynamic analysis, traffic management, infrastructure planning, and urban environment improvement. Also, it provides a valuable dataset to understand traffic flow change during a once-in-a-lifetime pandemic event.
{"title":"High-resolution traffic flow data from the urban traffic control system in Glasgow.","authors":"Yue Li, Qunshan Zhao, Mingshu Wang","doi":"10.1038/s41597-025-04494-y","DOIUrl":"10.1038/s41597-025-04494-y","url":null,"abstract":"<p><p>Traffic flow data has been used in various disciplines, including geography, transportation, urban planning, and public health. However, existing datasets often have limitations such as low spatiotemporal resolution and inconsistent quality due to data collection methods and the need for an adequate data cleaning process. This paper introduces a long-term traffic flow dataset at an intra-city scale with high spatio-temporal granularity. The dataset covers the Glasgow City Council area for four consecutive years spanning the COVID-19 pandemic, from October 2019 to September 2023, providing comprehensive temporal and spatial coverage. Such detailed information facilitates diverse applications, including traffic dynamic analysis, traffic management, infrastructure planning, and urban environment improvement. Also, it provides a valuable dataset to understand traffic flow change during a once-in-a-lifetime pandemic event.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"253"},"PeriodicalIF":5.8,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11821839/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143410284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-12DOI: 10.1038/s41597-025-04581-0
José Antonio López-Pastor, Alejandro Gil-Martinez, Antonio Hernández-Mateos, Astrid Algaba-Brazález, José Luis Gómez Tornero
Location systems based on Bluetooth Low Energy (BLE) fingerprinting using RSSI (Received Signal Strength Indicator) have been widely used for the implementation of indoor real-time location systems (IRTLS). Numerous databases have BLE RSSI information collected in multiple scenarios with measurements at various time intervals. However, all these databases collect the RSSI of the three advertising channels of the BLE protocol without considering the channel over which they are transmitted, which is known as Unified Channel Fingerprinting (UCFP). This paper describes and makes available to the scientific community for the first time a dataset using Separate Channel Fingerprinting (SCFP) and Frequency Scanned Leaky Wave Antennas (FSLWA). The dataset is composed of calibration and test data collected by two different sub-systems: one using four monopole antennas and another one using two FSLWAs. Both systems employ four BLE dongles and cover an indoor area of 35m2. The data is sequentially collected over a 94-day duration including obstacles in the environment to test the robustness of SCFP with FSLWA against traditional UCFP.
{"title":"Bluetooth Low Energy Dataset Using Separate-Channel Fingerprinting Techniques and Frequency Scanned Antennas.","authors":"José Antonio López-Pastor, Alejandro Gil-Martinez, Antonio Hernández-Mateos, Astrid Algaba-Brazález, José Luis Gómez Tornero","doi":"10.1038/s41597-025-04581-0","DOIUrl":"10.1038/s41597-025-04581-0","url":null,"abstract":"<p><p>Location systems based on Bluetooth Low Energy (BLE) fingerprinting using RSSI (Received Signal Strength Indicator) have been widely used for the implementation of indoor real-time location systems (IRTLS). Numerous databases have BLE RSSI information collected in multiple scenarios with measurements at various time intervals. However, all these databases collect the RSSI of the three advertising channels of the BLE protocol without considering the channel over which they are transmitted, which is known as Unified Channel Fingerprinting (UCFP). This paper describes and makes available to the scientific community for the first time a dataset using Separate Channel Fingerprinting (SCFP) and Frequency Scanned Leaky Wave Antennas (FSLWA). The dataset is composed of calibration and test data collected by two different sub-systems: one using four monopole antennas and another one using two FSLWAs. Both systems employ four BLE dongles and cover an indoor area of 35m<sup>2</sup>. The data is sequentially collected over a 94-day duration including obstacles in the environment to test the robustness of SCFP with FSLWA against traditional UCFP.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"255"},"PeriodicalIF":5.8,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11821836/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143410263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-12DOI: 10.1038/s41597-025-04527-6
Qinqin Kong, Matthew Huber
Increasing heat stress with climate change will threaten human health and cause broad social and economic impacts. The evaluation of such impacts depends on a reliable dataset of heat stress projection. Here we present a global dataset of the future projection of dry-bulb, wet-bulb and wet-bulb globe temperature under 1-4°C of global warming levels compared with the preindustrial era using output from 16 CMIP6 global climate models (GCMs). The dataset was bias-corrected against ERA5 reanalysis by adding the GCM-simulated climate change signal onto ERA5 baseline (1950-1976) at 3-hourly frequency. The resulting datasets are provided at fine spatial (0.25° × 0.25°) and temporal (3-hourly) resolution. We validate the bias-correction approach and demonstrate that it substantially improves the GCMs' ability to reproduce both the annual average and entire range of quantiles for all metrics within an ERA5 reference climate state. We expect the dataset to benefit future work on estimating projected changes in both mean and extreme heat stress and assessing consequential health and social-economic impacts.
{"title":"A global high-resolution and bias-corrected dataset of CMIP6 projected heat stress metrics.","authors":"Qinqin Kong, Matthew Huber","doi":"10.1038/s41597-025-04527-6","DOIUrl":"10.1038/s41597-025-04527-6","url":null,"abstract":"<p><p>Increasing heat stress with climate change will threaten human health and cause broad social and economic impacts. The evaluation of such impacts depends on a reliable dataset of heat stress projection. Here we present a global dataset of the future projection of dry-bulb, wet-bulb and wet-bulb globe temperature under 1-4°C of global warming levels compared with the preindustrial era using output from 16 CMIP6 global climate models (GCMs). The dataset was bias-corrected against ERA5 reanalysis by adding the GCM-simulated climate change signal onto ERA5 baseline (1950-1976) at 3-hourly frequency. The resulting datasets are provided at fine spatial (0.25° × 0.25°) and temporal (3-hourly) resolution. We validate the bias-correction approach and demonstrate that it substantially improves the GCMs' ability to reproduce both the annual average and entire range of quantiles for all metrics within an ERA5 reference climate state. We expect the dataset to benefit future work on estimating projected changes in both mean and extreme heat stress and assessing consequential health and social-economic impacts.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"246"},"PeriodicalIF":5.8,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11821900/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143410230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-12DOI: 10.1038/s41597-025-04583-y
Zhiyin Zhou, Yu Ma, Jie Zhang, Muhammad Firdaus, Michael Y Roleda, Delin Duan
Kappaphycus striatus is one of the carrageenan-producing red algae, and found primarily in tropical and subtropical coastal regions. Its global distribution is mainly in the Philippines, Indonesia, and Malaysia, among other locations. Here, through the high-quality chromosome-level genome sequences and assembly with PacBio HiFi and Hi-C sequencing data, we assembled one genome with a total of 211.46 Mb in size, containing a contig N50 length of 5.04 Mb and a scaffold N50 length of 5.39 Mb. After Hi-C assembly and manual adjustment to the heatmap, we deduced that 199.42 Mb of genomic sequences were anchored to 33 presumed chromosomes, which accounting for 94.31% of the entire genome. One total of 14,596 protein-coding genes and 1,673 non-coding RNAs were identified, and the 100.96 Mb of repetitive sequences accounting for 47.73% of the assembled genome. Our chromosome-level genome assembly data provide valuable references for K. striatus future nursery and breeding, and will be useful for the functional genomics interpretations and evolutionary studies of eukaryotes.
{"title":"Chromosome-level assembly and gene annotation of Kappaphycus striatus genome.","authors":"Zhiyin Zhou, Yu Ma, Jie Zhang, Muhammad Firdaus, Michael Y Roleda, Delin Duan","doi":"10.1038/s41597-025-04583-y","DOIUrl":"10.1038/s41597-025-04583-y","url":null,"abstract":"<p><p>Kappaphycus striatus is one of the carrageenan-producing red algae, and found primarily in tropical and subtropical coastal regions. Its global distribution is mainly in the Philippines, Indonesia, and Malaysia, among other locations. Here, through the high-quality chromosome-level genome sequences and assembly with PacBio HiFi and Hi-C sequencing data, we assembled one genome with a total of 211.46 Mb in size, containing a contig N50 length of 5.04 Mb and a scaffold N50 length of 5.39 Mb. After Hi-C assembly and manual adjustment to the heatmap, we deduced that 199.42 Mb of genomic sequences were anchored to 33 presumed chromosomes, which accounting for 94.31% of the entire genome. One total of 14,596 protein-coding genes and 1,673 non-coding RNAs were identified, and the 100.96 Mb of repetitive sequences accounting for 47.73% of the assembled genome. Our chromosome-level genome assembly data provide valuable references for K. striatus future nursery and breeding, and will be useful for the functional genomics interpretations and evolutionary studies of eukaryotes.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"249"},"PeriodicalIF":5.8,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11822179/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143410276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-12DOI: 10.1038/s41597-025-04529-4
Tillmann Ohm, Andres Karjus, Mikhail V Tamm, Maximilian Schich
The notion of visual similarity is essential for computer vision, and in applications and studies revolving around vector embeddings of images. However, the scarcity of benchmark datasets poses a significant hurdle in exploring how these models perceive similarity. Here we introduce Style Aligned Artwork Datasets (SALAD), and an example of fruit-SALAD with 10,000 images of fruit depictions. This combined semantic category and style benchmark comprises 100 instances each of 10 easy-to-recognize fruit categories, across 10 easy distinguishable styles. Leveraging a systematic pipeline of generative image synthesis, this visually diverse yet balanced benchmark demonstrates salient differences in semantic category and style similarity weights across various computational models, including machine learning models, feature extraction algorithms, and complexity measures, as well as conceptual models for reference. This meticulously designed dataset offers a controlled and balanced platform for the comparative analysis of similarity perception. The SALAD framework allows the comparison of how these models perform semantic category and style recognition task to go beyond the level of anecdotal knowledge, making it robustly quantifiable and qualitatively interpretable.
{"title":"fruit-SALAD: A Style Aligned Artwork Dataset to reveal similarity perception in image embeddings.","authors":"Tillmann Ohm, Andres Karjus, Mikhail V Tamm, Maximilian Schich","doi":"10.1038/s41597-025-04529-4","DOIUrl":"10.1038/s41597-025-04529-4","url":null,"abstract":"<p><p>The notion of visual similarity is essential for computer vision, and in applications and studies revolving around vector embeddings of images. However, the scarcity of benchmark datasets poses a significant hurdle in exploring how these models perceive similarity. Here we introduce Style Aligned Artwork Datasets (SALAD), and an example of fruit-SALAD with 10,000 images of fruit depictions. This combined semantic category and style benchmark comprises 100 instances each of 10 easy-to-recognize fruit categories, across 10 easy distinguishable styles. Leveraging a systematic pipeline of generative image synthesis, this visually diverse yet balanced benchmark demonstrates salient differences in semantic category and style similarity weights across various computational models, including machine learning models, feature extraction algorithms, and complexity measures, as well as conceptual models for reference. This meticulously designed dataset offers a controlled and balanced platform for the comparative analysis of similarity perception. The SALAD framework allows the comparison of how these models perform semantic category and style recognition task to go beyond the level of anecdotal knowledge, making it robustly quantifiable and qualitatively interpretable.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"254"},"PeriodicalIF":5.8,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11821872/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143410282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Continuous time-series of land cover is critical for attributing runoff, sediment and carbon changes on the Chinese Loess Plateau (CLP). However, current land cover products with annal temporal resolution lack spatial identification accuracy, particularly in capturing authentic changes of cropland, forest and grassland. To address these issues, a 30 m annual land cover dataset was proposed by the Yellow River Conservancy Commission (YRCC_LPLC) for the CLP from 1990 to 2022. Different levels of land cover were classified using different combinations of spectral, monthly and annual temporal and topographic features and Random Forest classifier. Compared to other land cover products (45.64%-73.38%), the accuracy of YRCC_LPLC has a better performance with an overall accuracy of 85.16%. The YRCC_LPLC is capable of capturing not only the explicit spatial variation but also the change direction and change time of land cover, especially for the most critical conversion of cropland into forest and grassland induced by implementation of Grain to Green Program on the CLP.
{"title":"The 30 m land cover dataset for capturing land cover changes induced by ecological restoration from 1990 to 2022 on the Chinese Loess Plateau.","authors":"Zhihui Wang, Xiaogang Shi, Shentang Dou, Miaomiao Cheng, Lulu Miao","doi":"10.1038/s41597-025-04575-y","DOIUrl":"10.1038/s41597-025-04575-y","url":null,"abstract":"<p><p>Continuous time-series of land cover is critical for attributing runoff, sediment and carbon changes on the Chinese Loess Plateau (CLP). However, current land cover products with annal temporal resolution lack spatial identification accuracy, particularly in capturing authentic changes of cropland, forest and grassland. To address these issues, a 30 m annual land cover dataset was proposed by the Yellow River Conservancy Commission (YRCC_LPLC) for the CLP from 1990 to 2022. Different levels of land cover were classified using different combinations of spectral, monthly and annual temporal and topographic features and Random Forest classifier. Compared to other land cover products (45.64%-73.38%), the accuracy of YRCC_LPLC has a better performance with an overall accuracy of 85.16%. The YRCC_LPLC is capable of capturing not only the explicit spatial variation but also the change direction and change time of land cover, especially for the most critical conversion of cropland into forest and grassland induced by implementation of Grain to Green Program on the CLP.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"252"},"PeriodicalIF":5.8,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11822044/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143410304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-12DOI: 10.1038/s41597-025-04506-x
Tânia Carvalho, Luís Antunes, Cristina Costa Santos, Nuno Moniz
The Covid-19 pandemic has affected the world at multiple levels. Data sharing was pivotal for advancing research to understand the underlying causes and implement effective containment strategies. In response, many countries have facilitated access to daily cases to support research initiatives, fostering collaboration between organisations and making such data available to the public through open data platforms. Despite the several advantages of data sharing, one of the major concerns before releasing health data is its impact on individuals' privacy. Such a sharing process should adhere to state-of-the-art methods in Data Protection by Design and by Default. In this paper, we use a Covid-19 data set from Portugal's second-largest hospital to show how it is feasible to ensure data privacy while improving the quality and maintaining the utility of the data. Our goal is to demonstrate how knowledge exchange in multidisciplinary teams of healthcare practitioners, data privacy, and data science experts is crucial to co-developing strategies that ensure high utility in de-identified data.
{"title":"Empowering open data sharing for social good: a privacy-aware approach.","authors":"Tânia Carvalho, Luís Antunes, Cristina Costa Santos, Nuno Moniz","doi":"10.1038/s41597-025-04506-x","DOIUrl":"10.1038/s41597-025-04506-x","url":null,"abstract":"<p><p>The Covid-19 pandemic has affected the world at multiple levels. Data sharing was pivotal for advancing research to understand the underlying causes and implement effective containment strategies. In response, many countries have facilitated access to daily cases to support research initiatives, fostering collaboration between organisations and making such data available to the public through open data platforms. Despite the several advantages of data sharing, one of the major concerns before releasing health data is its impact on individuals' privacy. Such a sharing process should adhere to state-of-the-art methods in Data Protection by Design and by Default. In this paper, we use a Covid-19 data set from Portugal's second-largest hospital to show how it is feasible to ensure data privacy while improving the quality and maintaining the utility of the data. Our goal is to demonstrate how knowledge exchange in multidisciplinary teams of healthcare practitioners, data privacy, and data science experts is crucial to co-developing strategies that ensure high utility in de-identified data.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"248"},"PeriodicalIF":5.8,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11821814/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143410278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-12DOI: 10.1038/s41597-025-04488-w
Joe Bracegirdle, John A Elix, Udayangani Mawalagedera, Yit-Heng Chooi, Cécile Gueidan
The history of lichen compound identification has long relied on techniques such as spot tests and TLC, which have been surpassed in sensitivity and accuracy by modern metabolomic techniques such as high-resolution MS/MS. In 2019, Olivier-Jimenez et al. released the Lichen DataBase (LDB), a library containing the Q-TOF MS/MS spectra of 251 metabolites on the MetaboLights and GNPS platforms, that has been widely used for the identification of lichen-derived unknowns. To increase the compound coverage, we have generated the Orbitrap MS/MS spectra of a further 534 lichen-derived compounds from the metabolite library of Jack Elix, housed at the CANB herbarium (Canberra, Australia). This included 399 unique metabolites that are not in the LDB, bringing the total number combined to 650. Technical validation was achieved by investigating the compounds in three Australian lichen extracts using the Library Search and Molecular Networking tools on the GNPS platform. This update provides a much larger database for lichen compound identification, which we envisage will allow refining the lichen chemotaxonomy framework and contribute to compound discovery.
{"title":"An expanded database of high-resolution MS/MS spectra for lichen-derived natural products.","authors":"Joe Bracegirdle, John A Elix, Udayangani Mawalagedera, Yit-Heng Chooi, Cécile Gueidan","doi":"10.1038/s41597-025-04488-w","DOIUrl":"10.1038/s41597-025-04488-w","url":null,"abstract":"<p><p>The history of lichen compound identification has long relied on techniques such as spot tests and TLC, which have been surpassed in sensitivity and accuracy by modern metabolomic techniques such as high-resolution MS/MS. In 2019, Olivier-Jimenez et al. released the Lichen DataBase (LDB), a library containing the Q-TOF MS/MS spectra of 251 metabolites on the MetaboLights and GNPS platforms, that has been widely used for the identification of lichen-derived unknowns. To increase the compound coverage, we have generated the Orbitrap MS/MS spectra of a further 534 lichen-derived compounds from the metabolite library of Jack Elix, housed at the CANB herbarium (Canberra, Australia). This included 399 unique metabolites that are not in the LDB, bringing the total number combined to 650. Technical validation was achieved by investigating the compounds in three Australian lichen extracts using the Library Search and Molecular Networking tools on the GNPS platform. This update provides a much larger database for lichen compound identification, which we envisage will allow refining the lichen chemotaxonomy framework and contribute to compound discovery.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"244"},"PeriodicalIF":5.8,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11814408/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143400008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-11DOI: 10.1038/s41597-025-04562-3
Noelia Vallez, Gloria Bueno, Oscar Deniz, Miguel Angel Rienda, Carlos Pastor
This dataset comprises 38 breast ultrasound scans from patients, encompassing a total of 683 images. The scans were conducted using a Siemens ACUSON S2000TM Ultrasound System from 2022 to 2023. The dataset is specifically created for the purpose of segmenting breast lesions, with the goal of identifying the area and contour of the lesion, as well as classifying it as either benign or malignant. The images can be classified into three categories based on their findings: 419 are normal, 174 are benign, and 90 are malignant. The ground truth is given as RGB segmentation masks in individual files, with black indicating normal breast tissue and green and red indicating benign and malignant lesions, respectively. This dataset enables researchers to construct and evaluate machine learning models for identifying between benign and malignant tumours in authentic breast ultrasound images. The segmentation annotations provided by expert radiologists enable accurate model training and evaluation, making this dataset a valuable asset in the field of computer vision and public health.
{"title":"BUS-UCLM: Breast ultrasound lesion segmentation dataset.","authors":"Noelia Vallez, Gloria Bueno, Oscar Deniz, Miguel Angel Rienda, Carlos Pastor","doi":"10.1038/s41597-025-04562-3","DOIUrl":"10.1038/s41597-025-04562-3","url":null,"abstract":"<p><p>This dataset comprises 38 breast ultrasound scans from patients, encompassing a total of 683 images. The scans were conducted using a Siemens ACUSON S2000TM Ultrasound System from 2022 to 2023. The dataset is specifically created for the purpose of segmenting breast lesions, with the goal of identifying the area and contour of the lesion, as well as classifying it as either benign or malignant. The images can be classified into three categories based on their findings: 419 are normal, 174 are benign, and 90 are malignant. The ground truth is given as RGB segmentation masks in individual files, with black indicating normal breast tissue and green and red indicating benign and malignant lesions, respectively. This dataset enables researchers to construct and evaluate machine learning models for identifying between benign and malignant tumours in authentic breast ultrasound images. The segmentation annotations provided by expert radiologists enable accurate model training and evaluation, making this dataset a valuable asset in the field of computer vision and public health.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"242"},"PeriodicalIF":5.8,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11814256/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143400009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-11DOI: 10.1038/s41597-025-04540-9
Allert I Bijleveld, Paula de la Barra, Hailley Danielson-Owczynsky, Livia Brunner, Anne Dekinga, Sander Holthuijsen, Job Ten Horn, Anne de Jong, Loran Kleine Schaars, Adrienne Kooij, Anita Koolhaas, Hidde Kressin, Felianne van Leersum, Simone Miguel, Luc G G de Monte, Dennis Mosk, Amin Niamir, Dorien Oude Luttikhuis, Myron A Peck, Theunis Piersma, Reyhaneh Roohi, Léon Serre-Fredj, Marten Tacoma, Evaline van Weerlee, Bas de Wit, Roeland A Bom
The Wadden Sea is the world's largest intertidal area and a UNESCO World Heritage Site. Macrozoobenthic invertebrates perform key ecological functions within intertidal areas by regulating nutrient cycles, decomposing organic matter, and providing food for fish, birds and humans. To understand ecological processes and human impacts on biodiversity, the Synoptic Intertidal BEnthic Survey (SIBES) has sampled intertidal macrozoobenthos since 2008. On average 4,109 stations across 1,200 km² of Dutch Wadden Sea mudflats are sampled from June to October to quantify the benthic invertebrate community and sediment composition, including species abundance and biomass, and grain size and mud content. The dataset published now contains 51,851 sampled stations with 3,034,760 individuals of 177 species. This paper details data collection, validation and processing methods. SIBES is ongoing and data will be updated yearly. In sharing these data, we hope to enhance collaborations and understanding of the impact of various pressures on macrozoobenthic invertebrates, sediment composition, food webs, the ecosystem, and biodiversity in the Wadden Sea and other intertidal habitats.
{"title":"SIBES: Long-term and large-scale monitoring of intertidal macrozoobenthos and sediment in the Dutch Wadden Sea.","authors":"Allert I Bijleveld, Paula de la Barra, Hailley Danielson-Owczynsky, Livia Brunner, Anne Dekinga, Sander Holthuijsen, Job Ten Horn, Anne de Jong, Loran Kleine Schaars, Adrienne Kooij, Anita Koolhaas, Hidde Kressin, Felianne van Leersum, Simone Miguel, Luc G G de Monte, Dennis Mosk, Amin Niamir, Dorien Oude Luttikhuis, Myron A Peck, Theunis Piersma, Reyhaneh Roohi, Léon Serre-Fredj, Marten Tacoma, Evaline van Weerlee, Bas de Wit, Roeland A Bom","doi":"10.1038/s41597-025-04540-9","DOIUrl":"10.1038/s41597-025-04540-9","url":null,"abstract":"<p><p>The Wadden Sea is the world's largest intertidal area and a UNESCO World Heritage Site. Macrozoobenthic invertebrates perform key ecological functions within intertidal areas by regulating nutrient cycles, decomposing organic matter, and providing food for fish, birds and humans. To understand ecological processes and human impacts on biodiversity, the Synoptic Intertidal BEnthic Survey (SIBES) has sampled intertidal macrozoobenthos since 2008. On average 4,109 stations across 1,200 km² of Dutch Wadden Sea mudflats are sampled from June to October to quantify the benthic invertebrate community and sediment composition, including species abundance and biomass, and grain size and mud content. The dataset published now contains 51,851 sampled stations with 3,034,760 individuals of 177 species. This paper details data collection, validation and processing methods. SIBES is ongoing and data will be updated yearly. In sharing these data, we hope to enhance collaborations and understanding of the impact of various pressures on macrozoobenthic invertebrates, sediment composition, food webs, the ecosystem, and biodiversity in the Wadden Sea and other intertidal habitats.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"12 1","pages":"239"},"PeriodicalIF":5.8,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11814293/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143400011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}