首页 > 最新文献

Data in Brief最新文献

英文 中文
ONUBAD: A comprehensive dataset for automated conversion of Bangla regional dialects into standard Bengali dialect.
IF 1 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-06 eCollection Date: 2025-02-01 DOI: 10.1016/j.dib.2025.111276
Nusrat Sultana, Rumana Yasmin, Bijon Mallik, Mohammad Shorif Uddin

Despite significant research on the Bangla language in Natural Language Processing (NLP), there remains a notable resource deficit for its diverse regional dialects, such as those spoken in Chittagong, Sylhet, and Barisal. These dialects, often considered unintelligible to speakers of Standard Bengali, pose challenges due to their unique grammatical structures and phonetic variations. Some linguists categorize them as distinct languages. To address this, we present ONUBAD, a large and freely available dataset for the automatic translation of Chittagong, Sylhet, and Barisal dialects into Standard Bangla using a Neural Machine Translation (NMT) system. ONUBAD provides a parallel corpus of 1540 words, 130 clauses, and 980 sentences per regional dialect and their standard counterparts along with English translation. The dataset includes metadata on phonetic variations and grammatical features, aiming to bridge the gap between standard and non-standard forms of Bangla. It serves as a valuable resource for researchers in NLP, dialect studies, and linguistic preservation, helping to develop more accurate and contextually relevant translation models. The dataset was collected between July and September 2024 from diverse sources such as books, websites, and regional people with the help of regional dialect specialists. It is hosted by the Department of Computer Science and Engineering, Jahangirnagar University, and is freely accessible at https://data.mendeley.com/datasets/6ft99kf89b/2.

{"title":"ONUBAD: A comprehensive dataset for automated conversion of Bangla regional dialects into standard Bengali dialect.","authors":"Nusrat Sultana, Rumana Yasmin, Bijon Mallik, Mohammad Shorif Uddin","doi":"10.1016/j.dib.2025.111276","DOIUrl":"https://doi.org/10.1016/j.dib.2025.111276","url":null,"abstract":"<p><p>Despite significant research on the Bangla language in Natural Language Processing (NLP), there remains a notable resource deficit for its diverse regional dialects, such as those spoken in Chittagong, Sylhet, and Barisal. These dialects, often considered unintelligible to speakers of Standard Bengali, pose challenges due to their unique grammatical structures and phonetic variations. Some linguists categorize them as distinct languages. To address this, we present ONUBAD, a large and freely available dataset for the automatic translation of Chittagong, Sylhet, and Barisal dialects into Standard Bangla using a Neural Machine Translation (NMT) system. ONUBAD provides a parallel corpus of 1540 words, 130 clauses, and 980 sentences per regional dialect and their standard counterparts along with English translation. The dataset includes metadata on phonetic variations and grammatical features, aiming to bridge the gap between standard and non-standard forms of Bangla. It serves as a valuable resource for researchers in NLP, dialect studies, and linguistic preservation, helping to develop more accurate and contextually relevant translation models. The dataset was collected between July and September 2024 from diverse sources such as books, websites, and regional people with the help of regional dialect specialists. It is hosted by the Department of Computer Science and Engineering, Jahangirnagar University, and is freely accessible at https://data.mendeley.com/datasets/6ft99kf89b/2.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"111276"},"PeriodicalIF":1.0,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11787450/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143078918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Internal transcribed spacer metagenomics data unravelling the core fungal community structure residing the wheat and maize rhizosphere.
IF 1 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-04 eCollection Date: 2025-02-01 DOI: 10.1016/j.dib.2025.111269
Sadia Latif, Rizwana Kousar, Anum Fatima, Naeem Khan, Hina Fatimah

Plants are colonized by a vast array of microorganisms that outstrip plant cell densities and genes, thus referred to as plant's second genome or extended genome. The microbial communities exert a significant influence on the vigor, growth, development and productivity of plants by supporting nutrient acquisition, organic matter decomposition and tolerance against biotic and abiotic stresses such as heat, high salt, drought and disease, by regulating plant defense responses. The rhizosphere is a complex micro-ecological zone in the direct vicinity of plant roots and is considered a hotspot of microbial diversity. The exploration and understanding of the rhizosphere microbes can be valuable in sustainable agriculture. The present dataset aimed to reveal the core fungal community residing in the rhizosphere of wheat ( Triticum aestivum L.) and maize ( Zea mays L.). The rhizosphere fungal communities were explored via amplicon sequencing of the Internal Transcribed Spacer (ITS) region using the IonS5TMXL sequencing platform. The data obtained were filtered and the high-quality reads were clustered into Microbial Operational Taxonomic Units (OTUs) at 97 % similarity. Further, the data were subjected to alpha and beta diversity analysis. The OTUs obtained from the wheat rhizosphere soils of Kallar Syedian (TA.KS), Islamabad (TA.ISB) and Mirpur Azad Kashmir (TA.MAK) were 603, 513 and 424, respectively, whereas 616 OTUs were found in the maize rhizosphere soil of Kallar Syedian (ZM.KS). The major fungal phyla inhabiting the rhizosphere soils were Ascomycota, accounting for 94 %, 97 %, 95 % and 90 % of the fungal community in ZM.KS, TA.KS, TA.MAK and TA.ISB, respectively. Alpha and beta diversity analysis depicted the presence of considerable variations in the relative abundance of fungal groups residing in the rhizosphere soils. The dataset obtained can be employed in meta-analysis studies that will pave the way toward understanding the core fungal community structure and will directly aid in enhancing crop productivity through rhizosphere engineering.

{"title":"Internal transcribed spacer metagenomics data unravelling the core fungal community structure residing the wheat and maize rhizosphere.","authors":"Sadia Latif, Rizwana Kousar, Anum Fatima, Naeem Khan, Hina Fatimah","doi":"10.1016/j.dib.2025.111269","DOIUrl":"10.1016/j.dib.2025.111269","url":null,"abstract":"<p><p>Plants are colonized by a vast array of microorganisms that outstrip plant cell densities and genes, thus referred to as plant's second genome or extended genome. The microbial communities exert a significant influence on the vigor, growth, development and productivity of plants by supporting nutrient acquisition, organic matter decomposition and tolerance against biotic and abiotic stresses such as heat, high salt, drought and disease, by regulating plant defense responses. The rhizosphere is a complex micro-ecological zone in the direct vicinity of plant roots and is considered a hotspot of microbial diversity. The exploration and understanding of the rhizosphere microbes can be valuable in sustainable agriculture. The present dataset aimed to reveal the core fungal community residing in the rhizosphere of wheat ( <b><i>Triticum aestivum</i></b> L.) and maize ( <b><i>Zea mays</i></b> L.). The rhizosphere fungal communities were explored via amplicon sequencing of the Internal Transcribed Spacer (ITS) region using the IonS5<sup>TM</sup>XL sequencing platform. The data obtained were filtered and the high-quality reads were clustered into Microbial Operational Taxonomic Units (OTUs) at 97 % similarity. Further, the data were subjected to alpha and beta diversity analysis. The OTUs obtained from the wheat rhizosphere soils of Kallar Syedian (TA.KS), Islamabad (TA.ISB) and Mirpur Azad Kashmir (TA.MAK) were 603, 513 and 424, respectively, whereas 616 OTUs were found in the maize rhizosphere soil of Kallar Syedian (ZM.KS). The major fungal phyla inhabiting the rhizosphere soils were Ascomycota, accounting for 94 %, 97 %, 95 % and 90 % of the fungal community in ZM.KS, TA.KS, TA.MAK and TA.ISB, respectively. Alpha and beta diversity analysis depicted the presence of considerable variations in the relative abundance of fungal groups residing in the rhizosphere soils. The dataset obtained can be employed in meta-analysis studies that will pave the way toward understanding the core fungal community structure and will directly aid in enhancing crop productivity through rhizosphere engineering.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"111269"},"PeriodicalIF":1.0,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11772148/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143058370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vertical two-phase flow regimes in an annulus image dataset - Texas A&M university.
IF 1 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-04 eCollection Date: 2025-02-01 DOI: 10.1016/j.dib.2024.111245
Kaushik Manikonda, Chinemerem Obi, Aarya Abhay Brahmane, Mohammad Azizur Rahman, Abu Rashid Hasan

The Vertical Two-Phase Flow Regimes in an annulus Image Dataset, generated at Texas A&M University, presents an extensive collection of high-resolution images capturing various gas-liquid two-phase flow dynamics within a vertical flow setup. This dataset results from meticulous experimental work in the 140 ft Tower Lab, utilizing a combination of water and air flows to simulate real-world conditions and employing high-quality video recordings to document flow regime transitions. Designed to support research in fluid dynamics, machine vision, and computational modeling, the dataset offers valuable resources for developing machine vision models for accurate regime detection and differentiation, enhancing the fidelity of computational fluid dynamics simulations, and facilitating the estimation of critical flow parameters. Despite its comprehensive nature, the dataset notes limitations such as the absence of annular flow regime images and its exclusive focus on vertical flow conditions.

{"title":"Vertical two-phase flow regimes in an annulus image dataset - Texas A&M university.","authors":"Kaushik Manikonda, Chinemerem Obi, Aarya Abhay Brahmane, Mohammad Azizur Rahman, Abu Rashid Hasan","doi":"10.1016/j.dib.2024.111245","DOIUrl":"https://doi.org/10.1016/j.dib.2024.111245","url":null,"abstract":"<p><p>The Vertical Two-Phase Flow Regimes in an annulus Image Dataset, generated at Texas A&M University, presents an extensive collection of high-resolution images capturing various gas-liquid two-phase flow dynamics within a vertical flow setup. This dataset results from meticulous experimental work in the 140 ft Tower Lab, utilizing a combination of water and air flows to simulate real-world conditions and employing high-quality video recordings to document flow regime transitions. Designed to support research in fluid dynamics, machine vision, and computational modeling, the dataset offers valuable resources for developing machine vision models for accurate regime detection and differentiation, enhancing the fidelity of computational fluid dynamics simulations, and facilitating the estimation of critical flow parameters. Despite its comprehensive nature, the dataset notes limitations such as the absence of annular flow regime images and its exclusive focus on vertical flow conditions.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"111245"},"PeriodicalIF":1.0,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11786701/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143078840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial distribution of fine-grained floodplain deposits and anthropogenic materials based on official borehole data in the floodplain of Leipzig, Germany.
IF 1 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-04 eCollection Date: 2025-02-01 DOI: 10.1016/j.dib.2025.111275
Nele Graubner, Johannes Schmidt

This data set includes the spatial model of the thickness and distribution of fine-grained floodplain deposits in the Leipzig floodplain area. The data set originates from borehole records provided by the Saxon State Office for Environment, Agriculture, and Geology [1]. The data processing involved the categorization of the stratigraphic descriptions of the borehole logs. For that, a methodology was implemented to categorize those into 6 broader classifications (sand, gravel, clay, anthropogenic sediments, fine-grained/organic sediments and others) with 33 sub-categories. Subsequently, the stratigraphic layers were analysed to determine the depth and thickness of the fine-grained floodplain deposits, as well as the distribution of anthropogenic material. The data set was filtered, with the condition that each borehole log has at least one clayey layer and a gravel layer of at least 0.7 m thickness and, later, interpolated to present a complete spatial model for the research area. The final data set is based on 3,414 data points (data collection covers the period: 1852 to 2018) within the Leipzig floodplain and offers significant resource for future interdisciplinary research into the natural and anthropogenic history of the Leipzig's floodplains, offering valuable information for more detailed analyses and more precise modelling of fine-grained floodplain deposit distribution in the Leipzig floodplain area.

{"title":"Spatial distribution of fine-grained floodplain deposits and anthropogenic materials based on official borehole data in the floodplain of Leipzig, Germany.","authors":"Nele Graubner, Johannes Schmidt","doi":"10.1016/j.dib.2025.111275","DOIUrl":"10.1016/j.dib.2025.111275","url":null,"abstract":"<p><p>This data set includes the spatial model of the thickness and distribution of fine-grained floodplain deposits in the Leipzig floodplain area. The data set originates from borehole records provided by the Saxon State Office for Environment, Agriculture, and Geology [1]. The data processing involved the categorization of the stratigraphic descriptions of the borehole logs. For that, a methodology was implemented to categorize those into 6 broader classifications (sand, gravel, clay, anthropogenic sediments, fine-grained/organic sediments and others) with 33 sub-categories. Subsequently, the stratigraphic layers were analysed to determine the depth and thickness of the fine-grained floodplain deposits, as well as the distribution of anthropogenic material. The data set was filtered, with the condition that each borehole log has at least one clayey layer and a gravel layer of at least 0.7 m thickness and, later, interpolated to present a complete spatial model for the research area. The final data set is based on 3,414 data points (data collection covers the period: 1852 to 2018) within the Leipzig floodplain and offers significant resource for future interdisciplinary research into the natural and anthropogenic history of the Leipzig's floodplains, offering valuable information for more detailed analyses and more precise modelling of fine-grained floodplain deposit distribution in the Leipzig floodplain area.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"111275"},"PeriodicalIF":1.0,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11772144/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143058373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Annotated image dataset with different stages of European pear rust for UAV-based automated symptom detection in orchards.
IF 1 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-03 eCollection Date: 2025-02-01 DOI: 10.1016/j.dib.2025.111271
Virginia Maß, Pendar Alirezazadeh, Johannes Seidl-Schulz, Matthias Leipnitz, Eric Fritzsche, Rasheed Ali Adam Ibraheem, Martin Geyer, Michael Pflanz, Stefanie Reim

The evaluation of fruit genetic resources regarding a resistance to pathogens is an essential basis for subsequent selection in fruit breeding. Both genetic analysis and phenotyping of defined traits are important tools and provide decision data in the evaluation process. However, the phenotyping of plants is often carried out 'by hand' and remains the bottleneck in fruit breeding and fruit growing. The development of a digital and UAV (unmanned aerial vehicle)-based phenotyping method for the assessment of genotype-specific susceptibility or resistance against diseases in orchards would significantly increase the efficiency of plant breeding. In this framework, a workflow for drone-based monitoring of pathogens in orchards was developed using the European pear rust (Gymnosporangium sabinae) as model pathogen. Pear rust is widespread in orchards and causes conspicuous, clearly visible, yellow to orange-colored disease symptoms. In this paper, we provide a dataset with expert-annotated high-resolution RGB images with pear rust symptoms. For data collection, ten UAV-flight campaigns were realized between 2021 and 2023 under various weather conditions and with different flight parameters in the experimental orchard of the Julius Kühn-Institute for Breeding Research on Fruit Crops in Dresden-Pillnitz (Germany). 1394 images were captured of different pear genotypes, including varieties, wild species and progeny from breeding. The dataset contains manually labelled images with a size of 768 × 768 pixels of leaves infected with pear rust at different stages of development, labelled as class GYMNSA, as well as background images without symptoms. Each leaf with pear rust symptoms was annotated with the drawing method by two points (bounding boxes) using the Computer Vision Annotation Tool (CVAT, v1.1.0) [1] and presented in YOLO 1.1 file format (.txt files). A total of 584 annotated images and 162 background images, organized into a training and validation set, are included in the GYMNSA dataset. This GYMNSA dataset can be used as a resource for researchers and developers working on drone-based plant disease monitoring systems.

{"title":"Annotated image dataset with different stages of European pear rust for UAV-based automated symptom detection in orchards.","authors":"Virginia Maß, Pendar Alirezazadeh, Johannes Seidl-Schulz, Matthias Leipnitz, Eric Fritzsche, Rasheed Ali Adam Ibraheem, Martin Geyer, Michael Pflanz, Stefanie Reim","doi":"10.1016/j.dib.2025.111271","DOIUrl":"https://doi.org/10.1016/j.dib.2025.111271","url":null,"abstract":"<p><p>The evaluation of fruit genetic resources regarding a resistance to pathogens is an essential basis for subsequent selection in fruit breeding. Both genetic analysis and phenotyping of defined traits are important tools and provide decision data in the evaluation process. However, the phenotyping of plants is often carried out 'by hand' and remains the bottleneck in fruit breeding and fruit growing. The development of a digital and UAV (unmanned aerial vehicle)-based phenotyping method for the assessment of genotype-specific susceptibility or resistance against diseases in orchards would significantly increase the efficiency of plant breeding. In this framework, a workflow for drone-based monitoring of pathogens in orchards was developed using the European pear rust (<i>Gymnosporangium sabinae</i>) as model pathogen. Pear rust is widespread in orchards and causes conspicuous, clearly visible, yellow to orange-colored disease symptoms. In this paper, we provide a dataset with expert-annotated high-resolution RGB images with pear rust symptoms. For data collection, ten UAV-flight campaigns were realized between 2021 and 2023 under various weather conditions and with different flight parameters in the experimental orchard of the Julius Kühn-Institute for Breeding Research on Fruit Crops in Dresden-Pillnitz (Germany). 1394 images were captured of different pear genotypes, including varieties, wild species and progeny from breeding. The dataset contains manually labelled images with a size of 768 × 768 pixels of leaves infected with pear rust at different stages of development, labelled as class GYMNSA, as well as background images without symptoms. Each leaf with pear rust symptoms was annotated with the drawing method by two points (bounding boxes) using the Computer Vision Annotation Tool (CVAT, v1.1.0) [1] and presented in YOLO 1.1 file format (.txt files). A total of 584 annotated images and 162 background images, organized into a training and validation set, are included in the GYMNSA dataset. This GYMNSA dataset can be used as a resource for researchers and developers working on drone-based plant disease monitoring systems.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"111271"},"PeriodicalIF":1.0,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11783052/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143078893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CoWIN twitter dataset: A comprehensive collection of public discourse on India's COVID-19 vaccination platform.
IF 1 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-02 eCollection Date: 2025-02-01 DOI: 10.1016/j.dib.2024.111252
Shubham Mittal, Swarnalakshmi Umamaheswaran

The CoWIN Twitter Dataset offers a wide-ranging collection of public opinions on India's COVID-19 vaccination platform CoWIN. The raw dataset has 635,000 tweets that mention "cowin," collected over the period of January to December 2021. The dataset was extracted by employing the Twitter Academic API. It addition to the raw data, it also included a cleaned and processed set of 419,409 English tweets, and a labeled subset with sentiment analysis. The raw data file has tweet details like ID, text, timestamp, user ID, and language. The processed dataset is devoid of URLs and hashtags and other noise, and also adds month and category groupings. Finally,the labelled dataset gives sentiment classifications of positive or negative the relevant tweets. This dataset enables researchers to analyse themes and sentiments related to India's vaccination administration. It can help policymakers gain insights around issues related to large-scale health initiatives and digital health systems. The mix of languages in the data also makes it useful for language processing research.

{"title":"CoWIN twitter dataset: A comprehensive collection of public discourse on India's COVID-19 vaccination platform.","authors":"Shubham Mittal, Swarnalakshmi Umamaheswaran","doi":"10.1016/j.dib.2024.111252","DOIUrl":"10.1016/j.dib.2024.111252","url":null,"abstract":"<p><p>The CoWIN Twitter Dataset offers a wide-ranging collection of public opinions on India's COVID-19 vaccination platform CoWIN. The raw dataset has 635,000 tweets that mention \"cowin,\" collected over the period of January to December 2021. The dataset was extracted by employing the Twitter Academic API. It addition to the raw data, it also included a cleaned and processed set of 419,409 English tweets, and a labeled subset with sentiment analysis. The raw data file has tweet details like ID, text, timestamp, user ID, and language. The processed dataset is devoid of URLs and hashtags and other noise, and also adds month and category groupings. Finally,the labelled dataset gives sentiment classifications of positive or negative the relevant tweets. This dataset enables researchers to analyse themes and sentiments related to India's vaccination administration. It can help policymakers gain insights around issues related to large-scale health initiatives and digital health systems. The mix of languages in the data also makes it useful for language processing research.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"111252"},"PeriodicalIF":1.0,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11772133/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143058365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dataset of polarimetric images of mechanically generated water surface waves coupled with surface elevation records by wave gauges linear array.
IF 1 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2025-01-02 eCollection Date: 2025-02-01 DOI: 10.1016/j.dib.2024.111267
Noam Ginio, Michael Lindenbaum, Barak Fishbain, Dan Liberzon

Effective spatio-temporal measurements of water surface elevation (water waves) in laboratory experiments are essential for scientific and engineering research. Existing techniques are often cumbersome, computationally heavy and generally suffer from limited wavenumber/frequency response. To address these challenges a novel method was developed, using polarization filter equipped camera as the main sensor and Machine Learning (ML) algorithms for data processing [1,2]. The developed method training and evaluation was based on in-house made supervised dataset. Here we present this supervised dataset of polarimetric images of the water surface coupled with the water surface elevation measurements made by a linear array of resistance-type wave gauges (WG). The water waves were mechanically generated in a laboratory waves basin, and the polarimetric images were captured under an artificial light source. Meticulous camera and WGs calibration and instruments synchronization supported high spatio-temporal resolution. The data set covers several wavefield conditions, from simple monochromatic wave trains of various steepness, to irregular wavefield of JONSWAP prescribed spectral shape and several wave breaking scenarios. The dataset contains measurements repeated in several camera positions relative to the wave field propagation direction.

{"title":"Dataset of polarimetric images of mechanically generated water surface waves coupled with surface elevation records by wave gauges linear array.","authors":"Noam Ginio, Michael Lindenbaum, Barak Fishbain, Dan Liberzon","doi":"10.1016/j.dib.2024.111267","DOIUrl":"https://doi.org/10.1016/j.dib.2024.111267","url":null,"abstract":"<p><p>Effective spatio-temporal measurements of water surface elevation (water waves) in laboratory experiments are essential for scientific and engineering research. Existing techniques are often cumbersome, computationally heavy and generally suffer from limited wavenumber/frequency response. To address these challenges a novel method was developed, using polarization filter equipped camera as the main sensor and Machine Learning (ML) algorithms for data processing [1,2]. The developed method training and evaluation was based on in-house made supervised dataset. Here we present this supervised dataset of polarimetric images of the water surface coupled with the water surface elevation measurements made by a linear array of resistance-type wave gauges (WG). The water waves were mechanically generated in a laboratory waves basin, and the polarimetric images were captured under an artificial light source. Meticulous camera and WGs calibration and instruments synchronization supported high spatio-temporal resolution. The data set covers several wavefield conditions, from simple monochromatic wave trains of various steepness, to irregular wavefield of JONSWAP prescribed spectral shape and several wave breaking scenarios. The dataset contains measurements repeated in several camera positions relative to the wave field propagation direction.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"111267"},"PeriodicalIF":1.0,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11774789/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143064331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An ecological connectivity dataset for Black Sea obtained from sea currents.
IF 1 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-12-31 eCollection Date: 2025-02-01 DOI: 10.1016/j.dib.2024.111268
Nikolaos Nagkoulis, Christos Adam, Ioannis Mamoutos, Stelios Katsanevakis, Antonios D Mazaris

Incorporating ecological connectivity into spatial conservation planning is increasingly recognized as a key strategy to facilitate species movements, especially under changing environmental conditions. However, obtaining connectivity data is challenging, especially in the marine realm. Sea currents are essential for exploring marine structural connectivity, but transforming sea current data into spatial connectivity matrices involves complex and resource-intensive processing steps to ensure accuracy and usability. Here, an applied a graph-based methodology has been developed to transform current data into formats suitable for delineating ecological corridors and applied to Black Sea. The dataset produced can be integrated to spatial conservation prioritization tools to incorporate connectivity in the analysis. This approach involved converting current centroids into points and projecting current directions and magnitudes onto a nearest-neighbour graph connecting these points. Using open-source data from the Copernicus Black Sea Physics Reanalysis dataset from 1993 to 2023, a high-resolution dataset of graph objects (edge lists) and shapefiles (points and edges) for the Black Sea has been created. Analyses were conducted in R, and the algorithm developed to produce the data is accessible on Zenodo. The resulting datasets are compatible with multiple software platforms (e.g., R, Python, and QGIS). A total of 17 datasets are provided from 1993 to 2023: twelve for monthly, four for seasonal, and one for yearly aggregation, supporting diverse spatial and temporal analysis needs. Overall, the datasets can be used to analyse connectivity patterns across the entire Black Sea or focus on specific regions, particularly useful for ecological modelling, and environmental protection purposes.

{"title":"An ecological connectivity dataset for Black Sea obtained from sea currents.","authors":"Nikolaos Nagkoulis, Christos Adam, Ioannis Mamoutos, Stelios Katsanevakis, Antonios D Mazaris","doi":"10.1016/j.dib.2024.111268","DOIUrl":"10.1016/j.dib.2024.111268","url":null,"abstract":"<p><p>Incorporating ecological connectivity into spatial conservation planning is increasingly recognized as a key strategy to facilitate species movements, especially under changing environmental conditions. However, obtaining connectivity data is challenging, especially in the marine realm. Sea currents are essential for exploring marine structural connectivity, but transforming sea current data into spatial connectivity matrices involves complex and resource-intensive processing steps to ensure accuracy and usability. Here, an applied a graph-based methodology has been developed to transform current data into formats suitable for delineating ecological corridors and applied to Black Sea. The dataset produced can be integrated to spatial conservation prioritization tools to incorporate connectivity in the analysis. This approach involved converting current centroids into points and projecting current directions and magnitudes onto a nearest-neighbour graph connecting these points. Using open-source data from the Copernicus Black Sea Physics Reanalysis dataset from 1993 to 2023, a high-resolution dataset of graph objects (edge lists) and shapefiles (points and edges) for the Black Sea has been created. Analyses were conducted in R, and the algorithm developed to produce the data is accessible on Zenodo. The resulting datasets are compatible with multiple software platforms (e.g., R, Python, and QGIS). A total of 17 datasets are provided from 1993 to 2023: twelve for monthly, four for seasonal, and one for yearly aggregation, supporting diverse spatial and temporal analysis needs. Overall, the datasets can be used to analyse connectivity patterns across the entire Black Sea or focus on specific regions, particularly useful for ecological modelling, and environmental protection purposes.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"111268"},"PeriodicalIF":1.0,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11763243/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143045893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CTU Hornet 65 Niner: A network dataset of geographically distributed low-interaction honeypots.
IF 1 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-12-30 eCollection Date: 2025-02-01 DOI: 10.1016/j.dib.2024.111261
Veronica Valeros, Sebastian Garcia

This data article introduces a new network dataset created to help understand how geographical location impacts the quality, type, and amount of incoming network attacks received by honeypots. The dataset consists of 12.4 million network flows collected from nine low-interaction honeypots in nine cities across the world for 65 days, from April 29th to July 1st, 2024. Each low-interaction honeypot was identically configured to capture incoming attacks using a state-of-the-art network flow collector, Zeek. Honeypots were distributed in nine cities: Amsterdam, Bangalore, Frankfurt, London, New York, San Francisco, Singapore, Toronto, and Sydney. The dataset is in JSON format and contains all types of Zeek network flow files, including protocol-specific logs.

{"title":"CTU Hornet 65 Niner: A network dataset of geographically distributed low-interaction honeypots.","authors":"Veronica Valeros, Sebastian Garcia","doi":"10.1016/j.dib.2024.111261","DOIUrl":"https://doi.org/10.1016/j.dib.2024.111261","url":null,"abstract":"<p><p>This data article introduces a new network dataset created to help understand how geographical location impacts the quality, type, and amount of incoming network attacks received by honeypots. The dataset consists of 12.4 million network flows collected from nine low-interaction honeypots in nine cities across the world for 65 days, from April 29th to July 1st, 2024. Each low-interaction honeypot was identically configured to capture incoming attacks using a state-of-the-art network flow collector, Zeek. Honeypots were distributed in nine cities: Amsterdam, Bangalore, Frankfurt, London, New York, San Francisco, Singapore, Toronto, and Sydney. The dataset is in JSON format and contains all types of Zeek network flow files, including protocol-specific logs<i>.</i></p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"111261"},"PeriodicalIF":1.0,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11772132/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143058366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dataset on environmental DNA, bacterio-, phyto- and zooplankton from an emerging periglacial lagoon in Svalbard, Arctic.
IF 1 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-12-28 eCollection Date: 2025-02-01 DOI: 10.1016/j.dib.2024.111260
Sergej Olenin, Dzmitry Lukashanets, Anastasija Zaiko, Aurelija Samuilovienė, Irina Olenina, Evelina Grinienė, Tobia Politi, Aleksej Šaškov, Greta Kilmonaitė, Andrius Šiaulys

Over the last few decades, climate change in Svalbard (European Arctic) has led to the emergence and growth of periglacial coastal lagoons in the place of retreating glaciers. In these emerging water bodies, new ecosystems are formed, consisting of elements presumably entering the lagoon from the melting glacier, the surrounding tundra water bodies and the coastal ocean. The data presented here were collected from an emerging lagoon in the western region of Spitsbergen, Svalbard, situated between the retreating Eidembreen Glacier and Eidembukta Bay in 2022-2023. The current size of the lagoon area is approximately 6 square kilometers. The sampling was carried out at 26 sites across various sections of the lagoon, spanning from close proximity to the glacier to the furthest point away. The dataset contains the results of bacterioplankton (total cell concentration and carbon biomass), phytoplankton (taxonomic composition, cell size for selected taxa, abundance, biomass and carbon biomass), zooplankton (taxonomic composition, abundance), and environmental DNA (eDNA) metabarcoding. The dataset will be utilized to provide a comprehensive description of the structure of the lagoon ecosystem. It will also facilitate a comparison of its various parts, which vary in terms of their age of origin, i.e., release from the glacier. Additionally, the dataset will aid in the understanding of the intricate interactions between the freshwater and marine elements of the ecosystem. It can be used for comparative analysis of biodiversity assessment using eDNA and traditional microscopy methods in the identification of phyto- and zooplankton. Furthermore, these data can be utilized for environmental monitoring, tracing the temporal shifts and conducting comparative analysis of periglacial lagoons that are emerging in various regions of Svalbard as a result of climate change.

{"title":"A dataset on environmental DNA, bacterio-, phyto- and zooplankton from an emerging periglacial lagoon in Svalbard, Arctic.","authors":"Sergej Olenin, Dzmitry Lukashanets, Anastasija Zaiko, Aurelija Samuilovienė, Irina Olenina, Evelina Grinienė, Tobia Politi, Aleksej Šaškov, Greta Kilmonaitė, Andrius Šiaulys","doi":"10.1016/j.dib.2024.111260","DOIUrl":"10.1016/j.dib.2024.111260","url":null,"abstract":"<p><p>Over the last few decades, climate change in Svalbard (European Arctic) has led to the emergence and growth of periglacial coastal lagoons in the place of retreating glaciers. In these emerging water bodies, new ecosystems are formed, consisting of elements presumably entering the lagoon from the melting glacier, the surrounding tundra water bodies and the coastal ocean. The data presented here were collected from an emerging lagoon in the western region of Spitsbergen, Svalbard, situated between the retreating Eidembreen Glacier and Eidembukta Bay in 2022-2023. The current size of the lagoon area is approximately 6 square kilometers. The sampling was carried out at 26 sites across various sections of the lagoon, spanning from close proximity to the glacier to the furthest point away. The dataset contains the results of bacterioplankton (total cell concentration and carbon biomass), phytoplankton (taxonomic composition, cell size for selected taxa, abundance, biomass and carbon biomass), zooplankton (taxonomic composition, abundance), and environmental DNA (eDNA) metabarcoding. The dataset will be utilized to provide a comprehensive description of the structure of the lagoon ecosystem. It will also facilitate a comparison of its various parts, which vary in terms of their age of origin, i.e., release from the glacier. Additionally, the dataset will aid in the understanding of the intricate interactions between the freshwater and marine elements of the ecosystem. It can be used for comparative analysis of biodiversity assessment using eDNA and traditional microscopy methods in the identification of phyto- and zooplankton. Furthermore, these data can be utilized for environmental monitoring, tracing the temporal shifts and conducting comparative analysis of periglacial lagoons that are emerging in various regions of Svalbard as a result of climate change.</p>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"111260"},"PeriodicalIF":1.0,"publicationDate":"2024-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11763512/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143045892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Data in Brief
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1