Pub Date : 2024-09-26DOI: 10.1016/j.dib.2024.110975
Heba Alawneh, Hamza Alkofahi
This article proposes a Process Control Block (PCB) dataset [1] mined over the process execution time of tested Android applications. The PCB data from 2620 malware-infested applications and 1610 benign applications were collected. The PCB data sequence was collected for 25 seconds, with an average of 18,500 PCB records stored for each application.The mining method was implemented at the kernel level and synced with the process (job) context switching. The data for each program comprises the PCB information for all threads running the application. The application automation testing and PCB gathering for benign and malicious applications were conducted in a closed dynamic malware analysis framework. The dataset can be used to compare and contrast the low-level (kernel) behavior of benign and malicious Android programs. For the vast majority of tested applications, the mining approach effectively captured 99% of the context switches.
{"title":"Process control block information dataset: Towards android malware detection","authors":"Heba Alawneh, Hamza Alkofahi","doi":"10.1016/j.dib.2024.110975","DOIUrl":"10.1016/j.dib.2024.110975","url":null,"abstract":"<div><div>This article proposes a Process Control Block (PCB) dataset <span><span>[1]</span></span> mined over the process execution time of tested Android applications. The PCB data from 2620 malware-infested applications and 1610 benign applications were collected. The PCB data sequence was collected for 25 seconds, with an average of 18,500 PCB records stored for each application.The mining method was implemented at the kernel level and synced with the process (job) context switching. The data for each program comprises the PCB information for all threads running the application. The application automation testing and PCB gathering for benign and malicious applications were conducted in a closed dynamic malware analysis framework. The dataset can be used to compare and contrast the low-level (kernel) behavior of benign and malicious Android programs. For the vast majority of tested applications, the mining approach effectively captured 99% of the context switches.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142427211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-25DOI: 10.1016/j.dib.2024.110971
Helder Fraga, Teresa Freitas, Nathalie Guimarães, João A. Santos
Crop landcover datasets are crucial for modern agriculture, aiding farmers, researchers, policymakers, and stakeholders. These databases offer extensive insights into crop distribution, facilitating informed decision-making for sustainable practices, particularly under a changing climate. Moreover, these datasets drive research, fostering collaborations and innovation for resilient agriculture. In Portugal, the COS dataset is vital, offering insights into agrarian landscapes and supporting sustainable practices. However, in recent versions, since 2007, information on permanent crops has been aggregated, necessitating complementary datasets and tools. The current paper addresses this gap by providing an open-source dataset focusing on perennial crops in mainland Portugal. Based on the 2019 agricultural census from the Portuguese Statistical Institute (INE), this dataset contributes to the spatial understanding of permanent crop distribution, being freely available for researchers, farmers and policymakers. The dataset includes a selection of perennial crops commonly cultivated in Portugal, such as Prunus dulcis (Almond), Malus domestica (Apple), Castanea sativa (Chestnut), Ceratonia siliqua (Carob), Prunus avium (Sweet Cherry), Vitis vinifera (Grapevine), Olea europaea (Olive), Citrus limon (Lemon), Citrus sinensis (Sweet Orange), Juglans regia (Walnut), Citrus reticulata (Mandarin), Prunus persica (Peach), Pyrus communis (Pear), and Prunus domestica (Plum). Further information regarding the Administrative Units of each crop is also available. This comprehensive list provides a detailed overview of the types of permanent crops included in the dataset, offering valuable insights into the Portuguese agricultural landscape.
{"title":"Perma_Crops_PT: A geolocated dataset for permanent crops in Portugal","authors":"Helder Fraga, Teresa Freitas, Nathalie Guimarães, João A. Santos","doi":"10.1016/j.dib.2024.110971","DOIUrl":"10.1016/j.dib.2024.110971","url":null,"abstract":"<div><div>Crop landcover datasets are crucial for modern agriculture, aiding farmers, researchers, policymakers, and stakeholders. These databases offer extensive insights into crop distribution, facilitating informed decision-making for sustainable practices, particularly under a changing climate. Moreover, these datasets drive research, fostering collaborations and innovation for resilient agriculture. In Portugal, the COS dataset is vital, offering insights into agrarian landscapes and supporting sustainable practices. However, in recent versions, since 2007, information on permanent crops has been aggregated, necessitating complementary datasets and tools. The current paper addresses this gap by providing an open-source dataset focusing on perennial crops in mainland Portugal. Based on the 2019 agricultural census from the Portuguese Statistical Institute (INE), this dataset contributes to the spatial understanding of permanent crop distribution, being freely available for researchers, farmers and policymakers. The dataset includes a selection of perennial crops commonly cultivated in Portugal, such as <em>Prunus dulcis</em> (Almond), <em>Malus domestica</em> (Apple), <em>Castanea sativa</em> (Chestnut), <em>Ceratonia siliqua</em> (Carob), <em>Prunus avium</em> (Sweet Cherry), <em>Vitis vinifera</em> (Grapevine), <em>Olea europaea</em> (Olive), <em>Citrus limon</em> (Lemon), <em>Citrus sinensis</em> (Sweet Orange), <em>Juglans regia</em> (Walnut), <em>Citrus reticulata</em> (Mandarin), <em>Prunus persica</em> (Peach), <em>Pyrus communis</em> (Pear), and <em>Prunus domestica</em> (Plum). Further information regarding the Administrative Units of each crop is also available. This comprehensive list provides a detailed overview of the types of permanent crops included in the dataset, offering valuable insights into the Portuguese agricultural landscape.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142427224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-24DOI: 10.1016/j.dib.2024.110973
Anna Karpenko, Yulia Mikhaylova, Andrey Shelenkov, Aleksey Tutelyan, Vasiliy Akimkin
Environmental bacterial species Raoultella ornithinolytica is an emerging pathogen becoming increasingly important in causing human infections. Thus far, the clinical isolates of this species have not exhibited multidrug resistance very often, but some reports underline the necessity for continuous monitoring of this potentially dangerous pathogen. Currently, epidemiological surveillance and antimicrobial resistance investigations of any bacterial pathogen usually rely on whole genome sequencing, which is becoming more affordable while providing increasingly important data in the recent years. However, R. ornithinolytica genomic information is scantily presented in public databases. Here, we report, to the best of our knowledge, the first whole genome sequence and corresponding raw data for a clinical R. ornithinolytica isolate from Russian Federation, which carried antimicrobial resistance (AMR) genes, virulence factors, one plasmid, and CRISPR-Cas system of type I-F. The data provided will facilitate epidemiological surveillance and antimicrobial resistance monitoring of this emerging pathogen.
环境细菌Raoultella ornithinolytica是一种新出现的病原体,在引起人类感染方面越来越重要。迄今为止,这种细菌的临床分离株并不经常表现出多药耐药性,但一些报告强调了对这种潜在危险病原体进行持续监测的必要性。目前,对任何细菌病原体的流行病学监测和抗菌药耐药性调查通常都依赖于全基因组测序,近年来,全基因组测序的价格越来越低廉,同时提供的数据也越来越重要。然而,鸟疫杆菌的基因组信息在公共数据库中却很少见。据我们所知,我们在此报告了首个全基因组序列和相应的原始数据,它们是来自俄罗斯联邦的一个临床 R. ornithinolytica 分离物,该分离物携带有抗菌素耐药性(AMR)基因、毒力因子、一个质粒和 I-F 型 CRISPR-Cas 系统。所提供的数据将有助于对这一新兴病原体进行流行病学监测和抗菌药耐药性监测。
{"title":"Data on the analysis of draft genome sequence of Raoultella ornithinolytica isolate carrying antimicrobial resistance genes, plasmid and CRISPR-Cas system","authors":"Anna Karpenko, Yulia Mikhaylova, Andrey Shelenkov, Aleksey Tutelyan, Vasiliy Akimkin","doi":"10.1016/j.dib.2024.110973","DOIUrl":"10.1016/j.dib.2024.110973","url":null,"abstract":"<div><div>Environmental bacterial species <em>Raoultella ornithinolytica</em> is an emerging pathogen becoming increasingly important in causing human infections. Thus far, the clinical isolates of this species have not exhibited multidrug resistance very often, but some reports underline the necessity for continuous monitoring of this potentially dangerous pathogen. Currently, epidemiological surveillance and antimicrobial resistance investigations of any bacterial pathogen usually rely on whole genome sequencing, which is becoming more affordable while providing increasingly important data in the recent years. However, <em>R. ornithinolytica</em> genomic information is scantily presented in public databases. Here, we report, to the best of our knowledge, the first whole genome sequence and corresponding raw data for a clinical <em>R. ornithinolytica</em> isolate from Russian Federation, which carried antimicrobial resistance (AMR) genes, virulence factors, one plasmid, and CRISPR-Cas system of type I-F. The data provided will facilitate epidemiological surveillance and antimicrobial resistance monitoring of this emerging pathogen.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142427215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-23DOI: 10.1016/j.dib.2024.110951
Richard Estrada , Liliana Aragón , Wendy E. Pérez , Yolanda Romero , Gabriel Martínez , Karina Garcia , Juancarlos Cruz , Carlos I. Arbizu
Fusarium verticillioides represents a major phytopathogenic threat to maize crops worldwide. In this study, we present genomic sequence data of a phytopathogen isolated from a maize stem that shows obvious signs of vascular rot. Using rigorous microbiological identification techniques, we correlated the disease symptoms observed in an affected maize region with the presence of the pathogen. Subsequently, the pathogen was cultured in a suitable fungal growth medium and extensive morphological characterization was performed. In addition, a pathogenicity test was carried out in a DCA model with three treatments and seven repetitions. De novo assembly from Illumina Novaseq 6000 sequencing yielded 456 contigs, which together constitute a 42.8 Mb genome assembly with a GC % content of 48.26. Subsequent comparative analyses were performed with other Fusarium genomes available in the NCBI database.
{"title":"Draft genome sequence data of Fusarium verticillioides strain REC01, a phytopathogen isolated from a Peruvian maize","authors":"Richard Estrada , Liliana Aragón , Wendy E. Pérez , Yolanda Romero , Gabriel Martínez , Karina Garcia , Juancarlos Cruz , Carlos I. Arbizu","doi":"10.1016/j.dib.2024.110951","DOIUrl":"10.1016/j.dib.2024.110951","url":null,"abstract":"<div><div><em>Fusarium verticillioides</em> represents a major phytopathogenic threat to maize crops worldwide. In this study, we present genomic sequence data of a phytopathogen isolated from a maize stem that shows obvious signs of vascular rot. Using rigorous microbiological identification techniques, we correlated the disease symptoms observed in an affected maize region with the presence of the pathogen. Subsequently, the pathogen was cultured in a suitable fungal growth medium and extensive morphological characterization was performed. In addition, a pathogenicity test was carried out in a DCA model with three treatments and seven repetitions. De novo assembly from Illumina Novaseq 6000 sequencing yielded 456 contigs, which together constitute a 42.8 Mb genome assembly with a GC % content of 48.26. Subsequent comparative analyses were performed with other <em>Fusarium</em> genomes available in the NCBI database.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142427130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-21DOI: 10.1016/j.dib.2024.110956
Seunghyeon Wang, Ikchul Eum, Sangkyun Park, Jaejun Kim
Fault detection and diagnosis (FDD) in Air Handling Units (AHUs) ensure building functions such as energy efficiency and occupant comfort by quickly identifying and diagnosing faults. Combining deep learning with FDD has demonstrated high generalization ability in this field. To develop deep learning models, this research constructed a dataset sourced from real data collected from a large-scale office in South Korea. The raw AHU data were extracted from the Building Management System (BMS) at 1-h intervals, spanning from November 2023 to May 2024. The dataset was partially labeled by annotation experts, categorizing the data into six types: normal condition, supply fan fault, total heating pump fault, return air temperature sensor fault, supply air Temperature sensor fault, and valve position fault. Additionally, semi-supervised learning methods were applied as an application example using this constructed dataset. The main contributions of this dataset to the field are twofold. First, it represents a unique dataset sourced from the real operational data of a large-scale office, which is currently non-existent in this domain. Second, the dataset's expert labeling adds significant value by ensuring accurate fault classification. Therefore, we hope that this dataset will encourage the development of robust FDD techniques that are more suitable for real-world applications.
{"title":"A semi-labelled dataset for fault detection in air handling units from a large-scale office","authors":"Seunghyeon Wang, Ikchul Eum, Sangkyun Park, Jaejun Kim","doi":"10.1016/j.dib.2024.110956","DOIUrl":"10.1016/j.dib.2024.110956","url":null,"abstract":"<div><div>Fault detection and diagnosis (FDD) in Air Handling Units (AHUs) ensure building functions such as energy efficiency and occupant comfort by quickly identifying and diagnosing faults. Combining deep learning with FDD has demonstrated high generalization ability in this field. To develop deep learning models, this research constructed a dataset sourced from real data collected from a large-scale office in South Korea. The raw AHU data were extracted from the Building Management System (BMS) at 1-h intervals, spanning from November 2023 to May 2024. The dataset was partially labeled by annotation experts, categorizing the data into six types: normal condition, supply fan fault, total heating pump fault, return air temperature sensor fault, supply air Temperature sensor fault, and valve position fault. Additionally, semi-supervised learning methods were applied as an application example using this constructed dataset. The main contributions of this dataset to the field are twofold. First, it represents a unique dataset sourced from the real operational data of a large-scale office, which is currently non-existent in this domain. Second, the dataset's expert labeling adds significant value by ensuring accurate fault classification. Therefore, we hope that this dataset will encourage the development of robust FDD techniques that are more suitable for real-world applications.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142318523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-21DOI: 10.1016/j.dib.2024.110966
Olubukola O. Babalola , Rebaona R. Molefe , Adenike E. Amoo
This data article reports shotgun metagenomic data obtained from drought-stressed maize rhizosphere through the Illumina Novaseq platform, utilizing the KBase online platform. 428,339,852 high-quality post-sequences were obtained, showcasing an average GC content of 65.45 %. The investigation, conducted at Molelwane farm in Mafikeng, South Africa, identified 13 metagenome-assembled genomes (MAGs). Functional annotation of these MAGs revealed their involvement in essential plant growth and development functions, such as sulfur and nitrogen metabolism. The dataset was deposited into the NCBI database, and MAGs accessions are available at DDBJ/ENA/GenBank under the accession number PRJNA101755.
{"title":"Metagenome assembly and annotation of data from the rhizosphere soil of drought-stressed CRN-3505 maize cultivar","authors":"Olubukola O. Babalola , Rebaona R. Molefe , Adenike E. Amoo","doi":"10.1016/j.dib.2024.110966","DOIUrl":"10.1016/j.dib.2024.110966","url":null,"abstract":"<div><div>This data article reports shotgun metagenomic data obtained from drought-stressed maize rhizosphere through the Illumina Novaseq platform, utilizing the KBase online platform. 428,339,852 high-quality post-sequences were obtained, showcasing an average GC content of 65.45 %. The investigation, conducted at Molelwane farm in Mafikeng, South Africa, identified 13 metagenome-assembled genomes (MAGs). Functional annotation of these MAGs revealed their involvement in essential plant growth and development functions, such as sulfur and nitrogen metabolism. The dataset was deposited into the NCBI database, and MAGs accessions are available at DDBJ/ENA/GenBank under the accession number PRJNA101755.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142322642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-21DOI: 10.1016/j.dib.2024.110938
Christopher Gihaut , Chrystelle Brin , Martial Briand , Jérôme Verdier , Matthieu Barret , Thomas Roitsch , Tristan Boureau
Xanthomonas citri pv. fuscans (Xcf) and Xanthomonas phaseoli pv. phaseoli (Xpp) are responsible for the Common Bacterial Blight (CBB), a major common bean (Phaseolus vulgaris) disease. The pathogenicity of Xcf and Xpp is known to be dependent upon a functional Type III Secretion System (T3SS) allowing the injection of numerous bacterial Type III Effectors (T3Es) into plant cells. T3Es have been described as able to disrupt plant defence and manipulate plant metabolism.
In this work we described the transcriptomic response of one susceptible (Flavert) and one resistant (Vezer) cultivars of P. vulgaris to the inoculation of the virulent strain Xcf CFBP4885 or its avirulent T3SS-defective hrcV mutant (CFBP13802).
Leaves of both bean cultivars were infiltrated with water or bacterial suspensions. Inoculated leaves were sampled at 24 or 48 h post inoculation (hpi). The experiment was independently repeated three times for total RNA extraction and sequencing analysis. Library construction and total RNA sequencing were performed with BGISEQ-500 at Beijing Genomics Institute (BGI, Hong-Kong), generating an average of 24M of paired-end reads of 100bp per sample. FastQC was used to check reads quality. Mapping analyses were made using a quasi-mapping alignment from Salmon (version 1.2.1) against the Phaseolus vulgaris reference genome (version 2.1), revealing the expression profiles of 36,978 transcripts in leaf tissues.
Fastq raw data and count files from 36 samples are available in the Gene Expression Omnibus (GEO) repository of the National Center for Biotechnology Information (NCBI) under the accession number GSE271236.
This dataset is a valuable resource to investigate the role of T3Es in subverting the cellular functions of bean.
{"title":"Transcriptomic dataset of Phaseolus vulgaris leaves in response to the inoculation of pathogenic Xanthomonas citri pv. fuscans and its type III secretion system-defective mutant hrcV","authors":"Christopher Gihaut , Chrystelle Brin , Martial Briand , Jérôme Verdier , Matthieu Barret , Thomas Roitsch , Tristan Boureau","doi":"10.1016/j.dib.2024.110938","DOIUrl":"10.1016/j.dib.2024.110938","url":null,"abstract":"<div><div><em>Xanthomonas citri</em> pv. <em>fuscans</em> (<em>Xcf</em>) and <em>Xanthomonas phaseoli</em> pv. <em>phaseoli</em> (<em>Xpp</em>) are responsible for the Common Bacterial Blight (CBB), a major common bean (<em>Phaseolus vulgaris</em>) disease. The pathogenicity of <em>Xcf</em> and <em>Xpp</em> is known to be dependent upon a functional Type III Secretion System (T3SS) allowing the injection of numerous bacterial Type III Effectors (T3Es) into plant cells. T3Es have been described as able to disrupt plant defence and manipulate plant metabolism.</div><div>In this work we described the transcriptomic response of one susceptible (Flavert) and one resistant (Vezer) cultivars of <em>P. vulgaris</em> to the inoculation of the virulent strain <em>Xcf</em> CFBP4885 or its avirulent T3SS-defective <em>hrcV</em> mutant (CFBP13802).</div><div>Leaves of both bean cultivars were infiltrated with water or bacterial suspensions. Inoculated leaves were sampled at 24 or 48 h post inoculation (hpi). The experiment was independently repeated three times for total RNA extraction and sequencing analysis. Library construction and total RNA sequencing were performed with BGISEQ-500 at Beijing Genomics Institute (BGI, Hong-Kong), generating an average of 24M of paired-end reads of 100bp per sample. FastQC was used to check reads quality. Mapping analyses were made using a quasi-mapping alignment from Salmon (version 1.2.1) against the <em>Phaseolus vulgaris</em> reference genome (version 2.1), revealing the expression profiles of 36,978 transcripts in leaf tissues.</div><div>Fastq raw data and count files from 36 samples are available in the Gene Expression Omnibus (GEO) repository of the National Center for Biotechnology Information (NCBI) under the accession number GSE271236.</div><div>This dataset is a valuable resource to investigate the role of T3Es in subverting the cellular functions of bean.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142427220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-21DOI: 10.1016/j.dib.2024.110970
Rubaba Binte Rahman, Sharia Arfin Tanim, Nazia Alfaz, Tahmid Enam Shrestha, Md Saef Ullah Miah, M.F. Mridha
This article presents a dental dataset for the improvement of research on deep learning-based detection and classification of dental diseases. The dataset is consisted of 232 panoramic dental radiographs, categorized into six major classes: healthy teeth, caries, impacted teeth, infections, fractured teeth, and broken-down crowns/roots (BDC/BDR). The images were collected from three renowned private clinics in Dhaka, Bangladesh, with the help of an experienced dental practitioner who ensured the confidentiality of patients and high-quality data acquisition using a 64-megapixel Android phone camera. To enhance the value of the dataset for machine and deep learning applications, we applied Contrast-Limited Adaptive Histogram Equalization (CLAHE) for image enhancement and augmented the data. The images were annotated using the CVAT tool and reviewed by dental experts. This benchmark dataset is publicly available and provides a valuable resource for researchers in artificial intelligence, computer science, and dental informatics to promote interdisciplinary collaboration and the development of advanced algorithms for dental disease detection.
本文介绍了一个牙科数据集,用于改进基于深度学习的牙科疾病检测和分类研究。该数据集由 232 张全景牙科 X 光片组成,分为六大类:健康牙齿、龋齿、阻生牙、感染、牙齿折断和牙冠/牙根折断(BDC/BDR)。这些图像来自孟加拉国达卡的三家知名私人诊所,由一名经验丰富的牙科医生协助收集,他使用 6400 万像素的安卓手机摄像头确保了患者的保密性和高质量的数据采集。为了提高数据集在机器学习和深度学习应用中的价值,我们应用了对比度受限自适应直方图均衡化(CLAHE)技术进行图像增强,并对数据进行了扩增。使用 CVAT 工具对图像进行了注释,并由牙科专家进行了审查。这个基准数据集是公开可用的,为人工智能、计算机科学和牙科信息学研究人员提供了宝贵的资源,促进了跨学科合作和牙科疾病检测先进算法的开发。
{"title":"A comprehensive dental dataset of six classes for deep learning based object detection study","authors":"Rubaba Binte Rahman, Sharia Arfin Tanim, Nazia Alfaz, Tahmid Enam Shrestha, Md Saef Ullah Miah, M.F. Mridha","doi":"10.1016/j.dib.2024.110970","DOIUrl":"10.1016/j.dib.2024.110970","url":null,"abstract":"<div><div>This article presents a dental dataset for the improvement of research on deep learning-based detection and classification of dental diseases. The dataset is consisted of 232 panoramic dental radiographs, categorized into six major classes: healthy teeth, caries, impacted teeth, infections, fractured teeth, and broken-down crowns/roots (BDC/BDR). The images were collected from three renowned private clinics in Dhaka, Bangladesh, with the help of an experienced dental practitioner who ensured the confidentiality of patients and high-quality data acquisition using a 64-megapixel Android phone camera. To enhance the value of the dataset for machine and deep learning applications, we applied Contrast-Limited Adaptive Histogram Equalization (CLAHE) for image enhancement and augmented the data. The images were annotated using the CVAT tool and reviewed by dental experts. This benchmark dataset is publicly available and provides a valuable resource for researchers in artificial intelligence, computer science, and dental informatics to promote interdisciplinary collaboration and the development of advanced algorithms for dental disease detection.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142322641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This data paper presents a comprehensive visual dataset of 19 distinct types of Indian spices, consisting of high-quality images meticulously curated to facilitate various research and educational applications. The dataset includes extensive imagery of the following spices: Asafoetida, Bay Leaf, Black Cardamom, Black Pepper, Caraway Seeds, Cinnamon Stick, Cloves, Coriander Seeds, Cubeb Pepper, Cumin Seeds, Dry Ginger, Dry Red Chilly, Fennel Seeds, Green Cardamom, Mace, Nutmeg, Poppy Seeds, Star Anise, and Stone Flowers. Each image in the dataset has been captured under controlled conditions to ensure consistency and clarity, making it an invaluable resource for studies in food science, agriculture, and culinary arts. The dataset can also support machine learning and computer vision applications, such as spice recognition and classification. By providing detailed visual documentation, this dataset aims to promote a deeper understanding and appreciation of the rich diversity of Indian spices.
{"title":"Facilitating spice recognition and classification: An image dataset of Indian spices","authors":"Sandip Thite , Deepali Godse , Kailas Patil , Prawit Chumchu , Alfa Nyandoro","doi":"10.1016/j.dib.2024.110936","DOIUrl":"10.1016/j.dib.2024.110936","url":null,"abstract":"<div><div>This data paper presents a comprehensive visual dataset of 19 distinct types of Indian spices, consisting of high-quality images meticulously curated to facilitate various research and educational applications. The dataset includes extensive imagery of the following spices: Asafoetida, Bay Leaf, Black Cardamom, Black Pepper, Caraway Seeds, Cinnamon Stick, Cloves, Coriander Seeds, Cubeb Pepper, Cumin Seeds, Dry Ginger, Dry Red Chilly, Fennel Seeds, Green Cardamom, Mace, Nutmeg, Poppy Seeds, Star Anise, and Stone Flowers. Each image in the dataset has been captured under controlled conditions to ensure consistency and clarity, making it an invaluable resource for studies in food science, agriculture, and culinary arts. The dataset can also support machine learning and computer vision applications, such as spice recognition and classification. By providing detailed visual documentation, this dataset aims to promote a deeper understanding and appreciation of the rich diversity of Indian spices.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142427131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-21DOI: 10.1016/j.dib.2024.110968
Grant M. Hanada , Marija Kalabic , Daniel P. Ferris
To fully understand brain processes in the real world, it is necessary to record and quantitatively analyse brain processes during real world human experiences. Mobile electroencephalography (EEG) and physiological data sensors provide new opportunities for studying humans outside of the laboratory. The purpose of this study was to document data from high-density EEG and mobile physiological sensors while humans performed a visual search task both on a treadmill in a laboratory setting and overground in a natural outdoor setting. The data set includes 49 young, healthy participants on an outdoor arboretum path and on a treadmill in a laboratory with a large virtual reality screen. The data provide a valuable research tool for scientists interested in signal processing, electrocortical brain processes, mobile brain imaging, and brain-computer interfaces based on mobile EEG. Given the comparison data between laboratory and real world conditions, researchers can test the viability of new processing algorithms across conditions or investigate changes in electrocortical activity related to behavioural dynamics coded into the data.
{"title":"Mobile brain–body imaging data set of indoor treadmill walking and outdoor walking with a visual search task","authors":"Grant M. Hanada , Marija Kalabic , Daniel P. Ferris","doi":"10.1016/j.dib.2024.110968","DOIUrl":"10.1016/j.dib.2024.110968","url":null,"abstract":"<div><div>To fully understand brain processes in the real world, it is necessary to record and quantitatively analyse brain processes during real world human experiences. Mobile electroencephalography (EEG) and physiological data sensors provide new opportunities for studying humans outside of the laboratory. The purpose of this study was to document data from high-density EEG and mobile physiological sensors while humans performed a visual search task both on a treadmill in a laboratory setting and overground in a natural outdoor setting. The data set includes 49 young, healthy participants on an outdoor arboretum path and on a treadmill in a laboratory with a large virtual reality screen. The data provide a valuable research tool for scientists interested in signal processing, electrocortical brain processes, mobile brain imaging, and brain-computer interfaces based on mobile EEG. Given the comparison data between laboratory and real world conditions, researchers can test the viability of new processing algorithms across conditions or investigate changes in electrocortical activity related to behavioural dynamics coded into the data.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142357549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}