首页 > 最新文献

Data in Brief最新文献

英文 中文
Kurdish social media sentiment corpus: Misyar marriage perspectives 库尔德社交媒体情感语料库:Misyar 婚姻观点
IF 1 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-10-03 DOI: 10.1016/j.dib.2024.110989
Sarkhel H. Taher Karim
This article presents a thorough compilation of 5108 Central Kurdish comments taken from YouTube and Facebook. The purpose of compiling the dataset was to investigate public perceptions of Misyar marriage, a non-traditional form of marriage, in the Kurdistan region. The goal of the 135-day data collection period was to gather comments from specific public pages on these social media platforms. there are two columns in the dataset: sentiments and comments. The sentiments column classifies each comment into one of eight sentiment labels: Positive, Negative, Neutral, Sarcastic or Humorous, Suggestive, Dismissive, Skeptical, and Curious. The comments column contains the text of the comments in Central Kurdish. To improve the quality and uniformity of the data, a great deal of preprocessing was done to address problems like noise removal, character replacement, and space adjustments.
Researchers interested in sentiment analysis, social media studies, Islamic studies, and Kurdish cultural practices will find the dataset to be a useful resource. It can be used for sentiment analysis, trend analysis, linguistic studies, and other analyses. It provides insights into the public discourse surrounding Misyar marriage. The labeled data can aid in the creation of machine learning models and further our knowledge of societal perceptions of emerging religious trends.
本文全面汇编了 5108 条来自 YouTube 和 Facebook 的库尔德中部评论。汇编该数据集的目的是调查库尔德斯坦地区公众对 Misyar 婚姻(一种非传统形式的婚姻)的看法。135 天数据收集期的目标是从这些社交媒体平台的特定公共页面上收集评论。数据集中有两列:情感和评论。情绪列将每条评论分为八种情绪标签之一:积极、消极、中性、讽刺或幽默、暗示、轻蔑、怀疑和好奇。评论栏包含中库尔德语的评论文本。为了提高数据的质量和统一性,对数据进行了大量预处理,以解决噪音去除、字符替换和空间调整等问题。对情感分析、社交媒体研究、伊斯兰研究和库尔德文化习俗感兴趣的研究人员会发现该数据集是一个有用的资源。它可用于情感分析、趋势分析、语言研究和其他分析。它提供了对围绕 Misyar 婚姻的公共讨论的见解。标注的数据有助于创建机器学习模型,进一步了解社会对新兴宗教趋势的看法。
{"title":"Kurdish social media sentiment corpus: Misyar marriage perspectives","authors":"Sarkhel H. Taher Karim","doi":"10.1016/j.dib.2024.110989","DOIUrl":"10.1016/j.dib.2024.110989","url":null,"abstract":"<div><div>This article presents a thorough compilation of 5108 Central Kurdish comments taken from YouTube and Facebook. The purpose of compiling the dataset was to investigate public perceptions of Misyar marriage, a non-traditional form of marriage, in the Kurdistan region. The goal of the 135-day data collection period was to gather comments from specific public pages on these social media platforms. there are two columns in the dataset: sentiments and comments. The sentiments column classifies each comment into one of eight sentiment labels: Positive, Negative, Neutral, Sarcastic or Humorous, Suggestive, Dismissive, Skeptical, and Curious. The comments column contains the text of the comments in Central Kurdish. To improve the quality and uniformity of the data, a great deal of preprocessing was done to address problems like noise removal, character replacement, and space adjustments.</div><div>Researchers interested in sentiment analysis, social media studies, Islamic studies, and Kurdish cultural practices will find the dataset to be a useful resource. It can be used for sentiment analysis, trend analysis, linguistic studies, and other analyses. It provides insights into the public discourse surrounding Misyar marriage. The labeled data can aid in the creation of machine learning models and further our knowledge of societal perceptions of emerging religious trends<em>.</em></div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"Article 110989"},"PeriodicalIF":1.0,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142427250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring reddit forum for software evolution as an alternative requirements source: An end-user discussion dataset on Google maps 探索作为替代需求来源的软件进化 Reddit 论坛:谷歌地图终端用户讨论数据集
IF 1 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-10-03 DOI: 10.1016/j.dib.2024.110993
Javed Ali Khan , Nek Dil Khan , Muhammad Yaqoob , Affan Yasin , Ayed Alwadain
For software development and evolution, end-user feedback from app stores and the Twitter (X) platform has been intensively used recently. However, Reddit forums that provide an argumentative platform to argue and reason about various software features and issues have been less likely to be explored for software evolution and improvement in the literature. Therefore, this study explores Reddit forums as an alternative source for software evolution compared to App Stores, Twitter (X), and Amazon reviews. For this purpose, a Python script is developed to extract end-user discussions related to the Google Maps (GM) app from Reddit forums using Python Praw API, keep the original argumentative structure in user discussions. In total, 3119 end-user discussions from seven related topics about the GMM app are extracted for software evolution. This dataset includes detailed end-user feedback and associated metadata, including Comments ID, Parent ID, author names, timestamps, and upvotes. This dataset is a crucial and valuable resource for software vendors, developers, researchers, and educationists to improve their understanding of identifying new features to include in upcoming app versions. Also, it is of pivotal importance in better understanding recently occurring issues, unlike app stores where user debate on it and provide their justifications. Moreover, the replication package and process of the dataset can enable software researchers, vendors, and developers to extract data from the Reddit forum and use it for the software evolution and improvement process.
在软件开发和进化方面,来自应用程序商店和 Twitter (X) 平台的最终用户反馈最近得到了广泛应用。然而,Reddit 论坛为各种软件功能和问题提供了一个争论和推理的平台,在文献中较少被用于软件进化和改进。因此,与应用商店、Twitter (X) 和亚马逊评论相比,本研究将探索 Reddit 论坛作为软件进化的另一种来源。为此,我们开发了一个 Python 脚本,利用 Python Praw API 从 Reddit 论坛中提取与谷歌地图(GM)应用程序相关的最终用户讨论,并保留用户讨论中的原始论证结构。总共从七个与谷歌地图应用程序相关的主题中提取了 3119 条最终用户讨论,用于软件进化。该数据集包含详细的最终用户反馈和相关元数据,包括评论 ID、父 ID、作者姓名、时间戳和向上投票。对于软件供应商、开发人员、研究人员和教育工作者来说,该数据集是至关重要的宝贵资源,可帮助他们更好地了解如何在即将推出的应用程序版本中加入新功能。此外,它对于更好地了解最近发生的问题也具有关键意义,而不像应用商店那样由用户进行讨论并提供理由。此外,数据集的复制包和复制过程可使软件研究人员、供应商和开发人员从 Reddit 论坛中提取数据,并将其用于软件进化和改进过程。
{"title":"Exploring reddit forum for software evolution as an alternative requirements source: An end-user discussion dataset on Google maps","authors":"Javed Ali Khan ,&nbsp;Nek Dil Khan ,&nbsp;Muhammad Yaqoob ,&nbsp;Affan Yasin ,&nbsp;Ayed Alwadain","doi":"10.1016/j.dib.2024.110993","DOIUrl":"10.1016/j.dib.2024.110993","url":null,"abstract":"<div><div>For software development and evolution, end-user feedback from app stores and the Twitter (X) platform has been intensively used recently. However, Reddit forums that provide an argumentative platform to argue and reason about various software features and issues have been less likely to be explored for software evolution and improvement in the literature. Therefore, this study explores Reddit forums as an alternative source for software evolution compared to App Stores, Twitter (X), and Amazon reviews. For this purpose, a Python script is developed to extract end-user discussions related to the Google Maps (GM) app from Reddit forums using Python Praw API, keep the original argumentative structure in user discussions. In total, 3119 end-user discussions from seven related topics about the GMM app are extracted for software evolution. This dataset includes detailed end-user feedback and associated metadata, including Comments ID, Parent ID, author names, timestamps, and upvotes. This dataset is a crucial and valuable resource for software vendors, developers, researchers, and educationists to improve their understanding of identifying new features to include in upcoming app versions. Also, it is of pivotal importance in better understanding recently occurring issues, unlike app stores where user debate on it and provide their justifications. Moreover, the replication package and process of the dataset can enable software researchers, vendors, and developers to extract data from the Reddit forum and use it for the software evolution and improvement process.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"Article 110993"},"PeriodicalIF":1.0,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142427132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dataset of DDoS attacks on Fibaro home center 3 for smart home security 用于智能家居安全的 Fibaro home center 3 的 DDoS 攻击数据集
IF 1 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-10-03 DOI: 10.1016/j.dib.2024.110991
Ladislav Huraj, Marek Šimon, Jakub Lietava
DDoS attacks pose a significant security risk to smart homes and can disrupt the functionality and availability of connected devices in the home. This dataset documents Distributed Denial of Service (DDoS) attacks against the Fibaro Home Center 3 central control unit, which is used to automate smart homes within the Internet of Things. The focus is on three types of DDoS attacks: TCP SYN flood, ICMP flood and HTTP flood. Data collection was performed on the local network, where SYN flood and ICMP flood attacks were performed using the hping3 tool, and HTTP flood attack was performed using the LOIC tool. The data was captured using Wireshark software and is available in PCAP and CSV formats, allowing detailed analysis of the network traffic. The logs include information such as timestamps, source and destination IP addresses, protocols, packet lengths, and port numbers. The dataset includes raw and anonymized data for each type of attack.
The dataset is a resource for researchers focused on cybersecurity and IoT device protection. It allows simulation and analysis of DDoS attacks on a specific IoT device, providing insight into attack patterns and the effectiveness of defenses. The simplicity and specialization of the dataset makes it a practical resource for developing and testing intrusion detection systems and predictive models to mitigate and prevent DDoS attacks. The use of the PCAP format facilitates the import of the data into various research software platforms.
分布式拒绝服务(DDoS)攻击对智能家居构成重大安全风险,可能会破坏家庭中联网设备的功能和可用性。本数据集记录了针对 Fibaro Home Center 3 中央控制装置的分布式拒绝服务 (DDoS) 攻击,该装置用于在物联网内实现智能家居自动化。重点是三种类型的 DDoS 攻击:TCP SYN flood、ICMP flood 和 HTTP flood。数据收集在本地网络上进行,其中 SYN flood 和 ICMP flood 攻击使用 hping3 工具执行,HTTP flood 攻击使用 LOIC 工具执行。数据使用 Wireshark 软件捕获,并以 PCAP 和 CSV 格式提供,以便对网络流量进行详细分析。日志包括时间戳、源和目标 IP 地址、协议、数据包长度和端口号等信息。该数据集包括每种攻击类型的原始数据和匿名数据。该数据集为专注于网络安全和物联网设备保护的研究人员提供了资源。通过该数据集,可以模拟和分析针对特定物联网设备的 DDoS 攻击,深入了解攻击模式和防御效果。数据集的简单性和专业性使其成为开发和测试入侵检测系统和预测模型的实用资源,以缓解和预防 DDoS 攻击。PCAP 格式的使用便于将数据导入各种研究软件平台。
{"title":"Dataset of DDoS attacks on Fibaro home center 3 for smart home security","authors":"Ladislav Huraj,&nbsp;Marek Šimon,&nbsp;Jakub Lietava","doi":"10.1016/j.dib.2024.110991","DOIUrl":"10.1016/j.dib.2024.110991","url":null,"abstract":"<div><div>DDoS attacks pose a significant security risk to smart homes and can disrupt the functionality and availability of connected devices in the home. This dataset documents Distributed Denial of Service (DDoS) attacks against the Fibaro Home Center 3 central control unit, which is used to automate smart homes within the Internet of Things. The focus is on three types of DDoS attacks: TCP SYN flood, ICMP flood and HTTP flood. Data collection was performed on the local network, where SYN flood and ICMP flood attacks were performed using the hping3 tool, and HTTP flood attack was performed using the LOIC tool. The data was captured using Wireshark software and is available in PCAP and CSV formats, allowing detailed analysis of the network traffic. The logs include information such as timestamps, source and destination IP addresses, protocols, packet lengths, and port numbers. The dataset includes raw and anonymized data for each type of attack.</div><div>The dataset is a resource for researchers focused on cybersecurity and IoT device protection. It allows simulation and analysis of DDoS attacks on a specific IoT device, providing insight into attack patterns and the effectiveness of defenses. The simplicity and specialization of the dataset makes it a practical resource for developing and testing intrusion detection systems and predictive models to mitigate and prevent DDoS attacks. The use of the PCAP format facilitates the import of the data into various research software platforms.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"Article 110991"},"PeriodicalIF":1.0,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142427288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring ddRAD sequencing data of tomato genotypes evaluated for the heat stress tolerance 探索番茄耐热胁迫基因型的 ddRAD 测序数据
IF 1 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-10-01 DOI: 10.1016/j.dib.2024.110982
Salvatore Graci, Amalia Barone
Climate change is a major concern for agricultural crops, and the selection of tolerant genotypes in response to abiotic stresses represents an important breeding strategy to reduce yield losses. In addition, the continuous development of new and more accurate high-throughput technologies for the analysis of DNA sequences is the key to improve biological understanding and application of biological knowledge. In the present work, 27 tomato genotypes already evaluated for their response under high temperature conditions were sequenced by using the ddRAD sequencing technology. The main goal was to provide genomic data useful for identifying candidate genes and variants to cope with current climate changes. Total genomic DNA was extracted from leaves and sequenced on the HiSeq2500 Illumina instrument. Raw reads of the dataset were processed using different bioinformatics tools to generate a Variant Calling Format (VCF) file. The availability of resources reporting polymorphisms among genomes of different genotypes provides a useful basis for studying tomato tolerance to current climate changes and can be used by researchers and breeders to investigate the molecular response mechanisms and develop new breeding programs, also aided by Marked Assisted Selection (MAS). The raw reads were deposited into SRA database (https://www.ncbi.nlm.nih.gov/sra/PRJNA1137563).
气候变化是农作物面临的一个主要问题,选择耐受非生物胁迫的基因型是减少产量损失的重要育种策略。此外,不断开发新的、更准确的高通量 DNA 序列分析技术是提高生物认识和生物知识应用的关键。在本研究中,利用 ddRAD 测序技术对已评估过的 27 个番茄基因型在高温条件下的反应进行了测序。主要目的是提供基因组数据,以确定应对当前气候变化的候选基因和变体。从叶片中提取总基因组DNA,并在HiSeq2500 Illumina仪器上进行测序。使用不同的生物信息学工具处理数据集的原始读数,生成变异调用格式(VCF)文件。报告不同基因型基因组间多态性的资源为研究番茄对当前气候变化的耐受性提供了有用的基础,研究人员和育种人员可利用这些资源研究分子响应机制,并开发新的育种计划,标记辅助选择(MAS)也可起到辅助作用。原始读数已存入 SRA 数据库 (https://www.ncbi.nlm.nih.gov/sra/PRJNA1137563)。
{"title":"Exploring ddRAD sequencing data of tomato genotypes evaluated for the heat stress tolerance","authors":"Salvatore Graci,&nbsp;Amalia Barone","doi":"10.1016/j.dib.2024.110982","DOIUrl":"10.1016/j.dib.2024.110982","url":null,"abstract":"<div><div>Climate change is a major concern for agricultural crops, and the selection of tolerant genotypes in response to abiotic stresses represents an important breeding strategy to reduce yield losses. In addition, the continuous development of new and more accurate high-throughput technologies for the analysis of DNA sequences is the key to improve biological understanding and application of biological knowledge. In the present work, 27 tomato genotypes already evaluated for their response under high temperature conditions were sequenced by using the ddRAD sequencing technology. The main goal was to provide genomic data useful for identifying candidate genes and variants to cope with current climate changes. Total genomic DNA was extracted from leaves and sequenced on the HiSeq2500 Illumina instrument. Raw reads of the dataset were processed using different bioinformatics tools to generate a Variant Calling Format (VCF) file. The availability of resources reporting polymorphisms among genomes of different genotypes provides a useful basis for studying tomato tolerance to current climate changes and can be used by researchers and breeders to investigate the molecular response mechanisms and develop new breeding programs, also aided by Marked Assisted Selection (MAS). The raw reads were deposited into SRA database (<span><span>https://www.ncbi.nlm.nih.gov/sra/PRJNA1137563</span><svg><path></path></svg></span>).</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"Article 110982"},"PeriodicalIF":1.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142427213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mine 4.0-mineCareerDB: A high-resolution image dataset for mining career segmentation and object detection Mine 4.0-mineCareerDB:用于采矿职业细分和对象检测的高分辨率图像数据集
IF 1 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-10-01 DOI: 10.1016/j.dib.2024.110976
Nasreddine Haqiq , Mounia Zaim , Mohamed Sbihi , Khalid El Amraoui , Mustapha El Alaoui , Lhoussaine Masmoudi , Hamza Echarrafi
The article presents Mine 4.0-MineCareerDB, a publicly available dataset of high-resolution image captured by a DJI Phantom 4 RTK drone specifically designed for analyzing mining careers. The dataset comprises a collection of 373 images depicting various mining operations and activities. Each image is georeferenced and offers a detailed view of mining activities, including the use of various equipment, infrastructure, and overall mining environment. This dataset has the potential to be a valuable resource for computer vision applications in the mining industry such as developing algorithms for identifying mining equipment, training deep learning models for safety analysis and optimization, and research on automation in mining operations. By making Mine4.0-MineCareerDB publicly available, we aim to stimulate further advancements in computer vision research and its applications in the mining sector. The dataset is available at: https://data.mendeley.com/datasets/c5s76mj4bm/5
文章介绍了 Mine 4.0-MineCareerDB,这是一个公开可用的高分辨率图像数据集,由大疆 Phantom 4 RTK 无人机拍摄,专门用于分析采矿职业。该数据集由 373 幅描绘各种采矿作业和活动的图像组成。每张图像都有地理坐标,可提供采矿活动的详细情况,包括各种设备的使用、基础设施和整体采矿环境。该数据集有可能成为采矿业计算机视觉应用的宝贵资源,例如开发用于识别采矿设备的算法、训练用于安全分析和优化的深度学习模型以及采矿作业自动化研究。通过公开 Mine4.0-MineCareerDB,我们希望进一步推动计算机视觉研究及其在采矿业的应用。数据集可从以下网址获取: https://data.mendeley.com/datasets/c5s76mj4bm/5
{"title":"Mine 4.0-mineCareerDB: A high-resolution image dataset for mining career segmentation and object detection","authors":"Nasreddine Haqiq ,&nbsp;Mounia Zaim ,&nbsp;Mohamed Sbihi ,&nbsp;Khalid El Amraoui ,&nbsp;Mustapha El Alaoui ,&nbsp;Lhoussaine Masmoudi ,&nbsp;Hamza Echarrafi","doi":"10.1016/j.dib.2024.110976","DOIUrl":"10.1016/j.dib.2024.110976","url":null,"abstract":"<div><div>The article presents Mine 4.0-MineCareerDB, a publicly available dataset of high-resolution image captured by a DJI Phantom 4 RTK drone specifically designed for analyzing mining careers. The dataset comprises a collection of 373 images depicting various mining operations and activities. Each image is georeferenced and offers a detailed view of mining activities, including the use of various equipment, infrastructure, and overall mining environment. This dataset has the potential to be a valuable resource for computer vision applications in the mining industry such as developing algorithms for identifying mining equipment, training deep learning models for safety analysis and optimization, and research on automation in mining operations. By making Mine4.0-MineCareerDB publicly available, we aim to stimulate further advancements in computer vision research and its applications in the mining sector. The dataset is available at: <span><span>https://data.mendeley.com/datasets/c5s76mj4bm/5</span><svg><path></path></svg></span></div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"Article 110976"},"PeriodicalIF":1.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142427251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What are data spaces? Systematic survey and future outlook 什么是数据空间?系统调查和未来展望
IF 1 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-10-01 DOI: 10.1016/j.dib.2024.110969
Manlio Bacco , Alexander Kocian , Stefano Chessa , Antonino Crivello , Paolo Barsocchi
Data spaces, a novel concept pushing data sharing and exchange, are experiencing momentum because of recent developments motivated by the increasing need for interoperability and data sovereignty. After an initial phase, dating back to approximately twenty years ago, in which this concept has been tentatively explored in different scenarios, it is presently going through a consolidation phase in which both specifications and implementations converge towards a common reference for standardisation. In this context, we offer our view on data spaces by presenting a systematic literature survey, a description of the components needed to build them, how they work, and of existing mature software implementations. We thoroughly present the architectural vision behind the concept and we analyse the Reference Architectural Model by IDS. We provide practical pointers to readers interested in experimenting with software components used in data spaces, and we conclude by highlighting open challenges for their success.
数据空间是一个推动数据共享和交换的新概念,由于对互操作性和数据主权的需求日益增长,数据空间的发展势头迅猛。在大约二十年前的初始阶段,这一概念曾在不同的场景中进行过初步探索,目前正经历一个巩固阶段,在这一阶段中,规范和实施都向标准化的共同参考靠拢。在此背景下,我们通过系统的文献调查、对构建数据空间所需组件的描述、数据空间的工作原理以及现有的成熟软件实施,提出了我们对数据空间的看法。我们全面介绍了这一概念背后的架构愿景,并分析了 IDS 的参考架构模型。我们为有兴趣尝试在数据空间中使用软件组件的读者提供了实用指南,并在最后强调了这些组件取得成功所面临的挑战。
{"title":"What are data spaces? Systematic survey and future outlook","authors":"Manlio Bacco ,&nbsp;Alexander Kocian ,&nbsp;Stefano Chessa ,&nbsp;Antonino Crivello ,&nbsp;Paolo Barsocchi","doi":"10.1016/j.dib.2024.110969","DOIUrl":"10.1016/j.dib.2024.110969","url":null,"abstract":"<div><div>Data spaces, a novel concept pushing data sharing and exchange, are experiencing momentum because of recent developments motivated by the increasing need for interoperability and data sovereignty. After an initial phase, dating back to approximately twenty years ago, in which this concept has been tentatively explored in different scenarios, it is presently going through a consolidation phase in which both specifications and implementations converge towards a common reference for standardisation. In this context, we offer our view on data spaces by presenting a systematic literature survey, a description of the components needed to build them, how they work, and of existing mature software implementations. We thoroughly present the architectural vision behind the concept and we analyse the Reference Architectural Model by IDS. We provide practical pointers to readers interested in experimenting with software components used in data spaces, and we conclude by highlighting open challenges for their success.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"Article 110969"},"PeriodicalIF":1.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142427207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dataset of mammography images with area-based breast density values, breast area, and dense tissue segmentation masks 具有基于面积的乳腺密度值、乳腺面积和致密组织分割掩码的乳腺 X 射线图像数据集
IF 1 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-09-30 DOI: 10.1016/j.dib.2024.110980
Hamid Behravan , Naga Raju Gudhe , Hidemi Okuma , Mazen Sudah , Arto Mannermaa
A new dataset is presented to propel research in automated breast density estimation, a crucial factor in mammogram interpretation. Mammography, a low-dose X-ray technique for breast cancer screening, can be affected by breast density. Dense tissue appears white on mammograms, potentially obscuring tumors. This dataset, built upon the public VinDr-Mammo dataset, offers 745 mammogram images (including training and test sets) along with expert-radiologist annotations for both the entire breast and dense tissue regions. Researchers can leverage this dataset for multiple purposes: training deep learning models for automated breast density analysis, refining segmentation methods for accurate delineation of breast tissue, and benchmarking existing and novel breast density estimation algorithms. This resource holds promise for improving breast cancer screening through advancements in automated breast density analysis.
本文介绍了一个新的数据集,以推动乳房密度自动估算方面的研究,这是乳房X光照片判读的一个关键因素。乳房 X 射线照相术是一种用于乳腺癌筛查的低剂量 X 射线技术,会受到乳房密度的影响。致密组织在乳房 X 光照片上显示为白色,可能会遮挡肿瘤。该数据集以公开的 VinDr-Mammo 数据集为基础,提供了 745 幅乳房 X 光图像(包括训练集和测试集),以及专家-放射线学家对整个乳房和致密组织区域的注释。研究人员可以利用该数据集实现多种目的:训练用于自动乳腺密度分析的深度学习模型,改进用于准确划分乳腺组织的分割方法,以及对现有和新型乳腺密度估计算法进行基准测试。该资源有望通过自动乳腺密度分析的进步改善乳腺癌筛查。
{"title":"A dataset of mammography images with area-based breast density values, breast area, and dense tissue segmentation masks","authors":"Hamid Behravan ,&nbsp;Naga Raju Gudhe ,&nbsp;Hidemi Okuma ,&nbsp;Mazen Sudah ,&nbsp;Arto Mannermaa","doi":"10.1016/j.dib.2024.110980","DOIUrl":"10.1016/j.dib.2024.110980","url":null,"abstract":"<div><div>A new dataset is presented to propel research in automated breast density estimation, a crucial factor in mammogram interpretation. Mammography, a low-dose X-ray technique for breast cancer screening, can be affected by breast density. Dense tissue appears white on mammograms, potentially obscuring tumors. This dataset, built upon the public VinDr-Mammo dataset, offers 745 mammogram images (including training and test sets) along with expert-radiologist annotations for both the entire breast and dense tissue regions. Researchers can leverage this dataset for multiple purposes: training deep learning models for automated breast density analysis, refining segmentation methods for accurate delineation of breast tissue, and benchmarking existing and novel breast density estimation algorithms. This resource holds promise for improving breast cancer screening through advancements in automated breast density analysis.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"Article 110980"},"PeriodicalIF":1.0,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142427208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dataset on the performance of Atlantic salmon (Salmo salar) reared at different dissolved oxygen levels under experimental conditions 在不同溶解氧水平的实验条件下饲养大西洋鲑鱼(Salmo salar)的性能数据集
IF 1 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-09-28 DOI: 10.1016/j.dib.2024.110983
Nina Liland , Ivar Rønnestad , Marina Azevedo , Floriana Lai , Frida Oulie , Luís Conceição , Filipe Soares
Atlantic salmon (Salmo salar) cultivated in cages and net-pens are regularly exposed to natural variations in dissolved oxygen levels, occasionally experiencing events of low oxygen availability. Quantifying the impact of low dissolved oxygen levels on fish performance can help fish farmers better manage the risks associated with such events.
This article describes the zootechnical performance of Atlantic salmon reared under experimental conditions at three different dissolved oxygen levels (i.e., low: 50 % saturation; medium: 60 % saturation; high: 95 % saturation). The data was collected in the context of two in vivo trials: (i) Trial A, where fish with an initial average body weight of 312.44 ± 11.53 g were reared in indoor tanks at the different DO levels for 30 days; (ii) Trial B, where fish with an initial average body weight of 735.33 ± 40.42 g were reared in indoor tanks at the different DO levels for 26 days.
The dataset [1] is composed of spreadsheets (.xlsx format) and charts (.png format), and includes daily and hourly resolution data (e.g., dissolved oxygen, water temperature, salinity, number of fish and feed intake), sampling and laboratory data (e.g., fish weight, fork length, sex, organs weight, whole-body composition, and tail and opercular beat frequency), and zootechnical indicators calculated at the tank level and averaged per treatment (e.g., survival rate, weight gain, cumulative feed intake, feed conversion ratio and somatic indexes). The differences between treatment means were analyzed using ANOVA, followed by post-hoc testing.
The data presented here has the potential to be used in subsequent analyses, for example when analyzed together with other experimental data or through its use to parameterize mathematical models, aiming at better understand and describe the effects of dissolved oxygen on the performance of Atlantic salmon.
在网箱和网箱中养殖的大西洋鲑(Salmo salar)经常暴露于溶解氧水平的自然变化中,偶尔会经历低氧可用性事件。本文介绍了在三种不同溶氧水平(即低:50% 饱和度;中:60% 饱和度;高:95% 饱和度)的实验条件下饲养的大西洋鲑的动物技术性能。数据是在两个体内试验中收集的:(i) 试验 A,在不同溶氧水平的室内水箱中饲养初始平均体重为 312.44 ± 11.53 克的鱼,为期 30 天;(ii) 试验 B,在不同溶氧水平的室内水箱中饲养初始平均体重为 735.数据集[1]由电子表格(.xlsx 格式)和图表(.png 格式)组成,包括每日和每小时的分辨率数据(如:溶解氧、水温、盐度、水质、水体温度、水质和水质)、溶解氧、水温、盐度、鱼的数量和摄食量)、取样和实验室数据(如鱼的重量、叉长、性别、器官重量、全身成分、尾部和厣的搏动频率),以及在水槽层面计算的动物技术指标和每个处理的平均值(如存活率、增重、累计摄食量、饲料转化率和体细胞指数)。本文提供的数据可用于后续分析,例如与其他实验数据一起分析,或用于数学模型参数化,以更好地理解和描述溶解氧对大西洋鲑表现的影响。
{"title":"Dataset on the performance of Atlantic salmon (Salmo salar) reared at different dissolved oxygen levels under experimental conditions","authors":"Nina Liland ,&nbsp;Ivar Rønnestad ,&nbsp;Marina Azevedo ,&nbsp;Floriana Lai ,&nbsp;Frida Oulie ,&nbsp;Luís Conceição ,&nbsp;Filipe Soares","doi":"10.1016/j.dib.2024.110983","DOIUrl":"10.1016/j.dib.2024.110983","url":null,"abstract":"<div><div>Atlantic salmon (<em>Salmo salar</em>) cultivated in cages and net-pens are regularly exposed to natural variations in dissolved oxygen levels, occasionally experiencing events of low oxygen availability. Quantifying the impact of low dissolved oxygen levels on fish performance can help fish farmers better manage the risks associated with such events.</div><div>This article describes the zootechnical performance of Atlantic salmon reared under experimental conditions at three different dissolved oxygen levels (i.e., low: 50 % saturation; medium: 60 % saturation; high: 95 % saturation). The data was collected in the context of two in vivo trials: (i) Trial A, where fish with an initial average body weight of 312.44 ± 11.53 g were reared in indoor tanks at the different DO levels for 30 days; (ii) Trial B, where fish with an initial average body weight of 735.33 ± 40.42 g were reared in indoor tanks at the different DO levels for 26 days.</div><div>The dataset [1] is composed of spreadsheets (.xlsx format) and charts (.png format), and includes daily and hourly resolution data (e.g., dissolved oxygen, water temperature, salinity, number of fish and feed intake), sampling and laboratory data (e.g., fish weight, fork length, sex, organs weight, whole-body composition, and tail and opercular beat frequency), and zootechnical indicators calculated at the tank level and averaged per treatment (e.g., survival rate, weight gain, cumulative feed intake, feed conversion ratio and somatic indexes). The differences between treatment means were analyzed using ANOVA, followed by post-hoc testing.</div><div>The data presented here has the potential to be used in subsequent analyses, for example when analyzed together with other experimental data or through its use to parameterize mathematical models, aiming at better understand and describe the effects of dissolved oxygen on the performance of Atlantic salmon.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"Article 110983"},"PeriodicalIF":1.0,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142445186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comprehensive dataset of rice field weed detection from Bangladesh 孟加拉国稻田杂草检测综合数据集
IF 1 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-09-28 DOI: 10.1016/j.dib.2024.110981
Md Sawkat Ali , Mohammad Rifat Ahmmad Rashid , Tasnim Hossain , Md Ahsan Kabir , Md. Kamrul , Sayam Hossain Bhuiyan Aumy , Mehedi Hasan Mridha , Imam Hossain Sajeeb , Mohammad Manzurul Islam , Taskeed Jabid
In agricultural research, particularly concerning rice cultivation, the presence of weeds within rice fields is acknowledged as a significant contributor to both diminished crop quality and increased production costs. Rice fields, due to their inherently moist environment, offer ideal conditions for weed proliferation. Traditionally, the control of these weeds has been managed through labor-intensive manual methods. However, as the agricultural sector evolves, there is a notable pivot towards leveraging advanced technological solutions, including deep learning and machine learning. The efficacy of these technologies hinges on the availability of high-quality, relevant data. To address this, a comprehensive dataset comprising 3632 high-resolution RGB images has been developed. This dataset is designed to capture a diverse range of weed species, specifically 11 types that are frequently found in rice fields. The diversity of the dataset ensures that machine learning models trained using this data can effectively identify and differentiate between desired and undesired plant species. While the dataset predominantly includes images from Bangladesh, the weed species it documents are commonly found across various global rice-growing regions, enhancing the dataset's applicability in different agricultural settings.
在农业研究中,尤其是有关水稻种植的研究中,稻田中杂草的存在被认为是导致作物质量下降和生产成本增加的重要原因。稻田由于其固有的潮湿环境,为杂草的繁殖提供了理想的条件。传统上,控制这些杂草的方法是劳动密集型的人工方法。然而,随着农业领域的发展,人们明显倾向于利用先进的技术解决方案,包括深度学习和机器学习。这些技术的功效取决于能否获得高质量的相关数据。为此,我们开发了一个由 3632 张高分辨率 RGB 图像组成的综合数据集。该数据集旨在捕捉多种杂草物种,特别是水稻田中常见的 11 种杂草。数据集的多样性确保了使用这些数据训练的机器学习模型能够有效地识别和区分理想和不理想的植物物种。虽然该数据集主要包括孟加拉国的图像,但其记录的杂草物种在全球各个水稻种植区都很常见,从而增强了该数据集在不同农业环境中的适用性。
{"title":"A comprehensive dataset of rice field weed detection from Bangladesh","authors":"Md Sawkat Ali ,&nbsp;Mohammad Rifat Ahmmad Rashid ,&nbsp;Tasnim Hossain ,&nbsp;Md Ahsan Kabir ,&nbsp;Md. Kamrul ,&nbsp;Sayam Hossain Bhuiyan Aumy ,&nbsp;Mehedi Hasan Mridha ,&nbsp;Imam Hossain Sajeeb ,&nbsp;Mohammad Manzurul Islam ,&nbsp;Taskeed Jabid","doi":"10.1016/j.dib.2024.110981","DOIUrl":"10.1016/j.dib.2024.110981","url":null,"abstract":"<div><div>In agricultural research, particularly concerning rice cultivation, the presence of weeds within rice fields is acknowledged as a significant contributor to both diminished crop quality and increased production costs. Rice fields, due to their inherently moist environment, offer ideal conditions for weed proliferation. Traditionally, the control of these weeds has been managed through labor-intensive manual methods. However, as the agricultural sector evolves, there is a notable pivot towards leveraging advanced technological solutions, including deep learning and machine learning. The efficacy of these technologies hinges on the availability of high-quality, relevant data. To address this, a comprehensive dataset comprising 3632 high-resolution RGB images has been developed. This dataset is designed to capture a diverse range of weed species, specifically 11 types that are frequently found in rice fields. The diversity of the dataset ensures that machine learning models trained using this data can effectively identify and differentiate between desired and undesired plant species. While the dataset predominantly includes images from Bangladesh, the weed species it documents are commonly found across various global rice-growing regions, enhancing the dataset's applicability in different agricultural settings.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"Article 110981"},"PeriodicalIF":1.0,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142427219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dataset on the segregation of students with disabilities in Brazil 巴西残疾学生隔离情况数据集
IF 1 Q3 MULTIDISCIPLINARY SCIENCES Pub Date : 2024-09-28 DOI: 10.1016/j.dib.2024.110972
Rafael Verão Françozo , Afonso Henriques Silva Leite , Leonardo Lopes Honda , Felipe Fernandes de Oliveira , Marcio Teixeira Oliveira , Calvin Rodrigues da Costa
The dataset provided shows the segregation rate between students with and without disabilities in Brazilian cities between 2008 and 2023. Student enrolment data was extracted from the microdata of the Brazilian school census. The segregation rate was calculated using the dissimilarity index, which quantifies how dissimilar or segregated two populations are. The dataset consists of a .csv file with the calculated data and thematic maps of the Brazilian states, highlighting the cities. This data can be useful for researchers in the field of inclusion and decision-makers to support the development of public policies that enable eliminate disparities in education and ensure equal access to all levels of education for the persons with disabilities.
所提供的数据集显示了 2008 年至 2023 年巴西各城市残疾学生和非残疾学生之间的隔离率。学生入学数据来自巴西学校普查的微观数据。隔离率使用差异指数计算,该指数可量化两个人群的差异或隔离程度。数据集由一个包含计算数据的 .csv 文件和巴西各州的专题地图组成,其中突出显示了各城市。这些数据对全纳教育领域的研究人员和决策者非常有用,有助于制定公共政策,消除教育差距,确保残疾人平等接受各级教育。
{"title":"A dataset on the segregation of students with disabilities in Brazil","authors":"Rafael Verão Françozo ,&nbsp;Afonso Henriques Silva Leite ,&nbsp;Leonardo Lopes Honda ,&nbsp;Felipe Fernandes de Oliveira ,&nbsp;Marcio Teixeira Oliveira ,&nbsp;Calvin Rodrigues da Costa","doi":"10.1016/j.dib.2024.110972","DOIUrl":"10.1016/j.dib.2024.110972","url":null,"abstract":"<div><div>The dataset provided shows the segregation rate between students with and without disabilities in Brazilian cities between 2008 and 2023. Student enrolment data was extracted from the microdata of the Brazilian school census. The segregation rate was calculated using the dissimilarity index, which quantifies how dissimilar or segregated two populations are. The dataset consists of a .csv file with the calculated data and thematic maps of the Brazilian states, highlighting the cities. This data can be useful for researchers in the field of inclusion and decision-makers to support the development of public policies that enable eliminate disparities in education and ensure equal access to all levels of education for the persons with disabilities.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"57 ","pages":"Article 110972"},"PeriodicalIF":1.0,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142427218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Data in Brief
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1