Data最新文献 - Book学术

SparrKULee: A Speech-Evoked Auditory Response Repository from KU Leuven, Containing the EEG of 85 Participants SparrKULee：鲁汶大学语音诱发听觉反应存储库，包含 85 名参与者的脑电图信息

Data

Pub Date : 2024-07-26 DOI: 10.3390/data9080094

Bernd Accou, Lies Bollens, Marlies Gillis, Wendy Verheijen, Hugo Van hamme, T. Francart

Researchers investigating the neural mechanisms underlying speech perception often employ electroencephalography (EEG) to record brain activity while participants listen to spoken language. The high temporal resolution of EEG enables the study of neural responses to fast and dynamic speech signals. Previous studies have successfully extracted speech characteristics from EEG data and, conversely, predicted EEG activity from speech features. Machine learning techniques are generally employed to construct encoding and decoding models, which necessitate a substantial quantity of data. We present SparrKULee, a Speech-evoked Auditory Repository of EEG data, measured at KU Leuven, comprising 64-channel EEG recordings from 85 young individuals with normal hearing, each of whom listened to 90–150 min of natural speech. This dataset is more extensive than any currently available dataset in terms of both the number of participants and the quantity of data per participant. It is suitable for training larger machine learning models. We evaluate the dataset using linear and state-of-the-art non-linear models in a speech encoding/decoding and match/mismatch paradigm, providing benchmark scores for future research.

研究语音感知神经机制的研究人员通常会使用脑电图（EEG）来记录参与者聆听口语时的大脑活动。脑电图的时间分辨率高，可以研究神经对快速和动态语音信号的反应。以往的研究已经成功地从脑电图数据中提取了语音特征，反之，也从语音特征中预测了脑电图活动。机器学习技术通常用于构建编码和解码模型，这需要大量的数据。我们介绍的 SparrKULee 是在鲁汶工程大学测量的语音诱发听觉脑电图数据存储库，由 85 名听力正常的年轻人的 64 个通道脑电图记录组成，每个人都听了 90-150 分钟的自然语音。无论从参与者人数还是从每个参与者的数据量来看，该数据集都比目前可用的任何数据集都要广泛。它适用于训练较大的机器学习模型。我们在语音编码/解码和匹配/错配范例中使用线性模型和最先进的非线性模型对数据集进行了评估，为未来的研究提供了基准分数。

{"title":"SparrKULee: A Speech-Evoked Auditory Response Repository from KU Leuven, Containing the EEG of 85 Participants","authors":"Bernd Accou, Lies Bollens, Marlies Gillis, Wendy Verheijen, Hugo Van hamme, T. Francart","doi":"10.3390/data9080094","DOIUrl":"https://doi.org/10.3390/data9080094","url":null,"abstract":"Researchers investigating the neural mechanisms underlying speech perception often employ electroencephalography (EEG) to record brain activity while participants listen to spoken language. The high temporal resolution of EEG enables the study of neural responses to fast and dynamic speech signals. Previous studies have successfully extracted speech characteristics from EEG data and, conversely, predicted EEG activity from speech features. Machine learning techniques are generally employed to construct encoding and decoding models, which necessitate a substantial quantity of data. We present SparrKULee, a Speech-evoked Auditory Repository of EEG data, measured at KU Leuven, comprising 64-channel EEG recordings from 85 young individuals with normal hearing, each of whom listened to 90–150 min of natural speech. This dataset is more extensive than any currently available dataset in terms of both the number of participants and the quantity of data per participant. It is suitable for training larger machine learning models. We evaluate the dataset using linear and state-of-the-art non-linear models in a speech encoding/decoding and match/mismatch paradigm, providing benchmark scores for future research.","PeriodicalId":502371,"journal":{"name":"Data","volume":"51 15","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141799899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bootstrap Method as a Tool for Analyzing Data with Atypical Distributions Deviating from Parametric Assumptions: Critique and Effectiveness Evaluation 用 Bootstrap 方法分析偏离参数假设的非典型分布数据：批评与效果评估

Data

Pub Date : 2024-07-26 DOI: 10.3390/data9080095

Joanna Kostanek, K. Karolczak, W. Kuliczkowski, Cezary Watała

In today’s research environment characterized by exponential data growth and increasing complexity, the selection of appropriate statistical tests, tailored to research objectives and data distributions, is paramount for rigorous analysis and accurate interpretation. This article explores the growing prominence of bootstrapping, an advanced statistical technique for multiple comparisons analysis, offering flexibility and customization by estimating sample distributions without assuming population distributions, thus serving as a valuable alternative to traditional methods in various data scenarios. Computer simulations were conducted using data from cardiovascular disease patients. Two approaches, spontaneous partly controlled simulation and fully constrained simulation using self-written R scripts, were utilized to generate datasets with specified distributions and analyze the data using tests for comparing more than two groups. The utilization of the bootstrap method greatly improves statistical analysis, especially in overcoming the constraints of conventional parametric tests. Our research showcased its effectiveness in comparing multiple scenarios, yielding strong findings across diverse distributions, even with minor inflation in p values. Serving as a valuable substitute for parametric approaches, bootstrap promotes careful consideration when rejecting hypotheses, thus fostering a deeper understanding of statistical nuances and bolstering analytical rigor.

在当今以指数级数据增长和日益复杂为特点的研究环境中，根据研究目标和数据分布选择适当的统计检验对于严谨分析和准确解释至关重要。本文探讨了日益突出的引导法（bootstrapping），这是一种先进的多重比较分析统计技术，通过估计样本分布而不假定总体分布，提供了灵活性和定制性，因此在各种数据情况下可作为传统方法的重要替代方法。我们利用心血管疾病患者的数据进行了计算机模拟。利用自发的部分受控模拟和使用自写的 R 脚本进行完全受限模拟这两种方法，生成了具有指定分布的数据集，并使用比较两组以上的测试对数据进行了分析。引导法的使用极大地改进了统计分析，尤其是在克服传统参数检验的限制方面。我们的研究展示了自举法在比较多种情况时的有效性，即使在 p 值略有膨胀的情况下，也能在不同的分布中得出有力的结论。作为参数方法的重要替代方法，bootstrap 促进了在拒绝假设时的慎重考虑，从而加深了对统计细微差别的理解，提高了分析的严谨性。

{"title":"Bootstrap Method as a Tool for Analyzing Data with Atypical Distributions Deviating from Parametric Assumptions: Critique and Effectiveness Evaluation","authors":"Joanna Kostanek, K. Karolczak, W. Kuliczkowski, Cezary Watała","doi":"10.3390/data9080095","DOIUrl":"https://doi.org/10.3390/data9080095","url":null,"abstract":"In today’s research environment characterized by exponential data growth and increasing complexity, the selection of appropriate statistical tests, tailored to research objectives and data distributions, is paramount for rigorous analysis and accurate interpretation. This article explores the growing prominence of bootstrapping, an advanced statistical technique for multiple comparisons analysis, offering flexibility and customization by estimating sample distributions without assuming population distributions, thus serving as a valuable alternative to traditional methods in various data scenarios. Computer simulations were conducted using data from cardiovascular disease patients. Two approaches, spontaneous partly controlled simulation and fully constrained simulation using self-written R scripts, were utilized to generate datasets with specified distributions and analyze the data using tests for comparing more than two groups. The utilization of the bootstrap method greatly improves statistical analysis, especially in overcoming the constraints of conventional parametric tests. Our research showcased its effectiveness in comparing multiple scenarios, yielding strong findings across diverse distributions, even with minor inflation in p values. Serving as a valuable substitute for parametric approaches, bootstrap promotes careful consideration when rejecting hypotheses, thus fostering a deeper understanding of statistical nuances and bolstering analytical rigor.","PeriodicalId":502371,"journal":{"name":"Data","volume":"43 9","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141800257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimizing Database Performance in Complex Event Processing through Indexing Strategies 通过索引策略优化复杂事件处理中的数据库性能

Data

Pub Date : 2024-07-24 DOI: 10.3390/data9080093

Maryam Abbasi, Marco V. Bernardo, Paulo Váz, J. Silva, Pedro Martins

Complex event processing (CEP) systems have gained significant importance in various domains, such as finance, logistics, and security, where the real-time analysis of event streams is crucial. However, as the volume and complexity of event data continue to grow, optimizing the performance of CEP systems becomes a critical challenge. This paper investigates the impact of indexing strategies on the performance of databases handling complex event processing. We propose a novel indexing technique, called Hierarchical Temporal Indexing (HTI), specifically designed for the efficient processing of complex event queries. HTI leverages the temporal nature of event data and employs a multi-level indexing approach to optimize query execution. By combining temporal indexing with spatial- and attribute-based indexing, HTI aims to accelerate the retrieval and processing of relevant events, thereby improving overall query performance. In this study, we evaluate the effectiveness of HTI by implementing complex event queries on various CEP systems with different indexing strategies. We conduct a comprehensive performance analysis, measuring the query execution times and resource utilization (CPU, memory, etc.), and analyzing the execution plans and query optimization techniques employed by each system. Our experimental results demonstrate that the proposed HTI indexing strategy outperforms traditional indexing approaches, particularly for complex event queries involving temporal constraints and multi-dimensional event attributes. We provide insights into the strengths and weaknesses of each indexing strategy, identifying the factors that influence performance, such as data volume, query complexity, and event characteristics. Furthermore, we discuss the implications of our findings for the design and optimization of CEP systems, offering recommendations for indexing strategy selection based on the specific requirements and workload characteristics. Finally, we outline the potential limitations of our study and suggest future research directions in this domain.

复杂事件处理（CEP）系统在金融、物流和安全等各个领域都具有重要意义，在这些领域，事件流的实时分析至关重要。然而，随着事件数据的数量和复杂性不断增加，优化 CEP 系统的性能成为一项严峻的挑战。本文研究了索引策略对处理复杂事件处理的数据库性能的影响。我们提出了一种名为分层时态索引（HTI）的新型索引技术，专门用于高效处理复杂事件查询。HTI 利用事件数据的时间特性，采用多级索引方法来优化查询执行。HTI 将时间索引与基于空间和属性的索引相结合，旨在加速相关事件的检索和处理，从而提高整体查询性能。在本研究中，我们通过在采用不同索引策略的各种 CEP 系统上执行复杂事件查询来评估 HTI 的有效性。我们进行了全面的性能分析，测量了查询执行时间和资源利用率（CPU、内存等），并分析了每个系统采用的执行计划和查询优化技术。我们的实验结果表明，所提出的 HTI 索引策略优于传统的索引方法，尤其是对于涉及时间限制和多维事件属性的复杂事件查询。我们深入分析了每种索引策略的优缺点，确定了影响性能的因素，如数据量、查询复杂性和事件特征。此外，我们还讨论了我们的研究结果对 CEP 系统设计和优化的影响，并根据具体要求和工作负载特征提出了索引策略选择建议。最后，我们概述了研究的潜在局限性，并提出了该领域未来的研究方向。

{"title":"Optimizing Database Performance in Complex Event Processing through Indexing Strategies","authors":"Maryam Abbasi, Marco V. Bernardo, Paulo Váz, J. Silva, Pedro Martins","doi":"10.3390/data9080093","DOIUrl":"https://doi.org/10.3390/data9080093","url":null,"abstract":"Complex event processing (CEP) systems have gained significant importance in various domains, such as finance, logistics, and security, where the real-time analysis of event streams is crucial. However, as the volume and complexity of event data continue to grow, optimizing the performance of CEP systems becomes a critical challenge. This paper investigates the impact of indexing strategies on the performance of databases handling complex event processing. We propose a novel indexing technique, called Hierarchical Temporal Indexing (HTI), specifically designed for the efficient processing of complex event queries. HTI leverages the temporal nature of event data and employs a multi-level indexing approach to optimize query execution. By combining temporal indexing with spatial- and attribute-based indexing, HTI aims to accelerate the retrieval and processing of relevant events, thereby improving overall query performance. In this study, we evaluate the effectiveness of HTI by implementing complex event queries on various CEP systems with different indexing strategies. We conduct a comprehensive performance analysis, measuring the query execution times and resource utilization (CPU, memory, etc.), and analyzing the execution plans and query optimization techniques employed by each system. Our experimental results demonstrate that the proposed HTI indexing strategy outperforms traditional indexing approaches, particularly for complex event queries involving temporal constraints and multi-dimensional event attributes. We provide insights into the strengths and weaknesses of each indexing strategy, identifying the factors that influence performance, such as data volume, query complexity, and event characteristics. Furthermore, we discuss the implications of our findings for the design and optimization of CEP systems, offering recommendations for indexing strategy selection based on the specific requirements and workload characteristics. Finally, we outline the potential limitations of our study and suggest future research directions in this domain.","PeriodicalId":502371,"journal":{"name":"Data","volume":"9 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141807800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SaBi3d—A LiDAR Point Cloud Data Set of Car-to-Bicycle Overtaking Maneuvers 汽车与自行车超车动作的 SaBi3d-A 激光雷达点云数据集

Data

Pub Date : 2024-07-24 DOI: 10.3390/data9080090

Christian Odenwald, Moritz Beeking

While cycling presents environmental benefits and promotes a healthy lifestyle, the risks associated with overtaking maneuvers by motorized vehicles represent a significant barrier for many potential cyclists. A large-scale analysis of overtaking maneuvers could inform traffic researchers and city planners how to reduce these risks by better understanding these maneuvers. Drawing from the fields of sensor-based cycling research and from LiDAR-based traffic data sets, this paper provides a step towards addressing these safety concerns by introducing the Salzburg Bicycle 3d (SaBi3d) data set, which consists of LiDAR point clouds capturing car-to-bicycle overtaking maneuvers. The data set, collected using a LiDAR-equipped bicycle, facilitates the detailed analysis of a large quantity of overtaking maneuvers without the need for manual annotation through enabling automatic labeling by a neural network. Additionally, a benchmark result for 3D object detection using a competitive neural network is provided as a baseline for future research. The SaBi3d data set is structured identically to the nuScenes data set, and therefore offers compatibility with numerous existing object detection systems. This work provides valuable resources for future researchers to better understand cycling infrastructure and mitigate risks, thus promoting cycling as a viable mode of transportation.

尽管骑自行车有益于环境并能促进健康的生活方式，但机动车超车动作所带来的风险对许多潜在的骑自行车者来说是一个重大障碍。对超车动作进行大规模分析，可以让交通研究人员和城市规划者了解如何通过更好地理解这些动作来降低风险。本文借鉴了基于传感器的自行车研究领域和基于激光雷达的交通数据集，通过介绍萨尔茨堡自行车 3d (SaBi3d) 数据集，为解决这些安全问题迈出了一步。该数据集使用装有激光雷达的自行车采集，通过神经网络自动标注，无需人工标注即可对大量超车动作进行详细分析。此外，还提供了使用竞争性神经网络进行三维物体检测的基准结果，作为未来研究的基线。SaBi3d 数据集的结构与 nuScenes 数据集完全相同，因此可与现有的众多物体检测系统兼容。这项工作为未来的研究人员更好地了解自行车基础设施和降低风险提供了宝贵的资源，从而促进自行车成为一种可行的交通方式。

{"title":"SaBi3d—A LiDAR Point Cloud Data Set of Car-to-Bicycle Overtaking Maneuvers","authors":"Christian Odenwald, Moritz Beeking","doi":"10.3390/data9080090","DOIUrl":"https://doi.org/10.3390/data9080090","url":null,"abstract":"While cycling presents environmental benefits and promotes a healthy lifestyle, the risks associated with overtaking maneuvers by motorized vehicles represent a significant barrier for many potential cyclists. A large-scale analysis of overtaking maneuvers could inform traffic researchers and city planners how to reduce these risks by better understanding these maneuvers. Drawing from the fields of sensor-based cycling research and from LiDAR-based traffic data sets, this paper provides a step towards addressing these safety concerns by introducing the Salzburg Bicycle 3d (SaBi3d) data set, which consists of LiDAR point clouds capturing car-to-bicycle overtaking maneuvers. The data set, collected using a LiDAR-equipped bicycle, facilitates the detailed analysis of a large quantity of overtaking maneuvers without the need for manual annotation through enabling automatic labeling by a neural network. Additionally, a benchmark result for 3D object detection using a competitive neural network is provided as a baseline for future research. The SaBi3d data set is structured identically to the nuScenes data set, and therefore offers compatibility with numerous existing object detection systems. This work provides valuable resources for future researchers to better understand cycling infrastructure and mitigate risks, thus promoting cycling as a viable mode of transportation.","PeriodicalId":502371,"journal":{"name":"Data","volume":"56 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141807012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BELMASK—An Audiovisual Dataset of Adversely Produced Speech for Auditory Cognition Research BELMASK--用于听觉认知研究的不良语音视听数据集

Data

Pub Date : 2024-07-24 DOI: 10.3390/data9080092

C. Moshona, F. Rudawski, André Fiebig, E. Sarradj

In this article, we introduce the Berlin Dataset of Lombard and Masked Speech (BELMASK), a phonetically controlled audiovisual dataset of speech produced in adverse speaking conditions, and describe the development of the related speech task. The dataset contains in total 128 min of audio and video recordings of 10 German native speakers (4 female, 6 male) with a mean age of 30.2 years (SD: 6.3 years), uttering matrix sentences in cued, uninstructed speech in four conditions: (i) with a Filtering Facepiece P2 (FFP2) mask in silence, (ii) without an FFP2 mask in silence, (iii) with an FFP2 mask while exposed to noise, iv) without an FFP2 mask while exposed to noise. Noise consisted of mixed-gender six-talker babble played over headphones to the speakers, triggering the Lombard effect. All conditions are readily available in face-and-voice and voice-only formats. The speech material is annotated, employing a multi-layer architecture, and was originally conceptualized to be used for the administration of a working memory task. The dataset is stored in a restricted-access Zenodo repository and is available for academic research in the area of speech communication, acoustics, psychology and related disciplines upon request, after signing an End User License Agreement (EULA).

在本文中，我们介绍了柏林伦巴第和蒙面语音数据集（BELMASK），这是一个在不利说话条件下产生的语音控制视听数据集，并描述了相关语音任务的开发过程。该数据集包含 10 位平均年龄为 30.2 岁（标准差：6.3 岁）的德国母语人士（4 位女性，6 位男性）在以下四种条件下以无指导的提示语音说出矩阵句子的 128 分钟音频和视频记录：(i) 在安静时佩戴过滤面罩 P2 (FFP2) 面罩；(ii) 在安静时不佩戴 FFP2 面罩；(iii) 在暴露于噪声时佩戴 FFP2 面罩；iv) 在暴露于噪声时不佩戴 FFP2 面罩。噪音包括通过耳机向扬声器播放的男女混合的六人絮语，从而引发伦巴第效应。所有条件均以面声和纯声形式提供。语音材料采用多层结构，并附有注释，最初的构想是用于执行工作记忆任务。该数据集存储在限制访问的 Zenodo 存储库中，在签署最终用户许可协议 (EULA) 后，可应要求用于语音通信、声学、心理学及相关学科领域的学术研究。

{"title":"BELMASK—An Audiovisual Dataset of Adversely Produced Speech for Auditory Cognition Research","authors":"C. Moshona, F. Rudawski, André Fiebig, E. Sarradj","doi":"10.3390/data9080092","DOIUrl":"https://doi.org/10.3390/data9080092","url":null,"abstract":"In this article, we introduce the Berlin Dataset of Lombard and Masked Speech (BELMASK), a phonetically controlled audiovisual dataset of speech produced in adverse speaking conditions, and describe the development of the related speech task. The dataset contains in total 128 min of audio and video recordings of 10 German native speakers (4 female, 6 male) with a mean age of 30.2 years (SD: 6.3 years), uttering matrix sentences in cued, uninstructed speech in four conditions: (i) with a Filtering Facepiece P2 (FFP2) mask in silence, (ii) without an FFP2 mask in silence, (iii) with an FFP2 mask while exposed to noise, iv) without an FFP2 mask while exposed to noise. Noise consisted of mixed-gender six-talker babble played over headphones to the speakers, triggering the Lombard effect. All conditions are readily available in face-and-voice and voice-only formats. The speech material is annotated, employing a multi-layer architecture, and was originally conceptualized to be used for the administration of a working memory task. The dataset is stored in a restricted-access Zenodo repository and is available for academic research in the area of speech communication, acoustics, psychology and related disciplines upon request, after signing an End User License Agreement (EULA).","PeriodicalId":502371,"journal":{"name":"Data","volume":"9 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141809124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data Descriptor of Snakebites in Brazil from 2007 to 2020 2007 至 2020 年巴西蛇咬伤数据描述

Data

Pub Date : 2024-07-24 DOI: 10.3390/data9080091

Alexandre Vilhena da Silva-Neto, G. S. Mouta, Antônio Alcirley Silva Balieiro, Jady Shayenne Mota Cordeiro, Patricia Carvalho Silva Balieiro, T. A. Ramos, D. Baía-da-Silva, Élisson Silva Rocha, P. Endo, Theo Lynn, W. Monteiro, V. Sampaio

Snakebite envenomations (SBE) are a significant global public health threat due to their morbidity and mortality. This is a neglected public health issue in many tropical and subtropical countries. Brazil is in the top ten countries affected by SBE, with 32,160 cases reported only in 2020, posing a high burden for this population. In this paper, we describe the data structure of snakebite records from 2007 to 2020 in the Notifiable Disease Information System (SINAN), made available by the Brazilian Ministry of Health (MoH). In addition, we also provide R scripts that allow a quick and automatic updating of data from the SINAN according to its availability. The data presented in this work are related to clinical and demographic information on SBE cases. Also, data on outcomes, laboratory results, and treatment are available. The dataset is available and freely accessible; however, preprocessing, adjustments, and standardization are necessary due to incompleteness and inconsistencies. Regardless of these limitations, it provides a solid basis for assessing different aspects and the national burden of envenoming.

蛇咬伤（SBE）因其发病率和死亡率而成为全球公共卫生的重大威胁。在许多热带和亚热带国家，这是一个被忽视的公共卫生问题。巴西是受 SBE 影响最大的十个国家之一，仅在 2020 年就报告了 32 160 例病例，给巴西人口造成了沉重负担。在本文中，我们介绍了巴西卫生部（MoH）提供的应报告疾病信息系统（SINAN）中 2007 年至 2020 年蛇咬伤记录的数据结构。此外，我们还提供了 R 脚本，可根据 SINAN 的可用性快速自动更新数据。本研究提供的数据与 SBE 病例的临床和人口统计学信息有关。此外，还提供了有关结果、实验室结果和治疗的数据。数据集可免费获取，但由于不完整和不一致，需要进行预处理、调整和标准化。尽管存在这些局限性，但该数据集为评估各方面的情况和全国的带毒负担提供了坚实的基础。

{"title":"Data Descriptor of Snakebites in Brazil from 2007 to 2020","authors":"Alexandre Vilhena da Silva-Neto, G. S. Mouta, Antônio Alcirley Silva Balieiro, Jady Shayenne Mota Cordeiro, Patricia Carvalho Silva Balieiro, T. A. Ramos, D. Baía-da-Silva, Élisson Silva Rocha, P. Endo, Theo Lynn, W. Monteiro, V. Sampaio","doi":"10.3390/data9080091","DOIUrl":"https://doi.org/10.3390/data9080091","url":null,"abstract":"Snakebite envenomations (SBE) are a significant global public health threat due to their morbidity and mortality. This is a neglected public health issue in many tropical and subtropical countries. Brazil is in the top ten countries affected by SBE, with 32,160 cases reported only in 2020, posing a high burden for this population. In this paper, we describe the data structure of snakebite records from 2007 to 2020 in the Notifiable Disease Information System (SINAN), made available by the Brazilian Ministry of Health (MoH). In addition, we also provide R scripts that allow a quick and automatic updating of data from the SINAN according to its availability. The data presented in this work are related to clinical and demographic information on SBE cases. Also, data on outcomes, laboratory results, and treatment are available. The dataset is available and freely accessible; however, preprocessing, adjustments, and standardization are necessary due to incompleteness and inconsistencies. Regardless of these limitations, it provides a solid basis for assessing different aspects and the national burden of envenoming.","PeriodicalId":502371,"journal":{"name":"Data","volume":"62 12","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141807495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Literature-Based Inventory of Chemical Substance Concentrations Measured in Organic Food Consumed in Europe 基于文献的欧洲有机食品中化学物质浓度清单

Data

Pub Date : 2024-07-03 DOI: 10.3390/data9070089

Joanna Choueiri, Pascal Petit, Franck Balducci, Dominique J. Bicout, Christine Demeilliers

Populations are exposed daily to numerous environmental pollutants, particularly through food. To address environmental issues, many agricultural production methods have been developed, including organic farming. To date, there is no exhaustive inventory of the contamination of organic foods as there is for conventional foods. The main objective of this work was to construct a growing and updatable database on chemical substances and their levels in organic foods consumed in Europe. To this end, a literature search was conducted, resulting in a total of 1207 concentration values from 823 food–substances pairs involving 166 food matrices and 209 chemical substances, among which 95% were not authorized in organic farming and 80% were pesticides. The most encountered substance groups are “inorganic contaminants” and “organophosphate”, and the most studied food groups are “fruit used as fruit” and “Cereals and cereal primary derivatives”. Further studies are needed to continue updating the database with robust and comprehensive data on organic food contamination. This database could be used to study the health risks associated with these contaminants.

人们每天都会接触到许多环境污染物，尤其是通过食物接触到的污染物。为了解决环境问题，人们开发了许多农业生产方法，包括有机耕作。迄今为止，还没有像传统食品那样详尽的有机食品污染清单。这项工作的主要目的是建立一个不断扩大和可更新的数据库，记录欧洲有机食品中的化学物质及其含量。为此，我们进行了文献检索，从 823 个食品-物质配对中获得了 1207 个浓度值，涉及 166 种食品基质和 209 种化学物质，其中 95% 在有机农业中未经授权，80% 为杀虫剂。遇到最多的物质类别是 "无机污染物 "和 "有机磷"，研究最多的食物类别是 "用作水果的水果 "和 "谷物和谷物初级衍生物"。需要开展进一步研究，继续更新数据库，提供有关有机食品污染的可靠而全面的数据。该数据库可用于研究与这些污染物有关的健康风险。

{"title":"Literature-Based Inventory of Chemical Substance Concentrations Measured in Organic Food Consumed in Europe","authors":"Joanna Choueiri, Pascal Petit, Franck Balducci, Dominique J. Bicout, Christine Demeilliers","doi":"10.3390/data9070089","DOIUrl":"https://doi.org/10.3390/data9070089","url":null,"abstract":"Populations are exposed daily to numerous environmental pollutants, particularly through food. To address environmental issues, many agricultural production methods have been developed, including organic farming. To date, there is no exhaustive inventory of the contamination of organic foods as there is for conventional foods. The main objective of this work was to construct a growing and updatable database on chemical substances and their levels in organic foods consumed in Europe. To this end, a literature search was conducted, resulting in a total of 1207 concentration values from 823 food–substances pairs involving 166 food matrices and 209 chemical substances, among which 95% were not authorized in organic farming and 80% were pesticides. The most encountered substance groups are “inorganic contaminants” and “organophosphate”, and the most studied food groups are “fruit used as fruit” and “Cereals and cereal primary derivatives”. Further studies are needed to continue updating the database with robust and comprehensive data on organic food contamination. This database could be used to study the health risks associated with these contaminants.","PeriodicalId":502371,"journal":{"name":"Data","volume":" 33","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141680877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Subjective Well-Being and Mental Health among College Students: Two Datasets for Diagnosis and Program Evaluation 大学生的主观幸福感和心理健康：用于诊断和项目评估的两个数据集

Data

Pub Date : 2024-03-06 DOI: 10.3390/data9030044

Lina Martínez, Esteban Robles, Valeria Trofimoff, Nicolás Vidal, Andres David Espada, Nayith Mosquera, Bryan Franco, Víctor Sarmiento, María Isabel Zafra

This paper presents two datasets about college students’ subjective well-being and mental health in a developing country. The first data set of this report offers a diagnosis of the prevalence of self-reported symptoms associated with stress, anxiety, depression, and overall evaluation of subjective well-being. The study uses validated scales to measure self-reported symptoms related to mental health conditions. To measure stress, the study used the Perceived Stress Scale (PSS-10) and the 7-item Generalized Anxiety Disorder Scale (GAD-7) to measure symptoms associated with anxiety (GAD-7), and the 9-item Patient Health Questionnaire (PHQ-9) to measure symptoms associated with depression. This diagnosis was collected in a college student sample of 3052 undergrad students in 2022 at a medium-sized university in Colombia. The second dataset reports the evaluation of a positive education intervention implemented in the same university. The Colombian Minister of Science and Technology financed the intervention to promote strategies to mitigate the consequences on college students’ well-being and mental health after the pandemic. The program evaluation data cover two years (2020–2022) with 193 college students in the treatment group (students enrolled in a class teaching evidence-based interventions to promote well-being and mental health awareness) and 135 students in the control group. Data for evaluation include a broad array of variables of life satisfaction, happiness, negative emotions, COVID-19 effects, relationships valuations, and habits and the measurement of three scales: The Satisfaction with Life Scale (SWLS), a brief measurement of depressive symptomatology (CESD-7), and the Brief Strengths Scale (BSS).

本文介绍了两个有关发展中国家大学生主观幸福感和心理健康的数据集。本报告的第一个数据集提供了与压力、焦虑、抑郁和主观幸福感总体评价相关的自我报告症状的流行诊断。该研究使用经过验证的量表来测量与精神健康状况相关的自我报告症状。为了测量压力，研究使用了感知压力量表（PSS-10）和 7 个项目的广泛性焦虑症量表（GAD-7）来测量与焦虑（GAD-7）相关的症状，以及 9 个项目的患者健康问卷（PHQ-9）来测量与抑郁相关的症状。该诊断是在哥伦比亚一所中等规模大学的 3052 名本科生样本中收集的，这些学生于 2022 年毕业。第二个数据集报告了对同一所大学实施的积极教育干预措施的评估。哥伦比亚科技部资助了这项干预措施，以推广减轻大流行病对大学生福祉和心理健康影响的策略。该计划的评估数据涵盖两年（2020-2022 年），治疗组有 193 名大学生（学生参加了教授循证干预措施的课程，以提高幸福感和心理健康意识），对照组有 135 名学生。评估数据包括生活满意度、幸福感、负面情绪、COVID-19 影响、人际关系价值和习惯等一系列变量，以及三个量表的测量：生活满意度量表（SWLS）、抑郁症状简表（CESD-7）和优势简表（BSS）。

{"title":"Subjective Well-Being and Mental Health among College Students: Two Datasets for Diagnosis and Program Evaluation","authors":"Lina Martínez, Esteban Robles, Valeria Trofimoff, Nicolás Vidal, Andres David Espada, Nayith Mosquera, Bryan Franco, Víctor Sarmiento, María Isabel Zafra","doi":"10.3390/data9030044","DOIUrl":"https://doi.org/10.3390/data9030044","url":null,"abstract":"This paper presents two datasets about college students’ subjective well-being and mental health in a developing country. The first data set of this report offers a diagnosis of the prevalence of self-reported symptoms associated with stress, anxiety, depression, and overall evaluation of subjective well-being. The study uses validated scales to measure self-reported symptoms related to mental health conditions. To measure stress, the study used the Perceived Stress Scale (PSS-10) and the 7-item Generalized Anxiety Disorder Scale (GAD-7) to measure symptoms associated with anxiety (GAD-7), and the 9-item Patient Health Questionnaire (PHQ-9) to measure symptoms associated with depression. This diagnosis was collected in a college student sample of 3052 undergrad students in 2022 at a medium-sized university in Colombia. The second dataset reports the evaluation of a positive education intervention implemented in the same university. The Colombian Minister of Science and Technology financed the intervention to promote strategies to mitigate the consequences on college students’ well-being and mental health after the pandemic. The program evaluation data cover two years (2020–2022) with 193 college students in the treatment group (students enrolled in a class teaching evidence-based interventions to promote well-being and mental health awareness) and 135 students in the control group. Data for evaluation include a broad array of variables of life satisfaction, happiness, negative emotions, COVID-19 effects, relationships valuations, and habits and the measurement of three scales: The Satisfaction with Life Scale (SWLS), a brief measurement of depressive symptomatology (CESD-7), and the Brief Strengths Scale (BSS).","PeriodicalId":502371,"journal":{"name":"Data","volume":"133 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140078272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Set of Ground Penetrating Radar Measures from Quarries 一组采石场地面穿透雷达测量数据

Data

Pub Date : 2024-03-03 DOI: 10.3390/data9030042

Stefano Bonduà, André Monteiro Klen, Massimiliano Pilone, L. Asimopolos, N. Asimopolos

This paper presents a set of Ground Penetrating Radar (GPR) data obtained from in situ measurements conducted in four ornamental stone quarries located in Italy (Botticino quarry) and Romania (Ruschita, Carpinis, and Pietroasa quarries). The GPR is a Non-Destructive Testing (NDT) technique that enables the detection and localization of fractures without damage to the surface, among other capabilities. In this study, two instruments of ground-coupled GPR were used to detect and locate the fractures, discontinuities, or weakened zones. The GPR data contains radargrams for discontinuities and fracture detection, besides the geographic location of the measures. For each measurement site, a set of radargrams has been acquired in two orthogonal directions, allowing for a 3D reconstruction of the investigated site.

本文介绍了在意大利（Botticino 采石场）和罗马尼亚（Ruschita、Carpinis 和 Pietroasa 采石场）的四个观赏石采石场进行现场测量所获得的一组地面穿透雷达（GPR）数据。GPR 是一种无损检测 (NDT) 技术，除其他功能外，它还能在不损坏表面的情况下检测和定位裂缝。在这项研究中，使用了两台地面耦合 GPR 仪器来检测和定位裂缝、不连续性或削弱区。除了测量的地理位置外，GPR 数据还包含用于检测不连续性和断裂的雷达图。对于每个测量点，都从两个正交方向获取了一组雷达图，从而可以对勘测点进行三维重建。

引用次数: 0

Multimodal Hinglish Tweet Dataset for Deep Pragmatic Analysis 用于深度语用分析的多模态兴英语推特数据集

Data

Pub Date : 2024-02-15 DOI: 10.3390/data9020038

Pratibha, Amandeep Kaur, Meenu Khurana, R. Damaševičius

Wars, conflicts, and peace efforts have become inherent characteristics of regions, and understanding the prevailing sentiments related to these issues is crucial for finding long-lasting solutions. Twitter/`X’, with its vast user base and real-time nature, provides a valuable source to assess the raw emotions and opinions of people regarding war, conflict, and peace. This paper focuses on collecting and curating hinglish tweets specifically related to wars, conflicts, and associated taxonomy. The creation of said dataset addresses the existing gap in contemporary literature, which lacks comprehensive datasets capturing the emotions and sentiments expressed by individuals regarding wars, conflicts, and peace efforts. This dataset holds significant value and application in deep pragmatic analysis as it enables future researchers to identify the flow of sentiments, analyze the information architecture surrounding war, conflict, and peace effects, and delve into the associated psychology in this context. To ensure the dataset’s quality and relevance, a meticulous selection process was employed, resulting in the inclusion of explanable 500 carefully chosen search filters. The dataset currently has 10,040 tweets that have been validated with the help of human expert to make sure they are correct and accurate.

战争、冲突与和平努力已成为各地区的固有特征，了解与这些问题相关的普遍情绪对于找到持久的解决方案至关重要。Twitter/"X "拥有庞大的用户群和实时性，为评估人们对战争、冲突与和平的原始情绪和观点提供了宝贵的来源。本文的重点是收集和整理专门与战争、冲突和相关分类有关的英语推文。当代文献缺乏全面的数据集来捕捉个人对战争、冲突与和平努力所表达的情绪和情感，而上述数据集的创建弥补了这一空白。该数据集在深度实用分析方面具有重要的价值和应用，因为它能让未来的研究人员识别情绪流，分析围绕战争、冲突与和平影响的信息结构，并深入研究与此相关的心理学。为确保数据集的质量和相关性，我们采用了细致的筛选过程，最终纳入了 500 个精心挑选的可解释搜索过滤器。数据集目前有 10,040 条推文，这些推文已经过人类专家的验证，以确保其正确性和准确性。

{"title":"Multimodal Hinglish Tweet Dataset for Deep Pragmatic Analysis","authors":"Pratibha, Amandeep Kaur, Meenu Khurana, R. Damaševičius","doi":"10.3390/data9020038","DOIUrl":"https://doi.org/10.3390/data9020038","url":null,"abstract":"Wars, conflicts, and peace efforts have become inherent characteristics of regions, and understanding the prevailing sentiments related to these issues is crucial for finding long-lasting solutions. Twitter/`X’, with its vast user base and real-time nature, provides a valuable source to assess the raw emotions and opinions of people regarding war, conflict, and peace. This paper focuses on collecting and curating hinglish tweets specifically related to wars, conflicts, and associated taxonomy. The creation of said dataset addresses the existing gap in contemporary literature, which lacks comprehensive datasets capturing the emotions and sentiments expressed by individuals regarding wars, conflicts, and peace efforts. This dataset holds significant value and application in deep pragmatic analysis as it enables future researchers to identify the flow of sentiments, analyze the information architecture surrounding war, conflict, and peace effects, and delve into the associated psychology in this context. To ensure the dataset’s quality and relevance, a meticulous selection process was employed, resulting in the inclusion of explanable 500 carefully chosen search filters. The dataset currently has 10,040 tweets that have been validated with the help of human expert to make sure they are correct and accurate.","PeriodicalId":502371,"journal":{"name":"Data","volume":"234 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139835830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0