Bernd Accou, Lies Bollens, Marlies Gillis, Wendy Verheijen, Hugo Van hamme, T. Francart
Researchers investigating the neural mechanisms underlying speech perception often employ electroencephalography (EEG) to record brain activity while participants listen to spoken language. The high temporal resolution of EEG enables the study of neural responses to fast and dynamic speech signals. Previous studies have successfully extracted speech characteristics from EEG data and, conversely, predicted EEG activity from speech features. Machine learning techniques are generally employed to construct encoding and decoding models, which necessitate a substantial quantity of data. We present SparrKULee, a Speech-evoked Auditory Repository of EEG data, measured at KU Leuven, comprising 64-channel EEG recordings from 85 young individuals with normal hearing, each of whom listened to 90–150 min of natural speech. This dataset is more extensive than any currently available dataset in terms of both the number of participants and the quantity of data per participant. It is suitable for training larger machine learning models. We evaluate the dataset using linear and state-of-the-art non-linear models in a speech encoding/decoding and match/mismatch paradigm, providing benchmark scores for future research.
{"title":"SparrKULee: A Speech-Evoked Auditory Response Repository from KU Leuven, Containing the EEG of 85 Participants","authors":"Bernd Accou, Lies Bollens, Marlies Gillis, Wendy Verheijen, Hugo Van hamme, T. Francart","doi":"10.3390/data9080094","DOIUrl":"https://doi.org/10.3390/data9080094","url":null,"abstract":"Researchers investigating the neural mechanisms underlying speech perception often employ electroencephalography (EEG) to record brain activity while participants listen to spoken language. The high temporal resolution of EEG enables the study of neural responses to fast and dynamic speech signals. Previous studies have successfully extracted speech characteristics from EEG data and, conversely, predicted EEG activity from speech features. Machine learning techniques are generally employed to construct encoding and decoding models, which necessitate a substantial quantity of data. We present SparrKULee, a Speech-evoked Auditory Repository of EEG data, measured at KU Leuven, comprising 64-channel EEG recordings from 85 young individuals with normal hearing, each of whom listened to 90–150 min of natural speech. This dataset is more extensive than any currently available dataset in terms of both the number of participants and the quantity of data per participant. It is suitable for training larger machine learning models. We evaluate the dataset using linear and state-of-the-art non-linear models in a speech encoding/decoding and match/mismatch paradigm, providing benchmark scores for future research.","PeriodicalId":502371,"journal":{"name":"Data","volume":"51 15","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141799899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joanna Kostanek, K. Karolczak, W. Kuliczkowski, Cezary Watała
In today’s research environment characterized by exponential data growth and increasing complexity, the selection of appropriate statistical tests, tailored to research objectives and data distributions, is paramount for rigorous analysis and accurate interpretation. This article explores the growing prominence of bootstrapping, an advanced statistical technique for multiple comparisons analysis, offering flexibility and customization by estimating sample distributions without assuming population distributions, thus serving as a valuable alternative to traditional methods in various data scenarios. Computer simulations were conducted using data from cardiovascular disease patients. Two approaches, spontaneous partly controlled simulation and fully constrained simulation using self-written R scripts, were utilized to generate datasets with specified distributions and analyze the data using tests for comparing more than two groups. The utilization of the bootstrap method greatly improves statistical analysis, especially in overcoming the constraints of conventional parametric tests. Our research showcased its effectiveness in comparing multiple scenarios, yielding strong findings across diverse distributions, even with minor inflation in p values. Serving as a valuable substitute for parametric approaches, bootstrap promotes careful consideration when rejecting hypotheses, thus fostering a deeper understanding of statistical nuances and bolstering analytical rigor.
在当今以指数级数据增长和日益复杂为特点的研究环境中,根据研究目标和数据分布选择适当的统计检验对于严谨分析和准确解释至关重要。本文探讨了日益突出的引导法(bootstrapping),这是一种先进的多重比较分析统计技术,通过估计样本分布而不假定总体分布,提供了灵活性和定制性,因此在各种数据情况下可作为传统方法的重要替代方法。我们利用心血管疾病患者的数据进行了计算机模拟。利用自发的部分受控模拟和使用自写的 R 脚本进行完全受限模拟这两种方法,生成了具有指定分布的数据集,并使用比较两组以上的测试对数据进行了分析。引导法的使用极大地改进了统计分析,尤其是在克服传统参数检验的限制方面。我们的研究展示了自举法在比较多种情况时的有效性,即使在 p 值略有膨胀的情况下,也能在不同的分布中得出有力的结论。作为参数方法的重要替代方法,bootstrap 促进了在拒绝假设时的慎重考虑,从而加深了对统计细微差别的理解,提高了分析的严谨性。
{"title":"Bootstrap Method as a Tool for Analyzing Data with Atypical Distributions Deviating from Parametric Assumptions: Critique and Effectiveness Evaluation","authors":"Joanna Kostanek, K. Karolczak, W. Kuliczkowski, Cezary Watała","doi":"10.3390/data9080095","DOIUrl":"https://doi.org/10.3390/data9080095","url":null,"abstract":"In today’s research environment characterized by exponential data growth and increasing complexity, the selection of appropriate statistical tests, tailored to research objectives and data distributions, is paramount for rigorous analysis and accurate interpretation. This article explores the growing prominence of bootstrapping, an advanced statistical technique for multiple comparisons analysis, offering flexibility and customization by estimating sample distributions without assuming population distributions, thus serving as a valuable alternative to traditional methods in various data scenarios. Computer simulations were conducted using data from cardiovascular disease patients. Two approaches, spontaneous partly controlled simulation and fully constrained simulation using self-written R scripts, were utilized to generate datasets with specified distributions and analyze the data using tests for comparing more than two groups. The utilization of the bootstrap method greatly improves statistical analysis, especially in overcoming the constraints of conventional parametric tests. Our research showcased its effectiveness in comparing multiple scenarios, yielding strong findings across diverse distributions, even with minor inflation in p values. Serving as a valuable substitute for parametric approaches, bootstrap promotes careful consideration when rejecting hypotheses, thus fostering a deeper understanding of statistical nuances and bolstering analytical rigor.","PeriodicalId":502371,"journal":{"name":"Data","volume":"43 9","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141800257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maryam Abbasi, Marco V. Bernardo, Paulo Váz, J. Silva, Pedro Martins
Complex event processing (CEP) systems have gained significant importance in various domains, such as finance, logistics, and security, where the real-time analysis of event streams is crucial. However, as the volume and complexity of event data continue to grow, optimizing the performance of CEP systems becomes a critical challenge. This paper investigates the impact of indexing strategies on the performance of databases handling complex event processing. We propose a novel indexing technique, called Hierarchical Temporal Indexing (HTI), specifically designed for the efficient processing of complex event queries. HTI leverages the temporal nature of event data and employs a multi-level indexing approach to optimize query execution. By combining temporal indexing with spatial- and attribute-based indexing, HTI aims to accelerate the retrieval and processing of relevant events, thereby improving overall query performance. In this study, we evaluate the effectiveness of HTI by implementing complex event queries on various CEP systems with different indexing strategies. We conduct a comprehensive performance analysis, measuring the query execution times and resource utilization (CPU, memory, etc.), and analyzing the execution plans and query optimization techniques employed by each system. Our experimental results demonstrate that the proposed HTI indexing strategy outperforms traditional indexing approaches, particularly for complex event queries involving temporal constraints and multi-dimensional event attributes. We provide insights into the strengths and weaknesses of each indexing strategy, identifying the factors that influence performance, such as data volume, query complexity, and event characteristics. Furthermore, we discuss the implications of our findings for the design and optimization of CEP systems, offering recommendations for indexing strategy selection based on the specific requirements and workload characteristics. Finally, we outline the potential limitations of our study and suggest future research directions in this domain.
{"title":"Optimizing Database Performance in Complex Event Processing through Indexing Strategies","authors":"Maryam Abbasi, Marco V. Bernardo, Paulo Váz, J. Silva, Pedro Martins","doi":"10.3390/data9080093","DOIUrl":"https://doi.org/10.3390/data9080093","url":null,"abstract":"Complex event processing (CEP) systems have gained significant importance in various domains, such as finance, logistics, and security, where the real-time analysis of event streams is crucial. However, as the volume and complexity of event data continue to grow, optimizing the performance of CEP systems becomes a critical challenge. This paper investigates the impact of indexing strategies on the performance of databases handling complex event processing. We propose a novel indexing technique, called Hierarchical Temporal Indexing (HTI), specifically designed for the efficient processing of complex event queries. HTI leverages the temporal nature of event data and employs a multi-level indexing approach to optimize query execution. By combining temporal indexing with spatial- and attribute-based indexing, HTI aims to accelerate the retrieval and processing of relevant events, thereby improving overall query performance. In this study, we evaluate the effectiveness of HTI by implementing complex event queries on various CEP systems with different indexing strategies. We conduct a comprehensive performance analysis, measuring the query execution times and resource utilization (CPU, memory, etc.), and analyzing the execution plans and query optimization techniques employed by each system. Our experimental results demonstrate that the proposed HTI indexing strategy outperforms traditional indexing approaches, particularly for complex event queries involving temporal constraints and multi-dimensional event attributes. We provide insights into the strengths and weaknesses of each indexing strategy, identifying the factors that influence performance, such as data volume, query complexity, and event characteristics. Furthermore, we discuss the implications of our findings for the design and optimization of CEP systems, offering recommendations for indexing strategy selection based on the specific requirements and workload characteristics. Finally, we outline the potential limitations of our study and suggest future research directions in this domain.","PeriodicalId":502371,"journal":{"name":"Data","volume":"9 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141807800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
While cycling presents environmental benefits and promotes a healthy lifestyle, the risks associated with overtaking maneuvers by motorized vehicles represent a significant barrier for many potential cyclists. A large-scale analysis of overtaking maneuvers could inform traffic researchers and city planners how to reduce these risks by better understanding these maneuvers. Drawing from the fields of sensor-based cycling research and from LiDAR-based traffic data sets, this paper provides a step towards addressing these safety concerns by introducing the Salzburg Bicycle 3d (SaBi3d) data set, which consists of LiDAR point clouds capturing car-to-bicycle overtaking maneuvers. The data set, collected using a LiDAR-equipped bicycle, facilitates the detailed analysis of a large quantity of overtaking maneuvers without the need for manual annotation through enabling automatic labeling by a neural network. Additionally, a benchmark result for 3D object detection using a competitive neural network is provided as a baseline for future research. The SaBi3d data set is structured identically to the nuScenes data set, and therefore offers compatibility with numerous existing object detection systems. This work provides valuable resources for future researchers to better understand cycling infrastructure and mitigate risks, thus promoting cycling as a viable mode of transportation.
尽管骑自行车有益于环境并能促进健康的生活方式,但机动车超车动作所带来的风险对许多潜在的骑自行车者来说是一个重大障碍。对超车动作进行大规模分析,可以让交通研究人员和城市规划者了解如何通过更好地理解这些动作来降低风险。本文借鉴了基于传感器的自行车研究领域和基于激光雷达的交通数据集,通过介绍萨尔茨堡自行车 3d (SaBi3d) 数据集,为解决这些安全问题迈出了一步。该数据集使用装有激光雷达的自行车采集,通过神经网络自动标注,无需人工标注即可对大量超车动作进行详细分析。此外,还提供了使用竞争性神经网络进行三维物体检测的基准结果,作为未来研究的基线。SaBi3d 数据集的结构与 nuScenes 数据集完全相同,因此可与现有的众多物体检测系统兼容。这项工作为未来的研究人员更好地了解自行车基础设施和降低风险提供了宝贵的资源,从而促进自行车成为一种可行的交通方式。
{"title":"SaBi3d—A LiDAR Point Cloud Data Set of Car-to-Bicycle Overtaking Maneuvers","authors":"Christian Odenwald, Moritz Beeking","doi":"10.3390/data9080090","DOIUrl":"https://doi.org/10.3390/data9080090","url":null,"abstract":"While cycling presents environmental benefits and promotes a healthy lifestyle, the risks associated with overtaking maneuvers by motorized vehicles represent a significant barrier for many potential cyclists. A large-scale analysis of overtaking maneuvers could inform traffic researchers and city planners how to reduce these risks by better understanding these maneuvers. Drawing from the fields of sensor-based cycling research and from LiDAR-based traffic data sets, this paper provides a step towards addressing these safety concerns by introducing the Salzburg Bicycle 3d (SaBi3d) data set, which consists of LiDAR point clouds capturing car-to-bicycle overtaking maneuvers. The data set, collected using a LiDAR-equipped bicycle, facilitates the detailed analysis of a large quantity of overtaking maneuvers without the need for manual annotation through enabling automatic labeling by a neural network. Additionally, a benchmark result for 3D object detection using a competitive neural network is provided as a baseline for future research. The SaBi3d data set is structured identically to the nuScenes data set, and therefore offers compatibility with numerous existing object detection systems. This work provides valuable resources for future researchers to better understand cycling infrastructure and mitigate risks, thus promoting cycling as a viable mode of transportation.","PeriodicalId":502371,"journal":{"name":"Data","volume":"56 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141807012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this article, we introduce the Berlin Dataset of Lombard and Masked Speech (BELMASK), a phonetically controlled audiovisual dataset of speech produced in adverse speaking conditions, and describe the development of the related speech task. The dataset contains in total 128 min of audio and video recordings of 10 German native speakers (4 female, 6 male) with a mean age of 30.2 years (SD: 6.3 years), uttering matrix sentences in cued, uninstructed speech in four conditions: (i) with a Filtering Facepiece P2 (FFP2) mask in silence, (ii) without an FFP2 mask in silence, (iii) with an FFP2 mask while exposed to noise, iv) without an FFP2 mask while exposed to noise. Noise consisted of mixed-gender six-talker babble played over headphones to the speakers, triggering the Lombard effect. All conditions are readily available in face-and-voice and voice-only formats. The speech material is annotated, employing a multi-layer architecture, and was originally conceptualized to be used for the administration of a working memory task. The dataset is stored in a restricted-access Zenodo repository and is available for academic research in the area of speech communication, acoustics, psychology and related disciplines upon request, after signing an End User License Agreement (EULA).
{"title":"BELMASK—An Audiovisual Dataset of Adversely Produced Speech for Auditory Cognition Research","authors":"C. Moshona, F. Rudawski, André Fiebig, E. Sarradj","doi":"10.3390/data9080092","DOIUrl":"https://doi.org/10.3390/data9080092","url":null,"abstract":"In this article, we introduce the Berlin Dataset of Lombard and Masked Speech (BELMASK), a phonetically controlled audiovisual dataset of speech produced in adverse speaking conditions, and describe the development of the related speech task. The dataset contains in total 128 min of audio and video recordings of 10 German native speakers (4 female, 6 male) with a mean age of 30.2 years (SD: 6.3 years), uttering matrix sentences in cued, uninstructed speech in four conditions: (i) with a Filtering Facepiece P2 (FFP2) mask in silence, (ii) without an FFP2 mask in silence, (iii) with an FFP2 mask while exposed to noise, iv) without an FFP2 mask while exposed to noise. Noise consisted of mixed-gender six-talker babble played over headphones to the speakers, triggering the Lombard effect. All conditions are readily available in face-and-voice and voice-only formats. The speech material is annotated, employing a multi-layer architecture, and was originally conceptualized to be used for the administration of a working memory task. The dataset is stored in a restricted-access Zenodo repository and is available for academic research in the area of speech communication, acoustics, psychology and related disciplines upon request, after signing an End User License Agreement (EULA).","PeriodicalId":502371,"journal":{"name":"Data","volume":"9 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141809124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexandre Vilhena da Silva-Neto, G. S. Mouta, Antônio Alcirley Silva Balieiro, Jady Shayenne Mota Cordeiro, Patricia Carvalho Silva Balieiro, T. A. Ramos, D. Baía-da-Silva, Élisson Silva Rocha, P. Endo, Theo Lynn, W. Monteiro, V. Sampaio
Snakebite envenomations (SBE) are a significant global public health threat due to their morbidity and mortality. This is a neglected public health issue in many tropical and subtropical countries. Brazil is in the top ten countries affected by SBE, with 32,160 cases reported only in 2020, posing a high burden for this population. In this paper, we describe the data structure of snakebite records from 2007 to 2020 in the Notifiable Disease Information System (SINAN), made available by the Brazilian Ministry of Health (MoH). In addition, we also provide R scripts that allow a quick and automatic updating of data from the SINAN according to its availability. The data presented in this work are related to clinical and demographic information on SBE cases. Also, data on outcomes, laboratory results, and treatment are available. The dataset is available and freely accessible; however, preprocessing, adjustments, and standardization are necessary due to incompleteness and inconsistencies. Regardless of these limitations, it provides a solid basis for assessing different aspects and the national burden of envenoming.
蛇咬伤(SBE)因其发病率和死亡率而成为全球公共卫生的重大威胁。在许多热带和亚热带国家,这是一个被忽视的公共卫生问题。巴西是受 SBE 影响最大的十个国家之一,仅在 2020 年就报告了 32 160 例病例,给巴西人口造成了沉重负担。在本文中,我们介绍了巴西卫生部(MoH)提供的应报告疾病信息系统(SINAN)中 2007 年至 2020 年蛇咬伤记录的数据结构。此外,我们还提供了 R 脚本,可根据 SINAN 的可用性快速自动更新数据。本研究提供的数据与 SBE 病例的临床和人口统计学信息有关。此外,还提供了有关结果、实验室结果和治疗的数据。数据集可免费获取,但由于不完整和不一致,需要进行预处理、调整和标准化。尽管存在这些局限性,但该数据集为评估各方面的情况和全国的带毒负担提供了坚实的基础。
{"title":"Data Descriptor of Snakebites in Brazil from 2007 to 2020","authors":"Alexandre Vilhena da Silva-Neto, G. S. Mouta, Antônio Alcirley Silva Balieiro, Jady Shayenne Mota Cordeiro, Patricia Carvalho Silva Balieiro, T. A. Ramos, D. Baía-da-Silva, Élisson Silva Rocha, P. Endo, Theo Lynn, W. Monteiro, V. Sampaio","doi":"10.3390/data9080091","DOIUrl":"https://doi.org/10.3390/data9080091","url":null,"abstract":"Snakebite envenomations (SBE) are a significant global public health threat due to their morbidity and mortality. This is a neglected public health issue in many tropical and subtropical countries. Brazil is in the top ten countries affected by SBE, with 32,160 cases reported only in 2020, posing a high burden for this population. In this paper, we describe the data structure of snakebite records from 2007 to 2020 in the Notifiable Disease Information System (SINAN), made available by the Brazilian Ministry of Health (MoH). In addition, we also provide R scripts that allow a quick and automatic updating of data from the SINAN according to its availability. The data presented in this work are related to clinical and demographic information on SBE cases. Also, data on outcomes, laboratory results, and treatment are available. The dataset is available and freely accessible; however, preprocessing, adjustments, and standardization are necessary due to incompleteness and inconsistencies. Regardless of these limitations, it provides a solid basis for assessing different aspects and the national burden of envenoming.","PeriodicalId":502371,"journal":{"name":"Data","volume":"62 12","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141807495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Populations are exposed daily to numerous environmental pollutants, particularly through food. To address environmental issues, many agricultural production methods have been developed, including organic farming. To date, there is no exhaustive inventory of the contamination of organic foods as there is for conventional foods. The main objective of this work was to construct a growing and updatable database on chemical substances and their levels in organic foods consumed in Europe. To this end, a literature search was conducted, resulting in a total of 1207 concentration values from 823 food–substances pairs involving 166 food matrices and 209 chemical substances, among which 95% were not authorized in organic farming and 80% were pesticides. The most encountered substance groups are “inorganic contaminants” and “organophosphate”, and the most studied food groups are “fruit used as fruit” and “Cereals and cereal primary derivatives”. Further studies are needed to continue updating the database with robust and comprehensive data on organic food contamination. This database could be used to study the health risks associated with these contaminants.
{"title":"Literature-Based Inventory of Chemical Substance Concentrations Measured in Organic Food Consumed in Europe","authors":"Joanna Choueiri, Pascal Petit, Franck Balducci, Dominique J. Bicout, Christine Demeilliers","doi":"10.3390/data9070089","DOIUrl":"https://doi.org/10.3390/data9070089","url":null,"abstract":"Populations are exposed daily to numerous environmental pollutants, particularly through food. To address environmental issues, many agricultural production methods have been developed, including organic farming. To date, there is no exhaustive inventory of the contamination of organic foods as there is for conventional foods. The main objective of this work was to construct a growing and updatable database on chemical substances and their levels in organic foods consumed in Europe. To this end, a literature search was conducted, resulting in a total of 1207 concentration values from 823 food–substances pairs involving 166 food matrices and 209 chemical substances, among which 95% were not authorized in organic farming and 80% were pesticides. The most encountered substance groups are “inorganic contaminants” and “organophosphate”, and the most studied food groups are “fruit used as fruit” and “Cereals and cereal primary derivatives”. Further studies are needed to continue updating the database with robust and comprehensive data on organic food contamination. This database could be used to study the health risks associated with these contaminants.","PeriodicalId":502371,"journal":{"name":"Data","volume":" 33","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141680877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lina Martínez, Esteban Robles, Valeria Trofimoff, Nicolás Vidal, Andres David Espada, Nayith Mosquera, Bryan Franco, Víctor Sarmiento, María Isabel Zafra
This paper presents two datasets about college students’ subjective well-being and mental health in a developing country. The first data set of this report offers a diagnosis of the prevalence of self-reported symptoms associated with stress, anxiety, depression, and overall evaluation of subjective well-being. The study uses validated scales to measure self-reported symptoms related to mental health conditions. To measure stress, the study used the Perceived Stress Scale (PSS-10) and the 7-item Generalized Anxiety Disorder Scale (GAD-7) to measure symptoms associated with anxiety (GAD-7), and the 9-item Patient Health Questionnaire (PHQ-9) to measure symptoms associated with depression. This diagnosis was collected in a college student sample of 3052 undergrad students in 2022 at a medium-sized university in Colombia. The second dataset reports the evaluation of a positive education intervention implemented in the same university. The Colombian Minister of Science and Technology financed the intervention to promote strategies to mitigate the consequences on college students’ well-being and mental health after the pandemic. The program evaluation data cover two years (2020–2022) with 193 college students in the treatment group (students enrolled in a class teaching evidence-based interventions to promote well-being and mental health awareness) and 135 students in the control group. Data for evaluation include a broad array of variables of life satisfaction, happiness, negative emotions, COVID-19 effects, relationships valuations, and habits and the measurement of three scales: The Satisfaction with Life Scale (SWLS), a brief measurement of depressive symptomatology (CESD-7), and the Brief Strengths Scale (BSS).
{"title":"Subjective Well-Being and Mental Health among College Students: Two Datasets for Diagnosis and Program Evaluation","authors":"Lina Martínez, Esteban Robles, Valeria Trofimoff, Nicolás Vidal, Andres David Espada, Nayith Mosquera, Bryan Franco, Víctor Sarmiento, María Isabel Zafra","doi":"10.3390/data9030044","DOIUrl":"https://doi.org/10.3390/data9030044","url":null,"abstract":"This paper presents two datasets about college students’ subjective well-being and mental health in a developing country. The first data set of this report offers a diagnosis of the prevalence of self-reported symptoms associated with stress, anxiety, depression, and overall evaluation of subjective well-being. The study uses validated scales to measure self-reported symptoms related to mental health conditions. To measure stress, the study used the Perceived Stress Scale (PSS-10) and the 7-item Generalized Anxiety Disorder Scale (GAD-7) to measure symptoms associated with anxiety (GAD-7), and the 9-item Patient Health Questionnaire (PHQ-9) to measure symptoms associated with depression. This diagnosis was collected in a college student sample of 3052 undergrad students in 2022 at a medium-sized university in Colombia. The second dataset reports the evaluation of a positive education intervention implemented in the same university. The Colombian Minister of Science and Technology financed the intervention to promote strategies to mitigate the consequences on college students’ well-being and mental health after the pandemic. The program evaluation data cover two years (2020–2022) with 193 college students in the treatment group (students enrolled in a class teaching evidence-based interventions to promote well-being and mental health awareness) and 135 students in the control group. Data for evaluation include a broad array of variables of life satisfaction, happiness, negative emotions, COVID-19 effects, relationships valuations, and habits and the measurement of three scales: The Satisfaction with Life Scale (SWLS), a brief measurement of depressive symptomatology (CESD-7), and the Brief Strengths Scale (BSS).","PeriodicalId":502371,"journal":{"name":"Data","volume":"133 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140078272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stefano Bonduà, André Monteiro Klen, Massimiliano Pilone, L. Asimopolos, N. Asimopolos
This paper presents a set of Ground Penetrating Radar (GPR) data obtained from in situ measurements conducted in four ornamental stone quarries located in Italy (Botticino quarry) and Romania (Ruschita, Carpinis, and Pietroasa quarries). The GPR is a Non-Destructive Testing (NDT) technique that enables the detection and localization of fractures without damage to the surface, among other capabilities. In this study, two instruments of ground-coupled GPR were used to detect and locate the fractures, discontinuities, or weakened zones. The GPR data contains radargrams for discontinuities and fracture detection, besides the geographic location of the measures. For each measurement site, a set of radargrams has been acquired in two orthogonal directions, allowing for a 3D reconstruction of the investigated site.
{"title":"A Set of Ground Penetrating Radar Measures from Quarries","authors":"Stefano Bonduà, André Monteiro Klen, Massimiliano Pilone, L. Asimopolos, N. Asimopolos","doi":"10.3390/data9030042","DOIUrl":"https://doi.org/10.3390/data9030042","url":null,"abstract":"This paper presents a set of Ground Penetrating Radar (GPR) data obtained from in situ measurements conducted in four ornamental stone quarries located in Italy (Botticino quarry) and Romania (Ruschita, Carpinis, and Pietroasa quarries). The GPR is a Non-Destructive Testing (NDT) technique that enables the detection and localization of fractures without damage to the surface, among other capabilities. In this study, two instruments of ground-coupled GPR were used to detect and locate the fractures, discontinuities, or weakened zones. The GPR data contains radargrams for discontinuities and fracture detection, besides the geographic location of the measures. For each measurement site, a set of radargrams has been acquired in two orthogonal directions, allowing for a 3D reconstruction of the investigated site.","PeriodicalId":502371,"journal":{"name":"Data","volume":"15 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140081172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pratibha, Amandeep Kaur, Meenu Khurana, R. Damaševičius
Wars, conflicts, and peace efforts have become inherent characteristics of regions, and understanding the prevailing sentiments related to these issues is crucial for finding long-lasting solutions. Twitter/`X’, with its vast user base and real-time nature, provides a valuable source to assess the raw emotions and opinions of people regarding war, conflict, and peace. This paper focuses on collecting and curating hinglish tweets specifically related to wars, conflicts, and associated taxonomy. The creation of said dataset addresses the existing gap in contemporary literature, which lacks comprehensive datasets capturing the emotions and sentiments expressed by individuals regarding wars, conflicts, and peace efforts. This dataset holds significant value and application in deep pragmatic analysis as it enables future researchers to identify the flow of sentiments, analyze the information architecture surrounding war, conflict, and peace effects, and delve into the associated psychology in this context. To ensure the dataset’s quality and relevance, a meticulous selection process was employed, resulting in the inclusion of explanable 500 carefully chosen search filters. The dataset currently has 10,040 tweets that have been validated with the help of human expert to make sure they are correct and accurate.
{"title":"Multimodal Hinglish Tweet Dataset for Deep Pragmatic Analysis","authors":"Pratibha, Amandeep Kaur, Meenu Khurana, R. Damaševičius","doi":"10.3390/data9020038","DOIUrl":"https://doi.org/10.3390/data9020038","url":null,"abstract":"Wars, conflicts, and peace efforts have become inherent characteristics of regions, and understanding the prevailing sentiments related to these issues is crucial for finding long-lasting solutions. Twitter/`X’, with its vast user base and real-time nature, provides a valuable source to assess the raw emotions and opinions of people regarding war, conflict, and peace. This paper focuses on collecting and curating hinglish tweets specifically related to wars, conflicts, and associated taxonomy. The creation of said dataset addresses the existing gap in contemporary literature, which lacks comprehensive datasets capturing the emotions and sentiments expressed by individuals regarding wars, conflicts, and peace efforts. This dataset holds significant value and application in deep pragmatic analysis as it enables future researchers to identify the flow of sentiments, analyze the information architecture surrounding war, conflict, and peace effects, and delve into the associated psychology in this context. To ensure the dataset’s quality and relevance, a meticulous selection process was employed, resulting in the inclusion of explanable 500 carefully chosen search filters. The dataset currently has 10,040 tweets that have been validated with the help of human expert to make sure they are correct and accurate.","PeriodicalId":502371,"journal":{"name":"Data","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139776468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}