Data

Pub Date : 2024-01-10 DOI: 10.3390/data9010012

Olivier Parisot

Recent smart telescopes allow the automatic collection of a large quantity of data for specific portions of the night sky—with the goal of capturing images of deep sky objects (nebula, galaxies, globular clusters). Nevertheless, human verification is still required afterwards to check whether celestial targets are effectively visible in the images produced by these instruments. Depending on the magnitude of deep sky objects, the observation conditions and the cumulative time of data acquisition, it is possible that only stars are present in the images. In addition, unfavorable external conditions (light pollution, bright moon, etc.) can make capture difficult. In this paper, we describe DeepSpaceYoloDataset, a set of 4696 RGB astronomical images captured by two smart telescopes and annotated with the positions of deep sky objects that are effectively in the images. This dataset can be used to train detection models on this type of image, enabling the better control of the duration of capture sessions, but also to detect unexpected celestial events such as supernova.

最新的智能望远镜可以自动收集夜空特定部分的大量数据，目的是捕捉深空天体（星云、星系、球状星团）的图像。不过，事后仍需要人工验证，以检查这些仪器生成的图像中是否能有效地看到天体目标。根据深空天体的亮度、观测条件和数据采集的累积时间，图像中可能只有恒星。此外，不利的外部条件（光污染、明月等）也会给拍摄带来困难。在本文中，我们介绍了 DeepSpaceYoloDataset，这是一组由两台智能望远镜拍摄的 4696 张 RGB 天文图像，并标注了图像中有效的深空天体的位置。该数据集可用于在这类图像上训练检测模型，从而更好地控制捕捉会话的持续时间，还可用于检测超新星等突发天体事件。

引用次数: 0

ADAS Simulation Result Dataset Processing Based on Improved BP Neural Network 基于改进 BP 神经网络的 ADAS 仿真结果数据集处理

Data

Pub Date : 2024-01-05 DOI: 10.3390/data9010011

Songyan Zhao, Lingshan Chen, Yongchao Huang

The autonomous driving simulation field lacks evaluation and forecasting systems for simulation results. The data obtained from the simulation of target algorithms and vehicle models cannot be reasonably estimated. This problem affects subsequent vehicle improvement and parameter calibration. The authors relied on the simulation results of the AEB algorithm. We selected the BP Neural Network as the basis and improved it with a genetic algorithm optimized via a roulette algorithm. The regression evaluation indicators of the prediction results show that the GA-BP neural network has better prediction accuracy and generalization ability than the original BP neural network and other optimized BP neural networks. This GA-BP neural network also fills the Gap in Evaluation and Prediction Systems.

自动驾驶模拟领域缺乏对模拟结果的评估和预测系统。从目标算法和车辆模型模拟中获得的数据无法得到合理估计。这一问题影响了后续的车辆改进和参数校准。作者依靠 AEB 算法的仿真结果。我们选择了 BP 神经网络作为基础，并通过轮盘算法优化遗传算法对其进行改进。预测结果的回归评价指标表明，GA-BP 神经网络比原始 BP 神经网络和其他优化后的 BP 神经网络具有更好的预测精度和泛化能力。该 GA-BP 神经网络也填补了评估和预测系统的空白。

引用次数: 0

Experimental Dataset of Tunable Mode Converter Based on Long-Period Fiber Gratings Written in Few-Mode Fiber: Impacts of Thermal, Wavelength, and Polarization Variations 基于写入少模光纤的长周期光纤光栅的可调模式转换器实验数据集：热、波长和偏振变化的影响

Data

Pub Date : 2023-12-31 DOI: 10.3390/data9010010

Juan Soto-Perdomo, E. Reyes-Vera, J. Montoya-Cardona, Pedro Torres

Mode division multiplexing (MDM) is currently one of the most attractive multiplexing techniques in optical communications, as it allows for an increase in the number of channels available for data transmission. Optical modal converters are one of the main devices used in this technique. Therefore, the characterization and improvement of these devices are of great current interest. In this work, we present a dataset of 49,736 near-field intensity images of a modal converter based on a long-period fiber grating (LPFG) written on a few-mode fiber (FMF). This characterization was performed experimentally at various wavelengths, polarizations, and temperature conditions when the device converted from LP01 mode to LP11 mode. The results show that the modal converter can be tuned by adjusting these parameters, and that its operation is optimal under specific circumstances which have a great impact on its performance. Additionally, the potential application of the database is validated in this work. A modal decomposition technique based on the particle swarm algorithm (PSO) was employed as a tool for determining the most effective combinations of modal weights and relative phases from the spatial distributions collected in the dataset. The proposed dataset can open up new opportunities for researchers working on image segmentation, detection, and classification problems related to MDM technology. In addition, we implement novel artificial intelligence techniques that can help in finding the optimal operating conditions for this type of device.

模式划分多路复用（MDM）是目前光通信领域最具吸引力的多路复用技术之一，因为它可以增加可用于数据传输的信道数量。光模态转换器是这项技术中使用的主要设备之一。因此，对这些设备进行表征和改进是当前人们非常关心的问题。在这项工作中，我们展示了一个模态转换器的 49736 幅近场强度图像数据集，该转换器基于写在少模光纤（FMF）上的长周期光纤光栅（LPFG）。当设备从 LP01 模式转换到 LP11 模式时，在不同波长、偏振和温度条件下进行了实验表征。结果表明，模态转换器可以通过调整这些参数进行调整，而且在对其性能有很大影响的特定情况下，其运行是最佳的。此外，数据库的潜在应用也在这项工作中得到了验证。基于粒子群算法（PSO）的模态分解技术被用作从数据集中收集的空间分布中确定模态权重和相对相位最有效组合的工具。所提议的数据集可为研究与 MDM 技术相关的图像分割、检测和分类问题的研究人员带来新的机遇。此外，我们还采用了新颖的人工智能技术，有助于为这类设备找到最佳运行条件。

{"title":"Experimental Dataset of Tunable Mode Converter Based on Long-Period Fiber Gratings Written in Few-Mode Fiber: Impacts of Thermal, Wavelength, and Polarization Variations","authors":"Juan Soto-Perdomo, E. Reyes-Vera, J. Montoya-Cardona, Pedro Torres","doi":"10.3390/data9010010","DOIUrl":"https://doi.org/10.3390/data9010010","url":null,"abstract":"Mode division multiplexing (MDM) is currently one of the most attractive multiplexing techniques in optical communications, as it allows for an increase in the number of channels available for data transmission. Optical modal converters are one of the main devices used in this technique. Therefore, the characterization and improvement of these devices are of great current interest. In this work, we present a dataset of 49,736 near-field intensity images of a modal converter based on a long-period fiber grating (LPFG) written on a few-mode fiber (FMF). This characterization was performed experimentally at various wavelengths, polarizations, and temperature conditions when the device converted from LP01 mode to LP11 mode. The results show that the modal converter can be tuned by adjusting these parameters, and that its operation is optimal under specific circumstances which have a great impact on its performance. Additionally, the potential application of the database is validated in this work. A modal decomposition technique based on the particle swarm algorithm (PSO) was employed as a tool for determining the most effective combinations of modal weights and relative phases from the spatial distributions collected in the dataset. The proposed dataset can open up new opportunities for researchers working on image segmentation, detection, and classification problems related to MDM technology. In addition, we implement novel artificial intelligence techniques that can help in finding the optimal operating conditions for this type of device.","PeriodicalId":502371,"journal":{"name":"Data","volume":"96 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139131651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Wi-Gitation: Replica Wi-Fi CSI Dataset for Physical Agitation Activity Recognition Wi-Gitation：用于肢体躁动活动识别的复制 Wi-Fi CSI 数据集

Data

Pub Date : 2023-12-30 DOI: 10.3390/data9010009

Nikita Sharma, J. K. Brinke, L. M. A. B. Jansen, Paul J. M. Havinga, Duc V. Le

Agitation is a commonly found behavioral condition in persons with advanced dementia. It requires continuous monitoring to gain insights into agitation levels to assist caregivers in delivering adequate care. The available monitoring techniques use cameras and wearables which are distressful and intrusive and are thus often rejected by older adults. To enable continuous monitoring in older adult care, unobtrusive Wi-Fi channel state information (CSI) can be leveraged to monitor physical activities related to agitation. However, to the best of our knowledge, there are no realistic CSI datasets available for facilitating the classification of physical activities demonstrated during agitation scenarios such as disturbed walking, repetitive sitting–getting up, tapping on a surface, hand wringing, rubbing on a surface, flipping objects, and kicking. Therefore, in this paper, we present a public dataset named Wi-Gitation. For Wi-Gitation, the Wi-Fi CSI data were collected with twenty-three healthy participants depicting the aforementioned agitation-related physical activities at two different locations in a one-bedroom apartment with multiple receivers placed at different distances (0.5–8 m) from the participants. The validation results on the Wi-Gitation dataset indicate higher accuracies (F1-Scores ≥0.95) when employing mixed-data analysis, where the training and testing data share the same distribution. Conversely, in scenarios where the training and testing data differ in distribution (i.e., leave-one-out), the accuracies experienced a notable decline (F1-Scores ≤0.21). This dataset can be used for fundamental research on CSI signals and in the evaluation of advanced algorithms developed for tackling domain invariance in CSI-based human activity recognition.

躁动是晚期痴呆症患者的常见行为症状。需要对其进行持续监测，以了解躁动程度，协助护理人员提供适当的护理。现有的监测技术使用摄像头和可穿戴设备，这些设备会给老年人带来困扰和干扰，因此常常被老年人所拒绝。为了在老年人护理过程中实现连续监测，可以利用无干扰的 Wi-Fi 信道状态信息 (CSI) 来监测与躁动有关的身体活动。然而，据我们所知，目前还没有现实的 CSI 数据集可用于对躁动场景中表现出的肢体活动进行分类，如走动不安、重复坐起、敲击表面、拧手、在表面上摩擦、翻转物体和踢脚等。因此，我们在本文中提出了一个名为 Wi-Gitation 的公共数据集。在 Wi-Gitation 数据集中，我们收集了 23 名健康参与者的 Wi-Fi CSI 数据，这些数据描述了他们在一居室公寓中两个不同地点的上述与躁动相关的身体活动，多个接收器被放置在距离参与者不同的距离（0.5-8 米）处。Wi-Gitation 数据集的验证结果表明，在采用混合数据分析（即训练数据和测试数据具有相同的分布）时，准确率更高（F1 分数≥0.95）。相反，在训练数据和测试数据分布不同的情况下（即leave-one-out），准确率明显下降（F1-Scores ≤0.21）。该数据集可用于 CSI 信号的基础研究，也可用于评估为解决基于 CSI 的人类活动识别中的域不变性问题而开发的先进算法。

{"title":"Wi-Gitation: Replica Wi-Fi CSI Dataset for Physical Agitation Activity Recognition","authors":"Nikita Sharma, J. K. Brinke, L. M. A. B. Jansen, Paul J. M. Havinga, Duc V. Le","doi":"10.3390/data9010009","DOIUrl":"https://doi.org/10.3390/data9010009","url":null,"abstract":"Agitation is a commonly found behavioral condition in persons with advanced dementia. It requires continuous monitoring to gain insights into agitation levels to assist caregivers in delivering adequate care. The available monitoring techniques use cameras and wearables which are distressful and intrusive and are thus often rejected by older adults. To enable continuous monitoring in older adult care, unobtrusive Wi-Fi channel state information (CSI) can be leveraged to monitor physical activities related to agitation. However, to the best of our knowledge, there are no realistic CSI datasets available for facilitating the classification of physical activities demonstrated during agitation scenarios such as disturbed walking, repetitive sitting–getting up, tapping on a surface, hand wringing, rubbing on a surface, flipping objects, and kicking. Therefore, in this paper, we present a public dataset named Wi-Gitation. For Wi-Gitation, the Wi-Fi CSI data were collected with twenty-three healthy participants depicting the aforementioned agitation-related physical activities at two different locations in a one-bedroom apartment with multiple receivers placed at different distances (0.5–8 m) from the participants. The validation results on the Wi-Gitation dataset indicate higher accuracies (F1-Scores ≥0.95) when employing mixed-data analysis, where the training and testing data share the same distribution. Conversely, in scenarios where the training and testing data differ in distribution (i.e., leave-one-out), the accuracies experienced a notable decline (F1-Scores ≤0.21). This dataset can be used for fundamental research on CSI signals and in the evaluation of advanced algorithms developed for tackling domain invariance in CSI-based human activity recognition.","PeriodicalId":502371,"journal":{"name":"Data","volume":" 32","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139137980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DNA Methylome and Transcriptome Maps of Primary Colorectal Cancer and Matched Liver Metastasis 原发性结直肠癌和匹配肝转移灶的 DNA 甲基组和转录组图

Data

Pub Date : 2023-12-29 DOI: 10.3390/data9010008

P. Ajithkumar, Gregory Gimenez, P. Stockwell, Suzan N. Almomani, Sarah A Bowden, A. Leichter, Antonio Ahn, Sharon Pattison, Sebastian Schmeier, Frank A. Frizelle, Michael R. Eccles, R. Purcell, Euan J. Rodger, Aniruddha Chatterjee

Sequencing-based genome-wide DNA methylation, gene expression studies and associated data on paired colorectal cancer (CRC) primary and liver metastasis are very limited. We have profiled the DNA methylome and transcriptome of matched primary CRC and liver metastasis samples from the same patients. Genome-scale methylation and expression levels were examined using Reduced Representation Bisulfite Sequencing (RRBS) and RNA-Seq, respectively. To investigate DNA methylation and expression patterns, we generated a total of 1.01 × 109 RRBS reads and 4.38 x 108 RNA-Seq reads from the matched cancer tissues. Here, we describe in detail the sample features, experimental design, methods and bioinformatic pipeline for these epigenetic data. We demonstrate the quality of both the samples and sequence data obtained from the paired samples. The sequencing data obtained from this study will serve as a valuable resource for studying underlying mechanisms of distant metastasis and the utility of epigenetic profiles in cancer metastasis.

基于测序的全基因组 DNA 甲基化、基因表达研究以及配对的结直肠癌（CRC）原发灶和肝转移灶的相关数据非常有限。我们对来自同一患者的配对原发 CRC 和肝转移样本的 DNA 甲基组和转录组进行了分析。我们分别使用还原表征亚硫酸氢盐测序（RRBS）和 RNA-Seq 对基因组范围内的甲基化和表达水平进行了检测。为了研究DNA甲基化和表达模式，我们从匹配的癌症组织中生成了总计1.01×109个RRBS读数和4.38×108个RNA-Seq读数。在此，我们详细介绍了这些表观遗传数据的样本特征、实验设计、方法和生物信息学管道。我们展示了从配对样本中获得的样本和序列数据的质量。这项研究获得的测序数据将成为研究远处转移潜在机制和表观遗传学特征在癌症转移中的应用的宝贵资源。

{"title":"DNA Methylome and Transcriptome Maps of Primary Colorectal Cancer and Matched Liver Metastasis","authors":"P. Ajithkumar, Gregory Gimenez, P. Stockwell, Suzan N. Almomani, Sarah A Bowden, A. Leichter, Antonio Ahn, Sharon Pattison, Sebastian Schmeier, Frank A. Frizelle, Michael R. Eccles, R. Purcell, Euan J. Rodger, Aniruddha Chatterjee","doi":"10.3390/data9010008","DOIUrl":"https://doi.org/10.3390/data9010008","url":null,"abstract":"Sequencing-based genome-wide DNA methylation, gene expression studies and associated data on paired colorectal cancer (CRC) primary and liver metastasis are very limited. We have profiled the DNA methylome and transcriptome of matched primary CRC and liver metastasis samples from the same patients. Genome-scale methylation and expression levels were examined using Reduced Representation Bisulfite Sequencing (RRBS) and RNA-Seq, respectively. To investigate DNA methylation and expression patterns, we generated a total of 1.01 × 109 RRBS reads and 4.38 x 108 RNA-Seq reads from the matched cancer tissues. Here, we describe in detail the sample features, experimental design, methods and bioinformatic pipeline for these epigenetic data. We demonstrate the quality of both the samples and sequence data obtained from the paired samples. The sequencing data obtained from this study will serve as a valuable resource for studying underlying mechanisms of distant metastasis and the utility of epigenetic profiles in cancer metastasis.","PeriodicalId":502371,"journal":{"name":"Data","volume":" 13","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139143253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Profit Maximization Model for Data Consumers with Data Providers’ Incentives in Personal Data Trading Market 个人数据交易市场中数据消费者与数据提供者激励机制的利润最大化模型

Data

Pub Date : 2023-12-25 DOI: 10.3390/data9010006

Hyo-Jin Park, Hyeontaek Oh, Jun Kyun Choi

This paper proposes a profit maximization model for a data consumer when it buys personal data from data providers (by obtaining consent) through data brokers and provides their new services to data providers (i.e., service consumers). To observe the behavioral models of data providers, the data consumer, and service consumers, this paper proposes the willingness-to-sell model of personal data of data providers (which is affected by data providers’ behavior related to explicit consent), the service quality model obtained by the collected personal data from the data consumer’s perspective, and the willingness-to-pay model of service consumers regarding provided new services from the data consumer. Particularly, this paper jointly considers the behavior of data providers and service users under a limited budget. With parameters inspired by real-world surveys on data providers, this paper shows various numerical results to check the feasibility of the proposed models.

本文提出了一个数据消费者的利润最大化模型，当数据消费者通过数据经纪商从数据提供商（通过获得同意）购买个人数据，并向数据提供商（即服务消费者）提供其新服务时，数据消费者的利润最大化模型。为了观察数据提供者、数据消费者和服务消费者的行为模型，本文提出了数据提供者的个人数据销售意愿模型（受数据提供者与显式同意相关行为的影响）、从数据消费者角度看所收集的个人数据获得的服务质量模型，以及服务消费者对数据消费者提供的新服务的支付意愿模型。特别是，本文共同考虑了数据提供者和服务使用者在有限预算下的行为。本文的参数来源于对数据提供商的实际调查，文中展示了各种数值结果，以检验建议模型的可行性。

引用次数: 0

Single-Nucleotide Variants in PADI2 and PADI4 and Ancestry Informative Markers in Interstitial Lung Disease and Rheumatoid Arthritis among a Mexican Mestizo Population 墨西哥混血人口中 PADI2 和 PADI4 的单核苷酸变异以及间质性肺病和类风湿关节炎的祖先信息标记物

Data

Pub Date : 2023-12-25 DOI: 10.3390/data9010005

Karol J. Nava-Quiroz, J. Rojas-Serrano, G. Pérez-Rubio, I. Buendía-Roldán, M. Mejía, J. Fernández-López, E. Ramos-Martínez, L. A. López-Flores, Alma D. Del Ángel-Pablo, R. Falfán-Valencia

Rheumatoid arthritis (RA) is an autoimmune disease mainly characterized by joint inflammation. It presents extra-articular manifestations, with the lungs being one of the affected areas. Among these, damage to the pulmonary interstitium (Interstitial Lung Disease—ILD) has been linked to proteins involved in the inflammatory process and related to extracellular matrix deposition and lung fibrosis establishment. Peptidyl arginine deiminase enzymes (PAD), which carry out protein citrullination, play a role in this context. A genetic association analysis was conducted on genes encoding two PAD isoforms: PAD2 and PAD4. This analysis also included ancestry informative markers and protein level determination in samples from patients with RA, RA-associated ILD, and clinically healthy controls. Significant single nucleotide variants (SNV) and one haplotype were identified as susceptibility factors for RA-ILD development. Elevated levels of PAD4 were found in RA-ILD cases, while PADI2 showed an association with RA susceptibility. This work presents data obtained from previously published research. Population variability has been noticed in genetic association studies. We present data for 14 SNVs that show geographical and genetic variation across the Mexican population, which provides highly informative content and greater intrapopulation genetic diversity. Further investigations in the field should be considered in addition to AIMs. The data presented in this study were analyzed in association with SNV genotypes in PADI2 and PADI4 to assess susceptibility to ILD in RA, as well as with changes in PAD2 and PAD4 protein levels according to carrier genotype, in addition to the use of covariates such as ancestry markers.

类风湿性关节炎（RA）是一种以关节炎症为主要特征的自身免疫性疾病。它还会出现关节以外的表现，肺部是受影响的部位之一。其中，肺间质的损伤（间质性肺病-ILD）与参与炎症过程的蛋白质有关，并与细胞外基质沉积和肺纤维化的形成有关。进行蛋白质瓜氨酸化的肽基精氨酸脱氨酶（PAD）在其中发挥了作用。对编码两种 PAD 异构体的基因进行了遗传关联分析：PAD2 和 PAD4。该分析还包括祖先信息标记以及对RA患者、RA相关ILD患者和临床健康对照组样本的蛋白质水平测定。结果发现，重要的单核苷酸变异（SNV）和一种单倍型是导致 RA-ILD 发生的易感因素。在 RA-ILD 病例中发现 PAD4 水平升高，而 PADI2 则与 RA 易感性有关。这项工作展示了从以前发表的研究中获得的数据。遗传关联研究中已经注意到了人群的变异性。我们展示了 14 个 SNV 的数据，这些数据显示了墨西哥人群的地理和遗传变异，提供了高度信息内容和更大的人群内遗传多样性。除 AIMs 外，还应考虑在该领域开展进一步调查。除了使用祖先标记等协变量外，本研究中提供的数据还与 PADI2 和 PADI4 的 SNV 基因型进行了关联分析，以评估 RA 中 ILD 的易感性，以及根据携带者基因型 PAD2 和 PAD4 蛋白水平的变化。

{"title":"Single-Nucleotide Variants in PADI2 and PADI4 and Ancestry Informative Markers in Interstitial Lung Disease and Rheumatoid Arthritis among a Mexican Mestizo Population","authors":"Karol J. Nava-Quiroz, J. Rojas-Serrano, G. Pérez-Rubio, I. Buendía-Roldán, M. Mejía, J. Fernández-López, E. Ramos-Martínez, L. A. López-Flores, Alma D. Del Ángel-Pablo, R. Falfán-Valencia","doi":"10.3390/data9010005","DOIUrl":"https://doi.org/10.3390/data9010005","url":null,"abstract":"Rheumatoid arthritis (RA) is an autoimmune disease mainly characterized by joint inflammation. It presents extra-articular manifestations, with the lungs being one of the affected areas. Among these, damage to the pulmonary interstitium (Interstitial Lung Disease—ILD) has been linked to proteins involved in the inflammatory process and related to extracellular matrix deposition and lung fibrosis establishment. Peptidyl arginine deiminase enzymes (PAD), which carry out protein citrullination, play a role in this context. A genetic association analysis was conducted on genes encoding two PAD isoforms: PAD2 and PAD4. This analysis also included ancestry informative markers and protein level determination in samples from patients with RA, RA-associated ILD, and clinically healthy controls. Significant single nucleotide variants (SNV) and one haplotype were identified as susceptibility factors for RA-ILD development. Elevated levels of PAD4 were found in RA-ILD cases, while PADI2 showed an association with RA susceptibility. This work presents data obtained from previously published research. Population variability has been noticed in genetic association studies. We present data for 14 SNVs that show geographical and genetic variation across the Mexican population, which provides highly informative content and greater intrapopulation genetic diversity. Further investigations in the field should be considered in addition to AIMs. The data presented in this study were analyzed in association with SNV genotypes in PADI2 and PADI4 to assess susceptibility to ILD in RA, as well as with changes in PAD2 and PAD4 protein levels according to carrier genotype, in addition to the use of covariates such as ancestry markers.","PeriodicalId":502371,"journal":{"name":"Data","volume":"2 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139157365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Urban Traffic Dataset Composed of Visible Images and Their Semantic Segmentation Generated by the CARLA Simulator CARLA 模拟器生成的可见光图像及其语义分割组成的城市交通数据集

Data

Pub Date : 2023-12-24 DOI: 10.3390/data9010004

Sergio Bemposta Rosende, David San José Gavilán, Javier Fernández-Andrés, Javier Sánchez-Soriano

A dataset of aerial urban traffic images and their semantic segmentation is presented to be used to train computer vision algorithms, among which those based on convolutional neural networks stand out. This article explains the process of creating the complete dataset, which includes the acquisition of the images, the labeling of vehicles, pedestrians, and pedestrian crossings as well as a description of the structure and content of the dataset (which amounts to 8694 images including visible images and those corresponding to the semantic segmentation). The images were generated using the CARLA simulator (but were like those that could be obtained with fixed aerial cameras or by using multi-copter drones) in the field of intelligent transportation management. The presented dataset is available and accessible to improve the performance of vision and road traffic management systems, especially for the detection of incorrect or dangerous maneuvers.

本文介绍了一个城市空中交通图像及其语义分割数据集，用于训练计算机视觉算法，其中以基于卷积神经网络的算法最为突出。本文介绍了创建完整数据集的过程，包括图像的获取、车辆、行人和人行横道的标注以及数据集结构和内容的描述（共有 8694 张图像，包括可见图像和与语义分割相对应的图像）。这些图像是在智能交通管理领域使用 CARLA 模拟器生成的（但与使用固定航空摄像机或多旋翼无人机获得的图像类似）。所提供的数据集可用于提高视觉和道路交通管理系统的性能，尤其是在检测错误或危险操作方面。

引用次数: 0

Internationalization in the Baltic Regional Accounts: A NUTS 3 Region Dataset 波罗的海地区账户的国际化：NUTS 3 地区数据集

Data

Pub Date : 2023-11-30 DOI: 10.3390/data8120181

Rasmus Bøgh Holmen, Nicolas Gavoille, Jaan Masso, Arūnas Burinskas

Features of internationalization, such as trade, foreign direct investments, and international migration, are crucial for understanding the economic developments of small and open economies. However, studying internationalization at the country level may obscure significant heterogeneity in its relationship with economic growth and other economic and social outcomes. Regional accounts provide insights into the geography of internationalization, but collections of such disaggregated statistics are rarely provided by statistical bureaus. The purpose of this paper is twofold. First, we demonstrate how regional account data, including internationalization indicators, can be constructed to obtain consistent and homogeneous regional-level series using a combination of micro and macro data sources. Second, our aim is to foster spatial research on internationalization and the spatial economy in the Baltics by providing comprehensive data collection of socio-economic variables at the NUTS 3 regional level over time. This collection encompasses trade, FDI, and migration, enabling the study of internationalization and other features of the Baltic economy. We present a series of key features, revealing noticeable correlation patterns between regional development and internationalization.

国际化的特征，如贸易、外国直接投资和国际移民，对于理解小型开放经济体的经济发展至关重要。然而，在国家层面研究国际化可能会掩盖其与经济增长及其他经济和社会成果之间关系的显著异质性。区域账户提供了对国际化地理的深入了解，但统计局很少提供此类分类统计数据。本文有两个目的。首先，我们展示了如何结合微观和宏观数据来源，构建包括国际化指标在内的地区账户数据，以获得一致且同质的地区级序列。其次，我们的目标是通过在 NUTS 3 区域层面提供长期的社会经济变量综合数据收集，促进对波罗的海地区国际化和空间经济的空间研究。这些数据包括贸易、外国直接投资和移民，有助于研究国际化和波罗的海经济的其他特征。我们提出了一系列关键特征，揭示了地区发展与国际化之间的显著相关模式。

{"title":"Internationalization in the Baltic Regional Accounts: A NUTS 3 Region Dataset","authors":"Rasmus Bøgh Holmen, Nicolas Gavoille, Jaan Masso, Arūnas Burinskas","doi":"10.3390/data8120181","DOIUrl":"https://doi.org/10.3390/data8120181","url":null,"abstract":"Features of internationalization, such as trade, foreign direct investments, and international migration, are crucial for understanding the economic developments of small and open economies. However, studying internationalization at the country level may obscure significant heterogeneity in its relationship with economic growth and other economic and social outcomes. Regional accounts provide insights into the geography of internationalization, but collections of such disaggregated statistics are rarely provided by statistical bureaus. The purpose of this paper is twofold. First, we demonstrate how regional account data, including internationalization indicators, can be constructed to obtain consistent and homogeneous regional-level series using a combination of micro and macro data sources. Second, our aim is to foster spatial research on internationalization and the spatial economy in the Baltics by providing comprehensive data collection of socio-economic variables at the NUTS 3 regional level over time. This collection encompasses trade, FDI, and migration, enabling the study of internationalization and other features of the Baltic economy. We present a series of key features, revealing noticeable correlation patterns between regional development and internationalization.","PeriodicalId":502371,"journal":{"name":"Data","volume":" 29","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139207341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Tourist-Based Framework for Developing Digital Marketing for Small and Medium-Sized Enterprises in the Tourism Sector in Saudi Arabia 基于游客的沙特阿拉伯旅游业中小型企业数字营销发展框架

Data

Pub Date : 2023-11-28 DOI: 10.3390/data8120179

Rishaa Alnajim, Bahjat Fakieh

Social media has become an essential tool for travel planning, with tourists increasingly using it to research destinations, book accommodation, and make travel arrangements. However, little is known about how tourists use social media for travel planning and what factors influence their intentions to use social media for this purpose. This thesis aims to understand tourists’ intentions to use social media for travel planning. Specifically, it investigates the factors influencing tourists’ intentions to use social media for planning travel to Saudi Arabia. It develops a machine learning (ML) classification model to assist Saudi tourism SMEs in creating effective digital marketing strategies for social media platforms. A survey was conducted with 573 tourists interested in visiting Saudi Arabia, using the Design Science Research (DSR) approach. The findings support the tourist-based theoretical framework, showing that perceived usefulness (PU), perceived ease of use (PEOU), satisfaction (SAT), marketing-generated content (MGC), and user-generated content (UGC) significantly impact tourists’ intentions to use social media for travel planning. Tourists’ characteristics and visit characteristics influenced their intentions to use MGC but not UGC. The tourist-based ML classification model, developed using the LinearSVC algorithm, achieved an accuracy of 99% when evaluated using the K-Fold Cross-Validation (KF-CV) technique. The findings of this study have several implications for Saudi tourism SMEs. First, the results suggest that SMEs should focus on developing social media content that is perceived as useful, easy to use, and satisfying. Second, the findings suggest that SMEs should focus on using MGC in their social media marketing campaigns. Third, the results suggest that SMEs should tailor their social media marketing campaigns to the characteristics of their target tourists. This study contributes to the literature on tourism marketing and social media by providing a better understanding of how tourists use social media for travel planning. Saudi tourism SMEs can use the findings of this study to develop more effective digital marketing strategies for social media platforms.

社交媒体已成为旅行规划的重要工具，游客越来越多地使用社交媒体来研究目的地、预订住宿和安排旅行。然而，人们对游客如何使用社交媒体进行旅行规划以及哪些因素会影响他们使用社交媒体进行旅行规划的意图知之甚少。本论文旨在了解游客使用社交媒体进行旅行规划的意图。具体而言，论文将研究影响游客使用社交媒体规划前往沙特阿拉伯旅游的意向的因素。论文开发了一个机器学习（ML）分类模型，以帮助沙特旅游业中小型企业为社交媒体平台制定有效的数字营销战略。采用设计科学研究（DSR）方法对 573 名有意前往沙特阿拉伯旅游的游客进行了调查。研究结果支持基于游客的理论框架，表明感知有用性（PU）、感知易用性（PEOU）、满意度（SAT）、营销生成内容（MGC）和用户生成内容（UGC）显著影响游客使用社交媒体进行旅游规划的意愿。游客的特征和访问特征会影响他们使用 MGC 的意愿，但不会影响 UGC 的意愿。使用 LinearSVC 算法开发的基于游客的 ML 分类模型，在使用 K-Fold Cross-Validation (KF-CV) 技术进行评估时，准确率达到了 99%。这项研究的结果对沙特旅游业中小型企业有几方面的启示。首先，研究结果表明，中小型企业应注重开发有用、易用和令人满意的社交媒体内容。其次，研究结果表明，中小企业应注重在社交媒体营销活动中使用 MGC。第三，研究结果表明，中小企业应根据其目标游客的特点调整社交媒体营销活动。通过更好地了解游客如何使用社交媒体进行旅游规划，本研究为旅游营销和社交媒体方面的文献做出了贡献。沙特旅游中小型企业可以利用本研究的结论为社交媒体平台制定更有效的数字营销战略。

{"title":"A Tourist-Based Framework for Developing Digital Marketing for Small and Medium-Sized Enterprises in the Tourism Sector in Saudi Arabia","authors":"Rishaa Alnajim, Bahjat Fakieh","doi":"10.3390/data8120179","DOIUrl":"https://doi.org/10.3390/data8120179","url":null,"abstract":"Social media has become an essential tool for travel planning, with tourists increasingly using it to research destinations, book accommodation, and make travel arrangements. However, little is known about how tourists use social media for travel planning and what factors influence their intentions to use social media for this purpose. This thesis aims to understand tourists’ intentions to use social media for travel planning. Specifically, it investigates the factors influencing tourists’ intentions to use social media for planning travel to Saudi Arabia. It develops a machine learning (ML) classification model to assist Saudi tourism SMEs in creating effective digital marketing strategies for social media platforms. A survey was conducted with 573 tourists interested in visiting Saudi Arabia, using the Design Science Research (DSR) approach. The findings support the tourist-based theoretical framework, showing that perceived usefulness (PU), perceived ease of use (PEOU), satisfaction (SAT), marketing-generated content (MGC), and user-generated content (UGC) significantly impact tourists’ intentions to use social media for travel planning. Tourists’ characteristics and visit characteristics influenced their intentions to use MGC but not UGC. The tourist-based ML classification model, developed using the LinearSVC algorithm, achieved an accuracy of 99% when evaluated using the K-Fold Cross-Validation (KF-CV) technique. The findings of this study have several implications for Saudi tourism SMEs. First, the results suggest that SMEs should focus on developing social media content that is perceived as useful, easy to use, and satisfying. Second, the findings suggest that SMEs should focus on using MGC in their social media marketing campaigns. Third, the results suggest that SMEs should tailor their social media marketing campaigns to the characteristics of their target tourists. This study contributes to the literature on tourism marketing and social media by providing a better understanding of how tourists use social media for travel planning. Saudi tourism SMEs can use the findings of this study to develop more effective digital marketing strategies for social media platforms.","PeriodicalId":502371,"journal":{"name":"Data","volume":"96 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139227470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data最新文献