Data最新文献_第4页

Fast Radius Outlier Filter Variant for Large Point Clouds 大型点云的快速半径离群值滤波变体

Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data

Pub Date : 2023-10-02 DOI: 10.3390/data8100149

Péter Szutor, Marianna Zichar

Currently, several devices (such as laser scanners, Kinect, time of flight cameras, medical imaging equipment (CT, MRI, intraoral scanners)), and technologies (e.g., photogrammetry) are capable of generating 3D point clouds. Each point cloud type has its unique structure or characteristics, but they have a common point: they may be loaded with errors. Before further data processing, these unwanted portions of the data must be removed with filtering and outlier detection. There are several algorithms for detecting outliers, but their performances decrease when the size of the point cloud increases. The industry has a high demand for efficient algorithms to deal with large point clouds. The most commonly used algorithm is the radius outlier filter (ROL or ROR), which has several improvements (e.g., statistical outlier removal, SOR). Unfortunately, this algorithm is also limited since it is slow on a large number of points. This paper introduces a novel algorithm, based on the idea of the ROL filter, that finds outliers in huge point clouds while its time complexity is not exponential. As a result of the linear complexity, the algorithm can handle extra large point clouds, and the effectiveness of this is demonstrated in several tests.

目前，有几种设备(如激光扫描仪、Kinect、飞行时间照相机、医学成像设备(CT、MRI、口内扫描仪))和技术(如摄影测量)能够生成3D点云。每种点云类型都有其独特的结构或特征，但它们有一个共同点:它们可能充满错误。在进一步的数据处理之前，必须通过过滤和离群值检测去除这些不需要的数据部分。有几种检测异常点的算法，但随着点云规模的增大，它们的性能下降。业界对处理大型点云的高效算法有很高的需求。最常用的算法是半径离群值过滤器(ROL或ROR)，它有几个改进(例如，统计离群值去除，SOR)。不幸的是，这种算法也有局限性，因为它在大量的点上很慢。本文介绍了一种基于ROL滤波器思想的新算法，该算法在时间复杂度不呈指数级的情况下发现巨大点云中的异常点。由于线性复杂性，该算法可以处理超大的点云，并在几个测试中证明了该算法的有效性。

{"title":"Fast Radius Outlier Filter Variant for Large Point Clouds","authors":"Péter Szutor, Marianna Zichar","doi":"10.3390/data8100149","DOIUrl":"https://doi.org/10.3390/data8100149","url":null,"abstract":"Currently, several devices (such as laser scanners, Kinect, time of flight cameras, medical imaging equipment (CT, MRI, intraoral scanners)), and technologies (e.g., photogrammetry) are capable of generating 3D point clouds. Each point cloud type has its unique structure or characteristics, but they have a common point: they may be loaded with errors. Before further data processing, these unwanted portions of the data must be removed with filtering and outlier detection. There are several algorithms for detecting outliers, but their performances decrease when the size of the point cloud increases. The industry has a high demand for efficient algorithms to deal with large point clouds. The most commonly used algorithm is the radius outlier filter (ROL or ROR), which has several improvements (e.g., statistical outlier removal, SOR). Unfortunately, this algorithm is also limited since it is slow on a large number of points. This paper introduces a novel algorithm, based on the idea of the ROL filter, that finds outliers in huge point clouds while its time complexity is not exponential. As a result of the linear complexity, the algorithm can handle extra large point clouds, and the effectiveness of this is demonstrated in several tests.","PeriodicalId":36824,"journal":{"name":"Data","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135895712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Data Storage, Scalability, and Availability in Blockchain Systems: A Bibliometric Analysis 迈向区块链系统中的数据存储、可扩展性和可用性:文献计量学分析

Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data

Pub Date : 2023-10-02 DOI: 10.3390/data8100148

Meenakshi Kandpal, Veena Goswami, Rojalina Priyadarshini, Rabindra Kumar Barik

In recent years, blockchain research has drawn attention from all across the world. It is a decentralized competence that is spread out and uncertain. Several nations and scholars have already successfully applied blockchain in numerous arenas. Blockchain is essential in delicate situations because it secures data and keeps it from being altered or forged. In addition, the market’s increased demand for data is driving demand for data scaling across all industries. Researchers from many nations have used blockchain in various sectors over time, thus bringing extreme focus to this newly escalating blockchain domain. Every research project begins with in-depth knowledge about the working domain, and new interest information about blockchain is quite scattered. This study analyzes academic literature on blockchain technology, emphasizing three key aspects: blockchain storage, scalability, and availability. These are critical areas within the broader field of blockchain technology. This study employs CiteSpace and VOSviewer to understand the current state of research in these areas comprehensively. These are bibliometric analysis tools commonly used in academic research to examine patterns and relationships within scientific literature. Thus, to visualize a way to store data with scalability and availability while keeping the security of the blockchain in sync, the required research has been performed on the storage, scalability, and availability of data in the blockchain environment. The ultimate goal is to contribute to developing secure and efficient data storage solutions within blockchain technology.

近年来，区块链的研究受到了世界各国的关注。这是一种分散的、不确定的能力。一些国家和学者已经成功地将区块链应用于许多领域。区块链在微妙的情况下是必不可少的，因为它可以保护数据，防止数据被篡改或伪造。此外，市场对数据的需求不断增长，推动了所有行业对数据扩展的需求。随着时间的推移，许多国家的研究人员在各个领域都使用了区块链，从而使人们对这个新兴的区块链领域产生了极大的关注。每个研究项目都是从对工作领域的深入了解开始的，关于区块链的新兴趣信息相当分散。本研究分析了区块链技术的学术文献，强调了三个关键方面:区块链存储、可扩展性和可用性。这些都是区块链技术更广泛领域内的关键领域。本研究使用CiteSpace和VOSviewer来全面了解这些领域的研究现状。这些是在学术研究中常用的文献计量分析工具，用于检查科学文献中的模式和关系。因此，为了可视化一种具有可扩展性和可用性的存储数据的方法，同时保持区块链的安全性同步，已经对区块链环境中数据的存储，可扩展性和可用性进行了必要的研究。最终目标是在区块链技术中开发安全高效的数据存储解决方案。

{"title":"Towards Data Storage, Scalability, and Availability in Blockchain Systems: A Bibliometric Analysis","authors":"Meenakshi Kandpal, Veena Goswami, Rojalina Priyadarshini, Rabindra Kumar Barik","doi":"10.3390/data8100148","DOIUrl":"https://doi.org/10.3390/data8100148","url":null,"abstract":"In recent years, blockchain research has drawn attention from all across the world. It is a decentralized competence that is spread out and uncertain. Several nations and scholars have already successfully applied blockchain in numerous arenas. Blockchain is essential in delicate situations because it secures data and keeps it from being altered or forged. In addition, the market’s increased demand for data is driving demand for data scaling across all industries. Researchers from many nations have used blockchain in various sectors over time, thus bringing extreme focus to this newly escalating blockchain domain. Every research project begins with in-depth knowledge about the working domain, and new interest information about blockchain is quite scattered. This study analyzes academic literature on blockchain technology, emphasizing three key aspects: blockchain storage, scalability, and availability. These are critical areas within the broader field of blockchain technology. This study employs CiteSpace and VOSviewer to understand the current state of research in these areas comprehensively. These are bibliometric analysis tools commonly used in academic research to examine patterns and relationships within scientific literature. Thus, to visualize a way to store data with scalability and availability while keeping the security of the blockchain in sync, the required research has been performed on the storage, scalability, and availability of data in the blockchain environment. The ultimate goal is to contribute to developing secure and efficient data storage solutions within blockchain technology.","PeriodicalId":36824,"journal":{"name":"Data","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135896045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Retinal Oct-Angiography and Cardiovascular STAtus (RASTA) Dataset of Swept-Source Microvascular Imaging for Cardiovascular Risk Assessment 用于心血管风险评估的视网膜oct血管造影和心血管状态(RASTA)扫描源微血管成像数据集

Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data

Pub Date : 2023-09-28 DOI: 10.3390/data8100147

Clément Germanèse, Fabrice Meriaudeau, Pétra Eid, Ramin Tadayoni, Dominique Ginhac, Atif Anwer, Steinberg Laure-Anne, Charles Guenancia, Catherine Creuzot-Garcher, Pierre-Henry Gabrielle, Louis Arnould

In the context of exponential demographic growth, the imbalance between human resources and public health problems impels us to envision other solutions to the difficulties faced in the diagnosis, prevention, and large-scale management of the most common diseases. Cardiovascular diseases represent the leading cause of morbidity and mortality worldwide. A large-scale screening program would make it possible to promptly identify patients with high cardiovascular risk in order to manage them adequately. Optical coherence tomography angiography (OCT-A), as a window into the state of the cardiovascular system, is a rapid, reliable, and reproducible imaging examination that enables the prompt identification of at-risk patients through the use of automated classification models. One challenge that limits the development of computer-aided diagnostic programs is the small number of open-source OCT-A acquisitions available. To facilitate the development of such models, we have assembled a set of images of the retinal microvascular system from 499 patients. It consists of 814 angiocubes as well as 2005 en face images. Angiocubes were captured with a swept-source OCT-A device of patients with varying overall cardiovascular risk. To the best of our knowledge, our dataset, Retinal oct-Angiography and cardiovascular STAtus (RASTA), is the only publicly available dataset comprising such a variety of images from healthy and at-risk patients. This dataset will enable the development of generalizable models for screening cardiovascular diseases from OCT-A retinal images.

在人口指数增长的背景下，人力资源和公共卫生问题之间的不平衡促使我们设想其他解决方案，以解决在诊断、预防和大规模管理最常见疾病方面面临的困难。心血管疾病是全世界发病率和死亡率的主要原因。大规模的筛查计划将使及时识别心血管疾病高风险患者成为可能，从而对他们进行适当的管理。光学相干断层扫描血管造影(OCT-A)作为了解心血管系统状态的窗口，是一种快速、可靠、可重复的成像检查，通过使用自动分类模型，能够及时识别有风险的患者。限制计算机辅助诊断程序发展的一个挑战是可用的开源OCT-A获取的数量很少。为了促进这种模型的发展，我们收集了499名患者视网膜微血管系统的一组图像。它由814个血管立方体和2005个正面图像组成。使用扫描源OCT-A装置捕获具有不同总体心血管风险的患者的血管立方。据我们所知，我们的数据集视网膜oct血管造影和心血管状态(RASTA)是唯一一个公开可用的数据集，包含来自健康和高危患者的各种图像。该数据集将使从OCT-A视网膜图像筛选心血管疾病的可推广模型的发展成为可能。

{"title":"A Retinal Oct-Angiography and Cardiovascular STAtus (RASTA) Dataset of Swept-Source Microvascular Imaging for Cardiovascular Risk Assessment","authors":"Clément Germanèse, Fabrice Meriaudeau, Pétra Eid, Ramin Tadayoni, Dominique Ginhac, Atif Anwer, Steinberg Laure-Anne, Charles Guenancia, Catherine Creuzot-Garcher, Pierre-Henry Gabrielle, Louis Arnould","doi":"10.3390/data8100147","DOIUrl":"https://doi.org/10.3390/data8100147","url":null,"abstract":"In the context of exponential demographic growth, the imbalance between human resources and public health problems impels us to envision other solutions to the difficulties faced in the diagnosis, prevention, and large-scale management of the most common diseases. Cardiovascular diseases represent the leading cause of morbidity and mortality worldwide. A large-scale screening program would make it possible to promptly identify patients with high cardiovascular risk in order to manage them adequately. Optical coherence tomography angiography (OCT-A), as a window into the state of the cardiovascular system, is a rapid, reliable, and reproducible imaging examination that enables the prompt identification of at-risk patients through the use of automated classification models. One challenge that limits the development of computer-aided diagnostic programs is the small number of open-source OCT-A acquisitions available. To facilitate the development of such models, we have assembled a set of images of the retinal microvascular system from 499 patients. It consists of 814 angiocubes as well as 2005 en face images. Angiocubes were captured with a swept-source OCT-A device of patients with varying overall cardiovascular risk. To the best of our knowledge, our dataset, Retinal oct-Angiography and cardiovascular STAtus (RASTA), is the only publicly available dataset comprising such a variety of images from healthy and at-risk patients. This dataset will enable the development of generalizable models for screening cardiovascular diseases from OCT-A retinal images.","PeriodicalId":36824,"journal":{"name":"Data","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135425069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Synthetic Data Generation for Data Envelopment Analysis 数据包络分析的合成数据生成

Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data

Pub Date : 2023-09-27 DOI: 10.3390/data8100146

Andrey V. Lychev

The paper is devoted to the problem of generating artificial datasets for data envelopment analysis (DEA), which can be used for testing DEA models and methods. In particular, the papers that applied DEA to big data often used synthetic data generation to obtain large-scale datasets because real datasets of large size, available in the public domain, are extremely rare. This paper proposes the algorithm which takes as input some real dataset and complements it by artificial efficient and inefficient units. The generation process extends the efficient part of the frontier by inserting artificial efficient units, keeping the original efficient frontier unchanged. For this purpose, the algorithm uses the assurance region method and consistently relaxes weight restrictions during the iterations. This approach produces synthetic datasets that are closer to real ones, compared to other algorithms that generate data from scratch. The proposed algorithm is applied to a pair of small real-life datasets. As a result, the datasets were expanded to 50K units. Computational experiments show that artificially generated DMUs preserve isotonicity and do not increase the collinearity of the original data as a whole.

本文研究了数据包络分析(DEA)中人工数据集的生成问题，这些数据集可用于检验DEA模型和方法。特别是将DEA应用于大数据的论文，往往采用合成数据生成的方法来获取大规模的数据集，因为在公共领域可获得的大规模真实数据集极为罕见。本文提出了一种以真实数据集为输入，辅以人工高效和低效单元的算法。生成过程通过插入人工有效单元来扩展有效边界部分，保持原有有效边界不变。为此，该算法使用保证域方法，并在迭代过程中不断放宽权重限制。与其他从头开始生成数据的算法相比，这种方法生成的合成数据集更接近真实数据集。将该算法应用于一对小的真实数据集。结果，数据集扩展到50K个单位。计算实验表明，人工生成的dmu保持了等压性，并没有增加原始数据的整体共线性。

引用次数: 0

Attention-Based Human Age Estimation from Face Images to Enhance Public Security 基于注意力的人脸年龄估计增强公共安全

Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data

Pub Date : 2023-09-25 DOI: 10.3390/data8100145

Md. Ashiqur Rahman, Shuhena Salam Aonty, Kaushik Deb, Iqbal H. Sarker

Age estimation from facial images has gained significant attention due to its practical applications such as public security. However, one of the major challenges faced in this field is the limited availability of comprehensive training data. Moreover, due to the gradual nature of aging, similar-aged faces tend to share similarities despite their race, gender, or location. Recent studies on age estimation utilize convolutional neural networks (CNN), treating every facial region equally and disregarding potentially informative patches that contain age-specific details. Therefore, an attention module can be used to focus extra attention on important patches in the image. In this study, tests are conducted on different attention modules, namely CBAM, SENet, and Self-attention, implemented with a convolutional neural network. The focus is on developing a lightweight model that requires a low number of parameters. A merged dataset and other cutting-edge datasets are used to test the proposed model’s performance. In addition, transfer learning is used alongside the scratch CNN model to achieve optimal performance more efficiently. Experimental results on different aging face databases show the remarkable advantages of the proposed attention-based CNN model over the conventional CNN model by attaining the lowest mean absolute error and the lowest number of parameters with a better cumulative score.

人脸图像年龄估计由于其在公安等领域的实际应用而受到广泛关注。然而，这一领域面临的主要挑战之一是综合训练数据的可得性有限。此外，由于衰老的渐进性质，尽管种族、性别或地点不同，年龄相近的面孔往往有相似之处。最近的年龄估计研究利用卷积神经网络(CNN)，平等地处理每个面部区域，忽略包含年龄特定细节的潜在信息补丁。因此，可以使用注意力模块将额外的注意力集中在图像中的重要补丁上。本研究采用卷积神经网络对CBAM、SENet和Self-attention三种不同的注意模块进行了测试。重点是开发需要少量参数的轻量级模型。使用合并的数据集和其他前沿数据集来测试所提出模型的性能。此外，迁移学习与scratch CNN模型一起使用，可以更有效地实现最优性能。在不同老化人脸数据库上的实验结果表明，本文提出的基于注意力的CNN模型相对于传统的CNN模型具有显著的优势，平均绝对误差最小，参数数量最少，且累积得分较高。

{"title":"Attention-Based Human Age Estimation from Face Images to Enhance Public Security","authors":"Md. Ashiqur Rahman, Shuhena Salam Aonty, Kaushik Deb, Iqbal H. Sarker","doi":"10.3390/data8100145","DOIUrl":"https://doi.org/10.3390/data8100145","url":null,"abstract":"Age estimation from facial images has gained significant attention due to its practical applications such as public security. However, one of the major challenges faced in this field is the limited availability of comprehensive training data. Moreover, due to the gradual nature of aging, similar-aged faces tend to share similarities despite their race, gender, or location. Recent studies on age estimation utilize convolutional neural networks (CNN), treating every facial region equally and disregarding potentially informative patches that contain age-specific details. Therefore, an attention module can be used to focus extra attention on important patches in the image. In this study, tests are conducted on different attention modules, namely CBAM, SENet, and Self-attention, implemented with a convolutional neural network. The focus is on developing a lightweight model that requires a low number of parameters. A merged dataset and other cutting-edge datasets are used to test the proposed model’s performance. In addition, transfer learning is used alongside the scratch CNN model to achieve optimal performance more efficiently. Experimental results on different aging face databases show the remarkable advantages of the proposed attention-based CNN model over the conventional CNN model by attaining the lowest mean absolute error and the lowest number of parameters with a better cumulative score.","PeriodicalId":36824,"journal":{"name":"Data","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135816165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Potential Range Map Dataset of Indian Birds 印度鸟类潜在范围地图数据集

Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data

Pub Date : 2023-09-21 DOI: 10.3390/data8090144

Arpit Deomurari, Ajay Sharma, Dipankar Ghose, Randeep Singh

Conservation management heavily relies on accurate species distribution data. However, distributional information for most species is limited to distributional range maps, which could not have enough resolution to take conservation action and know current distribution status. In many cases, distribution maps are difficult to access in proper data formats for analysis and conservation planning of species. In this study, we addressed this issue by developing Species Distribution Models (SDMs) that integrate species presence data from various citizen science initiatives. This allowed us to systematically construct current distribution maps for 1091 bird species across India. To create these SDMs, we used MaxEnt 3.4.4 (Maximum Entropy) as the base for species distribution modelling and combined it with multiple citizen science datasets containing information on species occurrence and 29 environmental variables. Using this method, we were able to estimate species distribution maps at both a national scale and a high spatial resolution of 1 km2. Thus, the results of our study provide species current species distribution maps for 968 bird species found in India. These maps significantly improve our knowledge of the geographic distribution of about 75% of India’s bird species and are essential for addressing spatial knowledge gaps for conservation issues. Additionally, by superimposing the distribution maps of different species, we can locate hotspots for bird diversity and align conservation action.

保护管理严重依赖于准确的物种分布数据。然而，大多数物种的分布信息仅限于分布范围图，其分辨率不足以采取保护行动和了解当前的分布状况。在许多情况下，很难以适当的数据格式访问分布图，以便对物种进行分析和保护规划。在这项研究中，我们通过开发物种分布模型(SDMs)来解决这个问题，该模型整合了来自各种公民科学计划的物种存在数据。这使我们能够系统地构建印度1091种鸟类的当前分布图。为了创建这些sdm，我们使用MaxEnt 3.4.4 (Maximum Entropy)作为物种分布建模的基础，并将其与包含物种发生信息和29个环境变量的多个公民科学数据集相结合。利用该方法，我们能够在国家尺度和1 km2的高空间分辨率上估计物种分布图。因此，我们的研究结果提供了印度968种鸟类的物种现状分布图。这些地图大大提高了我们对印度约75%鸟类地理分布的认识，对于解决保护问题的空间知识差距至关重要。此外，通过叠加不同物种的分布图，我们可以定位鸟类多样性的热点，并调整保护行动。

{"title":"Potential Range Map Dataset of Indian Birds","authors":"Arpit Deomurari, Ajay Sharma, Dipankar Ghose, Randeep Singh","doi":"10.3390/data8090144","DOIUrl":"https://doi.org/10.3390/data8090144","url":null,"abstract":"Conservation management heavily relies on accurate species distribution data. However, distributional information for most species is limited to distributional range maps, which could not have enough resolution to take conservation action and know current distribution status. In many cases, distribution maps are difficult to access in proper data formats for analysis and conservation planning of species. In this study, we addressed this issue by developing Species Distribution Models (SDMs) that integrate species presence data from various citizen science initiatives. This allowed us to systematically construct current distribution maps for 1091 bird species across India. To create these SDMs, we used MaxEnt 3.4.4 (Maximum Entropy) as the base for species distribution modelling and combined it with multiple citizen science datasets containing information on species occurrence and 29 environmental variables. Using this method, we were able to estimate species distribution maps at both a national scale and a high spatial resolution of 1 km2. Thus, the results of our study provide species current species distribution maps for 968 bird species found in India. These maps significantly improve our knowledge of the geographic distribution of about 75% of India’s bird species and are essential for addressing spatial knowledge gaps for conservation issues. Additionally, by superimposing the distribution maps of different species, we can locate hotspots for bird diversity and align conservation action.","PeriodicalId":36824,"journal":{"name":"Data","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136153226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A New Odd Beta Prime-Burr X Distribution with Applications to Petroleum Rock Sample Data and COVID-19 Mortality Rate 一种新的奇β Prime-Burr X分布及其在石油样品数据和COVID-19死亡率中的应用

Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data

Pub Date : 2023-09-19 DOI: 10.3390/data8090143

Ahmad Abubakar Suleiman, Hanita Daud, Narinderjit Singh Sawaran Singh, Aliyu Ismail Ishaq, Mahmod Othman

In this article, we pioneer a new Burr X distribution using the odd beta prime generalized (OBP-G) family of distributions called the OBP-Burr X (OBPBX) distribution. The density function of this model is symmetric, left-skewed, right-skewed, and reversed-J, while the hazard function is monotonically increasing, decreasing, bathtub, and N-shaped, making it suitable for modeling skewed data and failure rates. Various statistical properties of the new model are obtained, such as moments, moment-generating function, entropies, quantile function, and limit behavior. The maximum-likelihood-estimation procedure is utilized to determine the parameters of the model. A Monte Carlo simulation study is implemented to ascertain the efficiency of maximum-likelihood estimators. The findings demonstrate the empirical application and flexibility of the OBPBX distribution, as showcased through its analysis of petroleum rock samples and COVID-19 mortality data, along with its superior performance compared to well-known extended versions of the Burr X distribution. We anticipate that the new distribution will attract a wider readership and provide a vital tool for modeling various phenomena in different domains.

在本文中，我们使用奇数素数广义(OBP-G)分布族，即OBP-Burr X (OBPBX)分布，开创了一个新的Burr X分布。该模型的密度函数为对称型、左偏型、右偏型和倒j型，而风险函数为单调递增型、递减型、浴盆型和n型，适合于偏态数据和故障率的建模。得到了新模型的各种统计性质，如矩、矩生成函数、熵、分位数函数和极限行为。利用最大似然估计方法确定模型的参数。通过蒙特卡罗仿真研究，验证了极大似然估计的有效性。通过对石油岩石样本和COVID-19死亡率数据的分析，这些发现证明了OBPBX分布的经验应用和灵活性，以及与众所周知的扩展版本Burr X分布相比的优越性能。我们预计新的发行版将吸引更广泛的读者，并为不同领域的各种现象建模提供一个重要的工具。

{"title":"A New Odd Beta Prime-Burr X Distribution with Applications to Petroleum Rock Sample Data and COVID-19 Mortality Rate","authors":"Ahmad Abubakar Suleiman, Hanita Daud, Narinderjit Singh Sawaran Singh, Aliyu Ismail Ishaq, Mahmod Othman","doi":"10.3390/data8090143","DOIUrl":"https://doi.org/10.3390/data8090143","url":null,"abstract":"In this article, we pioneer a new Burr X distribution using the odd beta prime generalized (OBP-G) family of distributions called the OBP-Burr X (OBPBX) distribution. The density function of this model is symmetric, left-skewed, right-skewed, and reversed-J, while the hazard function is monotonically increasing, decreasing, bathtub, and N-shaped, making it suitable for modeling skewed data and failure rates. Various statistical properties of the new model are obtained, such as moments, moment-generating function, entropies, quantile function, and limit behavior. The maximum-likelihood-estimation procedure is utilized to determine the parameters of the model. A Monte Carlo simulation study is implemented to ascertain the efficiency of maximum-likelihood estimators. The findings demonstrate the empirical application and flexibility of the OBPBX distribution, as showcased through its analysis of petroleum rock samples and COVID-19 mortality data, along with its superior performance compared to well-known extended versions of the Burr X distribution. We anticipate that the new distribution will attract a wider readership and provide a vital tool for modeling various phenomena in different domains.","PeriodicalId":36824,"journal":{"name":"Data","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135063250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Update of Dietary Supplement Label Database Addressing on Coding in Italy 意大利膳食补充剂标签数据库地址编码的更新

Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data

Pub Date : 2023-09-13 DOI: 10.3390/data8090142

Giorgia Perelli, Roberta Bernini, Massimo Lucarini, Alessandra Durazzo

Harmonized composition data for foods and dietary supplements are needed for research and for policy decision making. For a correct assessment of dietary intake, the categorization and the classification of food products and dietary supplements are necessary. In recent decades, the marketing of dietary supplements has increased. A food supplements-based database has, as a principal feature, an intrinsic dynamism related to the continuous changes in formulations, which consequently leads to the need for constant monitoring of the market and for regular updates of the database. This study presents an update to the Dietary Supplement Label Database in Italy focused on dietary supplements coding. The updated dataset here, presented for the first time, consists of the codes of 216 dietary supplements currently on the market in Italy that have functional foods as their characterizing ingredients, throughout the two commonly most used description and classification systems: LanguaLTM and FoodEx2-. This update represents a unique tool and guideline for other compilers and users for applying classification coding systems to dietary supplements. Moreover, this updated dataset represents a valuable resource for several applications such as epidemiological investigations, exposure studies, and dietary assessment.

研究和决策需要统一的食品和膳食补充剂成分数据。为了正确评估膳食摄入量，对食品和膳食补充剂进行分类和分类是必要的。近几十年来，膳食补充剂的市场营销有所增加。以食品补充剂为基础的数据库的主要特点是具有与配方不断变化有关的内在动力，因此需要不断监测市场和定期更新数据库。本研究更新了意大利膳食补充剂标签数据库，重点关注膳食补充剂编码。本文首次发布的更新数据集包括目前意大利市场上以功能食品为特征成分的216种膳食补充剂的代码，涵盖了两种最常用的描述和分类系统:LanguaLTM和FoodEx2-。该更新为其他编译者和用户提供了将分类编码系统应用于膳食补充剂的独特工具和指南。此外，这个更新的数据集为流行病学调查、暴露研究和饮食评估等几个应用提供了宝贵的资源。

{"title":"Update of Dietary Supplement Label Database Addressing on Coding in Italy","authors":"Giorgia Perelli, Roberta Bernini, Massimo Lucarini, Alessandra Durazzo","doi":"10.3390/data8090142","DOIUrl":"https://doi.org/10.3390/data8090142","url":null,"abstract":"Harmonized composition data for foods and dietary supplements are needed for research and for policy decision making. For a correct assessment of dietary intake, the categorization and the classification of food products and dietary supplements are necessary. In recent decades, the marketing of dietary supplements has increased. A food supplements-based database has, as a principal feature, an intrinsic dynamism related to the continuous changes in formulations, which consequently leads to the need for constant monitoring of the market and for regular updates of the database. This study presents an update to the Dietary Supplement Label Database in Italy focused on dietary supplements coding. The updated dataset here, presented for the first time, consists of the codes of 216 dietary supplements currently on the market in Italy that have functional foods as their characterizing ingredients, throughout the two commonly most used description and classification systems: LanguaLTM and FoodEx2-. This update represents a unique tool and guideline for other compilers and users for applying classification coding systems to dietary supplements. Moreover, this updated dataset represents a valuable resource for several applications such as epidemiological investigations, exposure studies, and dietary assessment.","PeriodicalId":36824,"journal":{"name":"Data","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135740300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dataset of Multi-Aspect Integrated Migration Indicators 多方面综合迁移指标数据集

Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data

Pub Date : 2023-08-31 DOI: 10.3390/data8090139

Diletta Goglia, Laura Pollacci, Alina Sîrbu

Nowadays, new branches of research are proposing the use of non-traditional data sources for the study of migration trends in order to find an original methodology to answer open questions about cross-border human mobility. New knowledge extracted from these data must be validated using traditional data, which are however distributed across different sources and difficult to integrate. In this context we present the Multi-aspect Integrated Migration Indicators (MIMI) dataset, a new dataset of migration indicators (flows and stocks) and possible migration drivers (cultural, economic, demographic and geographic indicators). This was obtained through acquisition, transformation and integration of disparate traditional datasets together with social network data from Facebook (Social Connectedness Index). This article describes the process of gathering, embedding and merging traditional and novel variables, resulting in this new multidisciplinary dataset that we believe could significantly contribute to nowcast/forecast bilateral migration trends and migration drivers.

如今，新的研究分支正在建议使用非传统的数据来源来研究移民趋势，以便找到一种原始的方法来回答有关跨境人口流动的开放性问题。从这些数据中提取的新知识必须使用传统数据进行验证，然而传统数据分布在不同的来源，难以整合。在此背景下，我们提出了多方面综合移民指标(MIMI)数据集，这是一个新的移民指标(流量和存量)和可能的移民驱动因素(文化、经济、人口和地理指标)数据集。这是通过获取、转换和整合不同的传统数据集以及来自Facebook的社交网络数据(社交连通性指数)获得的。本文描述了收集、嵌入和合并传统变量和新变量的过程，从而产生了这个新的多学科数据集，我们认为该数据集可以对临近预测/预测双边迁移趋势和迁移驱动因素做出重大贡献。

引用次数: 0

GUIDO: A Hybrid Approach to Guideline Discovery & Ordering from Natural Language Texts GUIDO:从自然语言文本中发现和排序指南的混合方法

IF 2.6 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data

Pub Date : 2023-07-19 DOI: 10.5220/0012084400003541

Nils Freyer, Dustin Thewes, Matthias Meinecke

Extracting workflow nets from textual descriptions can be used to simplify guidelines or formalize textual descriptions of formal processes like business processes and algorithms. The task of manually extracting processes, however, requires domain expertise and effort. While automatic process model extraction is desirable, annotating texts with formalized process models is expensive. Therefore, there are only a few machine-learning-based extraction approaches. Rule-based approaches, in turn, require domain specificity to work well and can rarely distinguish relevant and irrelevant information in textual descriptions. In this paper, we present GUIDO, a hybrid approach to the process model extraction task that first, classifies sentences regarding their relevance to the process model, using a BERT-based sentence classifier, and second, extracts a process model from the sentences classified as relevant, using dependency parsing. The presented approach achieves significantly better results than a pure rule-based approach. GUIDO achieves an average behavioral similarity score of $0.93$. Still, in comparison to purely machine-learning-based approaches, the annotation costs stay low.

从文本描述中提取工作流网络可用于简化指导方针或形式化正式流程(如业务流程和算法)的文本描述。然而，手动提取流程的任务需要领域的专业知识和努力。虽然需要自动提取过程模型，但是用形式化的过程模型对文本进行注释是非常昂贵的。因此，只有少数几种基于机器学习的提取方法。反过来，基于规则的方法需要领域特异性才能很好地工作，并且很少能够区分文本描述中的相关和不相关信息。在本文中，我们提出了GUIDO，这是一种过程模型提取任务的混合方法，首先，使用基于bert的句子分类器根据与过程模型的相关性对句子进行分类，然后使用依赖解析从分类为相关的句子中提取过程模型。所提出的方法比纯基于规则的方法取得了明显更好的结果。GUIDO的平均行为相似性得分为0.93美元。尽管如此，与纯粹基于机器学习的方法相比，注释的成本仍然很低。

{"title":"GUIDO: A Hybrid Approach to Guideline Discovery & Ordering from Natural Language Texts","authors":"Nils Freyer, Dustin Thewes, Matthias Meinecke","doi":"10.5220/0012084400003541","DOIUrl":"https://doi.org/10.5220/0012084400003541","url":null,"abstract":"Extracting workflow nets from textual descriptions can be used to simplify guidelines or formalize textual descriptions of formal processes like business processes and algorithms. The task of manually extracting processes, however, requires domain expertise and effort. While automatic process model extraction is desirable, annotating texts with formalized process models is expensive. Therefore, there are only a few machine-learning-based extraction approaches. Rule-based approaches, in turn, require domain specificity to work well and can rarely distinguish relevant and irrelevant information in textual descriptions. In this paper, we present GUIDO, a hybrid approach to the process model extraction task that first, classifies sentences regarding their relevance to the process model, using a BERT-based sentence classifier, and second, extracts a process model from the sentences classified as relevant, using dependency parsing. The presented approach achieves significantly better results than a pure rule-based approach. GUIDO achieves an average behavioral similarity score of $0.93$. Still, in comparison to purely machine-learning-based approaches, the annotation costs stay low.","PeriodicalId":36824,"journal":{"name":"Data","volume":"1 1","pages":"335-342"},"PeriodicalIF":2.6,"publicationDate":"2023-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46354325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0