Annals of Data Science最新文献_第5页

Blockchain Adoption in Operations Management: A Systematic Literature Review of 14 Years of Research 运营管理中的区块链应用：14 年研究的系统文献综述

Q1 Decision Sciences

Annals of Data Science

Pub Date : 2023-12-16 DOI: 10.1007/s40745-023-00505-0

Mansoureh Beheshti Nejad, Seyed Mahmoud Zanjirchi, Seyed Mojtaba Hosseini Bamakan, Negar Jalilian

Blockchain technology has ushered in significant technological disruptions within the operational management sphere, fostering value creation within operational management networks. In recent years, researchers have increasingly explored the potential applications of blockchain across diverse facets of operational management. Recognizing the pivotal role of comprehending prior research endeavors within any scientific domain for the development of a robust theoretical framework and a nuanced understanding of research progression in both the scientific realm and its practical applications, this study aims to identify areas where blockchain can be effectively employed. This objective is accomplished through an exhaustive systematic review of existing research on blockchain applications in the field of operations management. In pursuit of this goal, a comprehensive dataset comprising 9188 papers published up to the year 2020 is amassed and subjected to analysis employing life cycle analysis, bibliometrics, and textual analysis. The outcomes of this research elucidate the emergence of five distinctive clusters within the landscape of blockchain applications in operational management: Decentralized Finance, Traceability, Trust, Sustainability, and Information Sharing. These findings underscore the dynamic and evolving nature of blockchain’s impact in this domain.

区块链技术为运营管理领域带来了重大技术变革，促进了运营管理网络的价值创造。近年来，研究人员越来越多地探索区块链在运营管理各方面的潜在应用。本研究认识到，在任何科学领域中，理解先前的研究工作对于建立健全的理论框架和细致入微地了解科学领域及其实际应用中的研究进展具有关键作用，因此本研究旨在确定可以有效应用区块链的领域。为实现这一目标，我们对运营管理领域区块链应用的现有研究进行了详尽的系统回顾。为实现这一目标，本研究收集了截至 2020 年发表的 9188 篇论文组成的综合数据集，并采用生命周期分析、文献计量学和文本分析等方法对其进行了分析。这项研究的成果阐明了区块链在运营管理领域应用的五个独特集群：去中心化金融、可追溯性、信任、可持续性和信息共享。这些发现强调了区块链对该领域影响的动态性和演变性。

{"title":"Blockchain Adoption in Operations Management: A Systematic Literature Review of 14 Years of Research","authors":"Mansoureh Beheshti Nejad, Seyed Mahmoud Zanjirchi, Seyed Mojtaba Hosseini Bamakan, Negar Jalilian","doi":"10.1007/s40745-023-00505-0","DOIUrl":"10.1007/s40745-023-00505-0","url":null,"abstract":"<div><p>Blockchain technology has ushered in significant technological disruptions within the operational management sphere, fostering value creation within operational management networks. In recent years, researchers have increasingly explored the potential applications of blockchain across diverse facets of operational management. Recognizing the pivotal role of comprehending prior research endeavors within any scientific domain for the development of a robust theoretical framework and a nuanced understanding of research progression in both the scientific realm and its practical applications, this study aims to identify areas where blockchain can be effectively employed. This objective is accomplished through an exhaustive systematic review of existing research on blockchain applications in the field of operations management. In pursuit of this goal, a comprehensive dataset comprising 9188 papers published up to the year 2020 is amassed and subjected to analysis employing life cycle analysis, bibliometrics, and textual analysis. The outcomes of this research elucidate the emergence of five distinctive clusters within the landscape of blockchain applications in operational management: Decentralized Finance, Traceability, Trust, Sustainability, and Information Sharing. These findings underscore the dynamic and evolving nature of blockchain’s impact in this domain.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 4","pages":"1361 - 1389"},"PeriodicalIF":0.0,"publicationDate":"2023-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138995498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On a New Mixed Pareto–Weibull Distribution: Its Parametric Regression Model with an Insurance Applications 论一种新的帕累托-威布尔混合分布：其参数回归模型与保险应用

Q1 Decision Sciences

Annals of Data Science

Pub Date : 2023-12-16 DOI: 10.1007/s40745-023-00502-3

Deepesh Bhati, Buddepu Pavan, Girish Aradhye

This article introduces a new probability distribution suitable for modeling heavy-tailed and right-skewed data sets. The proposed distribution is derived from the continuous mixture of the scale parameter of the Pareto family with the Weibull distribution. Analytical expressions for various distributional properties and actuarial risk measures of the proposed model are derived. The applicability of the proposed model is assessed using two real-world insurance data sets, and its performance is compared with the existing class of heavy-tailed models. The proposed model is assumed for the response variable in parametric regression modeling to account for the heterogeneity of individual policyholders. The Expectation-Maximization (EM) Algorithm is included to expedite the process of finding maximum likelihood (ML) estimates for the parameters of the proposed models. Real-world data application demonstrates that the proposed distribution performs well compared to its competitor models. The regression model utilizing the mixed Pareto–Weibull response distribution, characterized by regression structures for both the mean and dispersion parameters, demonstrates superior performance when compared to the Pareto–Weibull regression model, where the dispersion parameter depends on covariates.

本文介绍了一种适用于重尾和右斜数据集建模的新概率分布。所提出的分布是由帕累托族的尺度参数与威布尔分布的连续混合物推导出来的。推导出了拟议模型的各种分布属性和精算风险度量的分析表达式。利用两个真实世界的保险数据集评估了所提模型的适用性，并将其性能与现有的重尾模型进行了比较。在参数回归建模中，假设所提出的模型为响应变量，以考虑个体投保人的异质性。该模型采用了期望最大化（EM）算法，以加快为模型参数寻找最大似然估计值的过程。真实世界的数据应用表明，与同类模型相比，建议的分布表现良好。与帕累托-威布尔回归模型相比，利用帕累托-威布尔混合响应分布的回归模型表现出更优越的性能，因为帕累托-威布尔回归模型的离散参数取决于协变量。

{"title":"On a New Mixed Pareto–Weibull Distribution: Its Parametric Regression Model with an Insurance Applications","authors":"Deepesh Bhati, Buddepu Pavan, Girish Aradhye","doi":"10.1007/s40745-023-00502-3","DOIUrl":"10.1007/s40745-023-00502-3","url":null,"abstract":"<div><p>This article introduces a new probability distribution suitable for modeling heavy-tailed and right-skewed data sets. The proposed distribution is derived from the continuous mixture of the scale parameter of the Pareto family with the Weibull distribution. Analytical expressions for various distributional properties and actuarial risk measures of the proposed model are derived. The applicability of the proposed model is assessed using two real-world insurance data sets, and its performance is compared with the existing class of heavy-tailed models. The proposed model is assumed for the response variable in parametric regression modeling to account for the heterogeneity of individual policyholders. The Expectation-Maximization (EM) Algorithm is included to expedite the process of finding maximum likelihood (ML) estimates for the parameters of the proposed models. Real-world data application demonstrates that the proposed distribution performs well compared to its competitor models. The regression model utilizing the mixed Pareto–Weibull response distribution, characterized by regression structures for both the mean and dispersion parameters, demonstrates superior performance when compared to the Pareto–Weibull regression model, where the dispersion parameter depends on covariates.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 6","pages":"2077 - 2107"},"PeriodicalIF":0.0,"publicationDate":"2023-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138967097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modelling and Forecasting of Covid-19 Using Periodical ARIMA Models 利用周期 ARIMA 模型对 Covid-19 进行建模和预测

Q1 Decision Sciences

Annals of Data Science

Pub Date : 2023-11-16 DOI: 10.1007/s40745-023-00501-4

Amaal Elsayed Mubarak, Ehab Mohamed Almetwally

Corona virus (Covid-19) is a great danger for whole world. World health organization (WHO) considered it an epidemic. Data collection was based on the reports of World health organization for Covid-19 in Egypt. The problem of this study is to describe actual behavior of the virus using an appropriate statistical model. As WHO stated, Covid-19 behaves in the form of waves, therefore we thought that we should pay attention to seasonal and periodical models when identifying an appropriate model for this virus. The aim of this article is to introduce and study Periodical Autoregressive integrated Moving Average (PARIMA) models and compare them with the Autoregressive Integrated Moving Average (ARIMA) and Seasonal Autoregressive Integrated Moving Average (SARIMA) models to find optimal or approximately optimal model helps to predict the epidemiological behavior of the prevalence and so find reliable future forecasts of the number of Covid-19 injuries in Egypt. A numerical study using real data analysis is performed to establish an appropriate PARIMA model. The results supported the reliance of PAR (7) odel and its use for the purpose of forecasting. Extensive comparisons have been made between the estimated PARIMA model and some other advanced time series models. The forecasts obtained from the estimated PARIMA model were compared with the forecasts obtained from ARIMA (2, 2, 2) and SARIMA (1, 2, 1), (0, 0 ,1) models.

科罗娜病毒（Covid-19）对全世界构成巨大威胁。世界卫生组织（WHO）将其视为一种流行病。数据收集以世界卫生组织关于埃及 Covid-19 的报告为基础。本研究的问题是利用适当的统计模型描述病毒的实际行为。正如世界卫生组织所指出的，Covid-19 的行为呈波浪状，因此我们认为在为该病毒确定适当模型时应注意季节性和周期性模型。本文旨在介绍和研究周期性自回归综合移动平均模型（PARIMA），并将其与自回归综合移动平均模型（ARIMA）和季节性自回归综合移动平均模型（SARIMA）进行比较，以找到最佳或近似最佳模型，帮助预测流行病学行为，从而找到埃及 Covid-19 受伤人数的可靠未来预测。为建立一个适当的 PARIMA 模型，利用真实数据分析进行了一项数值研究。研究结果支持使用 PAR (7) 模型进行预测。对估计的 PARIMA 模型和其他一些先进的时间序列模型进行了广泛的比较。将估计 PARIMA 模型得出的预测结果与 ARIMA (2, 2, 2) 和 SARIMA (1, 2, 1), (0, 0 ,1) 模型得出的预测结果进行了比较。

{"title":"Modelling and Forecasting of Covid-19 Using Periodical ARIMA Models","authors":"Amaal Elsayed Mubarak, Ehab Mohamed Almetwally","doi":"10.1007/s40745-023-00501-4","DOIUrl":"10.1007/s40745-023-00501-4","url":null,"abstract":"<div><p>Corona virus (Covid-19) is a great danger for whole world. World health organization (WHO) considered it an epidemic. Data collection was based on the reports of World health organization for Covid-19 in Egypt. The problem of this study is to describe actual behavior of the virus using an appropriate statistical model. As WHO stated, Covid-19 behaves in the form of waves, therefore we thought that we should pay attention to seasonal and periodical models when identifying an appropriate model for this virus. The aim of this article is to introduce and study Periodical Autoregressive integrated Moving Average (PARIMA) models and compare them with the Autoregressive Integrated Moving Average (ARIMA) and Seasonal Autoregressive Integrated Moving Average (SARIMA) models to find optimal or approximately optimal model helps to predict the epidemiological behavior of the prevalence and so find reliable future forecasts of the number of Covid-19 injuries in Egypt. A numerical study using real data analysis is performed to establish an appropriate PARIMA model. The results supported the reliance of PAR (7) odel and its use for the purpose of forecasting. Extensive comparisons have been made between the estimated PARIMA model and some other advanced time series models. The forecasts obtained from the estimated PARIMA model were compared with the forecasts obtained from ARIMA (2, 2, 2) and SARIMA (1, 2, 1), (0, 0 ,1) models.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 4","pages":"1483 - 1502"},"PeriodicalIF":0.0,"publicationDate":"2023-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139266775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Statistical Modelling for Pandemic Crisis Management in Universities 大学大流行病危机管理统计模型

Q1 Decision Sciences

Annals of Data Science

Pub Date : 2023-10-26 DOI: 10.1007/s40745-023-00499-9

Shayan Frouzanfar, Maryam Omidi Najafabadi, Seyed Mehdi Mirdamadi

The purpose of this research is to explain the crisis management model of agricultural faculties in pandemic conditions. This descriptive-correlation research was conducted using a survey method. The staff and teachers of the agricultural faculties from universities in the Tehran province (493 people) constitute the statistical population of this research. Using Cochran's relationship and the size of the statistical population, the number of samples was estimated to be 240, and the samples were selected using the stratified random sampling method. The main tool of this research was a researcher-made questionnaire whose validity and reliability were tested and confirmed. In order to analyze data and test research hypotheses, structural equation modeling with a partial least squares approach and PLS Smart software were used. The results showed that formulation of laws and policies with a path coefficient of 0.137, diversification of financial resources with a path coefficient of 0.323, development and strengthening of infrastructure with a path coefficient of 0.245, communication with a path coefficient of 0.102, and human resources management with a path coefficient of 0.363 have significantly positive impacts on the pandemic crisis management, which directly explained 84.8% of the changes related to the variable of pandemic crisis management in the university. Moreover, pandemic crisis management at the university has a positive and significant effect on the sustainability of higher education with a path coefficient of 0.453, and it has the ability to predict 20.5% of changes in the sustainability of higher education.

本研究旨在解释大流行病条件下农业院校的危机管理模式。本研究采用调查法进行描述性-相关性研究。德黑兰省各大学农业院系的教职员工（493 人）构成了本研究的统计人口。根据科克兰关系和统计人口的规模，估计样本数量为 240 个，并采用分层随机抽样法选取样本。本研究的主要工具是研究人员自制的调查问卷，其有效性和可靠性已得到检验和确认。为了分析数据和检验研究假设，使用了偏最小二乘法结构方程模型和 PLS Smart 软件。结果表明，法律和政策的制定（路径系数为 0.137）、财政资源的多样化（路径系数为 0.323）、基础设施的发展和加强（路径系数为 0.245）、沟通（路径系数为 0.102）和人力资源管理（路径系数为 0.363）对大流行病危机管理有显著的正向影响，直接解释了 84.8%与高校大流行病危机管理变量相关的变化。此外，高校大流行病危机管理对高等教育的可持续发展具有显著的正向影响，其路径系数为 0.453，能够预测高等教育可持续发展 20.5%的变化。

{"title":"Statistical Modelling for Pandemic Crisis Management in Universities","authors":"Shayan Frouzanfar, Maryam Omidi Najafabadi, Seyed Mehdi Mirdamadi","doi":"10.1007/s40745-023-00499-9","DOIUrl":"10.1007/s40745-023-00499-9","url":null,"abstract":"<div><p>The purpose of this research is to explain the crisis management model of agricultural faculties in pandemic conditions. This descriptive-correlation research was conducted using a survey method. The staff and teachers of the agricultural faculties from universities in the Tehran province (493 people) constitute the statistical population of this research. Using Cochran's relationship and the size of the statistical population, the number of samples was estimated to be 240, and the samples were selected using the stratified random sampling method. The main tool of this research was a researcher-made questionnaire whose validity and reliability were tested and confirmed. In order to analyze data and test research hypotheses, structural equation modeling with a partial least squares approach and PLS Smart software were used. The results showed that formulation of laws and policies with a path coefficient of 0.137, diversification of financial resources with a path coefficient of 0.323, development and strengthening of infrastructure with a path coefficient of 0.245, communication with a path coefficient of 0.102, and human resources management with a path coefficient of 0.363 have significantly positive impacts on the pandemic crisis management, which directly explained 84.8% of the changes related to the variable of pandemic crisis management in the university. Moreover, pandemic crisis management at the university has a positive and significant effect on the sustainability of higher education with a path coefficient of 0.453, and it has the ability to predict 20.5% of changes in the sustainability of higher education.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 4","pages":"1459 - 1481"},"PeriodicalIF":0.0,"publicationDate":"2023-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134909131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Nonparametric Test for Randomly Censored Data 随机删失数据的非参数检验

Q1 Decision Sciences

Annals of Data Science

Pub Date : 2023-10-26 DOI: 10.1007/s40745-023-00500-5

Ayushee, Narinder Kumar, Manish Goyal

A nonparametric test for the testing of scale parameters, is proposed in two-sample situation with random censored data. Random censored data are mostly encountered in clinical studies, where some individuals experience the event of interest (death); some are drop-outs or loss to follow-ups and some are still alive at the end of study. The performance of test is studied by comparing it with some existing tests in terms of asymptotic relative efficiency. Critical values required for the test are computed. Statistical power of the test is assessed through simulation study with varying sample sizes and varying censoring percentages. The working of test is illustrated by applying it to a real-life data set.

本文提出了一种非参数检验方法，用于在随机删失数据的双样本情况下检验量表参数。随机删减数据大多出现在临床研究中，其中一些个体经历了感兴趣的事件（死亡）；一些个体退出或失去随访，还有一些个体在研究结束时仍然存活。通过与现有的一些检验方法在渐近相对效率方面的比较，对该检验方法的性能进行了研究。计算了检验所需的临界值。通过不同样本量和不同删减百分比的模拟研究，评估了检验的统计功率。通过将检验应用于现实生活中的数据集，说明了检验的工作原理。

引用次数: 0

Sentiment Analysis using Dictionary-Based Lexicon Approach: Analysis on the Opinion of Indian Community for the Topic of Cryptocurrency 使用基于词典的情感分析方法：印度社区对加密货币话题的看法分析

Q1 Decision Sciences

Annals of Data Science

Pub Date : 2023-10-25 DOI: 10.1007/s40745-023-00496-y

Sankalp Loomba, Madhavi Dave, Harshal Arolkar, Sachin Sharma

Due to the ever-increasing computing power and easy availability, social-networking platforms like Facebook, Twitter, etc. have become a popular medium to express one’s views instantly, be it about political situations, commercial products, or social occurrences. Twitter is a powerful source of information, whose data can be utilized to investigate the opinions of users through a process called Opinion Mining or Sentiment Analysis. Using the principles of Natural Language Processing and data science, this paper presents a comparative evaluation of multiple lexicon-based sentiment analysis algorithms to extract public opinion from tweets. The study explores the nuances of sentiment analysis using data science methodology, assessing how various lexicon-based algorithms may successfully identify and classify sentiments expressed in tweets from the Indian community about cryptocurrency.

由于计算能力的不断提高和使用的便捷性，Facebook、Twitter 等社交网络平台已成为即时表达个人观点的流行媒介，无论是关于政治局势、商业产品还是社会事件。Twitter 是一个强大的信息源，其数据可以通过一个名为 "观点挖掘 "或 "情感分析 "的过程来调查用户的观点。本文利用自然语言处理和数据科学的原理，对多种基于词库的情感分析算法进行了比较评估，以从推文中提取民意。该研究利用数据科学方法探讨了情感分析的细微差别，评估了各种基于词典的算法如何成功识别和分类印度社区关于加密货币的推文中表达的情感。

引用次数: 0

On Discrete Mixture of Moment Exponential Using Lagrangian Probability Model: Properties and Applications in Count Data with Excess Zeros 论使用拉格朗日概率模型的离散矩指数混合：有多余零点的计数数据的性质和应用

Q1 Decision Sciences

Annals of Data Science

Pub Date : 2023-10-13 DOI: 10.1007/s40745-023-00498-w

Mohanan Monisha, Damodaran Santhamani Shibu

In this paper, we introduce a new distribution for modeling count datasets with some unique characteristics, obtained by mixing the generalized Poisson distribution and the moment exponential distribution based on the framework of the Lagrangian probability distribution, so-called generalized Poisson moment exponential distribution (GPMED). It is shown that the Poisson-moment exponential and Poisson-Ailamujia distributions are special cases of the GPMED. Some important mathematical properties of the GPMED, including median, mode and non-central moment are also discussed through this paper. It is shown that the moment of the GPMED do not exist in some situations and have increasing, decreasing, and upside-down bathtub shaped hazard rates. The maximum likelihood method has been discussed for estimating its parameters. The likelihood ratio test is used to assess the effectiveness of the additional parameter included in the GPMED. The behaviour of these estimators is assessed using simulation study based on the inverse tranformation method. A zero-inflated version of the GPMED is also defined for the situation with an excessive number of zeros in the datasets. Applications of the GPMED and zero-inflated GPMED in various fields are presented and compared with some other existing distributions. In general, the GPMED or its zero-inflated version performs better than the other models, especially for the cases where the data are highly skewed or excessive number of zeros.

本文基于拉格朗日概率分布的框架，通过混合广义泊松分布和矩形指数分布，引入了一种用于计数数据集建模的新分布，即广义泊松矩形指数分布（GPMED）。研究表明，泊松矩指数分布和泊松-艾拉穆贾分布是 GPMED 的特例。本文还讨论了 GPMED 的一些重要数学性质，包括中位数、模式和非中心矩。结果表明，GPMED 的矩在某些情况下并不存在，并且具有递增、递减和倒置的浴缸形危险率。本文讨论了估计其参数的最大似然法。似然比检验用于评估 GPMED 中附加参数的有效性。使用基于反变换方法的模拟研究评估了这些估计器的行为。此外，还针对数据集中零点数量过多的情况定义了 GPMED 的零膨胀版本。介绍了 GPMED 和零膨胀 GPMED 在不同领域的应用，并与其他一些现有分布进行了比较。一般来说，GPMED 或其零膨胀版本的表现优于其他模型，尤其是在数据高度倾斜或零数量过多的情况下。

{"title":"On Discrete Mixture of Moment Exponential Using Lagrangian Probability Model: Properties and Applications in Count Data with Excess Zeros","authors":"Mohanan Monisha, Damodaran Santhamani Shibu","doi":"10.1007/s40745-023-00498-w","DOIUrl":"10.1007/s40745-023-00498-w","url":null,"abstract":"<div><p>In this paper, we introduce a new distribution for modeling count datasets with some unique characteristics, obtained by mixing the generalized Poisson distribution and the moment exponential distribution based on the framework of the Lagrangian probability distribution, so-called generalized Poisson moment exponential distribution (GPMED). It is shown that the Poisson-moment exponential and Poisson-Ailamujia distributions are special cases of the GPMED. Some important mathematical properties of the GPMED, including median, mode and non-central moment are also discussed through this paper. It is shown that the moment of the GPMED do not exist in some situations and have increasing, decreasing, and upside-down bathtub shaped hazard rates. The maximum likelihood method has been discussed for estimating its parameters. The likelihood ratio test is used to assess the effectiveness of the additional parameter included in the GPMED. The behaviour of these estimators is assessed using simulation study based on the inverse tranformation method. A zero-inflated version of the GPMED is also defined for the situation with an excessive number of zeros in the datasets. Applications of the GPMED and zero-inflated GPMED in various fields are presented and compared with some other existing distributions. In general, the GPMED or its zero-inflated version performs better than the other models, especially for the cases where the data are highly skewed or excessive number of zeros.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 6","pages":"2035 - 2057"},"PeriodicalIF":0.0,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135858861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Research on Wind Turbine Fault Diagnosis Method Realized by Vibration Monitoring 通过振动监测实现风力涡轮机故障诊断方法研究

Q1 Decision Sciences

Annals of Data Science

Pub Date : 2023-10-10 DOI: 10.1007/s40745-023-00497-x

Xiuhua Jiang

Wind energy is one of the fast evolving renewable energy sources that has seen widespread application. Therefore, research on its carrier, the wind turbine, is growing, and the majority of them concentrate on the diagnosis of wind turbine faults. In this paper, the vibration signals collected in the time domain by vibration monitoring were analyzed, and the fault characteristic parameters were identified. These parameters were then inputted into a genetic algorithm back-propagation neural network (GA-BPNN) for wind turbine fault diagnosis. It was found that the presence of defects in the wind turbine depended on the effective value, peak value, and kurtosis of the vibration signal. The overall recognition accuracy of the GA-BPNN was 94.89%, which was much higher than that of the support vector machine (88.7%) and random forest (88.35%). Therefore, it is feasible and highly accurate to extract fault characteristic parameters through vibration monitoring and input them into a GA-BPNN for wind turbine fault diagnosis.

风能是快速发展的可再生能源之一，已得到广泛应用。因此，对其载体--风力发电机的研究也日益增多，其中大部分研究集中在风力发电机的故障诊断上。本文分析了通过振动监测收集到的时域振动信号，并确定了故障特征参数。然后将这些参数输入用于风力发电机故障诊断的遗传算法反向传播神经网络（GA-BPNN）。研究发现，风力发电机是否存在缺陷取决于振动信号的有效值、峰值和峰度。GA-BPNN 的总体识别准确率为 94.89%，远高于支持向量机（88.7%）和随机森林（88.35%）。因此，通过振动监测提取故障特征参数并输入 GA-BPNN 进行风力发电机组故障诊断是可行的，且准确率较高。

引用次数: 0

Prediction of Various Job Opportunities in IT Companies Using Enhanced Integrated Gated Recurrent Unit (EIGRU) 使用增强型集成门控循环单元（EIGRU）预测 IT 公司的各种工作机会

Q1 Decision Sciences

Annals of Data Science

Pub Date : 2023-10-08 DOI: 10.1007/s40745-023-00495-z

R. Santhosh Kumar, N. Prakash

The fresh engineering graduates are looking only for the popular jobs where the competition is high and the number of job openings is minimal, but they fail to look for the other job openings. The major problem is that the graduates fail to look at the number of requirements needed for a job role in the present and future. So there is a need for a prediction model that provides the number of job opportunities in a job role in the future. Many research studies have been carried out to predict the placement status of students, but they have not predicted the number of job opportunities in a job role. Many existing prediction models focus on improving prediction accuracy but fail to consider the handling of data fluctuations. When there is a data fluctuation, the predicted value deviates from the actual value. This paper presents a hybrid time-series prediction model called the enhanced integrated gated recurrent unit (EIGRU) Model to predict the number of job opportunities in a job role based on the company, salary, and experience. The proposed EIGRU model tries to minimize the divergence in the predicted value. The proposed time series prediction model is achieving a prediction accuracy of 98%. Based on the experimental evaluation of the Job dataset, the proposed model’s mean absolute percentage error and mean absolute error values are lower than the baseline models. As a result, the graduates will know about the number of job opportunities in their job role and make an effective decision.

应届工科毕业生只寻找竞争激烈、职位空缺少的热门工作，却不去寻找其他职位空缺。主要的问题是，毕业生们没有考虑到现在和未来工作角色所需的要求数量。因此，有必要建立一个预测模型，提供未来某个工作角色的工作机会数量。目前已经开展了许多研究来预测学生的就业状况，但却没有预测工作角色的工作机会数量。现有的许多预测模型都侧重于提高预测准确率，但却没有考虑到数据波动的处理。当数据出现波动时，预测值就会偏离实际值。本文提出了一种混合时间序列预测模型，称为增强型集成门控循环单元（EIGRU）模型，用于根据公司、薪资和经验预测某个工作角色的工作机会数量。所提出的 EIGRU 模型试图最大限度地减少预测值的偏差。所提出的时间序列预测模型的预测准确率达到了 98%。根据对工作数据集的实验评估，建议模型的平均绝对百分比误差和平均绝对误差值均低于基准模型。因此，毕业生可以了解其工作角色的工作机会数量，并做出有效决策。

{"title":"Prediction of Various Job Opportunities in IT Companies Using Enhanced Integrated Gated Recurrent Unit (EIGRU)","authors":"R. Santhosh Kumar, N. Prakash","doi":"10.1007/s40745-023-00495-z","DOIUrl":"10.1007/s40745-023-00495-z","url":null,"abstract":"<div><p>The fresh engineering graduates are looking only for the popular jobs where the competition is high and the number of job openings is minimal, but they fail to look for the other job openings. The major problem is that the graduates fail to look at the number of requirements needed for a job role in the present and future. So there is a need for a prediction model that provides the number of job opportunities in a job role in the future. Many research studies have been carried out to predict the placement status of students, but they have not predicted the number of job opportunities in a job role. Many existing prediction models focus on improving prediction accuracy but fail to consider the handling of data fluctuations. When there is a data fluctuation, the predicted value deviates from the actual value. This paper presents a hybrid time-series prediction model called the enhanced integrated gated recurrent unit (EIGRU) Model to predict the number of job opportunities in a job role based on the company, salary, and experience. The proposed EIGRU model tries to minimize the divergence in the predicted value. The proposed time series prediction model is achieving a prediction accuracy of 98%. Based on the experimental evaluation of the Job dataset, the proposed model’s mean absolute percentage error and mean absolute error values are lower than the baseline models. As a result, the graduates will know about the number of job opportunities in their job role and make an effective decision.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 6","pages":"2001 - 2018"},"PeriodicalIF":0.0,"publicationDate":"2023-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135197829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing Diabetic Retinopathy Diagnosis with ResNet-50-Based Transfer Learning: A Promising Approach 利用基于 ResNet-50 的迁移学习加强糖尿病视网膜病变诊断：一种可行的方法

Q1 Decision Sciences

Annals of Data Science

Pub Date : 2023-09-19 DOI: 10.1007/s40745-023-00494-0

S. Karthika, M. Durgadevi, T. Yamuna Rani

Diabetic retinopathy is considered the leading cause of blindness in the population. High blood sugar levels can damage the tiny blood vessels in the retina at any time, leading to retinal detachment and sometimes glaucoma blindness. Treatment involves maintaining the current visual quality of the patient, as the disease is irreversible. Early diagnosis and timely treatment are crucial to minimizing the risk of vision loss. However, existing DR recognition strategies face numerous challenges, such as limited training datasets, high training loss, high-dimensional features, and high misclassification rates, which can significantly affect classification accuracies. In this paper, we propose a ResNet-50-based transfer learning method for classifying DR, which leverages the knowledge and expertise gained from training on a large dataset such as ImageNet. Our method involves preprocessing and segmenting the input images, which are then fed into ResNet-50 for extracting optimal features. We freeze a few layers of the pre-trained ResNet-50 and add Global Average Pooling to generate feature maps. The reduced feature maps are then classified to categorize the type of diabetic retinopathy. We evaluated the proposed method on 40 Real-time fundus images gathered from ICF Hospital together with the APTOS-2019 dataset and used various metrics to evaluate its performance. The experimentation results revealed that the proposed method achieved an accuracy of 99.82%, a sensitivity of 99%, a specificity of 96%, and an AUC score of 0.99 compared to existing DR recognition techniques. Overall, our ResNet-50-based transfer learning method presents a promising approach for DR classification and addresses the existing challenges of DR recognition strategies. It has the potential to aid in early DR diagnosis, leading to timely treatment and improved visual outcomes for patients.

糖尿病视网膜病变被认为是导致人口失明的主要原因。高血糖可随时损伤视网膜上的微小血管，导致视网膜脱落，有时甚至导致青光眼性失明。由于这种疾病是不可逆的，因此治疗包括维持患者目前的视觉质量。早期诊断和及时治疗对于最大限度地降低视力丧失的风险至关重要。然而，现有的 DR 识别策略面临着诸多挑战，如训练数据集有限、训练损失大、高维特征和高误判率等，这些都会严重影响分类准确性。在本文中，我们提出了一种基于 ResNet-50 的迁移学习方法来对 DR 进行分类，该方法充分利用了在 ImageNet 等大型数据集上训练所获得的知识和专业技能。我们的方法包括对输入图像进行预处理和分割，然后将其输入 ResNet-50 以提取最佳特征。我们冻结了几层预训练的 ResNet-50，并添加了全局平均池化技术来生成特征图。然后对缩减后的特征图进行分类，以确定糖尿病视网膜病变的类型。我们在从 ICF 医院收集的 40 幅实时眼底图像和 APTOS-2019 数据集上评估了所提出的方法，并使用各种指标来评价其性能。实验结果表明，与现有的 DR 识别技术相比，所提出的方法达到了 99.82% 的准确率、99% 的灵敏度、96% 的特异性和 0.99 的 AUC 分数。总之，我们基于 ResNet-50 的迁移学习方法为 DR 分类提供了一种前景广阔的方法，并解决了 DR 识别策略所面临的现有挑战。它有望帮助早期 DR 诊断，从而为患者提供及时的治疗和更好的视觉效果。

{"title":"Enhancing Diabetic Retinopathy Diagnosis with ResNet-50-Based Transfer Learning: A Promising Approach","authors":"S. Karthika, M. Durgadevi, T. Yamuna Rani","doi":"10.1007/s40745-023-00494-0","DOIUrl":"10.1007/s40745-023-00494-0","url":null,"abstract":"<div><p>Diabetic retinopathy is considered the leading cause of blindness in the population. High blood sugar levels can damage the tiny blood vessels in the retina at any time, leading to retinal detachment and sometimes glaucoma blindness. Treatment involves maintaining the current visual quality of the patient, as the disease is irreversible. Early diagnosis and timely treatment are crucial to minimizing the risk of vision loss. However, existing DR recognition strategies face numerous challenges, such as limited training datasets, high training loss, high-dimensional features, and high misclassification rates, which can significantly affect classification accuracies. In this paper, we propose a ResNet-50-based transfer learning method for classifying DR, which leverages the knowledge and expertise gained from training on a large dataset such as ImageNet. Our method involves preprocessing and segmenting the input images, which are then fed into ResNet-50 for extracting optimal features. We freeze a few layers of the pre-trained ResNet-50 and add Global Average Pooling to generate feature maps. The reduced feature maps are then classified to categorize the type of diabetic retinopathy. We evaluated the proposed method on 40 Real-time fundus images gathered from ICF Hospital together with the APTOS-2019 dataset and used various metrics to evaluate its performance. The experimentation results revealed that the proposed method achieved an accuracy of 99.82%, a sensitivity of 99%, a specificity of 96%, and an AUC score of 0.99 compared to existing DR recognition techniques. Overall, our ResNet-50-based transfer learning method presents a promising approach for DR classification and addresses the existing challenges of DR recognition strategies. It has the potential to aid in early DR diagnosis, leading to timely treatment and improved visual outcomes for patients.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 1","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135014170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0