首页 > 最新文献

Annals of Data Science最新文献

英文 中文
Utilization of Priori Information in the Estimation of Population Mean for Time-Based Surveys 基于时间的调查中先验信息在人口均值估计中的应用
Q1 Decision Sciences Pub Date : 2023-06-05 DOI: 10.1007/s40745-023-00472-6
Sanjay Kumar, Priyanka Chhaparwal

Use of a priori information is very common at an estimation stage to form an estimator of a population parameter. Estimation problems can lead to more accurate and efficient estimates using prior information. In this study, we utilized the information from the past surveys along with the information available from the current surveys in the form of a hybrid exponentially weighted moving average to suggest the estimator of the population mean using a known coefficient of variation of the study variable for time-based surveys. We derived the expression of the mean square error of the suggested estimator and established the mathematical conditions to prove the efficiency of the suggested estimator. The results showed that the utilization of information from past surveys and current surveys excels the estimator's efficiency. A simulation study and a real-life example are provided to support using the suggested estimator.

在估算阶段,使用先验信息来形成人口参数的估算值是非常常见的。利用先验信息可以更准确、更有效地估计估计值,从而解决估计问题。在本研究中,我们以混合指数加权移动平均法的形式,利用过去调查的信息和当前调查的信息,通过已知的研究变量变异系数,为基于时间的调查提出了人口平均值的估计值。我们推导出了建议估计器的均方误差表达式,并建立了数学条件来证明建议估计器的效率。结果表明,利用过去调查和当前调查的信息可以提高估计器的效率。研究还提供了一个模拟研究和一个实际案例,以支持使用所建议的估计器。
{"title":"Utilization of Priori Information in the Estimation of Population Mean for Time-Based Surveys","authors":"Sanjay Kumar,&nbsp;Priyanka Chhaparwal","doi":"10.1007/s40745-023-00472-6","DOIUrl":"10.1007/s40745-023-00472-6","url":null,"abstract":"<div><p>Use of a priori information is very common at an estimation stage to form an estimator of a population parameter. Estimation problems can lead to more accurate and efficient estimates using prior information. In this study, we utilized the information from the past surveys along with the information available from the current surveys in the form of a hybrid exponentially weighted moving average to suggest the estimator of the population mean using a known coefficient of variation of the study variable for time-based surveys. We derived the expression of the mean square error of the suggested estimator and established the mathematical conditions to prove the efficiency of the suggested estimator. The results showed that the utilization of information from past surveys and current surveys excels the estimator's efficiency. A simulation study and a real-life example are provided to support using the suggested estimator.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 5","pages":"1675 - 1685"},"PeriodicalIF":0.0,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45425769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LADDERS: Log Based Anomaly Detection and Diagnosis for Enterprise Systems 梯子:基于日志的企业系统异常检测和诊断
Q1 Decision Sciences Pub Date : 2023-06-04 DOI: 10.1007/s40745-023-00471-7
Sakib A. Mondal, Prashanth Rv, Sagar Rao, Arun Menon

Enterprise software can fail due to not only malfunction of application servers, but also due to performance degradation or non-availability of other servers or middle layers. Consequently, valuable time and resources are wasted in trying to identify the root cause of software failures. To address this, we have developed a framework called LADDERS. In LADDERS, anomalous incidents are detected from log events generated by various systems and KPIs (Key Performance Indicators) through an ensemble of supervised and unsupervised models. Without transaction identifiers, it is not possible to relate various events from different systems. LADDERS implements Recursive Parallel Causal Discovery (RPCD) to establish causal relationships among log events. The framework builds coresets using BICO to manage high volumes of log data during training and inferencing. An anomaly can cause a number of anomalies throughout the systems. LADDERS makes use of RPCD again to discover causal relationships among these anomalous events. Probable root causes are revealed from the causal graph and anomaly rating of events using a k-shortest path algorithm. We evaluated LADDERS using live logs from an enterprise system. The results demonstrate its effectiveness and efficiency for anomaly detection.

企业软件出现故障的原因不仅包括应用服务器的故障,还包括其他服务器或中间层的性能下降或不可用。因此,宝贵的时间和资源都浪费在了试图找出软件故障的根本原因上。为了解决这个问题,我们开发了一个名为 LADDERS 的框架。在 LADDERS 中,我们通过一组监督和非监督模型,从各种系统和 KPI(关键性能指标)生成的日志事件中检测异常事件。如果没有事务标识符,就无法将来自不同系统的各种事件联系起来。LADDERS 实现了递归并行因果发现(RPCD),以建立日志事件之间的因果关系。该框架使用 BICO 构建核心集,以便在训练和推断过程中管理大量日志数据。一个异常可能会导致整个系统出现一系列异常。LADDERS 再次利用 RPCD 发现这些异常事件之间的因果关系。利用 k 最短路径算法,从因果图和异常事件评级中揭示出可能的根本原因。我们使用企业系统的实时日志对 LADDERS 进行了评估。结果证明了它在异常检测方面的有效性和效率。
{"title":"LADDERS: Log Based Anomaly Detection and Diagnosis for Enterprise Systems","authors":"Sakib A. Mondal,&nbsp;Prashanth Rv,&nbsp;Sagar Rao,&nbsp;Arun Menon","doi":"10.1007/s40745-023-00471-7","DOIUrl":"10.1007/s40745-023-00471-7","url":null,"abstract":"<div><p>Enterprise software can fail due to not only malfunction of application servers, but also due to performance degradation or non-availability of other servers or middle layers. Consequently, valuable time and resources are wasted in trying to identify the root cause of software failures. To address this, we have developed a framework called LADDERS. In LADDERS, anomalous incidents are detected from log events generated by various systems and KPIs (Key Performance Indicators) through an ensemble of supervised and unsupervised models. Without transaction identifiers, it is not possible to relate various events from different systems. LADDERS implements Recursive Parallel Causal Discovery (RPCD) to establish causal relationships among log events. The framework builds coresets using BICO to manage high volumes of log data during training and inferencing. An anomaly can cause a number of anomalies throughout the systems. LADDERS makes use of RPCD again to discover causal relationships among these anomalous events. Probable root causes are revealed from the causal graph and anomaly rating of events using a k-shortest path algorithm. We evaluated LADDERS using live logs from an enterprise system. The results demonstrate its effectiveness and efficiency for anomaly detection.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 4","pages":"1165 - 1183"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46232475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Jump-Drop Adjusted Prediction of Cumulative Infected Cases Using the Modified SIS Model 使用修改后的 SIS 模型对累计感染病例进行跳跃式下降调整预测
Q1 Decision Sciences Pub Date : 2023-05-15 DOI: 10.1007/s40745-023-00467-3
Rashi Mohta, Sravya Prathapani, Palash Ghosh

Accurate prediction of cumulative COVID-19 infected cases is essential for effectively managing the limited healthcare resources in India. Historically, epidemiological models have helped in controlling such epidemics. Models require accurate historical data to predict future outcomes. In our data, there were days exhibiting erratic, apparently anomalous jumps and drops in the number of daily reported COVID-19 infected cases that did not conform with the overall trend. Including those observations in the training data would most likely worsen model predictive accuracy. However, with existing epidemiological models it is not straightforward to determine, for a specific day, whether or not an outcome should be considered anomalous. In this work, we propose an algorithm to automatically identify anomalous ‘jump’ and ‘drop’ days, and then based upon the overall trend, the number of daily infected cases for those days is adjusted and the training data is amended using the adjusted observations. We applied the algorithm in conjunction with a recently proposed, modified Susceptible-Infected-Susceptible (SIS) model to demonstrate that prediction accuracy is improved after adjusting training data counts for apparent erratic anomalous jumps and drops.

准确预测 COVID-19 的累积感染病例对于有效管理印度有限的医疗资源至关重要。从历史上看,流行病学模型有助于控制此类流行病。模型需要准确的历史数据来预测未来的结果。在我们的数据中,有几天每天报告的 COVID-19 感染病例数出现了不稳定、明显反常的跳跃和下降,这与总体趋势不符。将这些观测数据纳入训练数据很可能会降低模型的预测准确性。然而,在现有的流行病学模型中,并不能直接确定某一天的结果是否应被视为异常。在这项工作中,我们提出了一种自动识别异常 "跳跃 "日和 "下降 "日的算法,然后根据总体趋势调整这些日子的每日感染病例数,并使用调整后的观测数据修正训练数据。我们将该算法与最近提出的经过修改的易感-感染-易感(SIS)模型结合起来使用,证明在针对明显不规则的异常跳跃和下降调整训练数据计数后,预测的准确性得到了提高。
{"title":"Jump-Drop Adjusted Prediction of Cumulative Infected Cases Using the Modified SIS Model","authors":"Rashi Mohta,&nbsp;Sravya Prathapani,&nbsp;Palash Ghosh","doi":"10.1007/s40745-023-00467-3","DOIUrl":"10.1007/s40745-023-00467-3","url":null,"abstract":"<div><p>Accurate prediction of cumulative COVID-19 infected cases is essential for effectively managing the limited healthcare resources in India. Historically, epidemiological models have helped in controlling such epidemics. Models require accurate historical data to predict future outcomes. In our data, there were days exhibiting erratic, apparently anomalous jumps and drops in the number of daily reported COVID-19 infected cases that did not conform with the overall trend. Including those observations in the training data would most likely worsen model predictive accuracy. However, with existing epidemiological models it is not straightforward to determine, for a specific day, whether or not an outcome should be considered anomalous. In this work, we propose an algorithm to automatically identify anomalous ‘jump’ and ‘drop’ days, and then based upon the overall trend, the number of daily infected cases for those days is adjusted and the training data is amended using the adjusted observations. We applied the algorithm in conjunction with a recently proposed, modified Susceptible-Infected-Susceptible (SIS) model to demonstrate that prediction accuracy is improved after adjusting training data counts for apparent erratic anomalous jumps and drops.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 3","pages":"959 - 978"},"PeriodicalIF":0.0,"publicationDate":"2023-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135086225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel Stock Trading Model based on Reinforcement Learning and Technical Analysis 一种基于强化学习和技术分析的股票交易模型
Q1 Decision Sciences Pub Date : 2023-05-11 DOI: 10.1007/s40745-023-00469-1
Zahra Pourahmadi, Dariush Fareed, Hamid Reza Mirzaei

This study investigates the potential of using reinforcement learning (RL) to establish a financial trading system (FTS), taking into account the main constraint imposed by the stock market, e.g., transaction costs. More specifically, this paper shows the inferior performance of the pure reinforcement learning model when it is applied in a multi-dimensional and noisy stock market environment. To solve this problem and to get a practical and reasonable trading strategies process, a modified RL model is proposed based on the actor-critic method where we have amended the actor by incorporating three metrics from technical analysis. The results show significant improvement compared with traditional trading strategies. The reliability of the model is verified by experimental results on financial data (S&P500 index) and a fair evaluation of the proposed method and pure RL and three benchmarks is demonstrated. Statistical analysis proves that a combination of a) technical analysis (role-based strategies) and b) RL (machine learning strategies) and c) restricting the action of the RL policy network with a few realistic conditions results in trading decisions with higher investment return rates.

本研究探讨了使用强化学习(RL)建立金融交易系统(FTS)的潜力,同时考虑到股票市场的主要限制因素,如交易成本。更具体地说,本文展示了纯强化学习模型在多维度、高噪声的股市环境中应用时的劣势表现。为了解决这一问题,并获得实用合理的交易策略流程,我们提出了一种基于行为者批判方法的修正 RL 模型。结果表明,与传统交易策略相比,该模型有了明显改善。金融数据(S&P500 指数)的实验结果验证了该模型的可靠性,并对所提出的方法和纯 RL 以及三个基准进行了公平评估。统计分析证明,将 a) 技术分析(基于角色的策略)和 b) RL(机器学习策略)相结合,以及 c) 用一些现实条件限制 RL 策略网络的作用,可以做出投资回报率更高的交易决策。
{"title":"A Novel Stock Trading Model based on Reinforcement Learning and Technical Analysis","authors":"Zahra Pourahmadi,&nbsp;Dariush Fareed,&nbsp;Hamid Reza Mirzaei","doi":"10.1007/s40745-023-00469-1","DOIUrl":"10.1007/s40745-023-00469-1","url":null,"abstract":"<div><p>This study investigates the potential of using reinforcement learning (RL) to establish a financial trading system (FTS), taking into account the main constraint imposed by the stock market, e.g., transaction costs. More specifically, this paper shows the inferior performance of the pure reinforcement learning model when it is applied in a multi-dimensional and noisy stock market environment. To solve this problem and to get a practical and reasonable trading strategies process, a modified RL model is proposed based on the actor-critic method where we have amended the actor by incorporating three metrics from technical analysis. The results show significant improvement compared with traditional trading strategies. The reliability of the model is verified by experimental results on financial data (S&amp;P500 index) and a fair evaluation of the proposed method and pure RL and three benchmarks is demonstrated. Statistical analysis proves that a combination of a) technical analysis (role-based strategies) and b) RL (machine learning strategies) and c) restricting the action of the RL policy network with a few realistic conditions results in trading decisions with higher investment return rates.\u0000</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 5","pages":"1653 - 1674"},"PeriodicalIF":0.0,"publicationDate":"2023-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49174695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Platform Resource Scheduling Method Based on Branch-and-Bound and Genetic Algorithm 基于分支定界和遗传算法的平台资源调度方法
Q1 Decision Sciences Pub Date : 2023-05-11 DOI: 10.1007/s40745-023-00470-8
Yanfen Zhang, Jinyao Ma, Haibin Zhang, Bin Yue

Platform resource scheduling is an operational research optimization problem of matching tasks and platform resources, which has important applications in production or marketing arrangement layout, combat task planning, etc. The existing algorithms are inflexible in task planning sequence and have poor stability. Aiming at this defect, the branch-and-bound algorithm is combined with the genetic algorithm in this paper. Branch-and-bound algorithm can adaptively adjust the next task to be planned and calculate a variety of feasible task planning sequences. Genetic algorithm is used to assign a platform combination to the selected task. Besides, we put forward a new lower bound calculation method and pruning rule. On the basis of the processing time of the direct successor tasks, the influence of the resource requirements of tasks on the priority of tasks is considered. Numerical experiments show that the proposed algorithm has good performance in platform resource scheduling problem.

平台资源调度是一个任务与平台资源匹配的运筹学优化问题,在生产或营销安排布局、作战任务规划等方面有重要应用。现有算法在任务规划序列上不灵活,稳定性差。针对这一缺陷,本文将分枝定界算法与遗传算法相结合。分枝定界算法可以自适应地调整下一个要计划的任务,并计算出各种可行的任务计划序列。遗传算法用于为所选任务分配平台组合。此外,我们还提出了一种新的下界计算方法和修剪规则。在直接后续任务处理时间的基础上,考虑了任务的资源需求对任务优先级的影响。数值实验表明,该算法在平台资源调度问题上具有良好的性能。
{"title":"Platform Resource Scheduling Method Based on Branch-and-Bound and Genetic Algorithm","authors":"Yanfen Zhang,&nbsp;Jinyao Ma,&nbsp;Haibin Zhang,&nbsp;Bin Yue","doi":"10.1007/s40745-023-00470-8","DOIUrl":"10.1007/s40745-023-00470-8","url":null,"abstract":"<div><p>Platform resource scheduling is an operational research optimization problem of matching tasks and platform resources, which has important applications in production or marketing arrangement layout, combat task planning, etc. The existing algorithms are inflexible in task planning sequence and have poor stability. Aiming at this defect, the branch-and-bound algorithm is combined with the genetic algorithm in this paper. Branch-and-bound algorithm can adaptively adjust the next task to be planned and calculate a variety of feasible task planning sequences. Genetic algorithm is used to assign a platform combination to the selected task. Besides, we put forward a new lower bound calculation method and pruning rule. On the basis of the processing time of the direct successor tasks, the influence of the resource requirements of tasks on the priority of tasks is considered. Numerical experiments show that the proposed algorithm has good performance in platform resource scheduling problem.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"10 5","pages":"1421 - 1445"},"PeriodicalIF":0.0,"publicationDate":"2023-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43159033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Estimation of Multiple Covariate of Autoregressive (MC-AR) Model 自回归(MC-AR)模型多协变量的贝叶斯估计
Q1 Decision Sciences Pub Date : 2023-05-04 DOI: 10.1007/s40745-023-00468-2
Jitendra Kumar, Ashok Kumar, Varun Agiwal

In present scenario, handling covariate/explanatory variable with the model is one of most important factor to study with the models. The main advantages of covariate are it’s dependency on past observations. So, study variable is modelled after explaining both on own past and past and future observation of covariates. Present paper deals estimation of parameters of autoregressive model with multiple covariates under Bayesian approach. A simulation and empirical study is performed to check the applicability of the model and recorded the better results.

在当前情况下,用模型处理协变量/解释变量是研究模型的最重要因素之一。协变量的主要优点是它依赖于过去的观测数据。因此,研究变量是在解释了自身的过去以及协变量的过去和未来观测值之后建立模型的。本文采用贝叶斯方法对带有多个协变量的自回归模型的参数进行估计。为了检验模型的适用性,本文进行了模拟和实证研究,并记录了较好的结果。
{"title":"Bayesian Estimation of Multiple Covariate of Autoregressive (MC-AR) Model","authors":"Jitendra Kumar,&nbsp;Ashok Kumar,&nbsp;Varun Agiwal","doi":"10.1007/s40745-023-00468-2","DOIUrl":"10.1007/s40745-023-00468-2","url":null,"abstract":"<div><p>In present scenario, handling covariate/explanatory variable with the model is one of most important factor to study with the models. The main advantages of covariate are it’s dependency on past observations. So, study variable is modelled after explaining both on own past and past and future observation of covariates. Present paper deals estimation of parameters of autoregressive model with multiple covariates under Bayesian approach. A simulation and empirical study is performed to check the applicability of the model and recorded the better results.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 4","pages":"1291 - 1301"},"PeriodicalIF":0.0,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47960675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bayes Analysis of Random Walk Model Under Different Error Assumptions 不同误差假设下随机漫步模型的贝叶斯分析
Q1 Decision Sciences Pub Date : 2023-04-22 DOI: 10.1007/s40745-023-00465-5
Praveen Kumar Tripathi, Manika Agarwal

In this paper, the Bayesian analyses for the random walk models have been performed under the assumptions of normal distribution, log-normal distribution and the stochastic volatility model, for the error component, one by one. For the various parameters, in each model, some suitable choices of informative and non-informative priors have been made and the posterior distributions are calculated. For the first two choices of error distribution, the posterior samples are easily obtained by using the gamma generating routine in R software. For a random walk model, having stochastic volatility error, the Gibbs sampling with intermediate independent Metropolis–Hastings steps is employed to obtain the desired posterior samples. The whole procedure is numerically illustrated through a real data set of crude oil prices from April 2014 to March 2022. The models are, then, compared on the basis of their accuracies in forecasting the true values. Among the other choices, the random walk model with stochastic volatile errors outperformed for the data in hand.

本文在正态分布、对数正态分布和随机波动模型的假设条件下,对随机漫步模型的误差分量逐一进行了贝叶斯分析。对于每个模型中的各种参数,我们都选择了合适的信息先验和非信息先验,并计算了后验分布。对于误差分布的前两种选择,使用 R 软件中的伽玛生成例程可以轻松获得后验样本。对于具有随机波动误差的随机漫步模型,则采用具有中间独立 Metropolis-Hastings 步骤的 Gibbs 采样来获得所需的后验样本。整个过程通过 2014 年 4 月至 2022 年 3 月原油价格的真实数据集进行了数值说明。然后,根据模型预测真实值的准确性对其进行比较。在其他选择中,具有随机波动误差的随机漫步模型对当前数据的预测效果更好。
{"title":"A Bayes Analysis of Random Walk Model Under Different Error Assumptions","authors":"Praveen Kumar Tripathi,&nbsp;Manika Agarwal","doi":"10.1007/s40745-023-00465-5","DOIUrl":"10.1007/s40745-023-00465-5","url":null,"abstract":"<div><p>In this paper, the Bayesian analyses for the random walk models have been performed under the assumptions of normal distribution, log-normal distribution and the stochastic volatility model, for the error component, one by one. For the various parameters, in each model, some suitable choices of informative and non-informative priors have been made and the posterior distributions are calculated. For the first two choices of error distribution, the posterior samples are easily obtained by using the gamma generating routine in R software. For a random walk model, having stochastic volatility error, the Gibbs sampling with intermediate independent Metropolis–Hastings steps is employed to obtain the desired posterior samples. The whole procedure is numerically illustrated through a real data set of crude oil prices from April 2014 to March 2022. The models are, then, compared on the basis of their accuracies in forecasting the true values. Among the other choices, the random walk model with stochastic volatile errors outperformed for the data in hand.\u0000</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 5","pages":"1635 - 1652"},"PeriodicalIF":0.0,"publicationDate":"2023-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47611888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Count Regression and Machine Learning Techniques for Zero-Inflated Overdispersed Count Data: Application to Ecological Data 零膨胀过分散计数数据的计数回归和机器学习技术:在生态数据中的应用
Q1 Decision Sciences Pub Date : 2023-04-13 DOI: 10.1007/s40745-023-00464-6
Bonelwa Sidumo, Energy Sonono, Isaac Takaidza

The aim of this study is to investigate the overdispersion problem that is rampant in ecological count data. In order to explore this problem, we consider the most commonly used count regression models: the Poisson, the negative binomial, the zero-inflated Poisson and the zero-inflated negative binomial models. The performance of these count regression models is compared with the four proposed machine learning (ML) regression techniques: random forests, support vector machines, (k-)nearest neighbors and artificial neural networks. The mean absolute error was used to compare the performance of count regression models and ML regression models. The results suggest that ML regression models perform better compared to count regression models. The performance shown by ML regression techniques is a motivation for further research in improving methods and applications in ecological studies.

本研究旨在探讨生态计数数据中普遍存在的过度分散问题。为了探讨这个问题,我们考虑了最常用的计数回归模型:泊松模型、负二项模型、零膨胀泊松模型和零膨胀负二项模型。这些计数回归模型的性能与所提出的四种机器学习(ML)回归技术进行了比较:随机森林、支持向量机、(k-)近邻和人工神经网络。使用平均绝对误差来比较计数回归模型和 ML 回归模型的性能。结果表明,与计数回归模型相比,ML 回归模型的性能更好。ML 回归技术所显示的性能是进一步研究改进生态研究方法和应用的动力。
{"title":"Count Regression and Machine Learning Techniques for Zero-Inflated Overdispersed Count Data: Application to Ecological Data","authors":"Bonelwa Sidumo,&nbsp;Energy Sonono,&nbsp;Isaac Takaidza","doi":"10.1007/s40745-023-00464-6","DOIUrl":"10.1007/s40745-023-00464-6","url":null,"abstract":"<div><p>The aim of this study is to investigate the overdispersion problem that is rampant in ecological count data. In order to explore this problem, we consider the most commonly used count regression models: the Poisson, the negative binomial, the zero-inflated Poisson and the zero-inflated negative binomial models. The performance of these count regression models is compared with the four proposed machine learning (ML) regression techniques: random forests, support vector machines, <span>(k-)</span>nearest neighbors and artificial neural networks. The mean absolute error was used to compare the performance of count regression models and ML regression models. The results suggest that ML regression models perform better compared to count regression models. The performance shown by ML regression techniques is a motivation for further research in improving methods and applications in ecological studies.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 3","pages":"803 - 817"},"PeriodicalIF":0.0,"publicationDate":"2023-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40745-023-00464-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43264905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inferences Based on Correlated Randomly Censored Gumbel’s Type-I Bivariate Exponential Distribution 基于相关随机截尾GumbelⅠ型双变量指数分布的推断
Q1 Decision Sciences Pub Date : 2023-01-31 DOI: 10.1007/s40745-023-00463-7
Hare Krishna, Rajni Goel

The formal random censoring plan has been extensively studied earlier in statistical literature by numerous researchers to deal with dropouts or unintentional random removals in life-testing experiments. All of them considered failure time and censoring time to be independent. But there are several situations in which one observes that as the failure time of an item increases, the censoring time decreases. In medical studies or especially in clinical trials, the occurrence of dropouts or unintentional removals is frequently observed in such a way that as the treatment (failure) time increases, the dropout (censoring) time decreases. No work has yet been found that deals with such correlated failure and censoring times. Therefore, in this article, we assume that the failure time is negatively correlated with censoring time, and they follow Gumbel’s type-I bivariate exponential distribution. We compute the maximum likelihood estimates of the model parameters. Using the Monte Carlo Markov chain methods, the Bayesian estimators of the parameters are calculated. The expected experimental time is also evaluated. Finally, for illustrative purposes, a numerical study and a real data set analysis are given.

在统计文献中,许多研究者早先都对正式随机剔除计划进行了广泛研究,以处理生命测试实验中的辍学或无意随机剔除问题。他们都认为失败时间和剔除时间是独立的。但在一些情况下,我们会发现随着项目失败时间的增加,删减时间也会减少。在医学研究中,尤其是在临床试验中,经常会观察到这样一种情况,即随着治疗(失败)时间的延长,辍学(剔除)时间也会缩短。目前还没有研究涉及到这种失败时间和剔除时间的相关性。因此,在本文中,我们假定失败时间与剔除时间呈负相关,并且它们遵循 Gumbel 的 I 型双变量指数分布。我们计算模型参数的最大似然估计值。使用蒙特卡洛马尔科夫链方法,我们计算了参数的贝叶斯估计值。我们还对预期实验时间进行了评估。最后,为了说明问题,我们给出了数值研究和真实数据集分析。
{"title":"Inferences Based on Correlated Randomly Censored Gumbel’s Type-I Bivariate Exponential Distribution","authors":"Hare Krishna,&nbsp;Rajni Goel","doi":"10.1007/s40745-023-00463-7","DOIUrl":"10.1007/s40745-023-00463-7","url":null,"abstract":"<div><p>The formal random censoring plan has been extensively studied earlier in statistical literature by numerous researchers to deal with dropouts or unintentional random removals in life-testing experiments. All of them considered failure time and censoring time to be independent. But there are several situations in which one observes that as the failure time of an item increases, the censoring time decreases. In medical studies or especially in clinical trials, the occurrence of dropouts or unintentional removals is frequently observed in such a way that as the treatment (failure) time increases, the dropout (censoring) time decreases. No work has yet been found that deals with such correlated failure and censoring times. Therefore, in this article, we assume that the failure time is negatively correlated with censoring time, and they follow Gumbel’s type-I bivariate exponential distribution. We compute the maximum likelihood estimates of the model parameters. Using the Monte Carlo Markov chain methods, the Bayesian estimators of the parameters are calculated. The expected experimental time is also evaluated. Finally, for illustrative purposes, a numerical study and a real data set analysis are given.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 4","pages":"1185 - 1207"},"PeriodicalIF":0.0,"publicationDate":"2023-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49343107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Hierarchical Spatial Modeling of COVID-19 Cases in Bangladesh 孟加拉国COVID-19病例的贝叶斯分层空间模型
Q1 Decision Sciences Pub Date : 2023-01-22 DOI: 10.1007/s40745-022-00461-1
Md. Rezaul Karim,  Sefat-E-Barket

This research aimed to investigate the spatial autocorrelation and heterogeneity throughout Bangladesh’s 64 districts. Moran I and Geary C are used to measure spatial autocorrelation. Different conventional models, such as Poisson-Gamma and Poisson-Lognormal, and spatial models, such as Conditional Autoregressive (CAR) Model, Convolution Model, and modified CAR Model, have been employed to detect the spatial heterogeneity. Bayesian hierarchical methods via Gibbs sampling are used to implement these models. The best model is selected using the Deviance Information Criterion. Results revealed Dhaka has the highest relative risk due to the city’s high population density and growth rate. This study identifies which district has the highest relative risk and which districts adjacent to that district also have a high risk, which allows for the appropriate actions to be taken by the government agencies and communities to mitigate the risk effect.

本研究旨在调查孟加拉国 64 个县的空间自相关性和异质性。Moran I 和 Geary C 用于测量空间自相关性。采用不同的传统模型(如 Poisson-Gamma 和 Poisson-Lognormal 模型)和空间模型(如条件自回归模型、卷积模型和修正的自回归模型)来检测空间异质性。贝叶斯分层方法通过吉布斯采样来实现这些模型。利用偏差信息标准选出最佳模型。结果显示,达卡的相对风险最高,原因是该市人口密度大、增长率高。这项研究确定了哪个区的相对风险最高,以及与该区相邻的哪些区的风险也较高,从而使政府机构和社区能够采取适当的行动来减轻风险影响。
{"title":"Bayesian Hierarchical Spatial Modeling of COVID-19 Cases in Bangladesh","authors":"Md. Rezaul Karim,&nbsp; Sefat-E-Barket","doi":"10.1007/s40745-022-00461-1","DOIUrl":"10.1007/s40745-022-00461-1","url":null,"abstract":"<div><p>This research aimed to investigate the spatial autocorrelation and heterogeneity throughout Bangladesh’s 64 districts. Moran <i>I</i> and Geary <i>C</i> are used to measure spatial autocorrelation. Different conventional models, such as Poisson-Gamma and Poisson-Lognormal, and spatial models, such as Conditional Autoregressive (CAR) Model, Convolution Model, and modified CAR Model, have been employed to detect the spatial heterogeneity. Bayesian hierarchical methods via Gibbs sampling are used to implement these models. The best model is selected using the Deviance Information Criterion. Results revealed Dhaka has the highest relative risk due to the city’s high population density and growth rate. This study identifies which district has the highest relative risk and which districts adjacent to that district also have a high risk, which allows for the appropriate actions to be taken by the government agencies and communities to mitigate the risk effect.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 5","pages":"1581 - 1607"},"PeriodicalIF":0.0,"publicationDate":"2023-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47950849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Annals of Data Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1