Pub Date : 2023-06-05DOI: 10.1007/s40745-023-00472-6
Sanjay Kumar, Priyanka Chhaparwal
Use of a priori information is very common at an estimation stage to form an estimator of a population parameter. Estimation problems can lead to more accurate and efficient estimates using prior information. In this study, we utilized the information from the past surveys along with the information available from the current surveys in the form of a hybrid exponentially weighted moving average to suggest the estimator of the population mean using a known coefficient of variation of the study variable for time-based surveys. We derived the expression of the mean square error of the suggested estimator and established the mathematical conditions to prove the efficiency of the suggested estimator. The results showed that the utilization of information from past surveys and current surveys excels the estimator's efficiency. A simulation study and a real-life example are provided to support using the suggested estimator.
{"title":"Utilization of Priori Information in the Estimation of Population Mean for Time-Based Surveys","authors":"Sanjay Kumar, Priyanka Chhaparwal","doi":"10.1007/s40745-023-00472-6","DOIUrl":"10.1007/s40745-023-00472-6","url":null,"abstract":"<div><p>Use of a priori information is very common at an estimation stage to form an estimator of a population parameter. Estimation problems can lead to more accurate and efficient estimates using prior information. In this study, we utilized the information from the past surveys along with the information available from the current surveys in the form of a hybrid exponentially weighted moving average to suggest the estimator of the population mean using a known coefficient of variation of the study variable for time-based surveys. We derived the expression of the mean square error of the suggested estimator and established the mathematical conditions to prove the efficiency of the suggested estimator. The results showed that the utilization of information from past surveys and current surveys excels the estimator's efficiency. A simulation study and a real-life example are provided to support using the suggested estimator.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 5","pages":"1675 - 1685"},"PeriodicalIF":0.0,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45425769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-04DOI: 10.1007/s40745-023-00471-7
Sakib A. Mondal, Prashanth Rv, Sagar Rao, Arun Menon
Enterprise software can fail due to not only malfunction of application servers, but also due to performance degradation or non-availability of other servers or middle layers. Consequently, valuable time and resources are wasted in trying to identify the root cause of software failures. To address this, we have developed a framework called LADDERS. In LADDERS, anomalous incidents are detected from log events generated by various systems and KPIs (Key Performance Indicators) through an ensemble of supervised and unsupervised models. Without transaction identifiers, it is not possible to relate various events from different systems. LADDERS implements Recursive Parallel Causal Discovery (RPCD) to establish causal relationships among log events. The framework builds coresets using BICO to manage high volumes of log data during training and inferencing. An anomaly can cause a number of anomalies throughout the systems. LADDERS makes use of RPCD again to discover causal relationships among these anomalous events. Probable root causes are revealed from the causal graph and anomaly rating of events using a k-shortest path algorithm. We evaluated LADDERS using live logs from an enterprise system. The results demonstrate its effectiveness and efficiency for anomaly detection.
{"title":"LADDERS: Log Based Anomaly Detection and Diagnosis for Enterprise Systems","authors":"Sakib A. Mondal, Prashanth Rv, Sagar Rao, Arun Menon","doi":"10.1007/s40745-023-00471-7","DOIUrl":"10.1007/s40745-023-00471-7","url":null,"abstract":"<div><p>Enterprise software can fail due to not only malfunction of application servers, but also due to performance degradation or non-availability of other servers or middle layers. Consequently, valuable time and resources are wasted in trying to identify the root cause of software failures. To address this, we have developed a framework called LADDERS. In LADDERS, anomalous incidents are detected from log events generated by various systems and KPIs (Key Performance Indicators) through an ensemble of supervised and unsupervised models. Without transaction identifiers, it is not possible to relate various events from different systems. LADDERS implements Recursive Parallel Causal Discovery (RPCD) to establish causal relationships among log events. The framework builds coresets using BICO to manage high volumes of log data during training and inferencing. An anomaly can cause a number of anomalies throughout the systems. LADDERS makes use of RPCD again to discover causal relationships among these anomalous events. Probable root causes are revealed from the causal graph and anomaly rating of events using a k-shortest path algorithm. We evaluated LADDERS using live logs from an enterprise system. The results demonstrate its effectiveness and efficiency for anomaly detection.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 4","pages":"1165 - 1183"},"PeriodicalIF":0.0,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46232475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-15DOI: 10.1007/s40745-023-00467-3
Rashi Mohta, Sravya Prathapani, Palash Ghosh
Accurate prediction of cumulative COVID-19 infected cases is essential for effectively managing the limited healthcare resources in India. Historically, epidemiological models have helped in controlling such epidemics. Models require accurate historical data to predict future outcomes. In our data, there were days exhibiting erratic, apparently anomalous jumps and drops in the number of daily reported COVID-19 infected cases that did not conform with the overall trend. Including those observations in the training data would most likely worsen model predictive accuracy. However, with existing epidemiological models it is not straightforward to determine, for a specific day, whether or not an outcome should be considered anomalous. In this work, we propose an algorithm to automatically identify anomalous ‘jump’ and ‘drop’ days, and then based upon the overall trend, the number of daily infected cases for those days is adjusted and the training data is amended using the adjusted observations. We applied the algorithm in conjunction with a recently proposed, modified Susceptible-Infected-Susceptible (SIS) model to demonstrate that prediction accuracy is improved after adjusting training data counts for apparent erratic anomalous jumps and drops.
{"title":"Jump-Drop Adjusted Prediction of Cumulative Infected Cases Using the Modified SIS Model","authors":"Rashi Mohta, Sravya Prathapani, Palash Ghosh","doi":"10.1007/s40745-023-00467-3","DOIUrl":"10.1007/s40745-023-00467-3","url":null,"abstract":"<div><p>Accurate prediction of cumulative COVID-19 infected cases is essential for effectively managing the limited healthcare resources in India. Historically, epidemiological models have helped in controlling such epidemics. Models require accurate historical data to predict future outcomes. In our data, there were days exhibiting erratic, apparently anomalous jumps and drops in the number of daily reported COVID-19 infected cases that did not conform with the overall trend. Including those observations in the training data would most likely worsen model predictive accuracy. However, with existing epidemiological models it is not straightforward to determine, for a specific day, whether or not an outcome should be considered anomalous. In this work, we propose an algorithm to automatically identify anomalous ‘jump’ and ‘drop’ days, and then based upon the overall trend, the number of daily infected cases for those days is adjusted and the training data is amended using the adjusted observations. We applied the algorithm in conjunction with a recently proposed, modified Susceptible-Infected-Susceptible (SIS) model to demonstrate that prediction accuracy is improved after adjusting training data counts for apparent erratic anomalous jumps and drops.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 3","pages":"959 - 978"},"PeriodicalIF":0.0,"publicationDate":"2023-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135086225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study investigates the potential of using reinforcement learning (RL) to establish a financial trading system (FTS), taking into account the main constraint imposed by the stock market, e.g., transaction costs. More specifically, this paper shows the inferior performance of the pure reinforcement learning model when it is applied in a multi-dimensional and noisy stock market environment. To solve this problem and to get a practical and reasonable trading strategies process, a modified RL model is proposed based on the actor-critic method where we have amended the actor by incorporating three metrics from technical analysis. The results show significant improvement compared with traditional trading strategies. The reliability of the model is verified by experimental results on financial data (S&P500 index) and a fair evaluation of the proposed method and pure RL and three benchmarks is demonstrated. Statistical analysis proves that a combination of a) technical analysis (role-based strategies) and b) RL (machine learning strategies) and c) restricting the action of the RL policy network with a few realistic conditions results in trading decisions with higher investment return rates.
本研究探讨了使用强化学习(RL)建立金融交易系统(FTS)的潜力,同时考虑到股票市场的主要限制因素,如交易成本。更具体地说,本文展示了纯强化学习模型在多维度、高噪声的股市环境中应用时的劣势表现。为了解决这一问题,并获得实用合理的交易策略流程,我们提出了一种基于行为者批判方法的修正 RL 模型。结果表明,与传统交易策略相比,该模型有了明显改善。金融数据(S&P500 指数)的实验结果验证了该模型的可靠性,并对所提出的方法和纯 RL 以及三个基准进行了公平评估。统计分析证明,将 a) 技术分析(基于角色的策略)和 b) RL(机器学习策略)相结合,以及 c) 用一些现实条件限制 RL 策略网络的作用,可以做出投资回报率更高的交易决策。
{"title":"A Novel Stock Trading Model based on Reinforcement Learning and Technical Analysis","authors":"Zahra Pourahmadi, Dariush Fareed, Hamid Reza Mirzaei","doi":"10.1007/s40745-023-00469-1","DOIUrl":"10.1007/s40745-023-00469-1","url":null,"abstract":"<div><p>This study investigates the potential of using reinforcement learning (RL) to establish a financial trading system (FTS), taking into account the main constraint imposed by the stock market, e.g., transaction costs. More specifically, this paper shows the inferior performance of the pure reinforcement learning model when it is applied in a multi-dimensional and noisy stock market environment. To solve this problem and to get a practical and reasonable trading strategies process, a modified RL model is proposed based on the actor-critic method where we have amended the actor by incorporating three metrics from technical analysis. The results show significant improvement compared with traditional trading strategies. The reliability of the model is verified by experimental results on financial data (S&P500 index) and a fair evaluation of the proposed method and pure RL and three benchmarks is demonstrated. Statistical analysis proves that a combination of a) technical analysis (role-based strategies) and b) RL (machine learning strategies) and c) restricting the action of the RL policy network with a few realistic conditions results in trading decisions with higher investment return rates.\u0000</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 5","pages":"1653 - 1674"},"PeriodicalIF":0.0,"publicationDate":"2023-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49174695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-11DOI: 10.1007/s40745-023-00470-8
Yanfen Zhang, Jinyao Ma, Haibin Zhang, Bin Yue
Platform resource scheduling is an operational research optimization problem of matching tasks and platform resources, which has important applications in production or marketing arrangement layout, combat task planning, etc. The existing algorithms are inflexible in task planning sequence and have poor stability. Aiming at this defect, the branch-and-bound algorithm is combined with the genetic algorithm in this paper. Branch-and-bound algorithm can adaptively adjust the next task to be planned and calculate a variety of feasible task planning sequences. Genetic algorithm is used to assign a platform combination to the selected task. Besides, we put forward a new lower bound calculation method and pruning rule. On the basis of the processing time of the direct successor tasks, the influence of the resource requirements of tasks on the priority of tasks is considered. Numerical experiments show that the proposed algorithm has good performance in platform resource scheduling problem.
{"title":"Platform Resource Scheduling Method Based on Branch-and-Bound and Genetic Algorithm","authors":"Yanfen Zhang, Jinyao Ma, Haibin Zhang, Bin Yue","doi":"10.1007/s40745-023-00470-8","DOIUrl":"10.1007/s40745-023-00470-8","url":null,"abstract":"<div><p>Platform resource scheduling is an operational research optimization problem of matching tasks and platform resources, which has important applications in production or marketing arrangement layout, combat task planning, etc. The existing algorithms are inflexible in task planning sequence and have poor stability. Aiming at this defect, the branch-and-bound algorithm is combined with the genetic algorithm in this paper. Branch-and-bound algorithm can adaptively adjust the next task to be planned and calculate a variety of feasible task planning sequences. Genetic algorithm is used to assign a platform combination to the selected task. Besides, we put forward a new lower bound calculation method and pruning rule. On the basis of the processing time of the direct successor tasks, the influence of the resource requirements of tasks on the priority of tasks is considered. Numerical experiments show that the proposed algorithm has good performance in platform resource scheduling problem.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"10 5","pages":"1421 - 1445"},"PeriodicalIF":0.0,"publicationDate":"2023-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43159033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-04DOI: 10.1007/s40745-023-00468-2
Jitendra Kumar, Ashok Kumar, Varun Agiwal
In present scenario, handling covariate/explanatory variable with the model is one of most important factor to study with the models. The main advantages of covariate are it’s dependency on past observations. So, study variable is modelled after explaining both on own past and past and future observation of covariates. Present paper deals estimation of parameters of autoregressive model with multiple covariates under Bayesian approach. A simulation and empirical study is performed to check the applicability of the model and recorded the better results.
{"title":"Bayesian Estimation of Multiple Covariate of Autoregressive (MC-AR) Model","authors":"Jitendra Kumar, Ashok Kumar, Varun Agiwal","doi":"10.1007/s40745-023-00468-2","DOIUrl":"10.1007/s40745-023-00468-2","url":null,"abstract":"<div><p>In present scenario, handling covariate/explanatory variable with the model is one of most important factor to study with the models. The main advantages of covariate are it’s dependency on past observations. So, study variable is modelled after explaining both on own past and past and future observation of covariates. Present paper deals estimation of parameters of autoregressive model with multiple covariates under Bayesian approach. A simulation and empirical study is performed to check the applicability of the model and recorded the better results.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 4","pages":"1291 - 1301"},"PeriodicalIF":0.0,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47960675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-22DOI: 10.1007/s40745-023-00465-5
Praveen Kumar Tripathi, Manika Agarwal
In this paper, the Bayesian analyses for the random walk models have been performed under the assumptions of normal distribution, log-normal distribution and the stochastic volatility model, for the error component, one by one. For the various parameters, in each model, some suitable choices of informative and non-informative priors have been made and the posterior distributions are calculated. For the first two choices of error distribution, the posterior samples are easily obtained by using the gamma generating routine in R software. For a random walk model, having stochastic volatility error, the Gibbs sampling with intermediate independent Metropolis–Hastings steps is employed to obtain the desired posterior samples. The whole procedure is numerically illustrated through a real data set of crude oil prices from April 2014 to March 2022. The models are, then, compared on the basis of their accuracies in forecasting the true values. Among the other choices, the random walk model with stochastic volatile errors outperformed for the data in hand.
{"title":"A Bayes Analysis of Random Walk Model Under Different Error Assumptions","authors":"Praveen Kumar Tripathi, Manika Agarwal","doi":"10.1007/s40745-023-00465-5","DOIUrl":"10.1007/s40745-023-00465-5","url":null,"abstract":"<div><p>In this paper, the Bayesian analyses for the random walk models have been performed under the assumptions of normal distribution, log-normal distribution and the stochastic volatility model, for the error component, one by one. For the various parameters, in each model, some suitable choices of informative and non-informative priors have been made and the posterior distributions are calculated. For the first two choices of error distribution, the posterior samples are easily obtained by using the gamma generating routine in R software. For a random walk model, having stochastic volatility error, the Gibbs sampling with intermediate independent Metropolis–Hastings steps is employed to obtain the desired posterior samples. The whole procedure is numerically illustrated through a real data set of crude oil prices from April 2014 to March 2022. The models are, then, compared on the basis of their accuracies in forecasting the true values. Among the other choices, the random walk model with stochastic volatile errors outperformed for the data in hand.\u0000</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 5","pages":"1635 - 1652"},"PeriodicalIF":0.0,"publicationDate":"2023-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47611888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-13DOI: 10.1007/s40745-023-00464-6
Bonelwa Sidumo, Energy Sonono, Isaac Takaidza
The aim of this study is to investigate the overdispersion problem that is rampant in ecological count data. In order to explore this problem, we consider the most commonly used count regression models: the Poisson, the negative binomial, the zero-inflated Poisson and the zero-inflated negative binomial models. The performance of these count regression models is compared with the four proposed machine learning (ML) regression techniques: random forests, support vector machines, (k-)nearest neighbors and artificial neural networks. The mean absolute error was used to compare the performance of count regression models and ML regression models. The results suggest that ML regression models perform better compared to count regression models. The performance shown by ML regression techniques is a motivation for further research in improving methods and applications in ecological studies.
本研究旨在探讨生态计数数据中普遍存在的过度分散问题。为了探讨这个问题,我们考虑了最常用的计数回归模型:泊松模型、负二项模型、零膨胀泊松模型和零膨胀负二项模型。这些计数回归模型的性能与所提出的四种机器学习(ML)回归技术进行了比较:随机森林、支持向量机、(k-)近邻和人工神经网络。使用平均绝对误差来比较计数回归模型和 ML 回归模型的性能。结果表明,与计数回归模型相比,ML 回归模型的性能更好。ML 回归技术所显示的性能是进一步研究改进生态研究方法和应用的动力。
{"title":"Count Regression and Machine Learning Techniques for Zero-Inflated Overdispersed Count Data: Application to Ecological Data","authors":"Bonelwa Sidumo, Energy Sonono, Isaac Takaidza","doi":"10.1007/s40745-023-00464-6","DOIUrl":"10.1007/s40745-023-00464-6","url":null,"abstract":"<div><p>The aim of this study is to investigate the overdispersion problem that is rampant in ecological count data. In order to explore this problem, we consider the most commonly used count regression models: the Poisson, the negative binomial, the zero-inflated Poisson and the zero-inflated negative binomial models. The performance of these count regression models is compared with the four proposed machine learning (ML) regression techniques: random forests, support vector machines, <span>(k-)</span>nearest neighbors and artificial neural networks. The mean absolute error was used to compare the performance of count regression models and ML regression models. The results suggest that ML regression models perform better compared to count regression models. The performance shown by ML regression techniques is a motivation for further research in improving methods and applications in ecological studies.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 3","pages":"803 - 817"},"PeriodicalIF":0.0,"publicationDate":"2023-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40745-023-00464-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43264905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-31DOI: 10.1007/s40745-023-00463-7
Hare Krishna, Rajni Goel
The formal random censoring plan has been extensively studied earlier in statistical literature by numerous researchers to deal with dropouts or unintentional random removals in life-testing experiments. All of them considered failure time and censoring time to be independent. But there are several situations in which one observes that as the failure time of an item increases, the censoring time decreases. In medical studies or especially in clinical trials, the occurrence of dropouts or unintentional removals is frequently observed in such a way that as the treatment (failure) time increases, the dropout (censoring) time decreases. No work has yet been found that deals with such correlated failure and censoring times. Therefore, in this article, we assume that the failure time is negatively correlated with censoring time, and they follow Gumbel’s type-I bivariate exponential distribution. We compute the maximum likelihood estimates of the model parameters. Using the Monte Carlo Markov chain methods, the Bayesian estimators of the parameters are calculated. The expected experimental time is also evaluated. Finally, for illustrative purposes, a numerical study and a real data set analysis are given.
在统计文献中,许多研究者早先都对正式随机剔除计划进行了广泛研究,以处理生命测试实验中的辍学或无意随机剔除问题。他们都认为失败时间和剔除时间是独立的。但在一些情况下,我们会发现随着项目失败时间的增加,删减时间也会减少。在医学研究中,尤其是在临床试验中,经常会观察到这样一种情况,即随着治疗(失败)时间的延长,辍学(剔除)时间也会缩短。目前还没有研究涉及到这种失败时间和剔除时间的相关性。因此,在本文中,我们假定失败时间与剔除时间呈负相关,并且它们遵循 Gumbel 的 I 型双变量指数分布。我们计算模型参数的最大似然估计值。使用蒙特卡洛马尔科夫链方法,我们计算了参数的贝叶斯估计值。我们还对预期实验时间进行了评估。最后,为了说明问题,我们给出了数值研究和真实数据集分析。
{"title":"Inferences Based on Correlated Randomly Censored Gumbel’s Type-I Bivariate Exponential Distribution","authors":"Hare Krishna, Rajni Goel","doi":"10.1007/s40745-023-00463-7","DOIUrl":"10.1007/s40745-023-00463-7","url":null,"abstract":"<div><p>The formal random censoring plan has been extensively studied earlier in statistical literature by numerous researchers to deal with dropouts or unintentional random removals in life-testing experiments. All of them considered failure time and censoring time to be independent. But there are several situations in which one observes that as the failure time of an item increases, the censoring time decreases. In medical studies or especially in clinical trials, the occurrence of dropouts or unintentional removals is frequently observed in such a way that as the treatment (failure) time increases, the dropout (censoring) time decreases. No work has yet been found that deals with such correlated failure and censoring times. Therefore, in this article, we assume that the failure time is negatively correlated with censoring time, and they follow Gumbel’s type-I bivariate exponential distribution. We compute the maximum likelihood estimates of the model parameters. Using the Monte Carlo Markov chain methods, the Bayesian estimators of the parameters are calculated. The expected experimental time is also evaluated. Finally, for illustrative purposes, a numerical study and a real data set analysis are given.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 4","pages":"1185 - 1207"},"PeriodicalIF":0.0,"publicationDate":"2023-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49343107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-22DOI: 10.1007/s40745-022-00461-1
Md. Rezaul Karim, Sefat-E-Barket
This research aimed to investigate the spatial autocorrelation and heterogeneity throughout Bangladesh’s 64 districts. Moran I and Geary C are used to measure spatial autocorrelation. Different conventional models, such as Poisson-Gamma and Poisson-Lognormal, and spatial models, such as Conditional Autoregressive (CAR) Model, Convolution Model, and modified CAR Model, have been employed to detect the spatial heterogeneity. Bayesian hierarchical methods via Gibbs sampling are used to implement these models. The best model is selected using the Deviance Information Criterion. Results revealed Dhaka has the highest relative risk due to the city’s high population density and growth rate. This study identifies which district has the highest relative risk and which districts adjacent to that district also have a high risk, which allows for the appropriate actions to be taken by the government agencies and communities to mitigate the risk effect.
本研究旨在调查孟加拉国 64 个县的空间自相关性和异质性。Moran I 和 Geary C 用于测量空间自相关性。采用不同的传统模型(如 Poisson-Gamma 和 Poisson-Lognormal 模型)和空间模型(如条件自回归模型、卷积模型和修正的自回归模型)来检测空间异质性。贝叶斯分层方法通过吉布斯采样来实现这些模型。利用偏差信息标准选出最佳模型。结果显示,达卡的相对风险最高,原因是该市人口密度大、增长率高。这项研究确定了哪个区的相对风险最高,以及与该区相邻的哪些区的风险也较高,从而使政府机构和社区能够采取适当的行动来减轻风险影响。
{"title":"Bayesian Hierarchical Spatial Modeling of COVID-19 Cases in Bangladesh","authors":"Md. Rezaul Karim, Sefat-E-Barket","doi":"10.1007/s40745-022-00461-1","DOIUrl":"10.1007/s40745-022-00461-1","url":null,"abstract":"<div><p>This research aimed to investigate the spatial autocorrelation and heterogeneity throughout Bangladesh’s 64 districts. Moran <i>I</i> and Geary <i>C</i> are used to measure spatial autocorrelation. Different conventional models, such as Poisson-Gamma and Poisson-Lognormal, and spatial models, such as Conditional Autoregressive (CAR) Model, Convolution Model, and modified CAR Model, have been employed to detect the spatial heterogeneity. Bayesian hierarchical methods via Gibbs sampling are used to implement these models. The best model is selected using the Deviance Information Criterion. Results revealed Dhaka has the highest relative risk due to the city’s high population density and growth rate. This study identifies which district has the highest relative risk and which districts adjacent to that district also have a high risk, which allows for the appropriate actions to be taken by the government agencies and communities to mitigate the risk effect.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"11 5","pages":"1581 - 1607"},"PeriodicalIF":0.0,"publicationDate":"2023-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47950849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}