Pub Date : 2024-09-17DOI: 10.1007/s10182-024-00512-3
Huiqiao Wang, Christian H. Weiß, Mingming Zhang
A common choice for the marginal distribution of a bivariate count time series is the bivariate Poisson distribution. In practice, however, when the count data exhibit zero inflation, overdispersion or non-stationarity features, such that a marginal bivariate Poisson distribution is not suitable. To test the discrepancy between the actual count data and the bivariate Poisson distribution, we propose a new goodness-of-fit test based on a bivariate dispersion index. The asymptotic distribution of the test statistic under the null hypothesis of a first-order bivariate integer-valued autoregressive model with marginal bivariate Poisson distribution is derived, and the finite-sample performance of the goodness-of-fit test is analyzed by simulations. A real-data example illustrate the application and usefulness of the test in practice.
{"title":"Goodness-of-fit testing in bivariate count time series based on a bivariate dispersion index","authors":"Huiqiao Wang, Christian H. Weiß, Mingming Zhang","doi":"10.1007/s10182-024-00512-3","DOIUrl":"https://doi.org/10.1007/s10182-024-00512-3","url":null,"abstract":"<p>A common choice for the marginal distribution of a bivariate count time series is the bivariate Poisson distribution. In practice, however, when the count data exhibit zero inflation, overdispersion or non-stationarity features, such that a marginal bivariate Poisson distribution is not suitable. To test the discrepancy between the actual count data and the bivariate Poisson distribution, we propose a new goodness-of-fit test based on a bivariate dispersion index. The asymptotic distribution of the test statistic under the null hypothesis of a first-order bivariate integer-valued autoregressive model with marginal bivariate Poisson distribution is derived, and the finite-sample performance of the goodness-of-fit test is analyzed by simulations. A real-data example illustrate the application and usefulness of the test in practice.</p>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"54 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-20DOI: 10.1007/s10182-024-00509-y
YuZhu Tian, ChunHo Wu, ManLai Tang, MaoZai Tian
In this paper, we propose a Bayesian quantile regression (QR) approach to jointly model multivariate ordinal data. Firstly, a multivariate latent variable model is used to link the multivariate ordinal data and latent continuous responses and the multivariate asymmetric Laplace (MAL) distribution is employed to construct the joint QR-based working likelihood for the considered model. Secondly, adaptive-(L_{1/2}) penalization priors of regression parameters are incorporated into the working likelihood to implement high-dimensional Bayesian joint QR inference. Markov Chain Monte Carlo (MCMC) algorithm is utilized to derive the fully conditional posterior distributions of all parameters. Thirdly, Bayesian joint relatively QR estimation approach is recommended to result in more efficient estimation results. Finally, Monte Carlo simulation studies and a real instance analysis of multirater agreement data are presented to illustrate the performance of the proposed Bayesian joint relatively QR approach.
本文提出了一种贝叶斯量化回归(QR)方法,用于对多元序数数据进行联合建模。首先,使用多变量潜变量模型将多变量序数数据和潜连续响应联系起来,并使用多变量非对称拉普拉斯(MAL)分布为所考虑的模型构建基于 QR 的联合工作似然。其次,将回归参数的自适应-(L_{1/2}) 惩罚先验纳入工作似然,以实现高维贝叶斯联合 QR 推理。利用马尔可夫链蒙特卡罗(MCMC)算法得出所有参数的全条件后验分布。第三,建议采用贝叶斯联合相对 QR 估计方法,以获得更高效的估计结果。最后,介绍了蒙特卡罗模拟研究和多方一致数据的真实实例分析,以说明所建议的贝叶斯联合相对 QR 方法的性能。
{"title":"Bayesian joint relatively quantile regression of latent ordinal multivariate linear models with application to multirater agreement analysis","authors":"YuZhu Tian, ChunHo Wu, ManLai Tang, MaoZai Tian","doi":"10.1007/s10182-024-00509-y","DOIUrl":"https://doi.org/10.1007/s10182-024-00509-y","url":null,"abstract":"<p>In this paper, we propose a Bayesian quantile regression (QR) approach to jointly model multivariate ordinal data. Firstly, a multivariate latent variable model is used to link the multivariate ordinal data and latent continuous responses and the multivariate asymmetric Laplace (MAL) distribution is employed to construct the joint QR-based working likelihood for the considered model. Secondly, adaptive-<span>(L_{1/2})</span> penalization priors of regression parameters are incorporated into the working likelihood to implement high-dimensional Bayesian joint QR inference. Markov Chain Monte Carlo (MCMC) algorithm is utilized to derive the fully conditional posterior distributions of all parameters. Thirdly, Bayesian joint relatively QR estimation approach is recommended to result in more efficient estimation results. Finally, Monte Carlo simulation studies and a real instance analysis of multirater agreement data are presented to illustrate the performance of the proposed Bayesian joint relatively QR approach.</p>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"14 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1007/s10182-024-00510-5
Ali Al-Sharadqah, Karine Bagdasaryan, Ola Nusierat
This paper focuses on the general linear measurement error model, in which some or all predictors are measured with error, while others are measured precisely. We propose a semi-parametric estimator that works under general mechanisms of measurement error, including differential and non-differential errors. Other popular methods, such as the corrected score and conditional score methods, only work for non-differential measurement error models, but our estimator works in all scenarios. We develop our estimator by considering a family of objective functions that depend on an unspecified weight function. Using statistical error analysis and perturbation theory, we derive the optimal weight function under the small-sigma regime. The resulting estimator is statistically optimal in all senses. Even though we develop it under the small-sigma regime, we also establish its consistency and asymptotic normality under the large sample regime. Finally, we conduct a series of numerical experiments to confirm that the proposed estimator outperforms other existing methods.
{"title":"A Finite-sample bias correction method for general linear model in the presence of differential measurement errors","authors":"Ali Al-Sharadqah, Karine Bagdasaryan, Ola Nusierat","doi":"10.1007/s10182-024-00510-5","DOIUrl":"https://doi.org/10.1007/s10182-024-00510-5","url":null,"abstract":"<p>This paper focuses on the general linear measurement error model, in which some or all predictors are measured with error, while others are measured precisely. We propose a semi-parametric estimator that works under general mechanisms of measurement error, including differential and non-differential errors. Other popular methods, such as the corrected score and conditional score methods, only work for non-differential measurement error models, but our estimator works in all scenarios. We develop our estimator by considering a family of objective functions that depend on an unspecified weight function. Using statistical error analysis and perturbation theory, we derive the optimal weight function under the small-sigma regime. The resulting estimator is statistically optimal in all senses. Even though we develop it under the small-sigma regime, we also establish its consistency and asymptotic normality under the large sample regime. Finally, we conduct a series of numerical experiments to confirm that the proposed estimator outperforms other existing methods.</p>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"29 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142202707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-08DOI: 10.1007/s10182-024-00505-2
Roy Cerqueti, Mario Maggi
Benford’s law is a particular discrete probability distribution that is often satisfied by the significant digits of a dataset. The nonconformity with Benford’s law suggests the possible presence of data manipulation. This paper introduces two novel generalized versions of Benford’s law that are less restrictive than the original Benford’s law—hence, leading to more probable conformity of a given dataset. Such generalizations are grounded on the existing mathematical relations between Benford’s law probability distribution elements. Moreover, one of them leads to a set of probability distributions that is a proper subset of that of the other one. We show that the considered versions of Benford’s law have a geometric representation on the three-dimensional Euclidean space. Through suitable optimization models, we show that all the probability distributions satisfying the more restrictive generalization exhibit at least acceptable conformity with Benford’s law, according to the most popular distance measures. We also present some examples to highlight the practical usefulness of the introduced devices.
{"title":"Classes of probability measures built on the properties of Benford’s law","authors":"Roy Cerqueti, Mario Maggi","doi":"10.1007/s10182-024-00505-2","DOIUrl":"https://doi.org/10.1007/s10182-024-00505-2","url":null,"abstract":"<p>Benford’s law is a particular discrete probability distribution that is often satisfied by the significant digits of a dataset. The nonconformity with Benford’s law suggests the possible presence of data manipulation. This paper introduces two novel generalized versions of Benford’s law that are less restrictive than the original Benford’s law—hence, leading to more probable conformity of a given dataset. Such generalizations are grounded on the existing mathematical relations between Benford’s law probability distribution elements. Moreover, one of them leads to a set of probability distributions that is a proper subset of that of the other one. We show that the considered versions of Benford’s law have a geometric representation on the three-dimensional Euclidean space. Through suitable optimization models, we show that all the probability distributions satisfying the more restrictive generalization exhibit at least acceptable conformity with Benford’s law, according to the most popular distance measures. We also present some examples to highlight the practical usefulness of the introduced devices.</p>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"2013 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141948728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-30DOI: 10.1007/s10182-024-00508-z
Alexander Gerharz, Andreas Groll, Gunther Schauberger
{"title":"Publisher Correction: Deducing neighborhoods of classes from a fitted model","authors":"Alexander Gerharz, Andreas Groll, Gunther Schauberger","doi":"10.1007/s10182-024-00508-z","DOIUrl":"10.1007/s10182-024-00508-z","url":null,"abstract":"","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 4","pages":"915 - 915"},"PeriodicalIF":1.4,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-024-00508-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-16DOI: 10.1007/s10182-024-00506-1
Susanna Levantesi, Andrea Nigri, Paolo Pagnottoni, Alessandro Spelta
We propose to investigate the joint dynamics of regional gross domestic product and life expectancy in Italy through Wasserstein barycenter regression derived from optimal transport theory. Wasserstein barycenter regression has the advantage of being flexible in modeling complex data distributions, given its ability to capture multimodal relationships, while maintaining the possibility of incorporating uncertainty and priors, other than yielding interpretable results. The main findings reveal that regional clusters tend to emerge, highlighting inequalities in Italian regions in economic and life expectancy terms. This suggests that targeted policy actions at a regional level fostering equitable development, especially from an economic viewpoint, might reduce regional inequality. Our results are validated by a robustness check on a human mobility dataset and by an illustrative forecasting exercise, which confirms the model’s ability to estimate and predict joint distributions and produce novel empirical evidence.
{"title":"Wasserstein barycenter regression: application to the joint dynamics of regional GDP and life expectancy in Italy","authors":"Susanna Levantesi, Andrea Nigri, Paolo Pagnottoni, Alessandro Spelta","doi":"10.1007/s10182-024-00506-1","DOIUrl":"https://doi.org/10.1007/s10182-024-00506-1","url":null,"abstract":"<p>We propose to investigate the joint dynamics of regional gross domestic product and life expectancy in Italy through Wasserstein barycenter regression derived from optimal transport theory. Wasserstein barycenter regression has the advantage of being flexible in modeling complex data distributions, given its ability to capture multimodal relationships, while maintaining the possibility of incorporating uncertainty and priors, other than yielding interpretable results. The main findings reveal that regional clusters tend to emerge, highlighting inequalities in Italian regions in economic and life expectancy terms. This suggests that targeted policy actions at a regional level fostering equitable development, especially from an economic viewpoint, might reduce regional inequality. Our results are validated by a robustness check on a human mobility dataset and by an illustrative forecasting exercise, which confirms the model’s ability to estimate and predict joint distributions and produce novel empirical evidence.</p>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"38 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141718474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-08DOI: 10.1007/s10182-024-00507-0
Anagh Chattopadhyay, Soudeep Deb
It is often of primary interest to analyze and forecast the levels of a continuous phenomenon as a categorical variable. In this paper, we propose a new spatio-temporal model to deal with this problem in a binary setting, with an interesting application related to the COVID-19 pandemic, a phenomena that depends on both spatial proximity and temporal auto-correlation. Our model is defined through a hierarchical structure for the latent variable, which corresponds to the probit-link function. The mean of the latent variable in the proposed model is designed to capture the trend and the seasonal pattern as well as the lagged effects of relevant regressors. The covariance structure of the model is defined as an additive combination of a zero-mean spatio-temporally correlated process and a white noise process. The parameters associated with the space-time process enable us to analyze the effect of proximity of two points with respect to space or time and its influence on the overall process. For estimation and prediction, we adopt a complete Bayesian framework along with suitable prior specifications and utilize the concepts of Gibbs sampling. Using the county-level data from the state of New York, we show that the proposed methodology provides superior performance than benchmark techniques. We also use our model to devise a novel mechanism for predictive clustering which can be leveraged to develop localized policies.
{"title":"A spatio-temporal model for binary data and its application in analyzing the direction of COVID-19 spread","authors":"Anagh Chattopadhyay, Soudeep Deb","doi":"10.1007/s10182-024-00507-0","DOIUrl":"10.1007/s10182-024-00507-0","url":null,"abstract":"<div><p>It is often of primary interest to analyze and forecast the levels of a continuous phenomenon as a categorical variable. In this paper, we propose a new spatio-temporal model to deal with this problem in a binary setting, with an interesting application related to the COVID-19 pandemic, a phenomena that depends on both spatial proximity and temporal auto-correlation. Our model is defined through a hierarchical structure for the latent variable, which corresponds to the probit-link function. The mean of the latent variable in the proposed model is designed to capture the trend and the seasonal pattern as well as the lagged effects of relevant regressors. The covariance structure of the model is defined as an additive combination of a zero-mean spatio-temporally correlated process and a white noise process. The parameters associated with the space-time process enable us to analyze the effect of proximity of two points with respect to space or time and its influence on the overall process. For estimation and prediction, we adopt a complete Bayesian framework along with suitable prior specifications and utilize the concepts of Gibbs sampling. Using the county-level data from the state of New York, we show that the proposed methodology provides superior performance than benchmark techniques. We also use our model to devise a novel mechanism for predictive clustering which can be leveraged to develop localized policies.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 4","pages":"823 - 851"},"PeriodicalIF":1.4,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141567508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-02DOI: 10.1007/s10182-024-00504-3
Jinsu Park, Yoonjin Lee, Daewon Yang, Jongho Park, Hohyun Jung
Considerable research has been devoted to understanding the popularity effect on the art market dynamics, meaning that artworks by popular artists tend to have high prices. The hedonic pricing model has employed artists’ reputation attributes, such as survey results, to understand the popularity effect, but the reputation attributes are constant and not properly defined at the point of artwork sales. Moreover, the artist’s ability has been measured via random effect in the hedonic model, which fails to reflect ability changes. To remedy these problems, we present a method to define the popularity measure using the artwork sales dataset without relying on the artist’s reputation attributes. Also, we propose a novel pricing model to appropriately infer the time-dependent artist’s abilities using the presented popularity measure. An inference algorithm is presented using the EM algorithm and Gibbs sampling to estimate model parameters and artist abilities. We use the Artnet dataset to investigate the size of the rich-get-richer effect and the variables affecting artwork prices in real-world art market dynamics. We further conduct inferences about artists’ abilities under the popularity effect and examine how ability changes over time for various artists with remarkable interpretations.
大量研究致力于了解艺术市场动态中的人气效应,即受欢迎艺术家的艺术品往往价格较高。对冲定价模型利用艺术家的声誉属性(如调查结果)来理解人气效应,但声誉属性是恒定的,在艺术品销售时并没有正确定义。此外,在对冲定价模型中,艺术家的能力是通过随机效应来衡量的,无法反映能力的变化。为了解决这些问题,我们提出了一种方法,利用艺术品销售数据集来定义受欢迎程度,而不依赖于艺术家的声誉属性。此外,我们还提出了一个新颖的定价模型,利用所提出的受欢迎程度指标来适当推断随时间变化的艺术家能力。我们还提出了一种推理算法,使用 EM 算法和吉布斯采样来估计模型参数和艺术家能力。我们使用 Artnet 数据集来研究 "富者愈富 "效应的大小以及在现实世界艺术市场动态中影响艺术品价格的变量。我们还进一步推断了艺术家在人气效应下的能力,并研究了不同艺术家的能力随时间的变化情况,具有显著的解释力。
{"title":"Artwork pricing model integrating the popularity and ability of artists","authors":"Jinsu Park, Yoonjin Lee, Daewon Yang, Jongho Park, Hohyun Jung","doi":"10.1007/s10182-024-00504-3","DOIUrl":"10.1007/s10182-024-00504-3","url":null,"abstract":"<div><p>Considerable research has been devoted to understanding the popularity effect on the art market dynamics, meaning that artworks by popular artists tend to have high prices. The hedonic pricing model has employed artists’ reputation attributes, such as survey results, to understand the popularity effect, but the reputation attributes are constant and not properly defined at the point of artwork sales. Moreover, the artist’s ability has been measured via random effect in the hedonic model, which fails to reflect ability changes. To remedy these problems, we present a method to define the popularity measure using the artwork sales dataset without relying on the artist’s reputation attributes. Also, we propose a novel pricing model to appropriately infer the time-dependent artist’s abilities using the presented popularity measure. An inference algorithm is presented using the EM algorithm and Gibbs sampling to estimate model parameters and artist abilities. We use the Artnet dataset to investigate the size of the rich-get-richer effect and the variables affecting artwork prices in real-world art market dynamics. We further conduct inferences about artists’ abilities under the popularity effect and examine how ability changes over time for various artists with remarkable interpretations.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 4","pages":"889 - 913"},"PeriodicalIF":1.4,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-024-00504-3.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141509883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-21DOI: 10.1007/s10182-024-00503-4
Benjamin Säfken, David Rügamer
{"title":"Editorial special issue: Bridging the gap between AI and Statistics","authors":"Benjamin Säfken, David Rügamer","doi":"10.1007/s10182-024-00503-4","DOIUrl":"10.1007/s10182-024-00503-4","url":null,"abstract":"","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 2","pages":"225 - 229"},"PeriodicalIF":1.4,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142412950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-29DOI: 10.1007/s10182-024-00501-6
Timo Adam, Marius Ötting, Rouven Michels
Decision trees constitute a simple yet powerful and interpretable machine learning tool. While tree-based methods are designed only for cross-sectional data, we propose an approach that combines decision trees with time series modeling and thereby bridges the gap between machine learning and statistics. In particular, we combine decision trees with hidden Markov models where, for any time point, an underlying (hidden) Markov chain selects the tree that generates the corresponding observation. We propose an estimation approach that is based on the expectation-maximisation algorithm and assess its feasibility in simulation experiments. In our real-data application, we use eight seasons of National Football League (NFL) data to predict play calls conditional on covariates, such as the current quarter and the score, where the model’s states can be linked to the teams’ strategies. R code that implements the proposed method is available on GitHub.
决策树是一种简单但功能强大、可解释的机器学习工具。虽然基于树的方法只适用于横截面数据,但我们提出了一种将决策树与时间序列建模相结合的方法,从而缩小了机器学习与统计学之间的差距。特别是,我们将决策树与隐马尔可夫模型相结合,对于任何时间点,底层(隐)马尔可夫链都会选择生成相应观测值的树。我们提出了一种基于期望最大化算法的估计方法,并在模拟实验中评估了其可行性。在我们的真实数据应用中,我们使用美国国家橄榄球联盟(NFL)八个赛季的数据来预测以当前季度和比分等协变量为条件的比赛调用,其中模型的状态可以与球队的策略相关联。实现该方法的 R 代码可在 GitHub 上获取。
{"title":"Markov-switching decision trees","authors":"Timo Adam, Marius Ötting, Rouven Michels","doi":"10.1007/s10182-024-00501-6","DOIUrl":"10.1007/s10182-024-00501-6","url":null,"abstract":"<div><p>Decision trees constitute a simple yet powerful and interpretable machine learning tool. While tree-based methods are designed only for cross-sectional data, we propose an approach that combines decision trees with time series modeling and thereby bridges the gap between machine learning and statistics. In particular, we combine decision trees with hidden Markov models where, for any time point, an underlying (hidden) Markov chain selects the tree that generates the corresponding observation. We propose an estimation approach that is based on the expectation-maximisation algorithm and assess its feasibility in simulation experiments. In our real-data application, we use eight seasons of National Football League (NFL) data to predict play calls conditional on covariates, such as the current quarter and the score, where the model’s states can be linked to the teams’ strategies. R code that implements the proposed method is available on GitHub.</p></div>","PeriodicalId":55446,"journal":{"name":"Asta-Advances in Statistical Analysis","volume":"108 2","pages":"461 - 476"},"PeriodicalIF":1.4,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10182-024-00501-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141170744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}