基于自回归多项式Logit和C5.0决策树的面板数据多类预测

IF 1.1 Q3 STATISTICS & PROBABILITY Pakistan Journal of Statistics and Operation Research Pub Date : 2023-03-06 DOI:10.18187/pjsor.v19i1.4053
Muhlis Ardiansyah, Hari Wijayanto, Anang Kurnia, A. Djuraidah
{"title":"基于自回归多项式Logit和C5.0决策树的面板数据多类预测","authors":"Muhlis Ardiansyah, Hari Wijayanto, Anang Kurnia, A. Djuraidah","doi":"10.18187/pjsor.v19i1.4053","DOIUrl":null,"url":null,"abstract":"Panel data is commonly used for the numerical response variables, while the literature for forecasting categorical variables on the panel data structure is still challenging to find. Forecasting is important because it is helpful for government policies. This study aimed to forecast multiclass or categorical variables on the panel data structure. The proposed forecasting models were autoregressive multinomial logit and autoregressive C5.0. The strategy applied so that the two models could be used for forecasting was to add autoregressive effects and fixed predictor variables such as location, time, strata, and month of observations. The autoregressive effect  was assumed to be a fixed effect and treated as a dummy variable. The data used was the category of land conditions through The Area Sampling Frame (ASF) survey conducted by the BPS-Statistics Indonesia. The evaluation of both models was based on classification and forecasting performance. Classification performance was obtained by dividing the dataset into 75% training data for modeling and 25% test data for validation and then repeated 200 times. The classification results showed that the autoregressive C5.0 accuracy was 86.48%, while the autoregressive multinomial logit was 83.97%. A comparison of forecasting performance was obtained by dividing the data into training and testing based on the time sequence. The result showed that the forecasting performance was worse than the classification performance. Autoregressive C5.0 had an accuracy of 77.43%, while autoregressive multinomial logit had 77.77%.","PeriodicalId":19973,"journal":{"name":"Pakistan Journal of Statistics and Operation Research","volume":null,"pages":null},"PeriodicalIF":1.1000,"publicationDate":"2023-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Multiclass Forecasting on Panel Data Using Autoregressive Multinomial Logit and C5.0 Decision Tree\",\"authors\":\"Muhlis Ardiansyah, Hari Wijayanto, Anang Kurnia, A. Djuraidah\",\"doi\":\"10.18187/pjsor.v19i1.4053\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Panel data is commonly used for the numerical response variables, while the literature for forecasting categorical variables on the panel data structure is still challenging to find. Forecasting is important because it is helpful for government policies. This study aimed to forecast multiclass or categorical variables on the panel data structure. The proposed forecasting models were autoregressive multinomial logit and autoregressive C5.0. The strategy applied so that the two models could be used for forecasting was to add autoregressive effects and fixed predictor variables such as location, time, strata, and month of observations. The autoregressive effect  was assumed to be a fixed effect and treated as a dummy variable. The data used was the category of land conditions through The Area Sampling Frame (ASF) survey conducted by the BPS-Statistics Indonesia. The evaluation of both models was based on classification and forecasting performance. Classification performance was obtained by dividing the dataset into 75% training data for modeling and 25% test data for validation and then repeated 200 times. The classification results showed that the autoregressive C5.0 accuracy was 86.48%, while the autoregressive multinomial logit was 83.97%. A comparison of forecasting performance was obtained by dividing the data into training and testing based on the time sequence. The result showed that the forecasting performance was worse than the classification performance. Autoregressive C5.0 had an accuracy of 77.43%, while autoregressive multinomial logit had 77.77%.\",\"PeriodicalId\":19973,\"journal\":{\"name\":\"Pakistan Journal of Statistics and Operation Research\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2023-03-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pakistan Journal of Statistics and Operation Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18187/pjsor.v19i1.4053\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pakistan Journal of Statistics and Operation Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18187/pjsor.v19i1.4053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 1

摘要

面板数据通常用于数值响应变量,而在面板数据结构上预测分类变量的文献仍然很难找到。预测很重要,因为它有助于政府政策。本研究旨在预测面板数据结构上的多类别或分类变量。所提出的预测模型为自回归多项式logit和自回归C5.0。使这两个模型可用于预测的策略是添加自回归效应和固定的预测变量,如位置、时间、地层和观测月份。自回归效应被假设为固定效应,并被视为伪变量。使用的数据是通过BPS印尼统计局进行的区域抽样框架(ASF)调查得出的土地状况类别。这两个模型的评估都是基于分类和预测性能。分类性能是通过将数据集划分为75%的训练数据用于建模和25%的测试数据用于验证来获得的,然后重复200次。分类结果表明,自回归C5.0的准确率为86.48%,而自回归多项式logit为83.97%。通过根据时间序列将数据划分为训练和测试,获得了预测性能的比较。结果表明,预测性能比分类性能差。自回归C5.0的准确率为77.43%,而自回归多项式logit的准确率则为77.77%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Multiclass Forecasting on Panel Data Using Autoregressive Multinomial Logit and C5.0 Decision Tree
Panel data is commonly used for the numerical response variables, while the literature for forecasting categorical variables on the panel data structure is still challenging to find. Forecasting is important because it is helpful for government policies. This study aimed to forecast multiclass or categorical variables on the panel data structure. The proposed forecasting models were autoregressive multinomial logit and autoregressive C5.0. The strategy applied so that the two models could be used for forecasting was to add autoregressive effects and fixed predictor variables such as location, time, strata, and month of observations. The autoregressive effect  was assumed to be a fixed effect and treated as a dummy variable. The data used was the category of land conditions through The Area Sampling Frame (ASF) survey conducted by the BPS-Statistics Indonesia. The evaluation of both models was based on classification and forecasting performance. Classification performance was obtained by dividing the dataset into 75% training data for modeling and 25% test data for validation and then repeated 200 times. The classification results showed that the autoregressive C5.0 accuracy was 86.48%, while the autoregressive multinomial logit was 83.97%. A comparison of forecasting performance was obtained by dividing the data into training and testing based on the time sequence. The result showed that the forecasting performance was worse than the classification performance. Autoregressive C5.0 had an accuracy of 77.43%, while autoregressive multinomial logit had 77.77%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
3.30
自引率
26.70%
发文量
53
期刊介绍: Pakistan Journal of Statistics and Operation Research. PJSOR is a peer-reviewed journal, published four times a year. PJSOR publishes refereed research articles and studies that describe the latest research and developments in the area of statistics, operation research and actuarial statistics.
期刊最新文献
Characterizations of the Recently Introduced Discrete Distributions A New Family of Heavy-Tailed Generalized Topp-Leone-G Distributions with Application A new class of probability distributions with an application in engineering science Approximations to the Moments of Order Statistics for Normal Distribution Approximation Methods for the Bivariate Compound Truncated Poisson Gamma Distribution
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1