Pub Date : 2021-07-30DOI: 10.6339/jds.201310_11(4).0008
J. Achcar, Gian Franco Napa, Roberto Molina de Souza
In this paper we introduce a Bayesian analysis of a spherical distribution applied to rock joint orientation data in presence or not of a vector of covariates, where the response variable is given by the angle from the mean and the covariates are the components of the normal upwards vector. Standard simulation MCMC (Markov Chain Monte Carlo) methods have been used to obtain the posterior summaries of interest obtained from WinBugs software. Illustration of the proposed methodology are given using a simulated data set and a real rock spherical data set from a hydroelectrical site.
在本文中,我们介绍了在存在或不存在协变量向量的情况下应用于岩石节理定向数据的球形分布的贝叶斯分析,其中响应变量由与平均值的角度给出,协变量是法向上向量的分量。已使用标准模拟MCMC(Markov Chain Monte Carlo)方法来获得从WinBugs软件获得的感兴趣的后验摘要。使用水电站的模拟数据集和真实岩石球形数据集对所提出的方法进行了说明。
{"title":"A Bayesian Analysis of the Spherical Distribution in Presence of Covariates","authors":"J. Achcar, Gian Franco Napa, Roberto Molina de Souza","doi":"10.6339/jds.201310_11(4).0008","DOIUrl":"https://doi.org/10.6339/jds.201310_11(4).0008","url":null,"abstract":"In this paper we introduce a Bayesian analysis of a spherical distribution applied to rock joint orientation data in presence or not of a vector of covariates, where the response variable is given by the angle from the mean and the covariates are the components of the normal upwards vector. Standard simulation MCMC (Markov Chain Monte Carlo) methods have been used to obtain the posterior summaries of interest obtained from WinBugs software. Illustration of the proposed methodology are given using a simulated data set and a real rock spherical data set from a hydroelectrical site.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42477547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-30DOI: 10.6339/JDS.2013.11(3).1166
N. Sorokina, David E. Booth, John H. Thornton
We apply methodology robust to outliers to an existing event study of the eect of U.S. nancial reform on the stock markets of the 10 largest world economies, and obtain results that dier from the original OLS results in important ways. This nding underlines the importance of han- dling outliers in event studies. We further review closely the population of outliers identied using Cook's distance and nd that many of the out- liers lie within the event windows. We acknowledge that those data points lead to inaccurate regression tting; however, we cannot remove them since they carry valuable information regarding the event eect. We study further the residuals of the outliers within event windows and nd that the resid- uals change with application of M-estimators and MM-estimators; in most cases they became larger, meaning the main prediction equation is pulled back towards the main data population and further from the outliers and indicating more proper tting. We support our empirical results by pseudo- simulation experiments and nd signicant improvement in determination of both types of the event eect abnormal returns and change in systematic risk. We conclude that robust methods are important for obtaining accurate measurement of event eects in event studies.
{"title":"Robust Methods in Event Studies: Empirical Evidence and Theoretical Implications","authors":"N. Sorokina, David E. Booth, John H. Thornton","doi":"10.6339/JDS.2013.11(3).1166","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(3).1166","url":null,"abstract":"We apply methodology robust to outliers to an existing event study of the eect of U.S. nancial reform on the stock markets of the 10 largest world economies, and obtain results that dier from the original OLS results in important ways. This nding underlines the importance of han- dling outliers in event studies. We further review closely the population of outliers identied using Cook's distance and nd that many of the out- liers lie within the event windows. We acknowledge that those data points lead to inaccurate regression tting; however, we cannot remove them since they carry valuable information regarding the event eect. We study further the residuals of the outliers within event windows and nd that the resid- uals change with application of M-estimators and MM-estimators; in most cases they became larger, meaning the main prediction equation is pulled back towards the main data population and further from the outliers and indicating more proper tting. We support our empirical results by pseudo- simulation experiments and nd signicant improvement in determination of both types of the event eect abnormal returns and change in systematic risk. We conclude that robust methods are important for obtaining accurate measurement of event eects in event studies.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47109351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-30DOI: 10.6339/JDS.2013.11(4).1131
Marcelino A. R. Pascoa, Claudia M. M. de Paiva, G. Cordeiro, E. Ortega
The ve parameter Kumaraswamy generalized gamma model (Pas- coa et al., 2011) includes some important distributions as special cases and it is very useful for modeling lifetime data. We propose an extended version of this distribution by assuming that a shape parameter can take negative values. The new distribution can accommodate increasing, decreasing, bath- tub and unimodal shaped hazard functions. A second advantage is that it also includes as special models reciprocal distributions such as the recipro- cal gamma and reciprocal Weibull distributions. A third advantage is that it can represent the error distribution for the log-Kumaraswamy general- ized gamma regression model. We provide a mathematical treatment of the new distribution including explicit expressions for moments, generating function, mean deviations and order statistics. We obtain the moments of the log-transformed distribution. The new regression model can be used more eectively in the analysis of survival data since it includes as sub- models several widely-known regression models. The method of maximum likelihood and a Bayesian procedure are used for estimating the model pa- rameters for censored data. Overall, the new regression model is very useful to the analysis of real data.
ve参数Kumaraswamy广义伽玛模型(Pas-coa et al.,2011)包括一些重要的分布作为特例,它对寿命数据的建模非常有用。我们通过假设形状参数可以取负值来提出这种分布的扩展版本。新的分布可以适应增加、减少、浴缸和单峰形状的危险函数。第二个优点是,它还包括作为特殊模型的倒数分布,如回归伽马和倒数威布尔分布。第三个优点是它可以表示log Kumaraswamy广义伽玛回归模型的误差分布。我们提供了新分布的数学处理,包括矩、生成函数、平均偏差和阶统计量的显式表达式。我们得到了对数变换分布的矩。新的回归模型可以更有效地用于生存数据的分析,因为它包括几个众所周知的回归模型作为子模型。最大似然法和贝叶斯程序用于估计截尾数据的模型参数。总的来说,新的回归模型对真实数据的分析非常有用。
{"title":"The Log-Kumaraswamy Generalized Gamma Regression Model with Application to Chemical Dependency Data","authors":"Marcelino A. R. Pascoa, Claudia M. M. de Paiva, G. Cordeiro, E. Ortega","doi":"10.6339/JDS.2013.11(4).1131","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(4).1131","url":null,"abstract":"The ve parameter Kumaraswamy generalized gamma model (Pas- coa et al., 2011) includes some important distributions as special cases and it is very useful for modeling lifetime data. We propose an extended version of this distribution by assuming that a shape parameter can take negative values. The new distribution can accommodate increasing, decreasing, bath- tub and unimodal shaped hazard functions. A second advantage is that it also includes as special models reciprocal distributions such as the recipro- cal gamma and reciprocal Weibull distributions. A third advantage is that it can represent the error distribution for the log-Kumaraswamy general- ized gamma regression model. We provide a mathematical treatment of the new distribution including explicit expressions for moments, generating function, mean deviations and order statistics. We obtain the moments of the log-transformed distribution. The new regression model can be used more eectively in the analysis of survival data since it includes as sub- models several widely-known regression models. The method of maximum likelihood and a Bayesian procedure are used for estimating the model pa- rameters for censored data. Overall, the new regression model is very useful to the analysis of real data.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42116600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-30DOI: 10.6339/JDS.2013.11(2).1146
R. Wilcox
It is well known that the ordinary least squares (OLS) regression estimator is not robust. Many robust regression estimators have been proposed and inferential methods based on these estimators have been derived. However, for two independent groups, let θj (X) be some conditional measure of location for the jth group, given X, based on some robust regression estimator. An issue that has not been addressed is computing a 1 - α confidence interval for θ1(X) - θ2(X) in a manner that allows both within group and between group hetereoscedasticity. The paper reports the finite sample properties of a simple method for accomplishing this goal. Simulations indicate that, in terms of controlling the probability of a Type I error, the method performs very well for a wide range of situations, even with a relatively small sample size. In principle, any robust regression estimator can be used. The simulations are focused primarily on the Theil-Sen estimator, but some results using Yohai's MM-estimator, as well as the Koenker and Bassett quantile regression estimator, are noted. Data from the Well Elderly II study, dealing with measures of meaningful activity using the cortisol awakening response as a covariate, are used to illustrate that the choice between an extant method based on a nonparametric regression estimator, and the method suggested here, can make a practical difference.
{"title":"A HETEROSCEDASTIC METHOD FOR COMPARING REGRESSION LINES AT SPECIFIED DESIGN POINTS WHEN USING A ROBUST REGRESSION ESTIMATOR.","authors":"R. Wilcox","doi":"10.6339/JDS.2013.11(2).1146","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(2).1146","url":null,"abstract":"It is well known that the ordinary least squares (OLS) regression estimator is not robust. Many robust regression estimators have been proposed and inferential methods based on these estimators have been derived. However, for two independent groups, let θj (X) be some conditional measure of location for the jth group, given X, based on some robust regression estimator. An issue that has not been addressed is computing a 1 - α confidence interval for θ1(X) - θ2(X) in a manner that allows both within group and between group hetereoscedasticity. The paper reports the finite sample properties of a simple method for accomplishing this goal. Simulations indicate that, in terms of controlling the probability of a Type I error, the method performs very well for a wide range of situations, even with a relatively small sample size. In principle, any robust regression estimator can be used. The simulations are focused primarily on the Theil-Sen estimator, but some results using Yohai's MM-estimator, as well as the Koenker and Bassett quantile regression estimator, are noted. Data from the Well Elderly II study, dealing with measures of meaningful activity using the cortisol awakening response as a covariate, are used to illustrate that the choice between an extant method based on a nonparametric regression estimator, and the method suggested here, can make a practical difference.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"73 1","pages":"281-291"},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73846635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-30DOI: 10.6339/JDS.2013.11(4).1165
Naisheng Wang, Yan Lu
Price limits are applied to control risks in various futures mar- kets. In this research, we proposed an adapted autoregressive model for the observed futures return by introducing dummy variables that represent limit moves. We also proposed a stochastic volatility model with dummy variables. These two models are used to investigate the existence of price de- layed discovery eect and volatility spillover eect from price limits. We give an empirical study of the impact of price limits on copper and natural rubble futures in Shanghai Futures Exchange (SHFE) by using MCMC method. It is found that price limits are ecient in controlling copper futures price, but the rubber futures price is distorted signicantly. This implies that the eects of price limits are signicant for products with large uctuation and frequent limits hit.
{"title":"Adapted Autoregressive Model and Volatility Model with Application","authors":"Naisheng Wang, Yan Lu","doi":"10.6339/JDS.2013.11(4).1165","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(4).1165","url":null,"abstract":"Price limits are applied to control risks in various futures mar- kets. In this research, we proposed an adapted autoregressive model for the observed futures return by introducing dummy variables that represent limit moves. We also proposed a stochastic volatility model with dummy variables. These two models are used to investigate the existence of price de- layed discovery eect and volatility spillover eect from price limits. We give an empirical study of the impact of price limits on copper and natural rubble futures in Shanghai Futures Exchange (SHFE) by using MCMC method. It is found that price limits are ecient in controlling copper futures price, but the rubber futures price is distorted signicantly. This implies that the eects of price limits are signicant for products with large uctuation and frequent limits hit.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44998235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-30DOI: 10.6339/JDS.2013.11(1).1091
Grégory David, S. Jayakumar, B. Thomas
Clustering is an extremely important task in a wide variety of ap- plication domains especially in management and social science research. In this paper, an iterative procedure of clustering method based on multivariate outlier detection was proposed by using the famous Mahalanobis distance. At rst, Mahalanobis distance should be calculated for the entire sample, then using T 2 -statistic x a UCL. Above the UCL are treated as outliers which are grouped as outlier cluster and repeat the same procedure for the remaining inliers, until the variance-covariance matrix for the variables in the last cluster achieved singularity. At each iteration, multivariate test of mean used to check the discrimination between the outlier clusters and the inliers. Moreover, multivariate control charts also used to graphically visual- izes the iterations and outlier clustering process. Finally multivariate test of means helps to rmly establish the cluster discrimination and validity. This paper employed this procedure for clustering 275 customers of a famous two- wheeler in India based on 19 dierent attributes of the two wheeler and its company. The result of the proposed technique conrms there exist 5 and 7 outlier clusters of customers in the entire sample at 5% and 1% signicance level respectively.
{"title":"A New Procedure of Clustering Based on Multivariate Outlier Detection","authors":"Grégory David, S. Jayakumar, B. Thomas","doi":"10.6339/JDS.2013.11(1).1091","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(1).1091","url":null,"abstract":"Clustering is an extremely important task in a wide variety of ap- plication domains especially in management and social science research. In this paper, an iterative procedure of clustering method based on multivariate outlier detection was proposed by using the famous Mahalanobis distance. At rst, Mahalanobis distance should be calculated for the entire sample, then using T 2 -statistic x a UCL. Above the UCL are treated as outliers which are grouped as outlier cluster and repeat the same procedure for the remaining inliers, until the variance-covariance matrix for the variables in the last cluster achieved singularity. At each iteration, multivariate test of mean used to check the discrimination between the outlier clusters and the inliers. Moreover, multivariate control charts also used to graphically visual- izes the iterations and outlier clustering process. Finally multivariate test of means helps to rmly establish the cluster discrimination and validity. This paper employed this procedure for clustering 275 customers of a famous two- wheeler in India based on 19 dierent attributes of the two wheeler and its company. The result of the proposed technique conrms there exist 5 and 7 outlier clusters of customers in the entire sample at 5% and 1% signicance level respectively.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49148312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-30DOI: 10.6339/JDS.201301_11(1).0008
N. Nananukul
Watching videos online has become a popular activity for people around the world. To be able to manage revenue from online advertising an efficient Ad server that can match advertisement to targeted users is needed. In general the users’ demographics are provided to an Ad server by an inference engine which infers users’ demographics based on a profile reasoning technique. Rich media streaming through broadband networks has made significant impact on how online television users’ profiles reasoning can be implemented. Compared to traditional broadcasting services such as satellite and cable, broadcasting through broadband networks enables bidirectional communication between users and content providers. In this paper, a user profile reasoning technique based on a logistic regression model is introduced. The inference model takes into account genre preferences and viewing time from users in different age/gender groups. Historical viewing data were used to train and build the model. Different input data processing and model building strategies are discussed. Also, experimental results are provided to show how effective the proposed technique is.
{"title":"An Inference Model for Online Media Users","authors":"N. Nananukul","doi":"10.6339/JDS.201301_11(1).0008","DOIUrl":"https://doi.org/10.6339/JDS.201301_11(1).0008","url":null,"abstract":"Watching videos online has become a popular activity for people around the world. To be able to manage revenue from online advertising an efficient Ad server that can match advertisement to targeted users is needed. In general the users’ demographics are provided to an Ad server by an inference engine which infers users’ demographics based on a profile reasoning technique. Rich media streaming through broadband networks has made significant impact on how online television users’ profiles reasoning can be implemented. Compared to traditional broadcasting services such as satellite and cable, broadcasting through broadband networks enables bidirectional communication between users and content providers. In this paper, a user profile reasoning technique based on a logistic regression model is introduced. The inference model takes into account genre preferences and viewing time from users in different age/gender groups. Historical viewing data were used to train and build the model. Different input data processing and model building strategies are discussed. Also, experimental results are provided to show how effective the proposed technique is.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41594284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-30DOI: 10.6339/JDS.2013.11(2).1133
T. Dey
We have developed a tool for model space exploration and variable selection in linear regression models based on a simple spike and slab model (Dey, 2012). The model chosen is the best model with minimum nal prediction error (FPE) values among all other models. This is implemented via the R package modelSampler. However, model selection based on FPE criteria is dubious and questionable as FPE criteria can be sensitive to perturbations in the data. This R package can be used for empirical assessment of the stability of FPE criteria. A stable model selection is accomplished by using a bootstrap wrapper that calls the primary function of the package several times on the bootstrapped data. The heart of the method is the notion of model averaging for stable variable selection and to study the behavior of variables over the entire model space, a concept invaluable in high dimensional situations.
{"title":"modelSampler: An R Tool for Variable Selection and Model Exploration in Linear Regression","authors":"T. Dey","doi":"10.6339/JDS.2013.11(2).1133","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(2).1133","url":null,"abstract":"We have developed a tool for model space exploration and variable selection in linear regression models based on a simple spike and slab model (Dey, 2012). The model chosen is the best model with minimum nal prediction error (FPE) values among all other models. This is implemented via the R package modelSampler. However, model selection based on FPE criteria is dubious and questionable as FPE criteria can be sensitive to perturbations in the data. This R package can be used for empirical assessment of the stability of FPE criteria. A stable model selection is accomplished by using a bootstrap wrapper that calls the primary function of the package several times on the bootstrapped data. The heart of the method is the notion of model averaging for stable variable selection and to study the behavior of variables over the entire model space, a concept invaluable in high dimensional situations.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49303089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-30DOI: 10.6339/JDS.2013.11(3).1154
H. Haller, T. Dey, L. Gittner, S. Ludington-Hoe
Childhood obesity is a major health concern. The associated health risks dramatically reduce lifespan and increase healthcare costs. The goal was to develop methodology to identify as early in life as possible whether or not a child would become obese at age five. This diagnostic tool would facilitate clinical monitoring to prevent and or minimize obesity. Obesity is measured by Body Mass Index (BMI), but an improved metric, the ratio of weight to height (or length) (WOH), is proposed from this research for detecting early obesity. Results of this research demonstrate that WOH performs better than BMI for early detection of obesity in individuals using a longitudinal decision analysis (LDA), which is essentially an individuals type control chart analysis about a trend line. Utilizing LDA, the odds of obesity of a child at age five is indicated before the second birthday with 95% sensitivity and 97% specificity. Further, obesity at age five is indicated with 75% specificity before two months and with 84% specificity before three months of age. These results warrant expanding this study to larger cohorts of normal, overweight, and obese children at age five from different healthcare facilities to test the applicability of this novel diagnostic tool.
{"title":"Use of Serial Weight and Length Measurements in Children from Birth to Two Years of Age to Predict Obesity at Five Years of Age","authors":"H. Haller, T. Dey, L. Gittner, S. Ludington-Hoe","doi":"10.6339/JDS.2013.11(3).1154","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(3).1154","url":null,"abstract":"Childhood obesity is a major health concern. The associated health risks dramatically reduce lifespan and increase healthcare costs. The goal was to develop methodology to identify as early in life as possible whether or not a child would become obese at age five. This diagnostic tool would facilitate clinical monitoring to prevent and or minimize obesity. Obesity is measured by Body Mass Index (BMI), but an improved metric, the ratio of weight to height (or length) (WOH), is proposed from this research for detecting early obesity. Results of this research demonstrate that WOH performs better than BMI for early detection of obesity in individuals using a longitudinal decision analysis (LDA), which is essentially an individuals type control chart analysis about a trend line. Utilizing LDA, the odds of obesity of a child at age five is indicated before the second birthday with 95% sensitivity and 97% specificity. Further, obesity at age five is indicated with 75% specificity before two months and with 84% specificity before three months of age. These results warrant expanding this study to larger cohorts of normal, overweight, and obese children at age five from different healthcare facilities to test the applicability of this novel diagnostic tool.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48169099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-30DOI: 10.6339/JDS.2013.11(3).1145
Ling-Jing Kao, F. Lin, C. Yu
Although many scoring models have been developed in literature to oer nancial institutions guidance in credit granting decision, the pur- pose of most scoring models are to improve their discrimination ability, not their explanatory ability. Therefore, the conventional scoring models can only provide limited information in the relationship among customer de- mographics, default risk, and credit card attributes, such as APR (annual percentage rate) and credit limits. In this paper, a Bayesian behavior scor- ing model is proposed to help nancial institutions identify factors which truly reect customer value and can aect default risk. To illustrate the proposed model, we applied it to the credit cardholder database provided by one major bank in Taiwan. The empirical results show that increasing APR will raise the default probability greatly. Single cardholders are less accountable for credit card repayment. High income, female, or cardholders with higher education are more likely to have good repayment ability.
{"title":"Bayesian Behavior Scoring Model","authors":"Ling-Jing Kao, F. Lin, C. Yu","doi":"10.6339/JDS.2013.11(3).1145","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(3).1145","url":null,"abstract":"Although many scoring models have been developed in literature to oer nancial institutions guidance in credit granting decision, the pur- pose of most scoring models are to improve their discrimination ability, not their explanatory ability. Therefore, the conventional scoring models can only provide limited information in the relationship among customer de- mographics, default risk, and credit card attributes, such as APR (annual percentage rate) and credit limits. In this paper, a Bayesian behavior scor- ing model is proposed to help nancial institutions identify factors which truly reect customer value and can aect default risk. To illustrate the proposed model, we applied it to the credit cardholder database provided by one major bank in Taiwan. The empirical results show that increasing APR will raise the default probability greatly. Single cardholders are less accountable for credit card repayment. High income, female, or cardholders with higher education are more likely to have good repayment ability.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44982024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}