首页 > 最新文献

Journal of data science : JDS最新文献

英文 中文
A Bayesian Analysis of the Spherical Distribution in Presence of Covariates 存在协变量时球形分布的贝叶斯分析
Pub Date : 2021-07-30 DOI: 10.6339/jds.201310_11(4).0008
J. Achcar, Gian Franco Napa, Roberto Molina de Souza
In this paper we introduce a Bayesian analysis of a spherical distribution applied to rock joint orientation data in presence or not of a vector of covariates, where the response variable is given by the angle from the mean and the covariates are the components of the normal upwards vector. Standard simulation MCMC (Markov Chain Monte Carlo) methods have been used to obtain the posterior summaries of interest obtained from WinBugs software. Illustration of the proposed methodology are given using a simulated data set and a real rock spherical data set from a hydroelectrical site.
在本文中,我们介绍了在存在或不存在协变量向量的情况下应用于岩石节理定向数据的球形分布的贝叶斯分析,其中响应变量由与平均值的角度给出,协变量是法向上向量的分量。已使用标准模拟MCMC(Markov Chain Monte Carlo)方法来获得从WinBugs软件获得的感兴趣的后验摘要。使用水电站的模拟数据集和真实岩石球形数据集对所提出的方法进行了说明。
{"title":"A Bayesian Analysis of the Spherical Distribution in Presence of Covariates","authors":"J. Achcar, Gian Franco Napa, Roberto Molina de Souza","doi":"10.6339/jds.201310_11(4).0008","DOIUrl":"https://doi.org/10.6339/jds.201310_11(4).0008","url":null,"abstract":"In this paper we introduce a Bayesian analysis of a spherical distribution applied to rock joint orientation data in presence or not of a vector of covariates, where the response variable is given by the angle from the mean and the covariates are the components of the normal upwards vector. Standard simulation MCMC (Markov Chain Monte Carlo) methods have been used to obtain the posterior summaries of interest obtained from WinBugs software. Illustration of the proposed methodology are given using a simulated data set and a real rock spherical data set from a hydroelectrical site.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42477547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Methods in Event Studies: Empirical Evidence and Theoretical Implications 事件研究中的稳健方法:经验证据和理论意义
Pub Date : 2021-07-30 DOI: 10.6339/JDS.2013.11(3).1166
N. Sorokina, David E. Booth, John H. Thornton
We apply methodology robust to outliers to an existing event study of the eect of U.S. nancial reform on the stock markets of the 10 largest world economies, and obtain results that dier from the original OLS results in important ways. This nding underlines the importance of han- dling outliers in event studies. We further review closely the population of outliers identied using Cook's distance and nd that many of the out- liers lie within the event windows. We acknowledge that those data points lead to inaccurate regression tting; however, we cannot remove them since they carry valuable information regarding the event eect. We study further the residuals of the outliers within event windows and nd that the resid- uals change with application of M-estimators and MM-estimators; in most cases they became larger, meaning the main prediction equation is pulled back towards the main data population and further from the outliers and indicating more proper tting. We support our empirical results by pseudo- simulation experiments and nd signicant improvement in determination of both types of the event eect abnormal returns and change in systematic risk. We conclude that robust methods are important for obtaining accurate measurement of event eects in event studies.
我们对美国金融改革对世界10大经济体股票市场影响的现有事件研究应用了稳健的异常值方法,并在重要方面获得了与原始OLS结果相似的结果。这一结论强调了在事件研究中处理异常值的重要性。我们进一步仔细审查了使用库克距离确定的离群值的总体,并且发现许多离群值位于事件窗口内。我们承认这些数据点会导致不准确的回归;但是,我们不能删除它们,因为它们携带有关事件效果的有价值的信息。进一步研究了事件窗内异常值的残差,发现残差随m估计量和mm估计量的应用而变化;在大多数情况下,它们变得更大,这意味着主要的预测方程被拉回主要数据群体,远离离群值,表明更适当的调整。我们通过伪模拟实验来支持我们的实证结果,并在确定两种类型的事件影响异常收益和系统风险变化方面取得了显著的进步。我们得出结论,在事件研究中,可靠的方法对于获得事件效应的准确测量是重要的。
{"title":"Robust Methods in Event Studies: Empirical Evidence and Theoretical Implications","authors":"N. Sorokina, David E. Booth, John H. Thornton","doi":"10.6339/JDS.2013.11(3).1166","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(3).1166","url":null,"abstract":"We apply methodology robust to outliers to an existing event study of the eect of U.S. nancial reform on the stock markets of the 10 largest world economies, and obtain results that dier from the original OLS results in important ways. This nding underlines the importance of han- dling outliers in event studies. We further review closely the population of outliers identied using Cook's distance and nd that many of the out- liers lie within the event windows. We acknowledge that those data points lead to inaccurate regression tting; however, we cannot remove them since they carry valuable information regarding the event eect. We study further the residuals of the outliers within event windows and nd that the resid- uals change with application of M-estimators and MM-estimators; in most cases they became larger, meaning the main prediction equation is pulled back towards the main data population and further from the outliers and indicating more proper tting. We support our empirical results by pseudo- simulation experiments and nd signicant improvement in determination of both types of the event eect abnormal returns and change in systematic risk. We conclude that robust methods are important for obtaining accurate measurement of event eects in event studies.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47109351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
The Log-Kumaraswamy Generalized Gamma Regression Model with Application to Chemical Dependency Data LogKumaraswamy广义伽玛回归模型及其在化学依赖数据中的应用
Pub Date : 2021-07-30 DOI: 10.6339/JDS.2013.11(4).1131
Marcelino A. R. Pascoa, Claudia M. M. de Paiva, G. Cordeiro, E. Ortega
The ve parameter Kumaraswamy generalized gamma model (Pas- coa et al., 2011) includes some important distributions as special cases and it is very useful for modeling lifetime data. We propose an extended version of this distribution by assuming that a shape parameter can take negative values. The new distribution can accommodate increasing, decreasing, bath- tub and unimodal shaped hazard functions. A second advantage is that it also includes as special models reciprocal distributions such as the recipro- cal gamma and reciprocal Weibull distributions. A third advantage is that it can represent the error distribution for the log-Kumaraswamy general- ized gamma regression model. We provide a mathematical treatment of the new distribution including explicit expressions for moments, generating function, mean deviations and order statistics. We obtain the moments of the log-transformed distribution. The new regression model can be used more eectively in the analysis of survival data since it includes as sub- models several widely-known regression models. The method of maximum likelihood and a Bayesian procedure are used for estimating the model pa- rameters for censored data. Overall, the new regression model is very useful to the analysis of real data.
ve参数Kumaraswamy广义伽玛模型(Pas-coa et al.,2011)包括一些重要的分布作为特例,它对寿命数据的建模非常有用。我们通过假设形状参数可以取负值来提出这种分布的扩展版本。新的分布可以适应增加、减少、浴缸和单峰形状的危险函数。第二个优点是,它还包括作为特殊模型的倒数分布,如回归伽马和倒数威布尔分布。第三个优点是它可以表示log Kumaraswamy广义伽玛回归模型的误差分布。我们提供了新分布的数学处理,包括矩、生成函数、平均偏差和阶统计量的显式表达式。我们得到了对数变换分布的矩。新的回归模型可以更有效地用于生存数据的分析,因为它包括几个众所周知的回归模型作为子模型。最大似然法和贝叶斯程序用于估计截尾数据的模型参数。总的来说,新的回归模型对真实数据的分析非常有用。
{"title":"The Log-Kumaraswamy Generalized Gamma Regression Model with Application to Chemical Dependency Data","authors":"Marcelino A. R. Pascoa, Claudia M. M. de Paiva, G. Cordeiro, E. Ortega","doi":"10.6339/JDS.2013.11(4).1131","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(4).1131","url":null,"abstract":"The ve parameter Kumaraswamy generalized gamma model (Pas- coa et al., 2011) includes some important distributions as special cases and it is very useful for modeling lifetime data. We propose an extended version of this distribution by assuming that a shape parameter can take negative values. The new distribution can accommodate increasing, decreasing, bath- tub and unimodal shaped hazard functions. A second advantage is that it also includes as special models reciprocal distributions such as the recipro- cal gamma and reciprocal Weibull distributions. A third advantage is that it can represent the error distribution for the log-Kumaraswamy general- ized gamma regression model. We provide a mathematical treatment of the new distribution including explicit expressions for moments, generating function, mean deviations and order statistics. We obtain the moments of the log-transformed distribution. The new regression model can be used more eectively in the analysis of survival data since it includes as sub- models several widely-known regression models. The method of maximum likelihood and a Bayesian procedure are used for estimating the model pa- rameters for censored data. Overall, the new regression model is very useful to the analysis of real data.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42116600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A HETEROSCEDASTIC METHOD FOR COMPARING REGRESSION LINES AT SPECIFIED DESIGN POINTS WHEN USING A ROBUST REGRESSION ESTIMATOR. 在使用稳健回归估计量时,在指定设计点比较回归线的异方差方法。
Pub Date : 2021-07-30 DOI: 10.6339/JDS.2013.11(2).1146
R. Wilcox
It is well known that the ordinary least squares (OLS) regression estimator is not robust. Many robust regression estimators have been proposed and inferential methods based on these estimators have been derived. However, for two independent groups, let θj (X) be some conditional measure of location for the jth group, given X, based on some robust regression estimator. An issue that has not been addressed is computing a 1 - α confidence interval for θ1(X) - θ2(X) in a manner that allows both within group and between group hetereoscedasticity. The paper reports the finite sample properties of a simple method for accomplishing this goal. Simulations indicate that, in terms of controlling the probability of a Type I error, the method performs very well for a wide range of situations, even with a relatively small sample size. In principle, any robust regression estimator can be used. The simulations are focused primarily on the Theil-Sen estimator, but some results using Yohai's MM-estimator, as well as the Koenker and Bassett quantile regression estimator, are noted. Data from the Well Elderly II study, dealing with measures of meaningful activity using the cortisol awakening response as a covariate, are used to illustrate that the choice between an extant method based on a nonparametric regression estimator, and the method suggested here, can make a practical difference.
众所周知,普通最小二乘(OLS)回归估计量是不稳健的。人们提出了许多鲁棒回归估计量,并推导了基于这些估计量的推理方法。然而,对于两个独立的群体,设θj (X)为给定X的基于鲁棒回归估计量的第j群体位置的某个条件测度。一个尚未解决的问题是以允许组内和组间异方差的方式计算θ1(X) - θ2(X)的1 - α置信区间。本文报道了实现这一目标的一种简单方法的有限样本性质。模拟表明,在控制第一类错误的概率方面,该方法在广泛的情况下表现得非常好,即使样本量相对较小。原则上,任何稳健回归估计器都可以使用。模拟主要集中在Theil-Sen估计器上,但也注意到使用Yohai的mm估计器以及Koenker和Bassett分位数回归估计器的一些结果。来自Well Elderly II研究的数据,使用皮质醇唤醒反应作为协变量来处理有意义活动的测量,用于说明基于非参数回归估计量的现有方法和本文建议的方法之间的选择可以产生实际的差异。
{"title":"A HETEROSCEDASTIC METHOD FOR COMPARING REGRESSION LINES AT SPECIFIED DESIGN POINTS WHEN USING A ROBUST REGRESSION ESTIMATOR.","authors":"R. Wilcox","doi":"10.6339/JDS.2013.11(2).1146","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(2).1146","url":null,"abstract":"It is well known that the ordinary least squares (OLS) regression estimator is not robust. Many robust regression estimators have been proposed and inferential methods based on these estimators have been derived. However, for two independent groups, let θj (X) be some conditional measure of location for the jth group, given X, based on some robust regression estimator. An issue that has not been addressed is computing a 1 - α confidence interval for θ1(X) - θ2(X) in a manner that allows both within group and between group hetereoscedasticity. The paper reports the finite sample properties of a simple method for accomplishing this goal. Simulations indicate that, in terms of controlling the probability of a Type I error, the method performs very well for a wide range of situations, even with a relatively small sample size. In principle, any robust regression estimator can be used. The simulations are focused primarily on the Theil-Sen estimator, but some results using Yohai's MM-estimator, as well as the Koenker and Bassett quantile regression estimator, are noted. Data from the Well Elderly II study, dealing with measures of meaningful activity using the cortisol awakening response as a covariate, are used to illustrate that the choice between an extant method based on a nonparametric regression estimator, and the method suggested here, can make a practical difference.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"73 1","pages":"281-291"},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73846635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Adapted Autoregressive Model and Volatility Model with Application 自适应自回归模型与波动率模型及其应用
Pub Date : 2021-07-30 DOI: 10.6339/JDS.2013.11(4).1165
Naisheng Wang, Yan Lu
Price limits are applied to control risks in various futures mar- kets. In this research, we proposed an adapted autoregressive model for the observed futures return by introducing dummy variables that represent limit moves. We also proposed a stochastic volatility model with dummy variables. These two models are used to investigate the existence of price de- layed discovery eect and volatility spillover eect from price limits. We give an empirical study of the impact of price limits on copper and natural rubble futures in Shanghai Futures Exchange (SHFE) by using MCMC method. It is found that price limits are ecient in controlling copper futures price, but the rubber futures price is distorted signicantly. This implies that the eects of price limits are signicant for products with large uctuation and frequent limits hit.
在各种期货市场上,价格限制是用来控制风险的。在本研究中,我们通过引入代表极限移动的虚拟变量,为观察到的期货收益提出了一个自回归模型。我们还提出了一个带有虚拟变量的随机波动模型。利用这两个模型从价格限制的角度考察了价格延迟发现效应和波动溢出效应的存在性。本文运用MCMC方法对上海期货交易所铜和天然碎石期货价格限制的影响进行了实证研究。研究发现,限价对铜期货价格的调控有效,但对橡胶期货价格的调控扭曲明显。这意味着价格限制的影响对于波动较大和经常受到限制的产品是显著的。
{"title":"Adapted Autoregressive Model and Volatility Model with Application","authors":"Naisheng Wang, Yan Lu","doi":"10.6339/JDS.2013.11(4).1165","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(4).1165","url":null,"abstract":"Price limits are applied to control risks in various futures mar- kets. In this research, we proposed an adapted autoregressive model for the observed futures return by introducing dummy variables that represent limit moves. We also proposed a stochastic volatility model with dummy variables. These two models are used to investigate the existence of price de- layed discovery eect and volatility spillover eect from price limits. We give an empirical study of the impact of price limits on copper and natural rubble futures in Shanghai Futures Exchange (SHFE) by using MCMC method. It is found that price limits are ecient in controlling copper futures price, but the rubber futures price is distorted signicantly. This implies that the eects of price limits are signicant for products with large uctuation and frequent limits hit.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44998235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A New Procedure of Clustering Based on Multivariate Outlier Detection 一种基于多变量异常值检测的聚类新方法
Pub Date : 2021-07-30 DOI: 10.6339/JDS.2013.11(1).1091
Grégory David, S. Jayakumar, B. Thomas
Clustering is an extremely important task in a wide variety of ap- plication domains especially in management and social science research. In this paper, an iterative procedure of clustering method based on multivariate outlier detection was proposed by using the famous Mahalanobis distance. At rst, Mahalanobis distance should be calculated for the entire sample, then using T 2 -statistic x a UCL. Above the UCL are treated as outliers which are grouped as outlier cluster and repeat the same procedure for the remaining inliers, until the variance-covariance matrix for the variables in the last cluster achieved singularity. At each iteration, multivariate test of mean used to check the discrimination between the outlier clusters and the inliers. Moreover, multivariate control charts also used to graphically visual- izes the iterations and outlier clustering process. Finally multivariate test of means helps to rmly establish the cluster discrimination and validity. This paper employed this procedure for clustering 275 customers of a famous two- wheeler in India based on 19 dierent attributes of the two wheeler and its company. The result of the proposed technique conrms there exist 5 and 7 outlier clusters of customers in the entire sample at 5% and 1% signicance level respectively.
聚类在各种应用领域中是一项极其重要的任务,尤其是在管理和社会科学研究中。本文利用著名的马氏距离,提出了一种基于多元异常点检测的聚类方法的迭代过程。首先,应该计算整个样本的马氏距离,然后使用T2统计量x UCL。以上UCL被视为异常值,这些异常值被分组为异常值聚类,并对其余的异常值重复相同的过程,直到最后一个聚类中变量的方差-协方差矩阵达到奇异性。在每次迭代中,使用均值的多元检验来检查异常聚类和内部聚类之间的区别。此外,多元控制图还用于图形化可视化迭代和异常值聚类过程。最后,多元均值检验有助于rmly建立聚类判别和有效性。本文采用该程序,基于印度一辆著名两轮车及其公司的19个特征,对275名客户进行了聚类。所提出的技术的结果表明,在整个样本中,在5%和1%的显著水平上,分别存在5个和7个异常客户集群。
{"title":"A New Procedure of Clustering Based on Multivariate Outlier Detection","authors":"Grégory David, S. Jayakumar, B. Thomas","doi":"10.6339/JDS.2013.11(1).1091","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(1).1091","url":null,"abstract":"Clustering is an extremely important task in a wide variety of ap- plication domains especially in management and social science research. In this paper, an iterative procedure of clustering method based on multivariate outlier detection was proposed by using the famous Mahalanobis distance. At rst, Mahalanobis distance should be calculated for the entire sample, then using T 2 -statistic x a UCL. Above the UCL are treated as outliers which are grouped as outlier cluster and repeat the same procedure for the remaining inliers, until the variance-covariance matrix for the variables in the last cluster achieved singularity. At each iteration, multivariate test of mean used to check the discrimination between the outlier clusters and the inliers. Moreover, multivariate control charts also used to graphically visual- izes the iterations and outlier clustering process. Finally multivariate test of means helps to rmly establish the cluster discrimination and validity. This paper employed this procedure for clustering 275 customers of a famous two- wheeler in India based on 19 dierent attributes of the two wheeler and its company. The result of the proposed technique conrms there exist 5 and 7 outlier clusters of customers in the entire sample at 5% and 1% signicance level respectively.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49148312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
An Inference Model for Online Media Users 网络媒体用户的推理模型
Pub Date : 2021-07-30 DOI: 10.6339/JDS.201301_11(1).0008
N. Nananukul
Watching videos online has become a popular activity for people around the world. To be able to manage revenue from online advertising an efficient Ad server that can match advertisement to targeted users is needed. In general the users’ demographics are provided to an Ad server by an inference engine which infers users’ demographics based on a profile reasoning technique. Rich media streaming through broadband networks has made significant impact on how online television users’ profiles reasoning can be implemented. Compared to traditional broadcasting services such as satellite and cable, broadcasting through broadband networks enables bidirectional communication between users and content providers. In this paper, a user profile reasoning technique based on a logistic regression model is introduced. The inference model takes into account genre preferences and viewing time from users in different age/gender groups. Historical viewing data were used to train and build the model. Different input data processing and model building strategies are discussed. Also, experimental results are provided to show how effective the proposed technique is.
在线观看视频已经成为世界各地人们的一项流行活动。为了能够管理在线广告的收入,需要一个有效的广告服务器,可以将广告与目标用户相匹配。一般情况下,用户的人口统计信息由基于概要推理技术推断用户人口统计信息的推理引擎提供给广告服务器。通过宽带网络的富媒体流对如何实现在线电视用户档案推理产生了重大影响。与卫星、有线等传统广播服务相比,宽带广播可以实现用户和内容提供者之间的双向通信。本文介绍了一种基于逻辑回归模型的用户画像推理技术。推理模型考虑了不同年龄/性别用户的类型偏好和观看时间。使用历史观看数据来训练和构建模型。讨论了不同的输入数据处理和模型构建策略。实验结果表明了该方法的有效性。
{"title":"An Inference Model for Online Media Users","authors":"N. Nananukul","doi":"10.6339/JDS.201301_11(1).0008","DOIUrl":"https://doi.org/10.6339/JDS.201301_11(1).0008","url":null,"abstract":"Watching videos online has become a popular activity for people around the world. To be able to manage revenue from online advertising an efficient Ad server that can match advertisement to targeted users is needed. In general the users’ demographics are provided to an Ad server by an inference engine which infers users’ demographics based on a profile reasoning technique. Rich media streaming through broadband networks has made significant impact on how online television users’ profiles reasoning can be implemented. Compared to traditional broadcasting services such as satellite and cable, broadcasting through broadband networks enables bidirectional communication between users and content providers. In this paper, a user profile reasoning technique based on a logistic regression model is introduced. The inference model takes into account genre preferences and viewing time from users in different age/gender groups. Historical viewing data were used to train and build the model. Different input data processing and model building strategies are discussed. Also, experimental results are provided to show how effective the proposed technique is.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41594284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
modelSampler: An R Tool for Variable Selection and Model Exploration in Linear Regression modelSampler:线性回归中变量选择和模型探索的R工具
Pub Date : 2021-07-30 DOI: 10.6339/JDS.2013.11(2).1133
T. Dey
We have developed a tool for model space exploration and variable selection in linear regression models based on a simple spike and slab model (Dey, 2012). The model chosen is the best model with minimum nal prediction error (FPE) values among all other models. This is implemented via the R package modelSampler. However, model selection based on FPE criteria is dubious and questionable as FPE criteria can be sensitive to perturbations in the data. This R package can be used for empirical assessment of the stability of FPE criteria. A stable model selection is accomplished by using a bootstrap wrapper that calls the primary function of the package several times on the bootstrapped data. The heart of the method is the notion of model averaging for stable variable selection and to study the behavior of variables over the entire model space, a concept invaluable in high dimensional situations.
我们开发了一种基于简单尖峰和平板模型的线性回归模型中的模型空间探索和变量选择工具(Dey,2012)。所选择的模型是所有其他模型中具有最小最终预测误差(FPE)值的最佳模型。这是通过R包modelSampler实现的。然而,基于FPE标准的模型选择是可疑和可疑的,因为FPE标准可能对数据中的扰动敏感。该R包可用于FPE标准稳定性的经验评估。稳定的模型选择是通过使用引导包装器来完成的,该包装器在引导的数据上多次调用包的主函数。该方法的核心是模型平均的概念,用于稳定的变量选择,并研究变量在整个模型空间中的行为,这一概念在高维情况下非常宝贵。
{"title":"modelSampler: An R Tool for Variable Selection and Model Exploration in Linear Regression","authors":"T. Dey","doi":"10.6339/JDS.2013.11(2).1133","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(2).1133","url":null,"abstract":"We have developed a tool for model space exploration and variable selection in linear regression models based on a simple spike and slab model (Dey, 2012). The model chosen is the best model with minimum nal prediction error (FPE) values among all other models. This is implemented via the R package modelSampler. However, model selection based on FPE criteria is dubious and questionable as FPE criteria can be sensitive to perturbations in the data. This R package can be used for empirical assessment of the stability of FPE criteria. A stable model selection is accomplished by using a bootstrap wrapper that calls the primary function of the package several times on the bootstrapped data. The heart of the method is the notion of model averaging for stable variable selection and to study the behavior of variables over the entire model space, a concept invaluable in high dimensional situations.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49303089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Use of Serial Weight and Length Measurements in Children from Birth to Two Years of Age to Predict Obesity at Five Years of Age 使用从出生到两岁儿童的连续体重和长度测量来预测五岁时的肥胖
Pub Date : 2021-07-30 DOI: 10.6339/JDS.2013.11(3).1154
H. Haller, T. Dey, L. Gittner, S. Ludington-Hoe
Childhood obesity is a major health concern. The associated health risks dramatically reduce lifespan and increase healthcare costs. The goal was to develop methodology to identify as early in life as possible whether or not a child would become obese at age five. This diagnostic tool would facilitate clinical monitoring to prevent and or minimize obesity. Obesity is measured by Body Mass Index (BMI), but an improved metric, the ratio of weight to height (or length) (WOH), is proposed from this research for detecting early obesity. Results of this research demonstrate that WOH performs better than BMI for early detection of obesity in individuals using a longitudinal decision analysis (LDA), which is essentially an individuals type control chart analysis about a trend line. Utilizing LDA, the odds of obesity of a child at age five is indicated before the second birthday with 95% sensitivity and 97% specificity. Further, obesity at age five is indicated with 75% specificity before two months and with 84% specificity before three months of age. These results warrant expanding this study to larger cohorts of normal, overweight, and obese children at age five from different healthcare facilities to test the applicability of this novel diagnostic tool.
儿童肥胖是一个主要的健康问题。相关的健康风险大大缩短了寿命,增加了医疗成本。目标是开发一种方法,尽可能早地确定孩子是否会在五岁时变得肥胖。这种诊断工具将有助于临床监测,以预防和/或最大限度地减少肥胖。肥胖是通过体重指数(BMI)来衡量的,但这项研究提出了一种改进的指标,即体重与身高(或长度)的比率(WOH),用于检测早期肥胖。这项研究的结果表明,在使用纵向决策分析(LDA)早期检测个体肥胖方面,WOH比BMI表现更好,LDA本质上是一种关于趋势线的个体类型控制图分析。利用LDA,5岁儿童在2岁生日前肥胖的几率为95%的敏感性和97%的特异性。此外,五岁时的肥胖在两个月前具有75%的特异性,在三个月大前具有84%的特异性。这些结果值得将这项研究扩展到来自不同医疗机构的5岁正常、超重和肥胖儿童的更大群体,以测试这种新型诊断工具的适用性。
{"title":"Use of Serial Weight and Length Measurements in Children from Birth to Two Years of Age to Predict Obesity at Five Years of Age","authors":"H. Haller, T. Dey, L. Gittner, S. Ludington-Hoe","doi":"10.6339/JDS.2013.11(3).1154","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(3).1154","url":null,"abstract":"Childhood obesity is a major health concern. The associated health risks dramatically reduce lifespan and increase healthcare costs. The goal was to develop methodology to identify as early in life as possible whether or not a child would become obese at age five. This diagnostic tool would facilitate clinical monitoring to prevent and or minimize obesity. Obesity is measured by Body Mass Index (BMI), but an improved metric, the ratio of weight to height (or length) (WOH), is proposed from this research for detecting early obesity. Results of this research demonstrate that WOH performs better than BMI for early detection of obesity in individuals using a longitudinal decision analysis (LDA), which is essentially an individuals type control chart analysis about a trend line. Utilizing LDA, the odds of obesity of a child at age five is indicated before the second birthday with 95% sensitivity and 97% specificity. Further, obesity at age five is indicated with 75% specificity before two months and with 84% specificity before three months of age. These results warrant expanding this study to larger cohorts of normal, overweight, and obese children at age five from different healthcare facilities to test the applicability of this novel diagnostic tool.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48169099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Bayesian Behavior Scoring Model 贝叶斯行为评分模型
Pub Date : 2021-07-30 DOI: 10.6339/JDS.2013.11(3).1145
Ling-Jing Kao, F. Lin, C. Yu
Although many scoring models have been developed in literature to oer nancial institutions guidance in credit granting decision, the pur- pose of most scoring models are to improve their discrimination ability, not their explanatory ability. Therefore, the conventional scoring models can only provide limited information in the relationship among customer de- mographics, default risk, and credit card attributes, such as APR (annual percentage rate) and credit limits. In this paper, a Bayesian behavior scor- ing model is proposed to help nancial institutions identify factors which truly reect customer value and can aect default risk. To illustrate the proposed model, we applied it to the credit cardholder database provided by one major bank in Taiwan. The empirical results show that increasing APR will raise the default probability greatly. Single cardholders are less accountable for credit card repayment. High income, female, or cardholders with higher education are more likely to have good repayment ability.
虽然文献中已经建立了许多评分模型来指导金融机构的授信决策,但大多数评分模型的目的是提高其识别能力,而不是其解释能力。因此,传统的评分模型只能提供客户人口统计、违约风险和信用卡属性(如年利率和信用额度)之间关系的有限信息。本文提出了一个贝叶斯行为评分模型,以帮助金融机构识别真实反映客户价值并能影响违约风险的因素。为了说明所提出的模型,我们将其应用于台湾一家大型银行提供的信用卡持卡人数据库。实证结果表明,年利率的提高将大大提高违约概率。单一持卡人对信用卡还款的责任较少。高收入、女性或受过高等教育的持卡人更有可能有良好的还款能力。
{"title":"Bayesian Behavior Scoring Model","authors":"Ling-Jing Kao, F. Lin, C. Yu","doi":"10.6339/JDS.2013.11(3).1145","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(3).1145","url":null,"abstract":"Although many scoring models have been developed in literature to oer nancial institutions guidance in credit granting decision, the pur- pose of most scoring models are to improve their discrimination ability, not their explanatory ability. Therefore, the conventional scoring models can only provide limited information in the relationship among customer de- mographics, default risk, and credit card attributes, such as APR (annual percentage rate) and credit limits. In this paper, a Bayesian behavior scor- ing model is proposed to help nancial institutions identify factors which truly reect customer value and can aect default risk. To illustrate the proposed model, we applied it to the credit cardholder database provided by one major bank in Taiwan. The empirical results show that increasing APR will raise the default probability greatly. Single cardholders are less accountable for credit card repayment. High income, female, or cardholders with higher education are more likely to have good repayment ability.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44982024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Journal of data science : JDS
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1