首页 > 最新文献

Journal of data science : JDS最新文献

英文 中文
A HETEROSCEDASTIC METHOD FOR COMPARING REGRESSION LINES AT SPECIFIED DESIGN POINTS WHEN USING A ROBUST REGRESSION ESTIMATOR. 在使用稳健回归估计量时,在指定设计点比较回归线的异方差方法。
Pub Date : 2021-07-30 DOI: 10.6339/JDS.2013.11(2).1146
R. Wilcox
It is well known that the ordinary least squares (OLS) regression estimator is not robust. Many robust regression estimators have been proposed and inferential methods based on these estimators have been derived. However, for two independent groups, let θj (X) be some conditional measure of location for the jth group, given X, based on some robust regression estimator. An issue that has not been addressed is computing a 1 - α confidence interval for θ1(X) - θ2(X) in a manner that allows both within group and between group hetereoscedasticity. The paper reports the finite sample properties of a simple method for accomplishing this goal. Simulations indicate that, in terms of controlling the probability of a Type I error, the method performs very well for a wide range of situations, even with a relatively small sample size. In principle, any robust regression estimator can be used. The simulations are focused primarily on the Theil-Sen estimator, but some results using Yohai's MM-estimator, as well as the Koenker and Bassett quantile regression estimator, are noted. Data from the Well Elderly II study, dealing with measures of meaningful activity using the cortisol awakening response as a covariate, are used to illustrate that the choice between an extant method based on a nonparametric regression estimator, and the method suggested here, can make a practical difference.
众所周知,普通最小二乘(OLS)回归估计量是不稳健的。人们提出了许多鲁棒回归估计量,并推导了基于这些估计量的推理方法。然而,对于两个独立的群体,设θj (X)为给定X的基于鲁棒回归估计量的第j群体位置的某个条件测度。一个尚未解决的问题是以允许组内和组间异方差的方式计算θ1(X) - θ2(X)的1 - α置信区间。本文报道了实现这一目标的一种简单方法的有限样本性质。模拟表明,在控制第一类错误的概率方面,该方法在广泛的情况下表现得非常好,即使样本量相对较小。原则上,任何稳健回归估计器都可以使用。模拟主要集中在Theil-Sen估计器上,但也注意到使用Yohai的mm估计器以及Koenker和Bassett分位数回归估计器的一些结果。来自Well Elderly II研究的数据,使用皮质醇唤醒反应作为协变量来处理有意义活动的测量,用于说明基于非参数回归估计量的现有方法和本文建议的方法之间的选择可以产生实际的差异。
{"title":"A HETEROSCEDASTIC METHOD FOR COMPARING REGRESSION LINES AT SPECIFIED DESIGN POINTS WHEN USING A ROBUST REGRESSION ESTIMATOR.","authors":"R. Wilcox","doi":"10.6339/JDS.2013.11(2).1146","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(2).1146","url":null,"abstract":"It is well known that the ordinary least squares (OLS) regression estimator is not robust. Many robust regression estimators have been proposed and inferential methods based on these estimators have been derived. However, for two independent groups, let θj (X) be some conditional measure of location for the jth group, given X, based on some robust regression estimator. An issue that has not been addressed is computing a 1 - α confidence interval for θ1(X) - θ2(X) in a manner that allows both within group and between group hetereoscedasticity. The paper reports the finite sample properties of a simple method for accomplishing this goal. Simulations indicate that, in terms of controlling the probability of a Type I error, the method performs very well for a wide range of situations, even with a relatively small sample size. In principle, any robust regression estimator can be used. The simulations are focused primarily on the Theil-Sen estimator, but some results using Yohai's MM-estimator, as well as the Koenker and Bassett quantile regression estimator, are noted. Data from the Well Elderly II study, dealing with measures of meaningful activity using the cortisol awakening response as a covariate, are used to illustrate that the choice between an extant method based on a nonparametric regression estimator, and the method suggested here, can make a practical difference.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"73 1","pages":"281-291"},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73846635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
An Inference Model for Online Media Users 网络媒体用户的推理模型
Pub Date : 2021-07-30 DOI: 10.6339/JDS.201301_11(1).0008
N. Nananukul
Watching videos online has become a popular activity for people around the world. To be able to manage revenue from online advertising an efficient Ad server that can match advertisement to targeted users is needed. In general the users’ demographics are provided to an Ad server by an inference engine which infers users’ demographics based on a profile reasoning technique. Rich media streaming through broadband networks has made significant impact on how online television users’ profiles reasoning can be implemented. Compared to traditional broadcasting services such as satellite and cable, broadcasting through broadband networks enables bidirectional communication between users and content providers. In this paper, a user profile reasoning technique based on a logistic regression model is introduced. The inference model takes into account genre preferences and viewing time from users in different age/gender groups. Historical viewing data were used to train and build the model. Different input data processing and model building strategies are discussed. Also, experimental results are provided to show how effective the proposed technique is.
在线观看视频已经成为世界各地人们的一项流行活动。为了能够管理在线广告的收入,需要一个有效的广告服务器,可以将广告与目标用户相匹配。一般情况下,用户的人口统计信息由基于概要推理技术推断用户人口统计信息的推理引擎提供给广告服务器。通过宽带网络的富媒体流对如何实现在线电视用户档案推理产生了重大影响。与卫星、有线等传统广播服务相比,宽带广播可以实现用户和内容提供者之间的双向通信。本文介绍了一种基于逻辑回归模型的用户画像推理技术。推理模型考虑了不同年龄/性别用户的类型偏好和观看时间。使用历史观看数据来训练和构建模型。讨论了不同的输入数据处理和模型构建策略。实验结果表明了该方法的有效性。
{"title":"An Inference Model for Online Media Users","authors":"N. Nananukul","doi":"10.6339/JDS.201301_11(1).0008","DOIUrl":"https://doi.org/10.6339/JDS.201301_11(1).0008","url":null,"abstract":"Watching videos online has become a popular activity for people around the world. To be able to manage revenue from online advertising an efficient Ad server that can match advertisement to targeted users is needed. In general the users’ demographics are provided to an Ad server by an inference engine which infers users’ demographics based on a profile reasoning technique. Rich media streaming through broadband networks has made significant impact on how online television users’ profiles reasoning can be implemented. Compared to traditional broadcasting services such as satellite and cable, broadcasting through broadband networks enables bidirectional communication between users and content providers. In this paper, a user profile reasoning technique based on a logistic regression model is introduced. The inference model takes into account genre preferences and viewing time from users in different age/gender groups. Historical viewing data were used to train and build the model. Different input data processing and model building strategies are discussed. Also, experimental results are provided to show how effective the proposed technique is.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41594284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
modelSampler: An R Tool for Variable Selection and Model Exploration in Linear Regression modelSampler:线性回归中变量选择和模型探索的R工具
Pub Date : 2021-07-30 DOI: 10.6339/JDS.2013.11(2).1133
T. Dey
We have developed a tool for model space exploration and variable selection in linear regression models based on a simple spike and slab model (Dey, 2012). The model chosen is the best model with minimum nal prediction error (FPE) values among all other models. This is implemented via the R package modelSampler. However, model selection based on FPE criteria is dubious and questionable as FPE criteria can be sensitive to perturbations in the data. This R package can be used for empirical assessment of the stability of FPE criteria. A stable model selection is accomplished by using a bootstrap wrapper that calls the primary function of the package several times on the bootstrapped data. The heart of the method is the notion of model averaging for stable variable selection and to study the behavior of variables over the entire model space, a concept invaluable in high dimensional situations.
我们开发了一种基于简单尖峰和平板模型的线性回归模型中的模型空间探索和变量选择工具(Dey,2012)。所选择的模型是所有其他模型中具有最小最终预测误差(FPE)值的最佳模型。这是通过R包modelSampler实现的。然而,基于FPE标准的模型选择是可疑和可疑的,因为FPE标准可能对数据中的扰动敏感。该R包可用于FPE标准稳定性的经验评估。稳定的模型选择是通过使用引导包装器来完成的,该包装器在引导的数据上多次调用包的主函数。该方法的核心是模型平均的概念,用于稳定的变量选择,并研究变量在整个模型空间中的行为,这一概念在高维情况下非常宝贵。
{"title":"modelSampler: An R Tool for Variable Selection and Model Exploration in Linear Regression","authors":"T. Dey","doi":"10.6339/JDS.2013.11(2).1133","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(2).1133","url":null,"abstract":"We have developed a tool for model space exploration and variable selection in linear regression models based on a simple spike and slab model (Dey, 2012). The model chosen is the best model with minimum nal prediction error (FPE) values among all other models. This is implemented via the R package modelSampler. However, model selection based on FPE criteria is dubious and questionable as FPE criteria can be sensitive to perturbations in the data. This R package can be used for empirical assessment of the stability of FPE criteria. A stable model selection is accomplished by using a bootstrap wrapper that calls the primary function of the package several times on the bootstrapped data. The heart of the method is the notion of model averaging for stable variable selection and to study the behavior of variables over the entire model space, a concept invaluable in high dimensional situations.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49303089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Use of Serial Weight and Length Measurements in Children from Birth to Two Years of Age to Predict Obesity at Five Years of Age 使用从出生到两岁儿童的连续体重和长度测量来预测五岁时的肥胖
Pub Date : 2021-07-30 DOI: 10.6339/JDS.2013.11(3).1154
H. Haller, T. Dey, L. Gittner, S. Ludington-Hoe
Childhood obesity is a major health concern. The associated health risks dramatically reduce lifespan and increase healthcare costs. The goal was to develop methodology to identify as early in life as possible whether or not a child would become obese at age five. This diagnostic tool would facilitate clinical monitoring to prevent and or minimize obesity. Obesity is measured by Body Mass Index (BMI), but an improved metric, the ratio of weight to height (or length) (WOH), is proposed from this research for detecting early obesity. Results of this research demonstrate that WOH performs better than BMI for early detection of obesity in individuals using a longitudinal decision analysis (LDA), which is essentially an individuals type control chart analysis about a trend line. Utilizing LDA, the odds of obesity of a child at age five is indicated before the second birthday with 95% sensitivity and 97% specificity. Further, obesity at age five is indicated with 75% specificity before two months and with 84% specificity before three months of age. These results warrant expanding this study to larger cohorts of normal, overweight, and obese children at age five from different healthcare facilities to test the applicability of this novel diagnostic tool.
儿童肥胖是一个主要的健康问题。相关的健康风险大大缩短了寿命,增加了医疗成本。目标是开发一种方法,尽可能早地确定孩子是否会在五岁时变得肥胖。这种诊断工具将有助于临床监测,以预防和/或最大限度地减少肥胖。肥胖是通过体重指数(BMI)来衡量的,但这项研究提出了一种改进的指标,即体重与身高(或长度)的比率(WOH),用于检测早期肥胖。这项研究的结果表明,在使用纵向决策分析(LDA)早期检测个体肥胖方面,WOH比BMI表现更好,LDA本质上是一种关于趋势线的个体类型控制图分析。利用LDA,5岁儿童在2岁生日前肥胖的几率为95%的敏感性和97%的特异性。此外,五岁时的肥胖在两个月前具有75%的特异性,在三个月大前具有84%的特异性。这些结果值得将这项研究扩展到来自不同医疗机构的5岁正常、超重和肥胖儿童的更大群体,以测试这种新型诊断工具的适用性。
{"title":"Use of Serial Weight and Length Measurements in Children from Birth to Two Years of Age to Predict Obesity at Five Years of Age","authors":"H. Haller, T. Dey, L. Gittner, S. Ludington-Hoe","doi":"10.6339/JDS.2013.11(3).1154","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(3).1154","url":null,"abstract":"Childhood obesity is a major health concern. The associated health risks dramatically reduce lifespan and increase healthcare costs. The goal was to develop methodology to identify as early in life as possible whether or not a child would become obese at age five. This diagnostic tool would facilitate clinical monitoring to prevent and or minimize obesity. Obesity is measured by Body Mass Index (BMI), but an improved metric, the ratio of weight to height (or length) (WOH), is proposed from this research for detecting early obesity. Results of this research demonstrate that WOH performs better than BMI for early detection of obesity in individuals using a longitudinal decision analysis (LDA), which is essentially an individuals type control chart analysis about a trend line. Utilizing LDA, the odds of obesity of a child at age five is indicated before the second birthday with 95% sensitivity and 97% specificity. Further, obesity at age five is indicated with 75% specificity before two months and with 84% specificity before three months of age. These results warrant expanding this study to larger cohorts of normal, overweight, and obese children at age five from different healthcare facilities to test the applicability of this novel diagnostic tool.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48169099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Bayesian Behavior Scoring Model 贝叶斯行为评分模型
Pub Date : 2021-07-30 DOI: 10.6339/JDS.2013.11(3).1145
Ling-Jing Kao, F. Lin, C. Yu
Although many scoring models have been developed in literature to oer nancial institutions guidance in credit granting decision, the pur- pose of most scoring models are to improve their discrimination ability, not their explanatory ability. Therefore, the conventional scoring models can only provide limited information in the relationship among customer de- mographics, default risk, and credit card attributes, such as APR (annual percentage rate) and credit limits. In this paper, a Bayesian behavior scor- ing model is proposed to help nancial institutions identify factors which truly reect customer value and can aect default risk. To illustrate the proposed model, we applied it to the credit cardholder database provided by one major bank in Taiwan. The empirical results show that increasing APR will raise the default probability greatly. Single cardholders are less accountable for credit card repayment. High income, female, or cardholders with higher education are more likely to have good repayment ability.
虽然文献中已经建立了许多评分模型来指导金融机构的授信决策,但大多数评分模型的目的是提高其识别能力,而不是其解释能力。因此,传统的评分模型只能提供客户人口统计、违约风险和信用卡属性(如年利率和信用额度)之间关系的有限信息。本文提出了一个贝叶斯行为评分模型,以帮助金融机构识别真实反映客户价值并能影响违约风险的因素。为了说明所提出的模型,我们将其应用于台湾一家大型银行提供的信用卡持卡人数据库。实证结果表明,年利率的提高将大大提高违约概率。单一持卡人对信用卡还款的责任较少。高收入、女性或受过高等教育的持卡人更有可能有良好的还款能力。
{"title":"Bayesian Behavior Scoring Model","authors":"Ling-Jing Kao, F. Lin, C. Yu","doi":"10.6339/JDS.2013.11(3).1145","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(3).1145","url":null,"abstract":"Although many scoring models have been developed in literature to oer nancial institutions guidance in credit granting decision, the pur- pose of most scoring models are to improve their discrimination ability, not their explanatory ability. Therefore, the conventional scoring models can only provide limited information in the relationship among customer de- mographics, default risk, and credit card attributes, such as APR (annual percentage rate) and credit limits. In this paper, a Bayesian behavior scor- ing model is proposed to help nancial institutions identify factors which truly reect customer value and can aect default risk. To illustrate the proposed model, we applied it to the credit cardholder database provided by one major bank in Taiwan. The empirical results show that increasing APR will raise the default probability greatly. Single cardholders are less accountable for credit card repayment. High income, female, or cardholders with higher education are more likely to have good repayment ability.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44982024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
On Estimation of Rayleigh Scale Parameter under Doubly Type-II Censoring from Imprecise Data 非精确数据双ii型滤波下瑞利尺度参数的估计
Pub Date : 2021-07-30 DOI: 10.6339/JDS.2013.11(2).1144
Abbas Pak, G. Parham, M. Saraj
The scheme of doubly type-II censored sampling is an important method of obtaining data in lifetime studies. Statistical analysis of life- time distributions under this censoring scheme is based on precise lifetime data. However, some collected lifetime data might be imprecise and are represented in the form of fuzzy numbers. This paper deals with the prob- lem of estimating the scale parameter of Rayleigh distribution under doubly type-II censoring scheme when the lifetime observations are fuzzy and are assumed to be related to underlying crisp realization of a random sample. We propose a new method to determine the maximum likelihood estimate of the parameter of interest. The asymptotic variance of the ML estimate is then derived by using the missing information principle. Their performance is then assessed through Monte Carlo simulations. Finally, an illustrative example with real data concerning 25 ball bearings in a life test is presented.
双II型截尾抽样方案是寿命研究中获取数据的一种重要方法。在这种截尾方案下,寿命分布的统计分析是基于精确的寿命数据。然而,一些收集的寿命数据可能不精确,并且以模糊数的形式表示。本文讨论了当寿命观测值是模糊的并且假设与随机样本的底层清晰实现有关时,在双重II型截尾方案下估计瑞利分布的尺度参数的问题。我们提出了一种新的方法来确定感兴趣参数的最大似然估计。然后利用缺失信息原理推导了ML估计的渐近方差。然后通过蒙特卡洛模拟对其性能进行评估。最后,给出了25个滚珠轴承在寿命试验中的实际数据。
{"title":"On Estimation of Rayleigh Scale Parameter under Doubly Type-II Censoring from Imprecise Data","authors":"Abbas Pak, G. Parham, M. Saraj","doi":"10.6339/JDS.2013.11(2).1144","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(2).1144","url":null,"abstract":"The scheme of doubly type-II censored sampling is an important method of obtaining data in lifetime studies. Statistical analysis of life- time distributions under this censoring scheme is based on precise lifetime data. However, some collected lifetime data might be imprecise and are represented in the form of fuzzy numbers. This paper deals with the prob- lem of estimating the scale parameter of Rayleigh distribution under doubly type-II censoring scheme when the lifetime observations are fuzzy and are assumed to be related to underlying crisp realization of a random sample. We propose a new method to determine the maximum likelihood estimate of the parameter of interest. The asymptotic variance of the ML estimate is then derived by using the missing information principle. Their performance is then assessed through Monte Carlo simulations. Finally, an illustrative example with real data concerning 25 ball bearings in a life test is presented.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46969078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Modelling Progression of HIV/AIDS Disease Stages Using Semi-Markov Processes 用半马尔可夫过程模拟HIV/AIDS疾病阶段的进展
Pub Date : 2021-07-30 DOI: 10.6339/jds.201304_11(2).0004
A. Goshu, Zelalem G. Dessie
The aim of this study is to model the progression of HIV/AIDS disease of an individual patient under ART follow-up using semi-Markov processes. Recorded hospital data were obtained for a cohort of 710 patients at Felege-Hiwot referral hospital, Ethiopia, who have been under ART followup from June 2005 to August 2009. States of the Markov process are defined by the seriousness of the sickness based on the CD4 counts in cells/microliter. The five states considered are: state one (CD4 count > 500); state two (350 < CD4 count ≤ 500); state three (200 < CD4 count ≤ 350); state four (CD4 count ≤ 200); and state five (Death). The first four states are named as good or alive states. The findings obtained from the current study are as follows: within the good states, the transition probability from a given state to the next worse state increases with time, gets optimum at a time and then decreases with increasing time. This means that there is some period of time when such probability is highest for a patient to transit to a worse state of the disease. Moreover, the probability of dying decreases with increasing CD4 counts over time. For an HIV/AIDS patient in a specific state of the disease, the probability of being in same state decreases over time. Within the good states, the results show that probability of being in a better state is non-zero, but less than the probability of being in worse state. At any time of the process, there is more likely to be in worse state than to be in better one. The conditional probability of staying in same state until a given number of month decreases with increasing time. The reliability analysis also revealed that the survival probabilities are all declining over time. This implies that patient conditions should be improved with ART to improve the survival probability.
本研究的目的是利用半马尔可夫过程模拟抗逆转录病毒治疗随访下单个患者的艾滋病毒/艾滋病疾病进展。获得了从2005年6月至2009年8月在埃塞俄比亚费利格-希沃特转诊医院接受抗逆转录病毒治疗随访的710名患者的医院记录数据。马尔可夫过程的状态是根据细胞中CD4计数/微升来定义疾病的严重程度。考虑的五个州是:州一(CD4计数为500);状态二(350 < CD4计数≤500);状态三(200 < CD4计数≤350);状态四(CD4计数≤200);状态五(死亡)前四种状态被命名为良好状态或活状态。研究结果表明:在良好状态下,从给定状态到下一个较差状态的过渡概率随着时间的增加而增大,在某一时刻达到最优,然后随着时间的增加而减小。这意味着有一段时间,当这种可能性是最高的病人转移到疾病的更坏的状态。此外,随着时间的推移,死亡的可能性随着CD4计数的增加而降低。对于处于特定疾病状态的艾滋病毒/艾滋病患者,处于同一状态的概率随着时间的推移而降低。结果表明,在良好状态下,处于较好状态的概率不为零,但小于处于较差状态的概率。在这个过程中的任何时候,处于更坏状态的可能性都大于处于更好状态的可能性。保持相同状态直到给定月数的条件概率随着时间的增加而减小。可靠性分析还显示,随着时间的推移,生存概率都在下降。这意味着应该通过ART改善患者状况以提高生存率。
{"title":"Modelling Progression of HIV/AIDS Disease Stages Using Semi-Markov Processes","authors":"A. Goshu, Zelalem G. Dessie","doi":"10.6339/jds.201304_11(2).0004","DOIUrl":"https://doi.org/10.6339/jds.201304_11(2).0004","url":null,"abstract":"The aim of this study is to model the progression of HIV/AIDS disease of an individual patient under ART follow-up using semi-Markov processes. Recorded hospital data were obtained for a cohort of 710 patients at Felege-Hiwot referral hospital, Ethiopia, who have been under ART followup from June 2005 to August 2009. States of the Markov process are defined by the seriousness of the sickness based on the CD4 counts in cells/microliter. The five states considered are: state one (CD4 count > 500); state two (350 < CD4 count ≤ 500); state three (200 < CD4 count ≤ 350); state four (CD4 count ≤ 200); and state five (Death). The first four states are named as good or alive states. The findings obtained from the current study are as follows: within the good states, the transition probability from a given state to the next worse state increases with time, gets optimum at a time and then decreases with increasing time. This means that there is some period of time when such probability is highest for a patient to transit to a worse state of the disease. Moreover, the probability of dying decreases with increasing CD4 counts over time. For an HIV/AIDS patient in a specific state of the disease, the probability of being in same state decreases over time. Within the good states, the results show that probability of being in a better state is non-zero, but less than the probability of being in worse state. At any time of the process, there is more likely to be in worse state than to be in better one. The conditional probability of staying in same state until a given number of month decreases with increasing time. The reliability analysis also revealed that the survival probabilities are all declining over time. This implies that patient conditions should be improved with ART to improve the survival probability.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46846149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Two-Level Factorial Design with Circular Response: Model and Analysis 具有圆形响应的两级因子设计:模型与分析
Pub Date : 2021-07-30 DOI: 10.6339/jds.201307_11(3).0003
A. Zahran
Since late thirties, factorial analysis of a response measured on the real line has been well established and documented in the literature. No such analysis, however, is available for a response measured on the circle (or sphere in general), despite the fact that many designed experiments in industry, medicine, psychology and biology could result in an angular response. In this paper a full factorial analysis is presented for a circular response using the Spherical Projected Multivariate Linear model. Main and interaction effects are defined, estimated and tested. Analogy to the linear response case, two new effect plots: Circular-Main Effect and CircularInteraction Effect plots are proposed to visualize main and interaction effects on circular responses.
自三十年代末以来,对真实线上测量的反应的因子分析已经在文献中得到了很好的建立和记录。然而,尽管工业、医学、心理学和生物学中的许多设计实验可能会产生角度响应,但在圆(或一般的球体)上测量的响应却没有这样的分析。本文使用球面投影多元线性模型对圆形响应进行了全因子分析。定义、估计和测试了主要影响和相互作用影响。与线性响应情况类似,提出了两个新的效应图:循环主效应图和循环交互效应图,以可视化循环响应的主效应和交互效应。
{"title":"Two-Level Factorial Design with Circular Response: Model and Analysis","authors":"A. Zahran","doi":"10.6339/jds.201307_11(3).0003","DOIUrl":"https://doi.org/10.6339/jds.201307_11(3).0003","url":null,"abstract":"Since late thirties, factorial analysis of a response measured on the real line has been well established and documented in the literature. No such analysis, however, is available for a response measured on the circle (or sphere in general), despite the fact that many designed experiments in industry, medicine, psychology and biology could result in an angular response. In this paper a full factorial analysis is presented for a circular response using the Spherical Projected Multivariate Linear model. Main and interaction effects are defined, estimated and tested. Analogy to the linear response case, two new effect plots: Circular-Main Effect and CircularInteraction Effect plots are proposed to visualize main and interaction effects on circular responses.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42105829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The Design Aspect of the Bruceton Test for Pyrotechnics Sensitivity Analysis 烟火灵敏度分析布鲁顿试验的设计
Pub Date : 2021-07-21 DOI: 10.6339/JDS.2003.01(1).119
C. Fuh, J. S. Lee, C. M. Liaw
We start with a data set obtained from a study of the CS-M-3 ignitor in a military experiment and is based on the classical up-and-down method of Dixon and Mood (1948). Since the Bruce- ton tests are actively employed in pyrotechnical sensitivity studies, we reexamine this method based on the view that it is designed for data-collection. Two different aspects are addressed: as a design for parameter estimation and as a design for giving clues about the good- ness of fit. Two sets of data are employed to illustrate our points. For the estimation of (µ, σ), the location and the scale parameters, we show that a properly selected up-and-down design is quite infor- mative; for the estimation of xp, the 100p%-th quantile, however, the best selected up-and-down method is only about 50% effective as compared with the corresponding c-optimal design. Although not particularly useful, the up-and-down method does judge the proper selection of underlying model. In any case, all the quantal response models are rather poor in terms goodness of fit.
我们从军事实验中对CS-M-3点火器的研究获得的数据集开始,该数据集基于Dixon和Mood(1948)的经典上下法。由于布鲁斯-顿试验在烟火敏感性研究中被积极采用,我们重新审视这种方法,基于它是为数据收集而设计的观点。两个不同的方面被处理:作为参数估计的设计和作为提供关于拟合良好度的线索的设计。我们用两组数据来说明我们的观点。对于(µ,σ),位置和尺度参数的估计,我们表明,适当选择上下设计是非常具有创新性的;对于xp的估计,100p%-th分位数,然而,与相应的c-最优设计相比,最佳选择的上下方法仅有效50%左右。虽然不是特别有用,但上下法确实可以判断底层模型的正确选择。在任何情况下,所有的量子响应模型在拟合优度方面都相当差。
{"title":"The Design Aspect of the Bruceton Test for Pyrotechnics Sensitivity Analysis","authors":"C. Fuh, J. S. Lee, C. M. Liaw","doi":"10.6339/JDS.2003.01(1).119","DOIUrl":"https://doi.org/10.6339/JDS.2003.01(1).119","url":null,"abstract":"We start with a data set obtained from a study of the CS-M-3 ignitor in a military experiment and is based on the classical up-and-down method of Dixon and Mood (1948). Since the Bruce- ton tests are actively employed in pyrotechnical sensitivity studies, we reexamine this method based on the view that it is designed for data-collection. Two different aspects are addressed: as a design for parameter estimation and as a design for giving clues about the good- ness of fit. Two sets of data are employed to illustrate our points. For the estimation of (µ, σ), the location and the scale parameters, we show that a properly selected up-and-down design is quite infor- mative; for the estimation of xp, the 100p%-th quantile, however, the best selected up-and-down method is only about 50% effective as compared with the corresponding c-optimal design. Although not particularly useful, the up-and-down method does judge the proper selection of underlying model. In any case, all the quantal response models are rather poor in terms goodness of fit.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"84 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71321265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Building an Honest Tree for Mass Spectra Classification Based on Prior Logarithm Normal Distribution 建立基于先验对数正态分布的质谱分类诚实树
Pub Date : 2021-07-21 DOI: 10.6339/JDS.2003.01(4).179
Cheng-Jian Xu, Ping He, Yizeng Liang
Structure elucidation is one of big tasks for analytical researcher and it often needs an efficient classifier. The decision tree is especially attractive for easy understanding and intuitive represen- tation. However, small change in the data set due to the experiment error can often result in a very different series of split. In this pa- per, a prior logarithm normal distribution is adopted to weight the original mass spectra. It helps to building an honest tree for later structure elucidation.
结构解析是分析研究者的重要任务之一,它往往需要一个高效的分类器。决策树具有易于理解和直观表示的特点。然而,由于实验误差导致的数据集的微小变化往往会导致非常不同的分裂序列。该方法采用先验对数正态分布对原始质谱进行加权。它有助于建立一个诚实的树,为以后的结构说明。
{"title":"Building an Honest Tree for Mass Spectra Classification Based on Prior Logarithm Normal Distribution","authors":"Cheng-Jian Xu, Ping He, Yizeng Liang","doi":"10.6339/JDS.2003.01(4).179","DOIUrl":"https://doi.org/10.6339/JDS.2003.01(4).179","url":null,"abstract":"Structure elucidation is one of big tasks for analytical researcher and it often needs an efficient classifier. The decision tree is especially attractive for easy understanding and intuitive represen- tation. However, small change in the data set due to the experiment error can often result in a very different series of split. In this pa- per, a prior logarithm normal distribution is adopted to weight the original mass spectra. It helps to building an honest tree for later structure elucidation.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46490228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of data science : JDS
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1