Pub Date : 2021-07-30DOI: 10.6339/JDS.2013.11(2).1146
R. Wilcox
It is well known that the ordinary least squares (OLS) regression estimator is not robust. Many robust regression estimators have been proposed and inferential methods based on these estimators have been derived. However, for two independent groups, let θj (X) be some conditional measure of location for the jth group, given X, based on some robust regression estimator. An issue that has not been addressed is computing a 1 - α confidence interval for θ1(X) - θ2(X) in a manner that allows both within group and between group hetereoscedasticity. The paper reports the finite sample properties of a simple method for accomplishing this goal. Simulations indicate that, in terms of controlling the probability of a Type I error, the method performs very well for a wide range of situations, even with a relatively small sample size. In principle, any robust regression estimator can be used. The simulations are focused primarily on the Theil-Sen estimator, but some results using Yohai's MM-estimator, as well as the Koenker and Bassett quantile regression estimator, are noted. Data from the Well Elderly II study, dealing with measures of meaningful activity using the cortisol awakening response as a covariate, are used to illustrate that the choice between an extant method based on a nonparametric regression estimator, and the method suggested here, can make a practical difference.
{"title":"A HETEROSCEDASTIC METHOD FOR COMPARING REGRESSION LINES AT SPECIFIED DESIGN POINTS WHEN USING A ROBUST REGRESSION ESTIMATOR.","authors":"R. Wilcox","doi":"10.6339/JDS.2013.11(2).1146","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(2).1146","url":null,"abstract":"It is well known that the ordinary least squares (OLS) regression estimator is not robust. Many robust regression estimators have been proposed and inferential methods based on these estimators have been derived. However, for two independent groups, let θj (X) be some conditional measure of location for the jth group, given X, based on some robust regression estimator. An issue that has not been addressed is computing a 1 - α confidence interval for θ1(X) - θ2(X) in a manner that allows both within group and between group hetereoscedasticity. The paper reports the finite sample properties of a simple method for accomplishing this goal. Simulations indicate that, in terms of controlling the probability of a Type I error, the method performs very well for a wide range of situations, even with a relatively small sample size. In principle, any robust regression estimator can be used. The simulations are focused primarily on the Theil-Sen estimator, but some results using Yohai's MM-estimator, as well as the Koenker and Bassett quantile regression estimator, are noted. Data from the Well Elderly II study, dealing with measures of meaningful activity using the cortisol awakening response as a covariate, are used to illustrate that the choice between an extant method based on a nonparametric regression estimator, and the method suggested here, can make a practical difference.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"73 1","pages":"281-291"},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73846635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-30DOI: 10.6339/JDS.201301_11(1).0008
N. Nananukul
Watching videos online has become a popular activity for people around the world. To be able to manage revenue from online advertising an efficient Ad server that can match advertisement to targeted users is needed. In general the users’ demographics are provided to an Ad server by an inference engine which infers users’ demographics based on a profile reasoning technique. Rich media streaming through broadband networks has made significant impact on how online television users’ profiles reasoning can be implemented. Compared to traditional broadcasting services such as satellite and cable, broadcasting through broadband networks enables bidirectional communication between users and content providers. In this paper, a user profile reasoning technique based on a logistic regression model is introduced. The inference model takes into account genre preferences and viewing time from users in different age/gender groups. Historical viewing data were used to train and build the model. Different input data processing and model building strategies are discussed. Also, experimental results are provided to show how effective the proposed technique is.
{"title":"An Inference Model for Online Media Users","authors":"N. Nananukul","doi":"10.6339/JDS.201301_11(1).0008","DOIUrl":"https://doi.org/10.6339/JDS.201301_11(1).0008","url":null,"abstract":"Watching videos online has become a popular activity for people around the world. To be able to manage revenue from online advertising an efficient Ad server that can match advertisement to targeted users is needed. In general the users’ demographics are provided to an Ad server by an inference engine which infers users’ demographics based on a profile reasoning technique. Rich media streaming through broadband networks has made significant impact on how online television users’ profiles reasoning can be implemented. Compared to traditional broadcasting services such as satellite and cable, broadcasting through broadband networks enables bidirectional communication between users and content providers. In this paper, a user profile reasoning technique based on a logistic regression model is introduced. The inference model takes into account genre preferences and viewing time from users in different age/gender groups. Historical viewing data were used to train and build the model. Different input data processing and model building strategies are discussed. Also, experimental results are provided to show how effective the proposed technique is.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41594284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-30DOI: 10.6339/JDS.2013.11(2).1133
T. Dey
We have developed a tool for model space exploration and variable selection in linear regression models based on a simple spike and slab model (Dey, 2012). The model chosen is the best model with minimum nal prediction error (FPE) values among all other models. This is implemented via the R package modelSampler. However, model selection based on FPE criteria is dubious and questionable as FPE criteria can be sensitive to perturbations in the data. This R package can be used for empirical assessment of the stability of FPE criteria. A stable model selection is accomplished by using a bootstrap wrapper that calls the primary function of the package several times on the bootstrapped data. The heart of the method is the notion of model averaging for stable variable selection and to study the behavior of variables over the entire model space, a concept invaluable in high dimensional situations.
{"title":"modelSampler: An R Tool for Variable Selection and Model Exploration in Linear Regression","authors":"T. Dey","doi":"10.6339/JDS.2013.11(2).1133","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(2).1133","url":null,"abstract":"We have developed a tool for model space exploration and variable selection in linear regression models based on a simple spike and slab model (Dey, 2012). The model chosen is the best model with minimum nal prediction error (FPE) values among all other models. This is implemented via the R package modelSampler. However, model selection based on FPE criteria is dubious and questionable as FPE criteria can be sensitive to perturbations in the data. This R package can be used for empirical assessment of the stability of FPE criteria. A stable model selection is accomplished by using a bootstrap wrapper that calls the primary function of the package several times on the bootstrapped data. The heart of the method is the notion of model averaging for stable variable selection and to study the behavior of variables over the entire model space, a concept invaluable in high dimensional situations.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49303089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-30DOI: 10.6339/JDS.2013.11(3).1154
H. Haller, T. Dey, L. Gittner, S. Ludington-Hoe
Childhood obesity is a major health concern. The associated health risks dramatically reduce lifespan and increase healthcare costs. The goal was to develop methodology to identify as early in life as possible whether or not a child would become obese at age five. This diagnostic tool would facilitate clinical monitoring to prevent and or minimize obesity. Obesity is measured by Body Mass Index (BMI), but an improved metric, the ratio of weight to height (or length) (WOH), is proposed from this research for detecting early obesity. Results of this research demonstrate that WOH performs better than BMI for early detection of obesity in individuals using a longitudinal decision analysis (LDA), which is essentially an individuals type control chart analysis about a trend line. Utilizing LDA, the odds of obesity of a child at age five is indicated before the second birthday with 95% sensitivity and 97% specificity. Further, obesity at age five is indicated with 75% specificity before two months and with 84% specificity before three months of age. These results warrant expanding this study to larger cohorts of normal, overweight, and obese children at age five from different healthcare facilities to test the applicability of this novel diagnostic tool.
{"title":"Use of Serial Weight and Length Measurements in Children from Birth to Two Years of Age to Predict Obesity at Five Years of Age","authors":"H. Haller, T. Dey, L. Gittner, S. Ludington-Hoe","doi":"10.6339/JDS.2013.11(3).1154","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(3).1154","url":null,"abstract":"Childhood obesity is a major health concern. The associated health risks dramatically reduce lifespan and increase healthcare costs. The goal was to develop methodology to identify as early in life as possible whether or not a child would become obese at age five. This diagnostic tool would facilitate clinical monitoring to prevent and or minimize obesity. Obesity is measured by Body Mass Index (BMI), but an improved metric, the ratio of weight to height (or length) (WOH), is proposed from this research for detecting early obesity. Results of this research demonstrate that WOH performs better than BMI for early detection of obesity in individuals using a longitudinal decision analysis (LDA), which is essentially an individuals type control chart analysis about a trend line. Utilizing LDA, the odds of obesity of a child at age five is indicated before the second birthday with 95% sensitivity and 97% specificity. Further, obesity at age five is indicated with 75% specificity before two months and with 84% specificity before three months of age. These results warrant expanding this study to larger cohorts of normal, overweight, and obese children at age five from different healthcare facilities to test the applicability of this novel diagnostic tool.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48169099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-30DOI: 10.6339/JDS.2013.11(3).1145
Ling-Jing Kao, F. Lin, C. Yu
Although many scoring models have been developed in literature to oer nancial institutions guidance in credit granting decision, the pur- pose of most scoring models are to improve their discrimination ability, not their explanatory ability. Therefore, the conventional scoring models can only provide limited information in the relationship among customer de- mographics, default risk, and credit card attributes, such as APR (annual percentage rate) and credit limits. In this paper, a Bayesian behavior scor- ing model is proposed to help nancial institutions identify factors which truly reect customer value and can aect default risk. To illustrate the proposed model, we applied it to the credit cardholder database provided by one major bank in Taiwan. The empirical results show that increasing APR will raise the default probability greatly. Single cardholders are less accountable for credit card repayment. High income, female, or cardholders with higher education are more likely to have good repayment ability.
{"title":"Bayesian Behavior Scoring Model","authors":"Ling-Jing Kao, F. Lin, C. Yu","doi":"10.6339/JDS.2013.11(3).1145","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(3).1145","url":null,"abstract":"Although many scoring models have been developed in literature to oer nancial institutions guidance in credit granting decision, the pur- pose of most scoring models are to improve their discrimination ability, not their explanatory ability. Therefore, the conventional scoring models can only provide limited information in the relationship among customer de- mographics, default risk, and credit card attributes, such as APR (annual percentage rate) and credit limits. In this paper, a Bayesian behavior scor- ing model is proposed to help nancial institutions identify factors which truly reect customer value and can aect default risk. To illustrate the proposed model, we applied it to the credit cardholder database provided by one major bank in Taiwan. The empirical results show that increasing APR will raise the default probability greatly. Single cardholders are less accountable for credit card repayment. High income, female, or cardholders with higher education are more likely to have good repayment ability.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44982024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-30DOI: 10.6339/JDS.2013.11(2).1144
Abbas Pak, G. Parham, M. Saraj
The scheme of doubly type-II censored sampling is an important method of obtaining data in lifetime studies. Statistical analysis of life- time distributions under this censoring scheme is based on precise lifetime data. However, some collected lifetime data might be imprecise and are represented in the form of fuzzy numbers. This paper deals with the prob- lem of estimating the scale parameter of Rayleigh distribution under doubly type-II censoring scheme when the lifetime observations are fuzzy and are assumed to be related to underlying crisp realization of a random sample. We propose a new method to determine the maximum likelihood estimate of the parameter of interest. The asymptotic variance of the ML estimate is then derived by using the missing information principle. Their performance is then assessed through Monte Carlo simulations. Finally, an illustrative example with real data concerning 25 ball bearings in a life test is presented.
{"title":"On Estimation of Rayleigh Scale Parameter under Doubly Type-II Censoring from Imprecise Data","authors":"Abbas Pak, G. Parham, M. Saraj","doi":"10.6339/JDS.2013.11(2).1144","DOIUrl":"https://doi.org/10.6339/JDS.2013.11(2).1144","url":null,"abstract":"The scheme of doubly type-II censored sampling is an important method of obtaining data in lifetime studies. Statistical analysis of life- time distributions under this censoring scheme is based on precise lifetime data. However, some collected lifetime data might be imprecise and are represented in the form of fuzzy numbers. This paper deals with the prob- lem of estimating the scale parameter of Rayleigh distribution under doubly type-II censoring scheme when the lifetime observations are fuzzy and are assumed to be related to underlying crisp realization of a random sample. We propose a new method to determine the maximum likelihood estimate of the parameter of interest. The asymptotic variance of the ML estimate is then derived by using the missing information principle. Their performance is then assessed through Monte Carlo simulations. Finally, an illustrative example with real data concerning 25 ball bearings in a life test is presented.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46969078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-30DOI: 10.6339/jds.201304_11(2).0004
A. Goshu, Zelalem G. Dessie
The aim of this study is to model the progression of HIV/AIDS disease of an individual patient under ART follow-up using semi-Markov processes. Recorded hospital data were obtained for a cohort of 710 patients at Felege-Hiwot referral hospital, Ethiopia, who have been under ART followup from June 2005 to August 2009. States of the Markov process are defined by the seriousness of the sickness based on the CD4 counts in cells/microliter. The five states considered are: state one (CD4 count > 500); state two (350 < CD4 count ≤ 500); state three (200 < CD4 count ≤ 350); state four (CD4 count ≤ 200); and state five (Death). The first four states are named as good or alive states. The findings obtained from the current study are as follows: within the good states, the transition probability from a given state to the next worse state increases with time, gets optimum at a time and then decreases with increasing time. This means that there is some period of time when such probability is highest for a patient to transit to a worse state of the disease. Moreover, the probability of dying decreases with increasing CD4 counts over time. For an HIV/AIDS patient in a specific state of the disease, the probability of being in same state decreases over time. Within the good states, the results show that probability of being in a better state is non-zero, but less than the probability of being in worse state. At any time of the process, there is more likely to be in worse state than to be in better one. The conditional probability of staying in same state until a given number of month decreases with increasing time. The reliability analysis also revealed that the survival probabilities are all declining over time. This implies that patient conditions should be improved with ART to improve the survival probability.
{"title":"Modelling Progression of HIV/AIDS Disease Stages Using Semi-Markov Processes","authors":"A. Goshu, Zelalem G. Dessie","doi":"10.6339/jds.201304_11(2).0004","DOIUrl":"https://doi.org/10.6339/jds.201304_11(2).0004","url":null,"abstract":"The aim of this study is to model the progression of HIV/AIDS disease of an individual patient under ART follow-up using semi-Markov processes. Recorded hospital data were obtained for a cohort of 710 patients at Felege-Hiwot referral hospital, Ethiopia, who have been under ART followup from June 2005 to August 2009. States of the Markov process are defined by the seriousness of the sickness based on the CD4 counts in cells/microliter. The five states considered are: state one (CD4 count > 500); state two (350 < CD4 count ≤ 500); state three (200 < CD4 count ≤ 350); state four (CD4 count ≤ 200); and state five (Death). The first four states are named as good or alive states. The findings obtained from the current study are as follows: within the good states, the transition probability from a given state to the next worse state increases with time, gets optimum at a time and then decreases with increasing time. This means that there is some period of time when such probability is highest for a patient to transit to a worse state of the disease. Moreover, the probability of dying decreases with increasing CD4 counts over time. For an HIV/AIDS patient in a specific state of the disease, the probability of being in same state decreases over time. Within the good states, the results show that probability of being in a better state is non-zero, but less than the probability of being in worse state. At any time of the process, there is more likely to be in worse state than to be in better one. The conditional probability of staying in same state until a given number of month decreases with increasing time. The reliability analysis also revealed that the survival probabilities are all declining over time. This implies that patient conditions should be improved with ART to improve the survival probability.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46846149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-30DOI: 10.6339/jds.201307_11(3).0003
A. Zahran
Since late thirties, factorial analysis of a response measured on the real line has been well established and documented in the literature. No such analysis, however, is available for a response measured on the circle (or sphere in general), despite the fact that many designed experiments in industry, medicine, psychology and biology could result in an angular response. In this paper a full factorial analysis is presented for a circular response using the Spherical Projected Multivariate Linear model. Main and interaction effects are defined, estimated and tested. Analogy to the linear response case, two new effect plots: Circular-Main Effect and CircularInteraction Effect plots are proposed to visualize main and interaction effects on circular responses.
{"title":"Two-Level Factorial Design with Circular Response: Model and Analysis","authors":"A. Zahran","doi":"10.6339/jds.201307_11(3).0003","DOIUrl":"https://doi.org/10.6339/jds.201307_11(3).0003","url":null,"abstract":"Since late thirties, factorial analysis of a response measured on the real line has been well established and documented in the literature. No such analysis, however, is available for a response measured on the circle (or sphere in general), despite the fact that many designed experiments in industry, medicine, psychology and biology could result in an angular response. In this paper a full factorial analysis is presented for a circular response using the Spherical Projected Multivariate Linear model. Main and interaction effects are defined, estimated and tested. Analogy to the linear response case, two new effect plots: Circular-Main Effect and CircularInteraction Effect plots are proposed to visualize main and interaction effects on circular responses.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42105829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-21DOI: 10.6339/JDS.2003.01(1).119
C. Fuh, J. S. Lee, C. M. Liaw
We start with a data set obtained from a study of the CS-M-3 ignitor in a military experiment and is based on the classical up-and-down method of Dixon and Mood (1948). Since the Bruce- ton tests are actively employed in pyrotechnical sensitivity studies, we reexamine this method based on the view that it is designed for data-collection. Two different aspects are addressed: as a design for parameter estimation and as a design for giving clues about the good- ness of fit. Two sets of data are employed to illustrate our points. For the estimation of (µ, σ), the location and the scale parameters, we show that a properly selected up-and-down design is quite infor- mative; for the estimation of xp, the 100p%-th quantile, however, the best selected up-and-down method is only about 50% effective as compared with the corresponding c-optimal design. Although not particularly useful, the up-and-down method does judge the proper selection of underlying model. In any case, all the quantal response models are rather poor in terms goodness of fit.
{"title":"The Design Aspect of the Bruceton Test for Pyrotechnics Sensitivity Analysis","authors":"C. Fuh, J. S. Lee, C. M. Liaw","doi":"10.6339/JDS.2003.01(1).119","DOIUrl":"https://doi.org/10.6339/JDS.2003.01(1).119","url":null,"abstract":"We start with a data set obtained from a study of the CS-M-3 ignitor in a military experiment and is based on the classical up-and-down method of Dixon and Mood (1948). Since the Bruce- ton tests are actively employed in pyrotechnical sensitivity studies, we reexamine this method based on the view that it is designed for data-collection. Two different aspects are addressed: as a design for parameter estimation and as a design for giving clues about the good- ness of fit. Two sets of data are employed to illustrate our points. For the estimation of (µ, σ), the location and the scale parameters, we show that a properly selected up-and-down design is quite infor- mative; for the estimation of xp, the 100p%-th quantile, however, the best selected up-and-down method is only about 50% effective as compared with the corresponding c-optimal design. Although not particularly useful, the up-and-down method does judge the proper selection of underlying model. In any case, all the quantal response models are rather poor in terms goodness of fit.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":"84 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71321265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-21DOI: 10.6339/JDS.2003.01(4).179
Cheng-Jian Xu, Ping He, Yizeng Liang
Structure elucidation is one of big tasks for analytical researcher and it often needs an efficient classifier. The decision tree is especially attractive for easy understanding and intuitive represen- tation. However, small change in the data set due to the experiment error can often result in a very different series of split. In this pa- per, a prior logarithm normal distribution is adopted to weight the original mass spectra. It helps to building an honest tree for later structure elucidation.
{"title":"Building an Honest Tree for Mass Spectra Classification Based on Prior Logarithm Normal Distribution","authors":"Cheng-Jian Xu, Ping He, Yizeng Liang","doi":"10.6339/JDS.2003.01(4).179","DOIUrl":"https://doi.org/10.6339/JDS.2003.01(4).179","url":null,"abstract":"Structure elucidation is one of big tasks for analytical researcher and it often needs an efficient classifier. The decision tree is especially attractive for easy understanding and intuitive represen- tation. However, small change in the data set due to the experiment error can often result in a very different series of split. In this pa- per, a prior logarithm normal distribution is adopted to weight the original mass spectra. It helps to building an honest tree for later structure elucidation.","PeriodicalId":73699,"journal":{"name":"Journal of data science : JDS","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46490228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}