Abstract Identification of taxa can significantly be assisted by statistical classification based on trait measurements either individually or by phylogenetic (clustering) methods. In this article, we present a general Bayesian approach for classifying species individually based on measurements of a mixture of continuous and ordinal traits, and any type of covariates. The trait vector is derived from a latent variable with a multivariate Gaussian distribution. Decision rules based on supervised learning are presented that estimate model parameters through blocked Gibbs sampling. These decision regions allow for uncertainty (partial rejection), so that not necessarily one specific category (taxon) is output when new subjects are classified, but rather a set of categories including the most probable taxa. This type of discriminant analysis employs reward functions with a set-valued input argument, so that an optimal Bayes classifier can be defined. We also present a way of safeguarding against outlying new observations, using an analogue of a p-value within our Bayesian setting. We refer to our Bayesian set-valued classifier as the Karlsson–Hössjer method, and it is illustrated on an original ornithological data set of birds. We also incorporate model selection through cross-validation, exemplified on another original data set of birds.
{"title":"Identification of taxon through classification with partial reject options","authors":"Måns Karlsson, Ola Hössjer","doi":"10.1093/jrsssc/qlad036","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad036","url":null,"abstract":"Abstract Identification of taxa can significantly be assisted by statistical classification based on trait measurements either individually or by phylogenetic (clustering) methods. In this article, we present a general Bayesian approach for classifying species individually based on measurements of a mixture of continuous and ordinal traits, and any type of covariates. The trait vector is derived from a latent variable with a multivariate Gaussian distribution. Decision rules based on supervised learning are presented that estimate model parameters through blocked Gibbs sampling. These decision regions allow for uncertainty (partial rejection), so that not necessarily one specific category (taxon) is output when new subjects are classified, but rather a set of categories including the most probable taxa. This type of discriminant analysis employs reward functions with a set-valued input argument, so that an optimal Bayes classifier can be defined. We also present a way of safeguarding against outlying new observations, using an analogue of a p-value within our Bayesian setting. We refer to our Bayesian set-valued classifier as the Karlsson–Hössjer method, and it is illustrated on an original ornithological data set of birds. We also incorporate model selection through cross-validation, exemplified on another original data set of birds.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136260255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spatial preferential sampling occurs when the choice of sampling locations depends stochastically on the process of interest. Ignoring this dependence leads to inaccurate inferences. Our framework models experimenter preferences jointly with the spatial process to adjust for this. We dispense with the unrealistic assumption (required by existing methods) of conditional independence of sampling locations by defining a whole design distribution proportional to a utility function on the space of designs. The proposed model likelihood is generally intractable. We provide fitting techniques based on the noisy Markov chain Monte Carlo and demonstrate their usage on a data set of spatially distributed ammonia concentrations.
{"title":"A design utility approach for preferentially sampled spatial data","authors":"Elizabeth J Gray, E. Evangelou","doi":"10.1093/jrsssc/qlad040","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad040","url":null,"abstract":"\u0000 Spatial preferential sampling occurs when the choice of sampling locations depends stochastically on the process of interest. Ignoring this dependence leads to inaccurate inferences. Our framework models experimenter preferences jointly with the spatial process to adjust for this. We dispense with the unrealistic assumption (required by existing methods) of conditional independence of sampling locations by defining a whole design distribution proportional to a utility function on the space of designs. The proposed model likelihood is generally intractable. We provide fitting techniques based on the noisy Markov chain Monte Carlo and demonstrate their usage on a data set of spatially distributed ammonia concentrations.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"17 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81990291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Daniel Clarkson, Emma Eastoe and Amber Leeson's (Lancaster University) reply to the Discussion of ‘Statistical aspects of climate change’","authors":"D. Clarkson, E. Eastoe, A. Leeson","doi":"10.1093/jrsssc/qlad059","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad059","url":null,"abstract":"","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"34 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75588026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Anna Choi and Tze Leung Lai’s contribution to the Discussion of ‘The First Discussion Meeting on Statistical aspects of climate change’","authors":"Anna Choi, T. Lai","doi":"10.1093/jrsssc/qlad050","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad050","url":null,"abstract":"","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"96 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85300919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Richard L Smith’s contribution to the Discussion of ‘The First Discussion Meeting on Statistical aspects of climate change’","authors":"Richard L. Smith","doi":"10.1093/jrsssc/qlad046","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad046","url":null,"abstract":"","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"31 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74066061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present the two-dimensional targeted fused ridge estimator of the linear and logistic regression models. The estimator (i) handles both unpenalised and penalised covariates, (ii) accommodates possible relations among the covariates’ coefficients through a fusion penalty, and (iii) incorporates prior information on the regression parameter through a non-zero shrinkage target. In this work, the aforementioned relations are similarities among the covariates’ coefficients due to spatial proximity in a two-dimensional grid. In an extensive re-analysis of an epidemiological and an image analysis study, we illustrate the use of the estimator’s aforementioned features that result in a tangibly interpretable predictor.
{"title":"Two-dimensional fused targeted ridge regression for health indicator prediction from accelerometer data","authors":"A. Lettink, M. Chinapaw, W. V. van Wieringen","doi":"10.1093/jrsssc/qlad041","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad041","url":null,"abstract":"\u0000 We present the two-dimensional targeted fused ridge estimator of the linear and logistic regression models. The estimator (i) handles both unpenalised and penalised covariates, (ii) accommodates possible relations among the covariates’ coefficients through a fusion penalty, and (iii) incorporates prior information on the regression parameter through a non-zero shrinkage target. In this work, the aforementioned relations are similarities among the covariates’ coefficients due to spatial proximity in a two-dimensional grid. In an extensive re-analysis of an epidemiological and an image analysis study, we illustrate the use of the estimator’s aforementioned features that result in a tangibly interpretable predictor.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"1 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90622524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Christine P Chai’s contribution to the Discussion of ‘The First Discussion Meeting on Statistical aspects of climate change’","authors":"Christine P Chai","doi":"10.1093/jrsssc/qlad049","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad049","url":null,"abstract":"","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"7 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75429227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Christian Rohrbeck’s contribution to the Discussion of ‘The First Discussion Meeting on Statistical aspects of climate change’","authors":"C. Rohrbeck","doi":"10.1093/jrsssc/qlad047","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad047","url":null,"abstract":"","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"119 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79409526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proposer of the vote of thanks and contribution to the Discussion of ‘The First Discussion Meeting on Statistical aspects of climate change’","authors":"A. Raftery","doi":"10.1093/jrsssc/qlad044","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad044","url":null,"abstract":"","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"15 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89650540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sometimes classification tasks have to be based on multivariate time series data collected for each class. In these situations the data for each class might exhibit non-stationary behaviour together with complex dependence structures. We propose a vine copula-based approach to capture these features in each class before applying a Bayesian classifier. Vine copulas have been very successful in modelling asymmetric tail dependence among variables and are coupled with non-stationary univariate time series to model the multivariate time series data for each class. We illustrate this classification approach using data from a neural activity experiment using electroencephalography, where we want to classify the eye state. The level of neural activity was collected over time for multiple locations on the scalp. Our approach is able to identify relevant locations and allows for a model-based interpretation of the data generating process. A cross-validation study with comparison to competitor classifiers for this data set shows good performance of the proposed classifier.
{"title":"Vine copula-based Bayesian classification for multivariate time series of electroencephalography eye states","authors":"Chunfang Zhang, C. Czado","doi":"10.1093/jrsssc/qlad038","DOIUrl":"https://doi.org/10.1093/jrsssc/qlad038","url":null,"abstract":"\u0000 Sometimes classification tasks have to be based on multivariate time series data collected for each class. In these situations the data for each class might exhibit non-stationary behaviour together with complex dependence structures. We propose a vine copula-based approach to capture these features in each class before applying a Bayesian classifier. Vine copulas have been very successful in modelling asymmetric tail dependence among variables and are coupled with non-stationary univariate time series to model the multivariate time series data for each class. We illustrate this classification approach using data from a neural activity experiment using electroencephalography, where we want to classify the eye state. The level of neural activity was collected over time for multiple locations on the scalp. Our approach is able to identify relevant locations and allows for a model-based interpretation of the data generating process. A cross-validation study with comparison to competitor classifiers for this data set shows good performance of the proposed classifier.","PeriodicalId":49981,"journal":{"name":"Journal of the Royal Statistical Society Series C-Applied Statistics","volume":"264 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86546868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}