Factorization of large data corpora has emerged as an essential technique to extract dictionaries (sets of patterns that are meaningful for sparse encoding). Following this line, we present a novel algorithm based on compressive learning theory. In this framework, the (arbitrarily large) dataset of interest is replaced by a fixed‐size sketch resulting from a random sampling of the data distribution characteristic function. We apply our algorithm to the extraction of chromatographic elution profiles in mass spectrometry data, where it demonstrates its efficiency and interest compared to other related algorithms.
{"title":"Sketched Stochastic Dictionary Learning for large‐scale data and application to high‐throughput mass spectrometry","authors":"O. Permiakova, T. Burger","doi":"10.1002/sam.11542","DOIUrl":"https://doi.org/10.1002/sam.11542","url":null,"abstract":"Factorization of large data corpora has emerged as an essential technique to extract dictionaries (sets of patterns that are meaningful for sparse encoding). Following this line, we present a novel algorithm based on compressive learning theory. In this framework, the (arbitrarily large) dataset of interest is replaced by a fixed‐size sketch resulting from a random sampling of the data distribution characteristic function. We apply our algorithm to the extraction of chromatographic elution profiles in mass spectrometry data, where it demonstrates its efficiency and interest compared to other related algorithms.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"101 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114095267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we suggest a method for improving model selection in the presence of heteroscedasticity. For this purpose, we measure the heteroscedasticity in the data using the inter‐quartile range (IQR) of the fitted values under the framework of cross‐validation. To find the IQR, we fit 0.25 and 0.75 generic quantile regression using the training data. The two models then predict the values of the response variable at 0.25 and 0.75 quantiles in the test data, which yields predicted IQR. To reduce the effect of heteroscedastic data in the model selection, we propose to use weighted prediction error. The inverse of the predicted IQR is utilized to estimate the weights. The proposed method reduces the impact of large prediction errors via weighted prediction and leads to better model and parameter selection. The benefits of the proposed method are demonstrated in simulations and with two real data sets.
{"title":"Weighted validation of heteroscedastic regression models for better selection","authors":"Yoonsuh Jung, Hayoung Kim","doi":"10.1002/sam.11544","DOIUrl":"https://doi.org/10.1002/sam.11544","url":null,"abstract":"In this paper, we suggest a method for improving model selection in the presence of heteroscedasticity. For this purpose, we measure the heteroscedasticity in the data using the inter‐quartile range (IQR) of the fitted values under the framework of cross‐validation. To find the IQR, we fit 0.25 and 0.75 generic quantile regression using the training data. The two models then predict the values of the response variable at 0.25 and 0.75 quantiles in the test data, which yields predicted IQR. To reduce the effect of heteroscedastic data in the model selection, we propose to use weighted prediction error. The inverse of the predicted IQR is utilized to estimate the weights. The proposed method reduces the impact of large prediction errors via weighted prediction and leads to better model and parameter selection. The benefits of the proposed method are demonstrated in simulations and with two real data sets.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125180706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider modal linear regression models when neither the response variable nor the covariates can be directly observed, but are measured with multiplicative distortion measurement errors. Four calibration procedures are used to estimate parameters in the modal linear regression models, namely, conditional mean calibration, conditional absolute mean calibration, conditional variance calibration, and conditional absolute logarithmic calibration. The asymptotic properties for the estimators based on four calibration procedures are established. Monte Carlo simulation experiments are conducted to examine the performance of the proposed estimators. The proposed estimators are applied to analyze a forest fires dataset for an illustration.
{"title":"Modal linear regression models with multiplicative distortion measurement errors","authors":"Jun Zhang, Gaorong Li, Yiping Yang","doi":"10.1002/sam.11541","DOIUrl":"https://doi.org/10.1002/sam.11541","url":null,"abstract":"We consider modal linear regression models when neither the response variable nor the covariates can be directly observed, but are measured with multiplicative distortion measurement errors. Four calibration procedures are used to estimate parameters in the modal linear regression models, namely, conditional mean calibration, conditional absolute mean calibration, conditional variance calibration, and conditional absolute logarithmic calibration. The asymptotic properties for the estimators based on four calibration procedures are established. Monte Carlo simulation experiments are conducted to examine the performance of the proposed estimators. The proposed estimators are applied to analyze a forest fires dataset for an illustration.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114096519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural networks are routinely used for nonparametric regression modeling. The interest in these models is growing with ever‐increasing complexities in modern datasets. With modern technological advancements, the number of predictors frequently exceeds the sample size in many application areas. Thus, selecting important predictors from the huge pool is an extremely important task for judicious inference. This paper proposes a novel flexible class of single‐layer radial basis functions (RBF) networks. The proposed architecture can estimate smooth unknown regression functions and also perform variable selection. We primarily focus on Gaussian RBF‐net due to its attractive properties. The extensions to other choices of RBF are fairly straightforward. The proposed architecture is also shown to be effective in identifying relevant predictors in a low‐dimensional setting using the posterior samples without imposing any sparse estimation scheme. We develop an efficient Markov chain Monte Carlo algorithm to generate posterior samples of the parameters. We illustrate the proposed method's empirical efficacy through simulation experiments, both in high and low dimensional regression problems. The posterior contraction rate is established with respect to empirical ℓ2 distance assuming that the error variance is unknown, and the true function belongs to a Hölder ball. We illustrate our method in a Human Connectome Project dataset to predict vocabulary comprehension and to identify important edges of the structural connectome.
{"title":"Multivariate Gaussian RBF‐net for smooth function estimation and variable selection","authors":"Arkaprava Roy","doi":"10.1002/sam.11540","DOIUrl":"https://doi.org/10.1002/sam.11540","url":null,"abstract":"Neural networks are routinely used for nonparametric regression modeling. The interest in these models is growing with ever‐increasing complexities in modern datasets. With modern technological advancements, the number of predictors frequently exceeds the sample size in many application areas. Thus, selecting important predictors from the huge pool is an extremely important task for judicious inference. This paper proposes a novel flexible class of single‐layer radial basis functions (RBF) networks. The proposed architecture can estimate smooth unknown regression functions and also perform variable selection. We primarily focus on Gaussian RBF‐net due to its attractive properties. The extensions to other choices of RBF are fairly straightforward. The proposed architecture is also shown to be effective in identifying relevant predictors in a low‐dimensional setting using the posterior samples without imposing any sparse estimation scheme. We develop an efficient Markov chain Monte Carlo algorithm to generate posterior samples of the parameters. We illustrate the proposed method's empirical efficacy through simulation experiments, both in high and low dimensional regression problems. The posterior contraction rate is established with respect to empirical ℓ2 distance assuming that the error variance is unknown, and the true function belongs to a Hölder ball. We illustrate our method in a Human Connectome Project dataset to predict vocabulary comprehension and to identify important edges of the structural connectome.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133067313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Markov random field or undirected graphical models (GM) are a popular class of GM useful in various fields because they provide an intuitive and interpretable graph expressing the complex relationship between random variables. The zero‐inflated local Poisson graphical model has been proposed as a graphical model for count data with excess zeros. However, as count data are often characterized by over‐dispersion, the local Poisson graphical model may suffer from a poor fit to data. In this paper, we propose a zero‐inflated local negative binomial (NB) graphical model. Due to the dependencies of parameters in our models, a direct optimization of the objective function is difficult. Instead, we devise expectation‐minimization algorithms based on two different parametrizations for the NB distribution. Through a simulation study, we illustrate the effectiveness of our method for learning network structure from over‐dispersed count data with excess zeros. We further apply our method to real data to estimate its network structure.
{"title":"Negative binomial graphical model with excess zeros","authors":"Beomjin Park, Hosik Choi, Changyi Park","doi":"10.1002/sam.11536","DOIUrl":"https://doi.org/10.1002/sam.11536","url":null,"abstract":"Markov random field or undirected graphical models (GM) are a popular class of GM useful in various fields because they provide an intuitive and interpretable graph expressing the complex relationship between random variables. The zero‐inflated local Poisson graphical model has been proposed as a graphical model for count data with excess zeros. However, as count data are often characterized by over‐dispersion, the local Poisson graphical model may suffer from a poor fit to data. In this paper, we propose a zero‐inflated local negative binomial (NB) graphical model. Due to the dependencies of parameters in our models, a direct optimization of the objective function is difficult. Instead, we devise expectation‐minimization algorithms based on two different parametrizations for the NB distribution. Through a simulation study, we illustrate the effectiveness of our method for learning network structure from over‐dispersed count data with excess zeros. We further apply our method to real data to estimate its network structure.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124654145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the development of vehicle telematics and data mining technology, usage‐based insurance (UBI) has aroused widespread interest from both academia and industry. The extensive driving behavior features make it possible to further understand the risks of insured vehicles, but pose challenges in the identification and interpretation of important ratemaking factors. This study, based on the telematics data of policyholders in China's mainland, analyzes insurance claim frequency of commercial trucks using both Poisson regression and several machine learning models, including regression tree, random forest, gradient boosting tree, XGBoost and neural network. After selecting the best model, we analyze feature importance, feature effects and the contribution of each feature to the prediction from an actuarial perspective. Our empirical study shows that XGBoost greatly outperforms the traditional models and detects some important risk factors, such as the average speed, the average mileage traveled per day, the fraction of night driving, the number of sudden brakes and the fraction of left/right turns at intersections. These features usually have a nonlinear effect on driving risk, and there are complex interactions between features. To further distinguish high−/low‐risk drivers, we run supervised clustering for risk segmentation according to drivers' driving habits. In summary, this study not only provide a more accurate prediction of driving risk, but also greatly satisfy the interpretability requirements of insurance regulators and risk management.
{"title":"Evaluation and interpretation of driving risks: Automobile claim frequency modeling with telematics data","authors":"Yaqian Gao, Yifan Huang, Shengwang Meng","doi":"10.2139/ssrn.3910216","DOIUrl":"https://doi.org/10.2139/ssrn.3910216","url":null,"abstract":"With the development of vehicle telematics and data mining technology, usage‐based insurance (UBI) has aroused widespread interest from both academia and industry. The extensive driving behavior features make it possible to further understand the risks of insured vehicles, but pose challenges in the identification and interpretation of important ratemaking factors. This study, based on the telematics data of policyholders in China's mainland, analyzes insurance claim frequency of commercial trucks using both Poisson regression and several machine learning models, including regression tree, random forest, gradient boosting tree, XGBoost and neural network. After selecting the best model, we analyze feature importance, feature effects and the contribution of each feature to the prediction from an actuarial perspective. Our empirical study shows that XGBoost greatly outperforms the traditional models and detects some important risk factors, such as the average speed, the average mileage traveled per day, the fraction of night driving, the number of sudden brakes and the fraction of left/right turns at intersections. These features usually have a nonlinear effect on driving risk, and there are complex interactions between features. To further distinguish high−/low‐risk drivers, we run supervised clustering for risk segmentation according to drivers' driving habits. In summary, this study not only provide a more accurate prediction of driving risk, but also greatly satisfy the interpretability requirements of insurance regulators and risk management.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130150275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Understanding power system dynamics is essential for interarea oscillation analysis and the detection of grid instabilities. The FNET/GridEye is a GPS‐synchronized wide‐area frequency measurement network that provides an accurate picture of the normal real‐time operational condition of the power system dynamics, giving rise to new and intricate spatiotemporal patterns of power loads. We propose to model FNET/GridEye grid frequency data from the U.S. Eastern Interconnection with a spatiotemporal statistical model. We predict the frequency data at locations without observations, a critical need during disruption events where measurement data are inaccessible. Spatial information is accounted for either as neighboring measurements in the form of covariates or with a spatiotemporal correlation model captured by a latent Gaussian field. The proposed method is useful in estimating power system dynamic response from limited phasor measurements and holds promise for predicting instability that may lead to undesirable effects such as cascading outages.
{"title":"Power grid frequency prediction using spatiotemporal modeling","authors":"Amanda Lenzi, J. Bessac, M. Anitescu","doi":"10.1002/sam.11535","DOIUrl":"https://doi.org/10.1002/sam.11535","url":null,"abstract":"Understanding power system dynamics is essential for interarea oscillation analysis and the detection of grid instabilities. The FNET/GridEye is a GPS‐synchronized wide‐area frequency measurement network that provides an accurate picture of the normal real‐time operational condition of the power system dynamics, giving rise to new and intricate spatiotemporal patterns of power loads. We propose to model FNET/GridEye grid frequency data from the U.S. Eastern Interconnection with a spatiotemporal statistical model. We predict the frequency data at locations without observations, a critical need during disruption events where measurement data are inaccessible. Spatial information is accounted for either as neighboring measurements in the form of covariates or with a spatiotemporal correlation model captured by a latent Gaussian field. The proposed method is useful in estimating power system dynamic response from limited phasor measurements and holds promise for predicting instability that may lead to undesirable effects such as cascading outages.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132414630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Relevance vector machine (RVM) is a popular sparse Bayesian learning model typically used for prediction. Recently it has been shown that improper priors assumed on multiple penalty parameters in RVM may lead to an improper posterior. Currently in the literature, the sufficient conditions for posterior propriety of RVM do not allow improper priors over the multiple penalty parameters. In this article, we propose a single penalty relevance vector machine (SPRVM) model in which multiple penalty parameters are replaced by a single penalty and we consider a semi‐Bayesian approach for fitting the SPRVM. The necessary and sufficient conditions for posterior propriety of SPRVM are more liberal than those of RVM and allow for several improper priors over the penalty parameter. Additionally, we also prove the geometric ergodicity of the Gibbs sampler used to analyze the SPRVM model and hence can estimate the asymptotic standard errors associated with the Monte Carlo estimate of the means of the posterior predictive distribution. Such a Monte Carlo standard error cannot be computed in the case of RVM, since the rate of convergence of the Gibbs sampler used to analyze RVM is not known. The predictive performance of RVM and SPRVM is compared by analyzing two simulation examples and three real life datasets.
{"title":"Analyzing relevance vector machines using a single penalty approach","authors":"A. Dixit, Vivekananda Roy","doi":"10.1002/sam.11551","DOIUrl":"https://doi.org/10.1002/sam.11551","url":null,"abstract":"Relevance vector machine (RVM) is a popular sparse Bayesian learning model typically used for prediction. Recently it has been shown that improper priors assumed on multiple penalty parameters in RVM may lead to an improper posterior. Currently in the literature, the sufficient conditions for posterior propriety of RVM do not allow improper priors over the multiple penalty parameters. In this article, we propose a single penalty relevance vector machine (SPRVM) model in which multiple penalty parameters are replaced by a single penalty and we consider a semi‐Bayesian approach for fitting the SPRVM. The necessary and sufficient conditions for posterior propriety of SPRVM are more liberal than those of RVM and allow for several improper priors over the penalty parameter. Additionally, we also prove the geometric ergodicity of the Gibbs sampler used to analyze the SPRVM model and hence can estimate the asymptotic standard errors associated with the Monte Carlo estimate of the means of the posterior predictive distribution. Such a Monte Carlo standard error cannot be computed in the case of RVM, since the rate of convergence of the Gibbs sampler used to analyze RVM is not known. The predictive performance of RVM and SPRVM is compared by analyzing two simulation examples and three real life datasets.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132586838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large regression data sets are now commonplace, with so many predictors that they cannot or should not all be included individually. In practice, derived predictors are relevant as meaningful features or, at the very least, as a form of regularized approximation of the true coefficients. We consider derived predictors that are the sum of some groups of individual predictors, which is equivalent to predictors within a group sharing the same coefficient. However, the groups of predictors are usually not known in advance and must be discovered from the data. In this paper we develop a coefficient tree regression algorithm for generalized linear models to discover the group structure from the data. The approach results in simple and highly interpretable models, and we demonstrated with real examples that it can provide a clear and concise interpretation of the data. Via simulation studies under different scenarios we showed that our approach performs better than existing competitors in terms of computing time and predictive accuracy.
{"title":"Coefficient tree regression for generalized linear models","authors":"Özge Sürer, D. Apley, E. Malthouse","doi":"10.1002/sam.11534","DOIUrl":"https://doi.org/10.1002/sam.11534","url":null,"abstract":"Large regression data sets are now commonplace, with so many predictors that they cannot or should not all be included individually. In practice, derived predictors are relevant as meaningful features or, at the very least, as a form of regularized approximation of the true coefficients. We consider derived predictors that are the sum of some groups of individual predictors, which is equivalent to predictors within a group sharing the same coefficient. However, the groups of predictors are usually not known in advance and must be discovered from the data. In this paper we develop a coefficient tree regression algorithm for generalized linear models to discover the group structure from the data. The approach results in simple and highly interpretable models, and we demonstrated with real examples that it can provide a clear and concise interpretation of the data. Via simulation studies under different scenarios we showed that our approach performs better than existing competitors in terms of computing time and predictive accuracy.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"45 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125830306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a Fourier neural network (FNN) that can be mapped directly to the Fourier decomposition. The choice of activation and loss function yields results that replicate a Fourier series expansion closely while preserving a straightforward architecture with a single hidden layer. The simplicity of this network architecture facilitates the integration with any other higher‐complexity networks, at a data pre‐ or postprocessing stage. We validate this FNN on naturally periodic smooth functions and on piecewise continuous periodic functions. We showcase the use of this FNN for modeling or solving partial differential equations with periodic boundary conditions. The main advantages of the current approach are the validity of the solution outside the training region, interpretability of the trained model, and simplicity of use.
{"title":"Fourier neural networks as function approximators and differential equation solvers","authors":"M. Ngom, O. Marin","doi":"10.1002/sam.11531","DOIUrl":"https://doi.org/10.1002/sam.11531","url":null,"abstract":"We present a Fourier neural network (FNN) that can be mapped directly to the Fourier decomposition. The choice of activation and loss function yields results that replicate a Fourier series expansion closely while preserving a straightforward architecture with a single hidden layer. The simplicity of this network architecture facilitates the integration with any other higher‐complexity networks, at a data pre‐ or postprocessing stage. We validate this FNN on naturally periodic smooth functions and on piecewise continuous periodic functions. We showcase the use of this FNN for modeling or solving partial differential equations with periodic boundary conditions. The main advantages of the current approach are the validity of the solution outside the training region, interpretability of the trained model, and simplicity of use.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117015691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}