Hadrien Lorenzo, O. Cloarec, R. Thiébaut, J. Saracco
In the supervised high dimensional settings with a large number of variables and a low number of individuals, variable selection allows a simpler interpretation and more reliable predictions. That subspace selection is often managed with supervised tools when the real question is motivated by variable prediction. We propose a partial least square (PLS) based method, called data‐driven sparse PLS (ddsPLS), allowing variable selection both in the covariate and the response parts using a single hyperparameter per component. The subspace estimation is also performed by tuning a number of underlying parameters. The ddsPLS method is compared with existing methods such as classical PLS and two well established sparse PLS methods through numerical simulations. The observed results are promising both in terms of variable selection and prediction performance. This methodology is based on new prediction quality descriptors associated with the classical R2 and Q2 , and uses bootstrap sampling to tune parameters and select an optimal regression model.
{"title":"Data‐driven sparse partial least squares","authors":"Hadrien Lorenzo, O. Cloarec, R. Thiébaut, J. Saracco","doi":"10.1002/sam.11558","DOIUrl":"https://doi.org/10.1002/sam.11558","url":null,"abstract":"In the supervised high dimensional settings with a large number of variables and a low number of individuals, variable selection allows a simpler interpretation and more reliable predictions. That subspace selection is often managed with supervised tools when the real question is motivated by variable prediction. We propose a partial least square (PLS) based method, called data‐driven sparse PLS (ddsPLS), allowing variable selection both in the covariate and the response parts using a single hyperparameter per component. The subspace estimation is also performed by tuning a number of underlying parameters. The ddsPLS method is compared with existing methods such as classical PLS and two well established sparse PLS methods through numerical simulations. The observed results are promising both in terms of variable selection and prediction performance. This methodology is based on new prediction quality descriptors associated with the classical R2 and Q2 , and uses bootstrap sampling to tune parameters and select an optimal regression model.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116353603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To deal with the factor analysis for high‐dimensional stationary time series, this paper suggests a novel method that integrates three ideas. First, based on the eigenvalues of a non‐negative definite matrix, we propose a new approach for consistently determining the number of factors. The proposed method is computationally efficient with a single step procedure, especially when both weak and strong factors exist in the factor model. Second, a fresh measurement of the difference between the factor loading matrix and its estimate is recommended to overcome the nonidentifiability of the loading matrix due to any geometric rotation. The asymptotic results of our proposed method are also studied under this measurement, which enjoys “blessing of dimensionality.” Finally, with the estimated factors, the latent vector autoregressive (VAR) model is analyzed such that the convergence rate of the estimated coefficients is as fast as when the samples of VAR model are observed. In support of our results on consistency and computational efficiency, the finite sample performance of the proposed method is examined by simulations and the analysis of one real data example.
{"title":"Factor analysis for high‐dimensional time series: Consistent estimation and efficient computation","authors":"Qiang Xia, H. Wong, Shirun Shen, Kejun He","doi":"10.1002/sam.11557","DOIUrl":"https://doi.org/10.1002/sam.11557","url":null,"abstract":"To deal with the factor analysis for high‐dimensional stationary time series, this paper suggests a novel method that integrates three ideas. First, based on the eigenvalues of a non‐negative definite matrix, we propose a new approach for consistently determining the number of factors. The proposed method is computationally efficient with a single step procedure, especially when both weak and strong factors exist in the factor model. Second, a fresh measurement of the difference between the factor loading matrix and its estimate is recommended to overcome the nonidentifiability of the loading matrix due to any geometric rotation. The asymptotic results of our proposed method are also studied under this measurement, which enjoys “blessing of dimensionality.” Finally, with the estimated factors, the latent vector autoregressive (VAR) model is analyzed such that the convergence rate of the estimated coefficients is as fast as when the samples of VAR model are observed. In support of our results on consistency and computational efficiency, the finite sample performance of the proposed method is examined by simulations and the analysis of one real data example.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124714637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a new method in high‐dimensional classification based on estimation of high‐dimensional mean vector under unknown and unequal variances. Our proposed method is based on a semi‐parametric model that combines nonparametric and parametric models for mean and variance, respectively. Our proposed method is designed to be robust to the structure of the mean vector, while most existing methods are developed for some specific cases such as either sparse or non‐sparse case of the mean vector. In addition, we also consider estimating mean and variance separately under nonparametric empirical Bayes framework that has advantage over existing nonparametric empirical Bayes classifiers based on standardization. We present simulation studies showing that our proposed method outperforms a variety of existing methods. Application to real data sets demonstrates robustness of our method to various types of data sets, while all other methods produce either sensitive or poor results for different data sets.
{"title":"High‐dimensional classification based on nonparametric maximum likelihood estimation under unknown and inhomogeneous variances","authors":"Hoyoung Park, Seungchul Baek, Junyong Park","doi":"10.1002/sam.11554","DOIUrl":"https://doi.org/10.1002/sam.11554","url":null,"abstract":"We propose a new method in high‐dimensional classification based on estimation of high‐dimensional mean vector under unknown and unequal variances. Our proposed method is based on a semi‐parametric model that combines nonparametric and parametric models for mean and variance, respectively. Our proposed method is designed to be robust to the structure of the mean vector, while most existing methods are developed for some specific cases such as either sparse or non‐sparse case of the mean vector. In addition, we also consider estimating mean and variance separately under nonparametric empirical Bayes framework that has advantage over existing nonparametric empirical Bayes classifiers based on standardization. We present simulation studies showing that our proposed method outperforms a variety of existing methods. Application to real data sets demonstrates robustness of our method to various types of data sets, while all other methods produce either sensitive or poor results for different data sets.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127392221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data‐driven anomaly detection methods typically build a model for the normal behavior of the target system, and score each data instance with respect to this model. A threshold is invariably needed to identify data instances with high (or low) scores as anomalies. This presents a practical limitation on the applicability of such methods, since most methods are sensitive to the choice of the threshold, and it is challenging to set optimal thresholds. The issue is exacerbated in a streaming scenario, where the optimal thresholds vary with time. We present a probabilistic framework to explicitly model the normal and anomalous behaviors and probabilistically reason about the data. An extreme value theory based formulation is proposed to model the anomalous behavior as the extremes of the normal behavior. As a specific instantiation, a joint nonparametric clustering and anomaly detection algorithm (INCAD) is proposed that models the normal behavior as a Dirichlet process mixture model. Results on a variety of datasets, including streaming data, show that the proposed method provides effective and simultaneous clustering and anomaly detection without requiring strong initialization and threshold parameters.
{"title":"Tracking clusters and anomalies in evolving data streams","authors":"Sreelekha Guggilam, V. Chandola, A. Patra","doi":"10.1002/sam.11552","DOIUrl":"https://doi.org/10.1002/sam.11552","url":null,"abstract":"Data‐driven anomaly detection methods typically build a model for the normal behavior of the target system, and score each data instance with respect to this model. A threshold is invariably needed to identify data instances with high (or low) scores as anomalies. This presents a practical limitation on the applicability of such methods, since most methods are sensitive to the choice of the threshold, and it is challenging to set optimal thresholds. The issue is exacerbated in a streaming scenario, where the optimal thresholds vary with time. We present a probabilistic framework to explicitly model the normal and anomalous behaviors and probabilistically reason about the data. An extreme value theory based formulation is proposed to model the anomalous behavior as the extremes of the normal behavior. As a specific instantiation, a joint nonparametric clustering and anomaly detection algorithm (INCAD) is proposed that models the normal behavior as a Dirichlet process mixture model. Results on a variety of datasets, including streaming data, show that the proposed method provides effective and simultaneous clustering and anomaly detection without requiring strong initialization and threshold parameters.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129250964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this work, we develop a method named Twinning for partitioning a dataset into statistically similar twin sets. Twinning is based on SPlit, a recently proposed model‐independent method for optimally splitting a dataset into training and testing sets. Twinning is orders of magnitude faster than the SPlit algorithm, which makes it applicable to Big Data problems such as data compression. Twinning can also be used for generating multiple splits of a given dataset to aid divide‐and‐conquer procedures and k‐fold cross validation.
{"title":"Data Twinning","authors":"Akhil Vakayil, V. R. Joseph","doi":"10.1002/sam.11574","DOIUrl":"https://doi.org/10.1002/sam.11574","url":null,"abstract":"In this work, we develop a method named Twinning for partitioning a dataset into statistically similar twin sets. Twinning is based on SPlit, a recently proposed model‐independent method for optimally splitting a dataset into training and testing sets. Twinning is orders of magnitude faster than the SPlit algorithm, which makes it applicable to Big Data problems such as data compression. Twinning can also be used for generating multiple splits of a given dataset to aid divide‐and‐conquer procedures and k‐fold cross validation.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114243324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we are dealing with the dual goals for protecting privacy and making statistical inferences from the disseminated data using the regrouped design. It is not difficult to protect the privacy of patients by perturbing data. The problem is to perturb the data in such a way that privacy is protected, and also, the released data are useful for research. By applying the regrouped design, the dataset is released with the dummy groups associated with the actual groups via a pre‐specified transition probability matrix. Small stagnation probabilities of regrouped design are recommended to reach a small disclosure risk and a higher power of hypothesis testing. The power of test statistic in the released data increases as the stagnation probabilities depart from 0.5. The disclosure risk can be reduced further if more quasi‐identifiers are relocated. An example of National Health Insurance Research Database is given to illustrate the use of the regrouped design to protect the privacy and make the statistical inference.
{"title":"Regrouped design in privacy analysis for multinomial microdata","authors":"Shu-Mei Wan, Danny Wen-Yaw Chung, Monica Mayeni Manurung, Kwang-Hwa Chang, Chien-Hua Wu","doi":"10.1002/sam.11553","DOIUrl":"https://doi.org/10.1002/sam.11553","url":null,"abstract":"In this paper, we are dealing with the dual goals for protecting privacy and making statistical inferences from the disseminated data using the regrouped design. It is not difficult to protect the privacy of patients by perturbing data. The problem is to perturb the data in such a way that privacy is protected, and also, the released data are useful for research. By applying the regrouped design, the dataset is released with the dummy groups associated with the actual groups via a pre‐specified transition probability matrix. Small stagnation probabilities of regrouped design are recommended to reach a small disclosure risk and a higher power of hypothesis testing. The power of test statistic in the released data increases as the stagnation probabilities depart from 0.5. The disclosure risk can be reduced further if more quasi‐identifiers are relocated. An example of National Health Insurance Research Database is given to illustrate the use of the regrouped design to protect the privacy and make the statistical inference.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"18 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125770296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In linear regression of Y on X(∈ Rp) with parameters β(∈ Rp+1), statistical inference is unreliable when observations are obtained from gross‐error model, Fϵ,G = (1 − ϵ)F + ϵG, instead of the assumed probability F;G is gross‐error probability, 0 < ϵ < 1. Residual's influence index (RINFIN) at (x, y) is introduced, with components measuring also the local influence of x in the residual and large value flagging a bad leverage case (from G), thus causing unmasking. Large sample properties of RINFIN are presented to confirm significance of the findings, but often the large difference in the RINFIN scores of the data is indicative. RINFIN is successful with microarray data, simulated, high dimensional data and classic regression data sets. RINFIN's performance improves as p increases and can be used in multiple response linear regression.
{"title":"Residual's influence index (RINFIN), bad leverage and unmasking in high dimensional L2‐regression","authors":"Y. Yatracos","doi":"10.1002/sam.11550","DOIUrl":"https://doi.org/10.1002/sam.11550","url":null,"abstract":"In linear regression of Y on X(∈ Rp) with parameters β(∈ Rp+1), statistical inference is unreliable when observations are obtained from gross‐error model, Fϵ,G = (1 − ϵ)F + ϵG, instead of the assumed probability F;G is gross‐error probability, 0 < ϵ < 1. Residual's influence index (RINFIN) at (x, y) is introduced, with components measuring also the local influence of x in the residual and large value flagging a bad leverage case (from G), thus causing unmasking. Large sample properties of RINFIN are presented to confirm significance of the findings, but often the large difference in the RINFIN scores of the data is indicative. RINFIN is successful with microarray data, simulated, high dimensional data and classic regression data sets. RINFIN's performance improves as p increases and can be used in multiple response linear regression.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128249071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Krepel, Magdalena Kircher, Moritz Kohls, K. Jung
High‐dimensional gene expression data are regularly studied for their ability to separate different groups of samples by means of machine learning (ML) models. Meanwhile, a large number of such data are publicly available. Several approaches for meta‐analysis on independent sets of gene expression data have been proposed, mainly focusing on the step of feature selection, a typical step in fitting a ML model. Here, we compare different strategies of merging the information of such independent data sets to train a classifier model. Specifically, we compare the strategy of merging data sets directly (strategy A), and the strategy of merging the classification results (strategy B). We use simulations with pure artificial data as well as evaluations based on independent gene expression data from lung fibrosis studies to compare the two merging approaches. In the simulations, the number of studies, the strength of batch effects, and the separability are varied. The comparison incorporates five standard ML techniques typically used for high‐dimensional data, namely discriminant analysis, support vector machines, least absolute shrinkage and selection operator, random forest, and artificial neural networks. Using cross‐study validations, we found that direct data merging yields higher accuracies when having training data of three or four studies, and merging of classification results performed better when having only two training studies. In the evaluation with the lung fibrosis data, both strategies showed a similar performance.
{"title":"Comparison of merging strategies for building machine learning models on multiple independent gene expression data sets","authors":"J. Krepel, Magdalena Kircher, Moritz Kohls, K. Jung","doi":"10.1002/sam.11549","DOIUrl":"https://doi.org/10.1002/sam.11549","url":null,"abstract":"High‐dimensional gene expression data are regularly studied for their ability to separate different groups of samples by means of machine learning (ML) models. Meanwhile, a large number of such data are publicly available. Several approaches for meta‐analysis on independent sets of gene expression data have been proposed, mainly focusing on the step of feature selection, a typical step in fitting a ML model. Here, we compare different strategies of merging the information of such independent data sets to train a classifier model. Specifically, we compare the strategy of merging data sets directly (strategy A), and the strategy of merging the classification results (strategy B). We use simulations with pure artificial data as well as evaluations based on independent gene expression data from lung fibrosis studies to compare the two merging approaches. In the simulations, the number of studies, the strength of batch effects, and the separability are varied. The comparison incorporates five standard ML techniques typically used for high‐dimensional data, namely discriminant analysis, support vector machines, least absolute shrinkage and selection operator, random forest, and artificial neural networks. Using cross‐study validations, we found that direct data merging yields higher accuracies when having training data of three or four studies, and merging of classification results performed better when having only two training studies. In the evaluation with the lung fibrosis data, both strategies showed a similar performance.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122182087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we propose Bayesian skew‐normal regression models where the location, scale and shape parameters follow (linear or nonlinear) regression structures, and the variable of interest follows the Azzalini skew‐normal distribution. A Bayesian method is developed to fit the proposed models, using working variables to build the kernel transition functions. To illustrate the performance of the proposed Bayesian method and application of the model to analyze statistical data, we present results of simulated studies and of the application to studies of forced displacement in Colombia.
{"title":"Bayesian modeling of location, scale, and shape parameters in skew‐normal regression models","authors":"Martha Lucía Corrales, Edilberto Cepeda Cuervo","doi":"10.1002/sam.11548","DOIUrl":"https://doi.org/10.1002/sam.11548","url":null,"abstract":"In this paper, we propose Bayesian skew‐normal regression models where the location, scale and shape parameters follow (linear or nonlinear) regression structures, and the variable of interest follows the Azzalini skew‐normal distribution. A Bayesian method is developed to fit the proposed models, using working variables to build the kernel transition functions. To illustrate the performance of the proposed Bayesian method and application of the model to analyze statistical data, we present results of simulated studies and of the application to studies of forced displacement in Colombia.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114841431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yifan Zhao, Xian Yang, Carolina Bolnykh, Steve Harenberg, Nodirbek Korchiev, Saavan Raj Yerramsetty, Bhanu Prasad Vellanki, Ramakanth Kodumagulla, N. Samatova
Classical machine learning models typically try to optimize the model based on the most discriminatory features of the data; however, they do not usually account for end user preferences. In certain applications, this can be a serious issue as models not aware of user preferences could become costly, untrustworthy, or privacy‐intrusive to use, thus becoming irrelevant and/or uninterpretable. Ideally, end users with domain knowledge could propose preferable features that the predictive model could then take into account. In this paper, we propose a generic modeling method that respects end user preferences via a relative ranking system to express multi‐criteria preferences and a regularization term in the model's objective function to incorporate the ranked preferences. In a more generic perspective, this method is able to plug user preferences into existing predictive models without creating completely new ones. We implement this method in the context of decision trees and are able to achieve a comparable classification accuracy while reducing the use of undesirable features.
{"title":"Predictive models with end user preference","authors":"Yifan Zhao, Xian Yang, Carolina Bolnykh, Steve Harenberg, Nodirbek Korchiev, Saavan Raj Yerramsetty, Bhanu Prasad Vellanki, Ramakanth Kodumagulla, N. Samatova","doi":"10.1002/sam.11545","DOIUrl":"https://doi.org/10.1002/sam.11545","url":null,"abstract":"Classical machine learning models typically try to optimize the model based on the most discriminatory features of the data; however, they do not usually account for end user preferences. In certain applications, this can be a serious issue as models not aware of user preferences could become costly, untrustworthy, or privacy‐intrusive to use, thus becoming irrelevant and/or uninterpretable. Ideally, end users with domain knowledge could propose preferable features that the predictive model could then take into account. In this paper, we propose a generic modeling method that respects end user preferences via a relative ranking system to express multi‐criteria preferences and a regularization term in the model's objective function to incorporate the ranked preferences. In a more generic perspective, this method is able to plug user preferences into existing predictive models without creating completely new ones. We implement this method in the context of decision trees and are able to achieve a comparable classification accuracy while reducing the use of undesirable features.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"363 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132787446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}