Pub Date : 2022-08-25DOI: 10.1007/s11634-022-00514-6
Serge Vicente, Alejandro Murua-Sazo
Random restart of a given algorithm produces many partitions that can be aggregated to yield a consensus clustering. Ensemble methods have been recognized as more robust approaches for data clustering than single clustering algorithms. We propose the use of determinantal point processes or DPPs for the random restart of clustering algorithms based on initial sets of center points, such as k-medoids or k-means. The relation between DPPs and kernel-based methods makes DPPs suitable to describe and quantify similarity between objects. DPPs favor diversity of the center points in initial sets, so that sets with similar points have less chance of being generated than sets with very distinct points. Most current inital sets are generated with center points sampled uniformly at random. We show through extensive simulations that, contrary to DPPs, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets. The latter are two key properties that make DPPs achieve good performance. Simulations with artificial datasets and applications to real datasets show that determinantal consensus clustering outperforms consensus clusterings which are based on uniform random sampling of center points.
{"title":"Determinantal consensus clustering","authors":"Serge Vicente, Alejandro Murua-Sazo","doi":"10.1007/s11634-022-00514-6","DOIUrl":"10.1007/s11634-022-00514-6","url":null,"abstract":"<div><p>Random restart of a given algorithm produces many partitions that can be aggregated to yield a consensus clustering. Ensemble methods have been recognized as more robust approaches for data clustering than single clustering algorithms. We propose the use of determinantal point processes or DPPs for the random restart of clustering algorithms based on initial sets of center points, such as <i>k</i>-medoids or <i>k</i>-means. The relation between DPPs and kernel-based methods makes DPPs suitable to describe and quantify similarity between objects. DPPs favor diversity of the center points in initial sets, so that sets with similar points have less chance of being generated than sets with very distinct points. Most current inital sets are generated with center points sampled uniformly at random. We show through extensive simulations that, contrary to DPPs, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets. The latter are two key properties that make DPPs achieve good performance. Simulations with artificial datasets and applications to real datasets show that determinantal consensus clustering outperforms consensus clusterings which are based on uniform random sampling of center points.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 4","pages":"829 - 858"},"PeriodicalIF":1.6,"publicationDate":"2022-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50046217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-24DOI: 10.1007/s11634-022-00517-3
Licheng Zhao, Yi Zuo, Katsutoshi Yada
During the last decade, an increasing number of supermarkets have begun to use RFID technology to track consumers' in-store movements to collect data on their shopping behavioral. Marketers hope that such new types of RFID data will improve the accuracy of the existing customer segmentation, and provide effective marketing positioning from the customer’s perspective. Therefore, this paper presents an integrated work on combining RFID data with traditional point of sales (POS) data, and proposes a sequential classification-based model to classify and identify consumers’ purchasing behavior. We chose an island area of the supermarket to perform the tracking experiment and collected customer behavioral data for two months. RFID data are used to extract behavior explanatory variables, such as residence time and wandering direction. For these customers, we extracted their purchasing historical data for the past three months from the POS system to define customer background and segmentation. Finally, this paper proposes a novel classification model based on sequence-to-sequence (Seq2seq) learning architecture. The encoder–decoder of Seq2seq uses an attention mechanism to pursue sequential inputs, with gating units in the encoder and decoder adjusting the output weights based on the input variables. The experimental results showed that the proposed model has a higher accuracy and area under curve value for customer classification and recognition compared with other benchmark models. Furthermore, the validity of behavioral description variables among heterogeneous customers was verified by adjusting the attention mechanism.
{"title":"Sequential classification of customer behavior based on sequence-to-sequence learning with gated-attention neural networks","authors":"Licheng Zhao, Yi Zuo, Katsutoshi Yada","doi":"10.1007/s11634-022-00517-3","DOIUrl":"10.1007/s11634-022-00517-3","url":null,"abstract":"<div><p>During the last decade, an increasing number of supermarkets have begun to use RFID technology to track consumers' in-store movements to collect data on their shopping behavioral. Marketers hope that such new types of RFID data will improve the accuracy of the existing customer segmentation, and provide effective marketing positioning from the customer’s perspective. Therefore, this paper presents an integrated work on combining RFID data with traditional point of sales (POS) data, and proposes a sequential classification-based model to classify and identify consumers’ purchasing behavior. We chose an island area of the supermarket to perform the tracking experiment and collected customer behavioral data for two months. RFID data are used to extract behavior explanatory variables, such as residence time and wandering direction. For these customers, we extracted their purchasing historical data for the past three months from the POS system to define customer background and segmentation. Finally, this paper proposes a novel classification model based on sequence-to-sequence (Seq2seq) learning architecture. The encoder–decoder of Seq2seq uses an attention mechanism to pursue sequential inputs, with gating units in the encoder and decoder adjusting the output weights based on the input variables. The experimental results showed that the proposed model has a higher accuracy and area under curve value for customer classification and recognition compared with other benchmark models. Furthermore, the validity of behavioral description variables among heterogeneous customers was verified by adjusting the attention mechanism.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 3","pages":"549 - 581"},"PeriodicalIF":1.6,"publicationDate":"2022-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50044829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-19DOI: 10.1101/2022.05.15.492004
Björn-Hergen Laabs von Holt, A. Westenberger, I. König
In life sciences, random forests are often used to train predictive models. However, gaining any explanatory insight into the mechanics leading to a specific outcome is rather complex, which impedes the implementation of random forests into clinical practice. By simplifying a complex ensemble of decision trees to a single most representative tree, it is assumed to be possible to observe common tree structures, the importance of specific features and variable interactions. Thus, representative trees could also help to understand interactions between genetic variants. Intuitively, representative trees are those with the minimal distance to all other trees, which requires a proper definition of the distance between two trees. Thus, we developed a new tree-based distance measure, which incorporates more of the underlying tree structure than other metrics. We compared our new method with the existing metrics in an extensive simulation study and applied it to predict the age at onset based on a set of genetic risk factors in a clinical data set. In our simulation study we were able to show the advantages of our weighted splitting variable approach. Our real data application revealed that representative trees are not only able to replicate the results from a recent genome-wide association study, but also can give additional explanations of the genetic mechanisms. Finally, we implemented all compared distance measures in R and made them publicly available in the R package timbR ( https://github.com/imbs-hl/timbR ).
{"title":"Identification of representative trees in random forests based on a new tree-based distance measure","authors":"Björn-Hergen Laabs von Holt, A. Westenberger, I. König","doi":"10.1101/2022.05.15.492004","DOIUrl":"https://doi.org/10.1101/2022.05.15.492004","url":null,"abstract":"In life sciences, random forests are often used to train predictive models. However, gaining any explanatory insight into the mechanics leading to a specific outcome is rather complex, which impedes the implementation of random forests into clinical practice. By simplifying a complex ensemble of decision trees to a single most representative tree, it is assumed to be possible to observe common tree structures, the importance of specific features and variable interactions. Thus, representative trees could also help to understand interactions between genetic variants. Intuitively, representative trees are those with the minimal distance to all other trees, which requires a proper definition of the distance between two trees. Thus, we developed a new tree-based distance measure, which incorporates more of the underlying tree structure than other metrics. We compared our new method with the existing metrics in an extensive simulation study and applied it to predict the age at onset based on a set of genetic risk factors in a clinical data set. In our simulation study we were able to show the advantages of our weighted splitting variable approach. Our real data application revealed that representative trees are not only able to replicate the results from a recent genome-wide association study, but also can give additional explanations of the genetic mechanisms. Finally, we implemented all compared distance measures in R and made them publicly available in the R package timbR ( https://github.com/imbs-hl/timbR ).","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"30 5","pages":"1-18"},"PeriodicalIF":1.6,"publicationDate":"2022-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72628235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-19DOI: 10.1007/s11634-022-00512-8
Antonio Elías, Raúl Jiménez, J. E. Yukich
We propose an alternative to k-nearest neighbors for functional data whereby the approximating neighboring curves are piecewise functions built from a functional sample. Using a locally defined distance function that satisfies stabilization criteria, we establish pointwise and global approximation results in function spaces when the number of data curves is large. We exploit this feature to develop the asymptotic theory when a finite number of curves is observed at time-points given by an i.i.d. sample whose cardinality increases up to infinity. We use these results to investigate the problem of estimating unobserved segments of a partially observed functional data sample as well as to study the problem of functional classification and outlier detection. For such problems our methods are competitive with and sometimes superior to benchmark predictions in the field. The R package localFDA provides routines for computing the localization processes and the estimators proposed in this article.
{"title":"Localization processes for functional data analysis","authors":"Antonio Elías, Raúl Jiménez, J. E. Yukich","doi":"10.1007/s11634-022-00512-8","DOIUrl":"10.1007/s11634-022-00512-8","url":null,"abstract":"<div><p>We propose an alternative to <i>k</i>-nearest neighbors for functional data whereby the approximating neighboring curves are piecewise functions built from a functional sample. Using a locally defined distance function that satisfies stabilization criteria, we establish pointwise and global approximation results in function spaces when the number of data curves is large. We exploit this feature to develop the asymptotic theory when a finite number of curves is observed at time-points given by an i.i.d. sample whose cardinality increases up to infinity. We use these results to investigate the problem of estimating unobserved segments of a partially observed functional data sample as well as to study the problem of functional classification and outlier detection. For such problems our methods are competitive with and sometimes superior to benchmark predictions in the field. The R package <span>localFDA</span> provides routines for computing the localization processes and the estimators proposed in this article.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 2","pages":"485 - 517"},"PeriodicalIF":1.6,"publicationDate":"2022-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50494833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-16DOI: 10.1007/s11634-022-00511-9
Maurizio Vichi, Andrea Ceroli, Hans A. Kestler, Akinori Okada, Claus Weihs
{"title":"Editorial for ADAC issue 3 of volume 16 (2022)","authors":"Maurizio Vichi, Andrea Ceroli, Hans A. Kestler, Akinori Okada, Claus Weihs","doi":"10.1007/s11634-022-00511-9","DOIUrl":"10.1007/s11634-022-00511-9","url":null,"abstract":"","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"16 3","pages":"487 - 490"},"PeriodicalIF":1.6,"publicationDate":"2022-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50057265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-28DOI: 10.1007/s11634-023-00547-5
Panagiotis Papastamoulis
{"title":"Model based clustering of multinomial count data","authors":"Panagiotis Papastamoulis","doi":"10.1007/s11634-023-00547-5","DOIUrl":"https://doi.org/10.1007/s11634-023-00547-5","url":null,"abstract":"","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"50 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2022-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75623247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-08DOI: 10.1007/s11634-022-00509-3
Anna Gottard, Giulia Vannucci, Leonardo Grilli, Carla Rampichini
Tree-based regression models are a class of statistical models for predicting continuous response variables when the shape of the regression function is unknown. They naturally take into account both non-linearities and interactions. However, they struggle with linear and quasi-linear effects and assume iid data. This article proposes two new algorithms for jointly estimating an interpretable predictive mixed-effect model with two components: a linear part, capturing the main effects, and a non-parametric component consisting of three trees for capturing non-linearities and interactions among individual-level predictors, among cluster-level predictors or cross-level. The first proposed algorithm focuses on prediction. The second one is an extension which implements a post-selection inference strategy to provide valid inference. The performance of the two algorithms is validated via Monte Carlo studies. An application on INVALSI data illustrates the potentiality of the proposed approach.
{"title":"Mixed-effect models with trees","authors":"Anna Gottard, Giulia Vannucci, Leonardo Grilli, Carla Rampichini","doi":"10.1007/s11634-022-00509-3","DOIUrl":"10.1007/s11634-022-00509-3","url":null,"abstract":"<div><p>Tree-based regression models are a class of statistical models for predicting continuous response variables when the shape of the regression function is unknown. They naturally take into account both non-linearities and interactions. However, they struggle with linear and quasi-linear effects and assume <i>iid</i> data. This article proposes two new algorithms for jointly estimating an interpretable predictive mixed-effect model with two components: a linear part, capturing the main effects, and a non-parametric component consisting of three trees for capturing non-linearities and interactions among individual-level predictors, among cluster-level predictors or cross-level. The first proposed algorithm focuses on prediction. The second one is an extension which implements a post-selection inference strategy to provide valid inference. The performance of the two algorithms is validated via Monte Carlo studies. An application on INVALSI data illustrates the potentiality of the proposed approach.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 2","pages":"431 - 461"},"PeriodicalIF":1.6,"publicationDate":"2022-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-022-00509-3.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50462000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Many applications in data analysis study whether two categorical variables are independent using a function of the entries of their contingency table. Often, the categories of the variables, associated with the rows and columns of the table, are grouped, yielding a less granular representation of the categorical variables. The purpose of this is to attain reasonable sample sizes in the cells of the table and, more importantly, to incorporate expert knowledge on the allowable groupings. However, it is known that the conclusions on independence depend, in general, on the chosen granularity, as in the Simpson paradox. In this paper we propose a methodology to, for a given contingency table and a fixed granularity, find a clustered table with the highest (chi ^2) statistic. Repeating this procedure for different values of the granularity, we can either identify an extreme grouping, namely the largest granularity for which the statistical dependence is still detected, or conclude that it does not exist and that the two variables are dependent regardless of the size of the clustered table. For this problem, we propose an assignment mathematical formulation and a set partitioning one. Our approach is flexible enough to include constraints on the desirable structure of the clusters, such as must-link or cannot-link constraints on the categories that can, or cannot, be merged together, and ensure reasonable sample sizes in the cells of the clustered table from which trustful statistical conclusions can be derived. We illustrate the usefulness of our methodology using a dataset of a medical study.
{"title":"On mathematical optimization for clustering categories in contingency tables","authors":"Emilio Carrizosa, Vanesa Guerrero, Dolores Romero Morales","doi":"10.1007/s11634-022-00508-4","DOIUrl":"10.1007/s11634-022-00508-4","url":null,"abstract":"<div><p>Many applications in data analysis study whether two categorical variables are independent using a function of the entries of their contingency table. Often, the categories of the variables, associated with the rows and columns of the table, are grouped, yielding a less granular representation of the categorical variables. The purpose of this is to attain reasonable sample sizes in the cells of the table and, more importantly, to incorporate expert knowledge on the allowable groupings. However, it is known that the conclusions on independence depend, in general, on the chosen granularity, as in the Simpson paradox. In this paper we propose a methodology to, for a given contingency table and a fixed granularity, find a clustered table with the highest <span>(chi ^2)</span> statistic. Repeating this procedure for different values of the granularity, we can either identify an <i>extreme grouping</i>, namely the largest granularity for which the statistical dependence is still detected, or conclude that it does not exist and that the two variables are dependent regardless of the size of the clustered table. For this problem, we propose an assignment mathematical formulation and a set partitioning one. Our approach is flexible enough to include constraints on the desirable structure of the clusters, such as must-link or cannot-link constraints on the categories that can, or cannot, be merged together, and ensure reasonable sample sizes in the cells of the clustered table from which trustful statistical conclusions can be derived. We illustrate the usefulness of our methodology using a dataset of a medical study. \u0000</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 2","pages":"407 - 429"},"PeriodicalIF":1.6,"publicationDate":"2022-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-022-00508-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50520905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-25DOI: 10.1007/s11634-022-00504-8
Jan Vávra, Arnošt Komárek
Although many present day studies gather data of a diverse nature (numeric quantities, binary indicators or ordered categories) on the same units repeatedly over time, there only exist limited number of approaches in the literature to analyse so-called mixed-type longitudinal data. We present a statistical model capable of joint modelling several mixed-type outcomes, which also accounts for possible dependencies among the investigated outcomes. A thresholding approach to link binary or ordinal variables to their latent numeric counterparts allows us to jointly model all, including latent, numeric outcomes using a multivariate version of the linear mixed-effects model. We avoid the independence assumption over outcomes by relaxing the variance matrix of random effects to a completely general positive definite matrix. Moreover, we follow model-based clustering methodology to create a mixture of such models to model heterogeneity in the temporal evolution of the considered outcomes. The estimation of such an hierarchical model is approached by Bayesian principles with the use of Markov chain Monte Carlo methods. After a successful simulation study with the aim to examine the ability to consistently estimate the true parameter values and thus discover the different patterns, the EU-SILC dataset consisting of Czech households that were followed for 4 years in a time span from 2005 to 2016 was analysed. The households were classified into groups with a similar evolution of several closely related indicators of monetary poverty based on estimated classification probabilities.
{"title":"Classification based on multivariate mixed type longitudinal data with an application to the EU-SILC database","authors":"Jan Vávra, Arnošt Komárek","doi":"10.1007/s11634-022-00504-8","DOIUrl":"10.1007/s11634-022-00504-8","url":null,"abstract":"<div><p>Although many present day studies gather data of a diverse nature (numeric quantities, binary indicators or ordered categories) on the same units repeatedly over time, there only exist limited number of approaches in the literature to analyse so-called <i>mixed-type</i> longitudinal data. We present a statistical model capable of joint modelling several mixed-type outcomes, which also accounts for possible dependencies among the investigated outcomes. A thresholding approach to link binary or ordinal variables to their latent numeric counterparts allows us to jointly model all, including latent, numeric outcomes using a multivariate version of the linear mixed-effects model. We avoid the independence assumption over outcomes by relaxing the variance matrix of random effects to a completely general positive definite matrix. Moreover, we follow model-based clustering methodology to create a mixture of such models to model heterogeneity in the temporal evolution of the considered outcomes. The estimation of such an hierarchical model is approached by Bayesian principles with the use of Markov chain Monte Carlo methods. After a successful simulation study with the aim to examine the ability to consistently estimate the true parameter values and thus discover the different patterns, the EU-SILC dataset consisting of Czech households that were followed for 4 years in a time span from 2005 to 2016 was analysed. The households were classified into groups with a similar evolution of several closely related indicators of monetary poverty based on estimated classification probabilities.\u0000</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"17 2","pages":"369 - 406"},"PeriodicalIF":1.6,"publicationDate":"2022-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50512826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-24DOI: 10.1007/s11634-022-00506-6
Naoto Yamashita
{"title":"Correction to: Principal component analysis constrained by layered simple structures","authors":"Naoto Yamashita","doi":"10.1007/s11634-022-00506-6","DOIUrl":"10.1007/s11634-022-00506-6","url":null,"abstract":"","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"16 4","pages":"1099 - 1100"},"PeriodicalIF":1.6,"publicationDate":"2022-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50435310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}