Pub Date : 2023-10-18DOI: 10.1080/00401706.2023.2271091
Zheng Zhou, Zebin Yang, Aijun Zhang, Yongdao Zhou
AbstractSubsampling plays a crucial role in tackling problems associated with the storage and statistical learning of massive datasets. However, most existing subsampling methods are model-based, which means their performances can drop significantly when the underlying model is misspecified. Such an issue calls for model-free subsampling methods that are robust under diverse model specifications. Recently, several model-free subsampling methods are developed. However, the computing time of these methods grows explosively with the sample size, making them impractical for handling massive data. In this paper, an efficient model-free subsampling method is proposed, which segments the original data into some regular data blocks and obtains subsamples from each data block by the data-driven subsampling method. Compared with existing model-free subsampling methods, the proposed method has a significant speed advantage and performs more robustly for datasets with complex underlying distributions. As demonstrated in simulation experiments, the proposed method is an order of magnitude faster than other commonly used model-free subsampling methods when the sample size of the original dataset reaches the order of 107. Moreover, simulation experiments and case studies show that the proposed method is more robust than other model-free subsampling methods under diverse model specifications and subsample sizes.Keywords: Big data subsamplingModel robustnessParallel computingUniform designsDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.
{"title":"Efficient Model-free Subsampling Method for Massive Data","authors":"Zheng Zhou, Zebin Yang, Aijun Zhang, Yongdao Zhou","doi":"10.1080/00401706.2023.2271091","DOIUrl":"https://doi.org/10.1080/00401706.2023.2271091","url":null,"abstract":"AbstractSubsampling plays a crucial role in tackling problems associated with the storage and statistical learning of massive datasets. However, most existing subsampling methods are model-based, which means their performances can drop significantly when the underlying model is misspecified. Such an issue calls for model-free subsampling methods that are robust under diverse model specifications. Recently, several model-free subsampling methods are developed. However, the computing time of these methods grows explosively with the sample size, making them impractical for handling massive data. In this paper, an efficient model-free subsampling method is proposed, which segments the original data into some regular data blocks and obtains subsamples from each data block by the data-driven subsampling method. Compared with existing model-free subsampling methods, the proposed method has a significant speed advantage and performs more robustly for datasets with complex underlying distributions. As demonstrated in simulation experiments, the proposed method is an order of magnitude faster than other commonly used model-free subsampling methods when the sample size of the original dataset reaches the order of 107. Moreover, simulation experiments and case studies show that the proposed method is more robust than other model-free subsampling methods under diverse model specifications and subsample sizes.Keywords: Big data subsamplingModel robustnessParallel computingUniform designsDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.","PeriodicalId":22208,"journal":{"name":"Technometrics","volume":"20 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135884639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-16DOI: 10.1080/00401706.2023.2271060
Zihan Zhang, Shancong Mou, Kamran Paynabar, Jianjun Shi
AbstractIn advanced manufacturing processes, high-dimensional (HD) streaming data (e.g., sequential images or videos) are commonly used to provide online measurements of product quality. Although there exist numerous research studies for monitoring and anomaly detection using HD streaming data, little research is conducted on feedback control based on HD streaming data to improve product quality, especially in the presence of incomplete responses. To address this challenge, this paper proposes a novel tensor-based automatic control method for partially observed HD streaming data, which consists of two stages: offline modeling and online control. In the offline modeling stage, we propose a one-step approach integrating parameter estimation of the system model with missing value imputation for the response data. This approach (i) improves the accuracy of parameter estimation, and (ii) maintains a stable and superior imputation performance in a wider range of the rank or missing ratio for the data to be completed, compared to the existing data completion methods. In the online control stage, for each incoming sample, missing observations are imputed by balancing its low-rank information and the one-step-ahead prediction result based on the control action from the last time step. Then, the optimal control action is computed by minimizing a quadratic loss function on the sum of squared deviations from the target. Furthermore, we conduct two sets of simulations and one case study on semiconductor manufacturing to validate the superiority of the proposed framework.Keywords: Streaming DataHigh DimensionTensorFeedback ControlPartial ObservationDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.
{"title":"Tensor-based Temporal Control for Partially Observed High-dimensional Streaming Data","authors":"Zihan Zhang, Shancong Mou, Kamran Paynabar, Jianjun Shi","doi":"10.1080/00401706.2023.2271060","DOIUrl":"https://doi.org/10.1080/00401706.2023.2271060","url":null,"abstract":"AbstractIn advanced manufacturing processes, high-dimensional (HD) streaming data (e.g., sequential images or videos) are commonly used to provide online measurements of product quality. Although there exist numerous research studies for monitoring and anomaly detection using HD streaming data, little research is conducted on feedback control based on HD streaming data to improve product quality, especially in the presence of incomplete responses. To address this challenge, this paper proposes a novel tensor-based automatic control method for partially observed HD streaming data, which consists of two stages: offline modeling and online control. In the offline modeling stage, we propose a one-step approach integrating parameter estimation of the system model with missing value imputation for the response data. This approach (i) improves the accuracy of parameter estimation, and (ii) maintains a stable and superior imputation performance in a wider range of the rank or missing ratio for the data to be completed, compared to the existing data completion methods. In the online control stage, for each incoming sample, missing observations are imputed by balancing its low-rank information and the one-step-ahead prediction result based on the control action from the last time step. Then, the optimal control action is computed by minimizing a quadratic loss function on the sum of squared deviations from the target. Furthermore, we conduct two sets of simulations and one case study on semiconductor manufacturing to validate the superiority of the proposed framework.Keywords: Streaming DataHigh DimensionTensorFeedback ControlPartial ObservationDisclaimerAs a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.","PeriodicalId":22208,"journal":{"name":"Technometrics","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136114209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-02DOI: 10.1080/00401706.2023.2262896
Abdulkadir Hussein
{"title":"Post-Shrinkage Strategies in Statistical and Machine Learning for High Dimensional DataPost-Shrinkage Strategies in Statistical and Machine Learning for High Dimensional Data, Syed Ejaz Ahmed, Feryaal Ahmed, and Bahadir Yüzbaşı, New York: Chapman and Hall/CRC Press, 2023, 408 pp., ISBN 9780367763442","authors":"Abdulkadir Hussein","doi":"10.1080/00401706.2023.2262896","DOIUrl":"https://doi.org/10.1080/00401706.2023.2262896","url":null,"abstract":"","PeriodicalId":22208,"journal":{"name":"Technometrics","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135948235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-02DOI: 10.1080/00401706.2023.2262891
Aszani Aszani
{"title":"Machine Learning for Knowledge Discovery with R: Methodologies for Modeling, Inference, and PredictionKao-Tai Tsai, Boca Raton, FL: CRC Press, Taylor & Francis Group, LLC, 2022, xiii + 260 pp., $ 88.00, ISBN: 978-1-032-06536-6 (H)","authors":"Aszani Aszani","doi":"10.1080/00401706.2023.2262891","DOIUrl":"https://doi.org/10.1080/00401706.2023.2262891","url":null,"abstract":"","PeriodicalId":22208,"journal":{"name":"Technometrics","volume":"157 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135948473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-02DOI: 10.1080/00401706.2023.2262897
Stan Lipovetsky
{"title":"Computer Age Statistical Inference: Algorithms, Evidence, and Data Science, Student ed.Bradley Efron and Trevor Hastie, UK: Cambridge University Press, 2021, xix + 491 pp., $ 39.99 (pbk), ISBN 978-1-108-82341-8.","authors":"Stan Lipovetsky","doi":"10.1080/00401706.2023.2262897","DOIUrl":"https://doi.org/10.1080/00401706.2023.2262897","url":null,"abstract":"","PeriodicalId":22208,"journal":{"name":"Technometrics","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135948231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-02DOI: 10.1080/00401706.2023.2262895
Enrique Garcia-Ceja
{"title":"A Criminologist’s Guide to R: Crime by the NumbersJacob Kaplan, Boca Raton, FL: Chapman and Hall/CRC Press, Taylor & Francis Group, 2022, 432 pp., ISBN 9781032244075.","authors":"Enrique Garcia-Ceja","doi":"10.1080/00401706.2023.2262895","DOIUrl":"https://doi.org/10.1080/00401706.2023.2262895","url":null,"abstract":"","PeriodicalId":22208,"journal":{"name":"Technometrics","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135948236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-02DOI: 10.1080/00401706.2023.2262893
Irvanal Haq, Nila Lestari
{"title":"Statistical GenomicsBrooke Fridley and Xuefeng Wang, New York, NY: Humana, 2023, 377 pp., EUR 169.99, ISBN 978-1-0716-2986-4 (eBook)","authors":"Irvanal Haq, Nila Lestari","doi":"10.1080/00401706.2023.2262893","DOIUrl":"https://doi.org/10.1080/00401706.2023.2262893","url":null,"abstract":"","PeriodicalId":22208,"journal":{"name":"Technometrics","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135948485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-02DOI: 10.1080/00401706.2023.2262898
Stan Lipovetsky
{"title":"Mathematics of The Big Four Casino Table Games: Blackjack, Baccarat, Craps, & RouletteMark Bollman, Boca Raton, FL: CRC Press/Chapman & Hall, Taylor & Francis Group, 2021, xi +353 pp., 43 B/W illustrations, $ 31.16 (pbk), ISBN 9780367740900","authors":"Stan Lipovetsky","doi":"10.1080/00401706.2023.2262898","DOIUrl":"https://doi.org/10.1080/00401706.2023.2262898","url":null,"abstract":"","PeriodicalId":22208,"journal":{"name":"Technometrics","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135948475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-02DOI: 10.1080/00401706.2023.2262889
Stan Lipovetsky
{"title":"Luck, Logic, and White Lies: The Mathematics of Games; 2nd ed.Jörg Bewersdorff, translated by David Kramer, Boca Raton, FL: A.K. Peters/CRC Press, Taylor & Francis Group, 2021, xx + 548 pp., $ 47.96 (pbk), ISBN 9780367548414","authors":"Stan Lipovetsky","doi":"10.1080/00401706.2023.2262889","DOIUrl":"https://doi.org/10.1080/00401706.2023.2262889","url":null,"abstract":"","PeriodicalId":22208,"journal":{"name":"Technometrics","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135948481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}