{"title":"Explanatory Model Analysis: Explore, Explain, and Examine Predictive Models","authors":"Bing Si","doi":"10.1080/00224065.2021.1977101","DOIUrl":null,"url":null,"abstract":"Predictive models aim to guess, a.k.a., predict, values of a variable of interest based on other variables. It has been used throughout the entire human history and many statistical models have been developed for prediction during the last century. This book covers methods for exploration of predictive models from both instance level and dataset level. It would be a valuable addition to the Chapman & Hall/CRC’s Data Science Series. Together with other books that have published in the book series, this book provides a unique perspective into applied data science to guide data science practitioners who are interested in exploring, explaining, and examining data in real-world applications with both R and Python. Predictive models constitute an important component in the big picture of machine learning and data science approaches and require standard analytical steps such as model specification, model estimation, and model fitness diagnosis. Most of published books in this field focus on how to use these statistical methods to make predictions for different types of datasets, while lack of tools for model exploration and, in particular, model explanation (obtaining insights from model-based prediction) and model examination (evaluation of model performance and understanding its weakness). In contrast, this book is a novel effort that provides a deep understanding to all the steps with extensive validation and justification methods, leading to a better and faster interpretable data analysis. The book is well organized with three parts. It starts with an overview of basic concepts in Chapters 1-4 and then presents the instance-level exploration and datasetlevel exploration in Chapters 5-13 and Chapters 14-20, respectively. The overview part introduces basic and essential knowledge on model development process, software installation, and how to perform classic predictive models using software. The instance-level exploration part covers methods to help better understand “how a model yields a prediction for a particular single observation” for predictive models with both a small and a large number of exploratory variables. The last part is about dataset-level exploration that discusses “how do the model predictions perform overall, for an entire set of observations?” Although a basic understanding of programming languages would be beneficial, the coding part in this book is designed to be self-contained and friendly to readers without programming background as well. Additionally, it is worth noting that the readers are expected to have a certain level of knowledge about different types of data science models, such as logistic regression, support vector machine, and gradient boosting, and understand which kind of research questions each model can address. For example, given a research question aiming at predicting patient survival (yes/no) after surgery from other variables, e.g., age, symptoms, and medical history, the reader should be able to identify that the dependent variable of interest, survival, is a binary variable, and then consider a logistic regression model as a natural choice to start the predictive modeling. Overall, the book is a suitable reference book for data science practitioners to learn exploratory data analysis for predictive models and its applications using R or Python software.","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":null,"pages":null},"PeriodicalIF":2.6000,"publicationDate":"2021-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"62","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Quality Technology","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1080/00224065.2021.1977101","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}
引用次数: 62
Abstract
Predictive models aim to guess, a.k.a., predict, values of a variable of interest based on other variables. It has been used throughout the entire human history and many statistical models have been developed for prediction during the last century. This book covers methods for exploration of predictive models from both instance level and dataset level. It would be a valuable addition to the Chapman & Hall/CRC’s Data Science Series. Together with other books that have published in the book series, this book provides a unique perspective into applied data science to guide data science practitioners who are interested in exploring, explaining, and examining data in real-world applications with both R and Python. Predictive models constitute an important component in the big picture of machine learning and data science approaches and require standard analytical steps such as model specification, model estimation, and model fitness diagnosis. Most of published books in this field focus on how to use these statistical methods to make predictions for different types of datasets, while lack of tools for model exploration and, in particular, model explanation (obtaining insights from model-based prediction) and model examination (evaluation of model performance and understanding its weakness). In contrast, this book is a novel effort that provides a deep understanding to all the steps with extensive validation and justification methods, leading to a better and faster interpretable data analysis. The book is well organized with three parts. It starts with an overview of basic concepts in Chapters 1-4 and then presents the instance-level exploration and datasetlevel exploration in Chapters 5-13 and Chapters 14-20, respectively. The overview part introduces basic and essential knowledge on model development process, software installation, and how to perform classic predictive models using software. The instance-level exploration part covers methods to help better understand “how a model yields a prediction for a particular single observation” for predictive models with both a small and a large number of exploratory variables. The last part is about dataset-level exploration that discusses “how do the model predictions perform overall, for an entire set of observations?” Although a basic understanding of programming languages would be beneficial, the coding part in this book is designed to be self-contained and friendly to readers without programming background as well. Additionally, it is worth noting that the readers are expected to have a certain level of knowledge about different types of data science models, such as logistic regression, support vector machine, and gradient boosting, and understand which kind of research questions each model can address. For example, given a research question aiming at predicting patient survival (yes/no) after surgery from other variables, e.g., age, symptoms, and medical history, the reader should be able to identify that the dependent variable of interest, survival, is a binary variable, and then consider a logistic regression model as a natural choice to start the predictive modeling. Overall, the book is a suitable reference book for data science practitioners to learn exploratory data analysis for predictive models and its applications using R or Python software.
期刊介绍:
The objective of Journal of Quality Technology is to contribute to the technical advancement of the field of quality technology by publishing papers that emphasize the practical applicability of new techniques, instructive examples of the operation of existing techniques and results of historical researches. Expository, review, and tutorial papers are also acceptable if they are written in a style suitable for practicing engineers.
Sample our Mathematics & Statistics journals, sign in here to start your FREE access for 14 days