This article is dedicated to study the impact of machine intelligence (MI) methods viz. various types of Neural models for investigating dynamical systems arising in interdisciplinary areas. Different types of artificial neural network (ANN) methods, viz., recurrent neural network, functional‐link neural network, convolutional neural network, symplectic artificial neural network, genetic algorithm neural network, and so on, are addressed by different researchers to investigate these problems. Although various traditional methods have been developed by researchers to solve these dynamical problems but the existing traditional methods may sometimes be problem dependent, require repetitions of the simulations, and fail to solve nonlinearity behavior. In this regard, neural network model based methods are more general and solutions are continuous over the given domain of integration, self‐adaptive and can be used as a black box. As such, in this article, we have reviewed and analyzed different MI methods, which are applied to investigate these problems.
{"title":"Machine intelligence in dynamical systems: A state‐of‐art review","authors":"A. Sahoo, S. Chakraverty","doi":"10.1002/widm.1461","DOIUrl":"https://doi.org/10.1002/widm.1461","url":null,"abstract":"This article is dedicated to study the impact of machine intelligence (MI) methods viz. various types of Neural models for investigating dynamical systems arising in interdisciplinary areas. Different types of artificial neural network (ANN) methods, viz., recurrent neural network, functional‐link neural network, convolutional neural network, symplectic artificial neural network, genetic algorithm neural network, and so on, are addressed by different researchers to investigate these problems. Although various traditional methods have been developed by researchers to solve these dynamical problems but the existing traditional methods may sometimes be problem dependent, require repetitions of the simulations, and fail to solve nonlinearity behavior. In this regard, neural network model based methods are more general and solutions are continuous over the given domain of integration, self‐adaptive and can be used as a black box. As such, in this article, we have reviewed and analyzed different MI methods, which are applied to investigate these problems.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"87 4 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77884153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Image steganalysis involves the discovery of secret information embedded in an image. The common method is blind image steganalysis, which is a two‐class classification problem. Blind steganalysis extracts all possible feature variations in an image due to embedding, select the most appropriate feature data, and then classifies the image. The dimensionality of the extracted image features are high and demand data reduction to identify the most relevant features and to aid accurate classification of an image. The classification is under two classes namely, clean (cover) image and stego (image with embedded secret data) image. Since the classification accuracy depends on selection of most appropriate features, opting for the best data reduction or data optimization algorithms becomes a prime requisite. Research shows that most of the statistical optimization techniques converge to local minima and lead to less classification accuracy as compared to bio‐inspired methods. Bio‐inspired optimization methods obtain improved classification accuracy by reducing the high‐dimensional image features. These methods start with an initial population and then optimize them in steps till a global optimal point is reached. Examples of such methods include Ant Lion Optimization (ALO), Fire Fly Algorithm (FFA), and literature shows around 54 such algorithms. Bio‐inspired optimization has been applied in various fields of design optimization and is novel to image steganalysis. This article analyses the various bio‐inspired optimization techniques and their accuracy in image steganalysis pertaining to the discovery of embedded information in both JPEG and spatial domain steganalysis.
{"title":"Critical review of bio‐inspired data optimization techniques: An image steganalysis perspective","authors":"Anita Christaline Johnvictor, Austin Joe Amalanathan, Ramya Meghana Pariti Venkata, Nishtha Jethi","doi":"10.1002/widm.1460","DOIUrl":"https://doi.org/10.1002/widm.1460","url":null,"abstract":"Image steganalysis involves the discovery of secret information embedded in an image. The common method is blind image steganalysis, which is a two‐class classification problem. Blind steganalysis extracts all possible feature variations in an image due to embedding, select the most appropriate feature data, and then classifies the image. The dimensionality of the extracted image features are high and demand data reduction to identify the most relevant features and to aid accurate classification of an image. The classification is under two classes namely, clean (cover) image and stego (image with embedded secret data) image. Since the classification accuracy depends on selection of most appropriate features, opting for the best data reduction or data optimization algorithms becomes a prime requisite. Research shows that most of the statistical optimization techniques converge to local minima and lead to less classification accuracy as compared to bio‐inspired methods. Bio‐inspired optimization methods obtain improved classification accuracy by reducing the high‐dimensional image features. These methods start with an initial population and then optimize them in steps till a global optimal point is reached. Examples of such methods include Ant Lion Optimization (ALO), Fire Fly Algorithm (FFA), and literature shows around 54 such algorithms. Bio‐inspired optimization has been applied in various fields of design optimization and is novel to image steganalysis. This article analyses the various bio‐inspired optimization techniques and their accuracy in image steganalysis pertaining to the discovery of embedded information in both JPEG and spatial domain steganalysis.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"20 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85287828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Although artificial intelligence (AI; inclusive of machine learning) is gaining traction supporting climate change projections and impacts, limited work has used AI to address climate change adaptation. We identify this gap and highlight the value of AI especially in supporting complex adaptation choices and implementation. We illustrate how AI can effectively leverage precise, real‐time information in data‐scarce settings. We focus on supervised learning, transfer learning, reinforcement learning, and multimodal learning to illustrate how innovative AI methods can enable better‐informed choices, tailor adaptation measures to heterogenous groups and generate effective synergies and trade‐offs.
{"title":"Artificial intelligence for climate change adaptation","authors":"S. Cheong, K. Sankaran, Hamsa Bastani","doi":"10.1002/widm.1459","DOIUrl":"https://doi.org/10.1002/widm.1459","url":null,"abstract":"Although artificial intelligence (AI; inclusive of machine learning) is gaining traction supporting climate change projections and impacts, limited work has used AI to address climate change adaptation. We identify this gap and highlight the value of AI especially in supporting complex adaptation choices and implementation. We illustrate how AI can effectively leverage precise, real‐time information in data‐scarce settings. We focus on supervised learning, transfer learning, reinforcement learning, and multimodal learning to illustrate how innovative AI methods can enable better‐informed choices, tailor adaptation measures to heterogenous groups and generate effective synergies and trade‐offs.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"50 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79937568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The new educational models such as smart learning environments use of digital and context‐aware devices to facilitate the learning process. In this new educational scenario, a huge quantity of multimodal students' data from a variety of different sources can be captured, fused, and analyze. It offers to researchers and educators a unique opportunity of being able to discover new knowledge to better understand the learning process and to intervene if necessary. However, it is necessary to apply correctly data fusion approaches and techniques in order to combine various sources of multimodal learning analytics (MLA). These sources or modalities in MLA include audio, video, electrodermal activity data, eye‐tracking, user logs, and click‐stream data, but also learning artifacts and more natural human signals such as gestures, gaze, speech, or writing. This survey introduces data fusion in learning analytics (LA) and educational data mining (EDM) and how these data fusion techniques have been applied in smart learning. It shows the current state of the art by reviewing the main publications, the main type of fused educational data, and the data fusion approaches and techniques used in EDM/LA, as well as the main open problems, trends, and challenges in this specific research area.
{"title":"A review on data fusion in multimodal learning analytics and educational data mining","authors":"Wilson Chango, J. Lara, Rebeca Cerezo, C. Romero","doi":"10.1002/widm.1458","DOIUrl":"https://doi.org/10.1002/widm.1458","url":null,"abstract":"The new educational models such as smart learning environments use of digital and context‐aware devices to facilitate the learning process. In this new educational scenario, a huge quantity of multimodal students' data from a variety of different sources can be captured, fused, and analyze. It offers to researchers and educators a unique opportunity of being able to discover new knowledge to better understand the learning process and to intervene if necessary. However, it is necessary to apply correctly data fusion approaches and techniques in order to combine various sources of multimodal learning analytics (MLA). These sources or modalities in MLA include audio, video, electrodermal activity data, eye‐tracking, user logs, and click‐stream data, but also learning artifacts and more natural human signals such as gestures, gaze, speech, or writing. This survey introduces data fusion in learning analytics (LA) and educational data mining (EDM) and how these data fusion techniques have been applied in smart learning. It shows the current state of the art by reviewing the main publications, the main type of fused educational data, and the data fusion approaches and techniques used in EDM/LA, as well as the main open problems, trends, and challenges in this specific research area.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"13 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84559881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Buses are one of the important parts of public transport system. To provide accurate information about bus arrival and departure times at bus stops is one of the main parameters of good quality public transport. Accurate arrival and departure times information is important for a public transport mode since it enhances ridership as well as satisfaction of travelers. With accurate arrival‐time and departure time information, travelers can make informed decisions about their journey. The application of artificial intelligence (AI) based methods/algorithms to predict the bus arrival time (BAT) is reviewed in detail. Systematic survey of existing research conducted by various researchers by applying the different branches of AI has been done. Prediction models have been segregated and are accumulated under respective branches of AI. Thorough discussion is presented to elaborate different branches of AI that have been applied for several aspects of BAT prediction. Research gaps and possible future directions for further research work are summarized.
{"title":"A review of bus arrival time prediction using artificial intelligence","authors":"Nisha Singh, K. Kumar","doi":"10.1002/widm.1457","DOIUrl":"https://doi.org/10.1002/widm.1457","url":null,"abstract":"Buses are one of the important parts of public transport system. To provide accurate information about bus arrival and departure times at bus stops is one of the main parameters of good quality public transport. Accurate arrival and departure times information is important for a public transport mode since it enhances ridership as well as satisfaction of travelers. With accurate arrival‐time and departure time information, travelers can make informed decisions about their journey. The application of artificial intelligence (AI) based methods/algorithms to predict the bus arrival time (BAT) is reviewed in detail. Systematic survey of existing research conducted by various researchers by applying the different branches of AI has been done. Prediction models have been segregated and are accumulated under respective branches of AI. Thorough discussion is presented to elaborate different branches of AI that have been applied for several aspects of BAT prediction. Research gaps and possible future directions for further research work are summarized.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"55 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91260375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article emphasizes comprehending the “Garbage In, Garbage Out” (GIGO) rationale and ensuring the dataset quality in Machine Learning (ML) applications to achieve high and generalizable performance. An initial step should be added in an ML workflow where researchers evaluate the insights gained by quantitative analysis of the datasets sample and feature spaces. This study contributes towards achieving such a goal by suggesting a technique to quantify datasets in terms of feature frequency distribution characteristics. Hence a unique insight is provided into how the features in the available dataset samples are frequent. The technique was demonstrated in 11 benign and malign (malware) Android application datasets belonging to six academic Android mobile malware classification studies. The permissions requested by applications such as CALL_PHONE compose a relatively high‐dimensional binary feature space. The results showed that the distributions fit well into two of the four long right‐tail statistical distributions: log‐normal, exponential, power law, and Poisson. Precisely, log‐normal was the most exhibited statistical distribution except the two malign datasets that were in exponential. This study also explores statistical distribution fit/unfit feature analysis that enhances the insights in feature space. Finally, the study compiles phenomena examples in the literature exhibiting these statistical distributions that should be considered for interpreting the fitted distributions. In conclusion, conducting well‐formed statistical methods provides a clear understanding of the datasets and intra‐class and inter‐class differences before proceeding with selecting features and building a classifier model. Feature distribution characteristics should be one to analyze beforehand.
{"title":"Gaining insights in datasets in the shade of “garbage in, garbage out” rationale: Feature space distribution fitting","authors":"Gürol Canbek","doi":"10.1002/widm.1456","DOIUrl":"https://doi.org/10.1002/widm.1456","url":null,"abstract":"This article emphasizes comprehending the “Garbage In, Garbage Out” (GIGO) rationale and ensuring the dataset quality in Machine Learning (ML) applications to achieve high and generalizable performance. An initial step should be added in an ML workflow where researchers evaluate the insights gained by quantitative analysis of the datasets sample and feature spaces. This study contributes towards achieving such a goal by suggesting a technique to quantify datasets in terms of feature frequency distribution characteristics. Hence a unique insight is provided into how the features in the available dataset samples are frequent. The technique was demonstrated in 11 benign and malign (malware) Android application datasets belonging to six academic Android mobile malware classification studies. The permissions requested by applications such as CALL_PHONE compose a relatively high‐dimensional binary feature space. The results showed that the distributions fit well into two of the four long right‐tail statistical distributions: log‐normal, exponential, power law, and Poisson. Precisely, log‐normal was the most exhibited statistical distribution except the two malign datasets that were in exponential. This study also explores statistical distribution fit/unfit feature analysis that enhances the insights in feature space. Finally, the study compiles phenomena examples in the literature exhibiting these statistical distributions that should be considered for interpreting the fitted distributions. In conclusion, conducting well‐formed statistical methods provides a clear understanding of the datasets and intra‐class and inter‐class differences before proceeding with selecting features and building a classifier model. Feature distribution characteristics should be one to analyze beforehand.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"18 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79092308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modal verbs express modality, and modality is concerned with the status of the proposition that describes an event, it also expresses the opinion and attitude of a speaker toward the proposition of an utterance. Since modalities are directly related to the objective world, subjective world, and language use, they have been a hot topic of philosophers, logicians and linguists. Philosophers concern with the relations between the objective world and the true/false values of the modality; logicians are interested in the relations among the possibility, necessity and the objective world; and linguists pay attention to the modality category, sense category, function, recognition, and application of modal verbs. In recent years, the linguistic studies of modal verbs have extended from general linguistic studies to computational linguistic studies. Since modal verbs are a complex semantic system and they are often indeterminate in senses, they have been a tough issue in linguistic studies and have attracted great attention. To clarify the status of the previous linguistic studies of modal verbs and reveal the characteristics of the studies will be of great significance for the further study. Therefore, this article will focus on the review of the previous linguistic studies of English modal verbs and the data mining of the characteristics of the previous studies, and based on the summary of the previous studies, give suggestions for the further study of the English modal verbs.
{"title":"Review and data mining of linguistic studies of English modal verbs","authors":"Jianping Yu, Jilin Fu, Tana Bai, Xueping Xu","doi":"10.1002/widm.1455","DOIUrl":"https://doi.org/10.1002/widm.1455","url":null,"abstract":"Modal verbs express modality, and modality is concerned with the status of the proposition that describes an event, it also expresses the opinion and attitude of a speaker toward the proposition of an utterance. Since modalities are directly related to the objective world, subjective world, and language use, they have been a hot topic of philosophers, logicians and linguists. Philosophers concern with the relations between the objective world and the true/false values of the modality; logicians are interested in the relations among the possibility, necessity and the objective world; and linguists pay attention to the modality category, sense category, function, recognition, and application of modal verbs. In recent years, the linguistic studies of modal verbs have extended from general linguistic studies to computational linguistic studies. Since modal verbs are a complex semantic system and they are often indeterminate in senses, they have been a tough issue in linguistic studies and have attracted great attention. To clarify the status of the previous linguistic studies of modal verbs and reveal the characteristics of the studies will be of great significance for the further study. Therefore, this article will focus on the review of the previous linguistic studies of English modal verbs and the data mining of the characteristics of the previous studies, and based on the summary of the previous studies, give suggestions for the further study of the English modal verbs.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"11 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74774200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lam B. Q. Nguyen, I. Zelinka, V. Snás̃el, Loan T. T. Nguyen, Bay Vo
Large graphs are often used to simulate and model complex systems in various research and application fields. Because of its importance, frequent subgraph mining (FSM) in single large graphs is a vital issue, and recently, it has attracted numerous researchers, and played an important role in various tasks for both research and application purposes. FSM is aimed at finding all subgraphs whose number of appearances in a large graph is greater than or equal to a given frequency threshold. In most recent applications, the underlying graphs are very large, such as social networks, and therefore algorithms for FSM from a single large graph have been rapidly developed, but all of them have NP‐hard (nondeterministic polynomial time) complexity with huge search spaces, and therefore still need a lot of time and memory to restore and process. In this article, we present an overview of problems of FSM, important phases in FSM, main groups of FSM, as well as surveying many modern applied algorithms. This includes many practical applications and is a fundamental premise for many studies in the future.
{"title":"Subgraph mining in a large graph: A review","authors":"Lam B. Q. Nguyen, I. Zelinka, V. Snás̃el, Loan T. T. Nguyen, Bay Vo","doi":"10.1002/widm.1454","DOIUrl":"https://doi.org/10.1002/widm.1454","url":null,"abstract":"Large graphs are often used to simulate and model complex systems in various research and application fields. Because of its importance, frequent subgraph mining (FSM) in single large graphs is a vital issue, and recently, it has attracted numerous researchers, and played an important role in various tasks for both research and application purposes. FSM is aimed at finding all subgraphs whose number of appearances in a large graph is greater than or equal to a given frequency threshold. In most recent applications, the underlying graphs are very large, such as social networks, and therefore algorithms for FSM from a single large graph have been rapidly developed, but all of them have NP‐hard (nondeterministic polynomial time) complexity with huge search spaces, and therefore still need a lot of time and memory to restore and process. In this article, we present an overview of problems of FSM, important phases in FSM, main groups of FSM, as well as surveying many modern applied algorithms. This includes many practical applications and is a fundamental premise for many studies in the future.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"104 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76111040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years Artificial Intelligence in the form of machine learning has been revolutionizing biology, biomedical sciences, and gene-based agricultural technology capabilities. Massive data generated in biological sciences by rapid and deep gene sequencing and protein or other molecular structure determination, on the one hand, requires data analysis capabilities using machine learning that are distinctly different from classical statistical methods; on the other, these large datasets are enabling the adoption of novel data-intensive machine learning algorithms for the solution of biological problems that until recently had relied on mechanistic model-based approaches that are computationally expensive. This review provides a bird's eye view of the applications of machine learning in post-genomic biology. Attempt is also made to indicate as far as possible the areas of research that are poised to make further impacts in these areas, including the importance of explainable artificial intelligence (XAI) in human health. Further contributions of machine learning are expected to transform medicine, public health, agricultural technology, as well as to provide invaluable gene-based guidance for the management of complex environments in this age of global warming.
{"title":"Machine learning in postgenomic biology and personalized medicine.","authors":"Animesh Ray","doi":"10.1002/widm.1451","DOIUrl":"https://doi.org/10.1002/widm.1451","url":null,"abstract":"<p><p>In recent years Artificial Intelligence in the form of machine learning has been revolutionizing biology, biomedical sciences, and gene-based agricultural technology capabilities. Massive data generated in biological sciences by rapid and deep gene sequencing and protein or other molecular structure determination, on the one hand, requires data analysis capabilities using machine learning that are distinctly different from classical statistical methods; on the other, these large datasets are enabling the adoption of novel data-intensive machine learning algorithms for the solution of biological problems that until recently had relied on mechanistic model-based approaches that are computationally expensive. This review provides a bird's eye view of the applications of machine learning in post-genomic biology. Attempt is also made to indicate as far as possible the areas of research that are poised to make further impacts in these areas, including the importance of explainable artificial intelligence (XAI) in human health. Further contributions of machine learning are expected to transform medicine, public health, agricultural technology, as well as to provide invaluable gene-based guidance for the management of complex environments in this age of global warming.</p>","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"12 2","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9371441/pdf/nihms-1770264.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9375926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stefan Meisenbacher, Marian Turowski, Kaleb Phipps, Martin Ratz, D. Muller, V. Hagenmeyer, R. Mikut
Time series forecasting is fundamental for various use cases in different domains such as energy systems and economics. Creating a forecasting model for a specific use case requires an iterative and complex design process. The typical design process includes five sections (1) data preprocessing, (2) feature engineering, (3) hyperparameter optimization, (4) forecasting method selection, and (5) forecast ensembling, which are commonly organized in a pipeline structure. One promising approach to handle the ever‐growing demand for time series forecasts is automating this design process. The article, thus, reviews existing literature on automated time series forecasting pipelines and analyzes how the design process of forecasting models is currently automated. Thereby, we consider both automated machine learning (AutoML) and automated statistical forecasting methods in a single forecasting pipeline. For this purpose, we first present and compare the identified automation methods for each pipeline section. Second, we analyze these automation methods regarding their interaction, combination, and coverage of the five pipeline sections. For both, we discuss the reviewed literature that contributes toward automating the design process, identify problems, give recommendations, and suggest future research. This review reveals that the majority of the reviewed literature only covers two or three of the five pipeline sections. We conclude that future research has to holistically consider the automation of the forecasting pipeline to enable the large‐scale application of time series forecasting.
{"title":"Review of automated time series forecasting pipelines","authors":"Stefan Meisenbacher, Marian Turowski, Kaleb Phipps, Martin Ratz, D. Muller, V. Hagenmeyer, R. Mikut","doi":"10.1002/widm.1475","DOIUrl":"https://doi.org/10.1002/widm.1475","url":null,"abstract":"Time series forecasting is fundamental for various use cases in different domains such as energy systems and economics. Creating a forecasting model for a specific use case requires an iterative and complex design process. The typical design process includes five sections (1) data preprocessing, (2) feature engineering, (3) hyperparameter optimization, (4) forecasting method selection, and (5) forecast ensembling, which are commonly organized in a pipeline structure. One promising approach to handle the ever‐growing demand for time series forecasts is automating this design process. The article, thus, reviews existing literature on automated time series forecasting pipelines and analyzes how the design process of forecasting models is currently automated. Thereby, we consider both automated machine learning (AutoML) and automated statistical forecasting methods in a single forecasting pipeline. For this purpose, we first present and compare the identified automation methods for each pipeline section. Second, we analyze these automation methods regarding their interaction, combination, and coverage of the five pipeline sections. For both, we discuss the reviewed literature that contributes toward automating the design process, identify problems, give recommendations, and suggest future research. This review reveals that the majority of the reviewed literature only covers two or three of the five pipeline sections. We conclude that future research has to holistically consider the automation of the forecasting pipeline to enable the large‐scale application of time series forecasting.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"23 3 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89386499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}