The proliferating popularity of Internet of Things (IoT) devices has led to wide‐scale networked system implementations across multiple disciplines, including transportation, medicine, smart homes, and many others. This unprecedented level of interconnectivity has introduced new security vulnerabilities and threats. Ensuring security in these IoT settings is crucial for protecting against malicious activities and safeguarding data. Real‐time identification and response to potential intrusions and attacks are essential, and intrusion detection systems (IDS) are pivotal in this process. However, the dynamic and diverse nature of the IoT environment presents significant challenges to existing IDS solutions, which are often based on rule‐based or statistical approaches. Deep learning, a subset of artificial intelligence, has shown great potential to enhance IDS in IoT. Deep learning models can identify complex patterns and characteristics by utilizing artificial neural networks, automatically building hierarchical representations from data. This capability results in more precise and efficient intrusion detection in IoT‐based systems. The primary aim of this survey is to present an extensive overview of the current research on deep learning and IDS in the IoT domain. By examining existing literature, discussing mainstream datasets, and highlighting current challenges and potential prospects, this survey provides valuable insights into the prevailing scenario and future directions for using deep learning in IDS for IoT. The findings from this research aim to enhance intrusion detection techniques in IoT environments and promote the development of more effective antimalware solutions against cyber threats targeting IoT device systems.
{"title":"A comprehensive survey on deep learning‐based intrusion detection systems in Internet of Things (IoT)","authors":"Qasem Abu Al‐Haija, Ayat Droos","doi":"10.1111/exsy.13726","DOIUrl":"https://doi.org/10.1111/exsy.13726","url":null,"abstract":"The proliferating popularity of Internet of Things (IoT) devices has led to wide‐scale networked system implementations across multiple disciplines, including transportation, medicine, smart homes, and many others. This unprecedented level of interconnectivity has introduced new security vulnerabilities and threats. Ensuring security in these IoT settings is crucial for protecting against malicious activities and safeguarding data. Real‐time identification and response to potential intrusions and attacks are essential, and intrusion detection systems (IDS) are pivotal in this process. However, the dynamic and diverse nature of the IoT environment presents significant challenges to existing IDS solutions, which are often based on rule‐based or statistical approaches. Deep learning, a subset of artificial intelligence, has shown great potential to enhance IDS in IoT. Deep learning models can identify complex patterns and characteristics by utilizing artificial neural networks, automatically building hierarchical representations from data. This capability results in more precise and efficient intrusion detection in IoT‐based systems. The primary aim of this survey is to present an extensive overview of the current research on deep learning and IDS in the IoT domain. By examining existing literature, discussing mainstream datasets, and highlighting current challenges and potential prospects, this survey provides valuable insights into the prevailing scenario and future directions for using deep learning in IDS for IoT. The findings from this research aim to enhance intrusion detection techniques in IoT environments and promote the development of more effective antimalware solutions against cyber threats targeting IoT device systems.","PeriodicalId":51053,"journal":{"name":"Expert Systems","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142254408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Image copy‐move forgery, where an image region is copied and pasted within the same image, is a simple yet widely employed manipulation. In this paper, we rethink copy‐move forgery detection from the perspective of multi‐task learning and summarize two characteristics of this problem: (1) Homology and (2) Manipulated traces. Consequently, we propose a multi‐task forgery detection network (MTFDN) for image copy‐move forgery localization and source/target distinguishment. The network consists of a hard‐parameter sharing feature extractor, global forged homology detection (GFHD) and local manipulated trace detection (LMTD) modules. The difference of feature distribution between the GFHD module and the LMTD module is significantly reduced by sharing parameters. Experimental results on several benchmark copy‐move forgery datasets demonstrate the effectiveness of our proposed MTFDN.
{"title":"MTFDN: An image copy‐move forgery detection method based on multi‐task learning","authors":"Peng Liang, Hang Tu, Amir Hussain, Ziyuan Li","doi":"10.1111/exsy.13729","DOIUrl":"https://doi.org/10.1111/exsy.13729","url":null,"abstract":"Image copy‐move forgery, where an image region is copied and pasted within the same image, is a simple yet widely employed manipulation. In this paper, we rethink copy‐move forgery detection from the perspective of multi‐task learning and summarize two characteristics of this problem: (1) Homology and (2) Manipulated traces. Consequently, we propose a multi‐task forgery detection network (MTFDN) for image copy‐move forgery localization and source/target distinguishment. The network consists of a hard‐parameter sharing feature extractor, global forged homology detection (GFHD) and local manipulated trace detection (LMTD) modules. The difference of feature distribution between the GFHD module and the LMTD module is significantly reduced by sharing parameters. Experimental results on several benchmark copy‐move forgery datasets demonstrate the effectiveness of our proposed MTFDN.","PeriodicalId":51053,"journal":{"name":"Expert Systems","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142254409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nowadays, transfer learning has shown promising results in many applications. However, most deep transfer learning methods such as parameter sharing and fine‐tuning are still suffering from the lack of parameters transmission strategy. In this paper, we propose a new optimization model for parameter‐based transfer learning in convolutional neural networks named STP‐CNN. Indeed, we propose a Lasso transfer model supported by a regularization term that controls transferability. Moreover, we opt for the proximal gradient descent method to solve the proposed model. The suggested technique allows, under certain conditions, to control exactly which parameters, in each convolutional layer of the source network, which will be used directly or adjusted in the target network. Several experiments prove the performance of our model in locating the transferable parameters as well as improving the data classification.
{"title":"STP‐CNN: Selection of transfer parameters in convolutional neural networks","authors":"Otmane Mallouk, Nour‐Eddine Joudar, Mohamed Ettaouil","doi":"10.1111/exsy.13728","DOIUrl":"https://doi.org/10.1111/exsy.13728","url":null,"abstract":"Nowadays, transfer learning has shown promising results in many applications. However, most deep transfer learning methods such as <jats:italic>parameter sharing</jats:italic> and <jats:italic>fine‐tuning</jats:italic> are still suffering from the lack of parameters transmission strategy. In this paper, we propose a new optimization model for parameter‐based transfer learning in convolutional neural networks named STP‐CNN. Indeed, we propose a Lasso transfer model supported by a regularization term that controls transferability. Moreover, we opt for the proximal gradient descent method to solve the proposed model. The suggested technique allows, under certain conditions, to control exactly which parameters, in each convolutional layer of the source network, which will be used directly or adjusted in the target network. Several experiments prove the performance of our model in locating the transferable parameters as well as improving the data classification.","PeriodicalId":51053,"journal":{"name":"Expert Systems","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Human emotional states encompass both basic and compound facial expressions. However, current works primarily focus on basic expressions, consequently neglecting the broad spectrum of human emotions encountered in practical scenarios. Compound facial expressions involve the simultaneous manifestation of multiple emotions on an individual's face. This phenomenon reflects the complexity and richness of human states, where facial features dynamically convey a combination of feelings. This study embarks on a pioneering exploration of Compound Facial Expression Recognition (CFER), with a distinctive emphasis on leveraging the Label Distribution Learning (LDL) paradigm. This strategic application of LDL aims to address the ambiguity and complexity inherent in compound expressions, marking a significant departure from the dominant Single Label Learning (SLL) and Multi‐Label Learning (MLL) paradigms. Within this framework, we rigorously investigate the potential of LDL for a critical challenge in Facial Expression Recognition (FER): recognizing compound facial expressions in uncontrolled environments. We utilize the recently introduced RAF‐CE dataset, meticulously designed for compound expression assessment. By conducting a comprehensive comparative analysis pitting LDL against conventional SLL and MLL approaches on RAF‐CE, we aim to definitively establish LDL's superiority in handling this complex task. Furthermore, we assess the generalizability of LDL models trained on RAF‐CE by evaluating their performance on the EmotioNet and RAF‐DB Compound datasets. This demonstrates their effectiveness without domain adaptation. To solidify these findings, we conduct a comprehensive comparative analysis of 12 cutting‐edge LDL algorithms on RAF‐CE, S‐BU3DFE, and S‐JAFFE datasets, providing valuable insights into the most effective LDL techniques for FER in‐the‐wild.
{"title":"Label distribution learning for compound facial expression recognition in‐the‐wild: A comparative study","authors":"Afifa Khelifa, Haythem Ghazouani, Walid Barhoumi","doi":"10.1111/exsy.13724","DOIUrl":"https://doi.org/10.1111/exsy.13724","url":null,"abstract":"Human emotional states encompass both basic and compound facial expressions. However, current works primarily focus on basic expressions, consequently neglecting the broad spectrum of human emotions encountered in practical scenarios. Compound facial expressions involve the simultaneous manifestation of multiple emotions on an individual's face. This phenomenon reflects the complexity and richness of human states, where facial features dynamically convey a combination of feelings. This study embarks on a pioneering exploration of Compound Facial Expression Recognition (CFER), with a distinctive emphasis on leveraging the Label Distribution Learning (LDL) paradigm. This strategic application of LDL aims to address the ambiguity and complexity inherent in compound expressions, marking a significant departure from the dominant Single Label Learning (SLL) and Multi‐Label Learning (MLL) paradigms. Within this framework, we rigorously investigate the potential of LDL for a critical challenge in Facial Expression Recognition (FER): recognizing compound facial expressions in uncontrolled environments. We utilize the recently introduced RAF‐CE dataset, meticulously designed for compound expression assessment. By conducting a comprehensive comparative analysis pitting LDL against conventional SLL and MLL approaches on RAF‐CE, we aim to definitively establish LDL's superiority in handling this complex task. Furthermore, we assess the generalizability of LDL models trained on RAF‐CE by evaluating their performance on the EmotioNet and RAF‐DB Compound datasets. This demonstrates their effectiveness without domain adaptation. To solidify these findings, we conduct a comprehensive comparative analysis of 12 cutting‐edge LDL algorithms on RAF‐CE, S‐BU3DFE, and S‐JAFFE datasets, providing valuable insights into the most effective LDL techniques for FER in‐the‐wild.","PeriodicalId":51053,"journal":{"name":"Expert Systems","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the Internet of Medical Things (IoMT), the vulnerability of federated learning (FL) to single points of failure, low‐quality nodes, and poisoning attacks necessitates innovative solutions. This article introduces a FL‐driven dual‐blockchain approach to address these challenges and improve data sharing and reputation management. Our approach comprises two blockchains: the Model Quality Blockchain (MQchain) and the Reputation Incentive Blockchain (RIchain). MQchain utilizes an enhanced Proof of Quality (PoQ) consensus algorithm to exclude low‐quality nodes from participating in aggregation, effectively mitigating single points of failure and poisoning attacks by leveraging node reputation and quality thresholds. In parallel, RIchain incorporates a reputation evaluation, incentive mechanism, and index query mechanism, allowing for rapid and comprehensive node evaluation, thus identifying high‐reputation nodes for MQchain. Security analysis confirms the theoretical soundness of the proposed method. Experimental evaluation using real medical datasets, specifically MedMNIST, demonstrates the remarkable resilience of our approach against attacks compared to three alternative methods.
{"title":"Federated learning‐driven dual blockchain for data sharing and reputation management in Internet of medical things","authors":"Chenquan Gan, Xinghai Xiao, Qingyi Zhu, Deepak Kumar Jain, Akanksha Saini, Amir Hussain","doi":"10.1111/exsy.13714","DOIUrl":"https://doi.org/10.1111/exsy.13714","url":null,"abstract":"In the Internet of Medical Things (IoMT), the vulnerability of federated learning (FL) to single points of failure, low‐quality nodes, and poisoning attacks necessitates innovative solutions. This article introduces a FL‐driven dual‐blockchain approach to address these challenges and improve data sharing and reputation management. Our approach comprises two blockchains: the Model Quality Blockchain (MQchain) and the Reputation Incentive Blockchain (RIchain). MQchain utilizes an enhanced Proof of Quality (PoQ) consensus algorithm to exclude low‐quality nodes from participating in aggregation, effectively mitigating single points of failure and poisoning attacks by leveraging node reputation and quality thresholds. In parallel, RIchain incorporates a reputation evaluation, incentive mechanism, and index query mechanism, allowing for rapid and comprehensive node evaluation, thus identifying high‐reputation nodes for MQchain. Security analysis confirms the theoretical soundness of the proposed method. Experimental evaluation using real medical datasets, specifically MedMNIST, demonstrates the remarkable resilience of our approach against attacks compared to three alternative methods.","PeriodicalId":51053,"journal":{"name":"Expert Systems","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Palomo‐Alonso, V. G. Costa, L. M. Moreno‐Saavedra, E. Lorente‐Ramos, J. Pérez‐Aracil, C. E. Pedreira, S. Salcedo‐Sanz
This paper presents a novel implementation of the Coral Reef Optimization with Substrate Layers (CRO‐SL) algorithm. Our approach, which we call TensorCRO, takes advantage of the TensorFlow framework to represent CRO‐SL as a series of tensor operations, allowing it to run on GPU and search for solutions in a faster and more efficient way. We evaluate the performance of the proposed implementation across a wide range of benchmark functions commonly used in optimization research (such as the Rastrigin, Rosenbrock, Ackley, and Griewank functions), and we show that GPU execution leads to considerable speedups when compared to its CPU counterpart. Then, when comparing TensorCRO to other state‐of‐the‐art optimization algorithms (such as the Genetic Algorithm, Simulated Annealing, and Particle Swarm Optimization), the results show that TensorCRO can achieve better convergence rates and solutions than other algorithms within a fixed execution time, given that the fitness functions are also implemented on TensorFlow. Furthermore, we also evaluate the proposed approach in a real‐world problem of optimizing power production in wind farms by selecting the locations of turbines; in every evaluated scenario, TensorCRO outperformed the other meta‐heuristics and achieved solutions close to the best known in the literature. Overall, our implementation of the CRO‐SL algorithm in TensorFlow GPU provides a new, fast, and efficient approach to solving optimization problems, and we believe that the proposed implementation has significant potential to be applied in various domains, such as engineering, finance, and machine learning, where optimization is often used to solve complex problems. Furthermore, we propose that this implementation can be used to optimize models that cannot propagate an error gradient, which is an excellent choice for non‐gradient‐based optimizers.
{"title":"TensorCRO: A TensorFlow‐based implementation of a multi‐method ensemble for optimization","authors":"A. Palomo‐Alonso, V. G. Costa, L. M. Moreno‐Saavedra, E. Lorente‐Ramos, J. Pérez‐Aracil, C. E. Pedreira, S. Salcedo‐Sanz","doi":"10.1111/exsy.13713","DOIUrl":"https://doi.org/10.1111/exsy.13713","url":null,"abstract":"This paper presents a novel implementation of the Coral Reef Optimization with Substrate Layers (CRO‐SL) algorithm. Our approach, which we call TensorCRO, takes advantage of the TensorFlow framework to represent CRO‐SL as a series of tensor operations, allowing it to run on GPU and search for solutions in a faster and more efficient way. We evaluate the performance of the proposed implementation across a wide range of benchmark functions commonly used in optimization research (such as the Rastrigin, Rosenbrock, Ackley, and Griewank functions), and we show that GPU execution leads to considerable speedups when compared to its CPU counterpart. Then, when comparing TensorCRO to other state‐of‐the‐art optimization algorithms (such as the Genetic Algorithm, Simulated Annealing, and Particle Swarm Optimization), the results show that TensorCRO can achieve better convergence rates and solutions than other algorithms within a fixed execution time, given that the fitness functions are also implemented on TensorFlow. Furthermore, we also evaluate the proposed approach in a real‐world problem of optimizing power production in wind farms by selecting the locations of turbines; in every evaluated scenario, TensorCRO outperformed the other meta‐heuristics and achieved solutions close to the best known in the literature. Overall, our implementation of the CRO‐SL algorithm in TensorFlow GPU provides a new, fast, and efficient approach to solving optimization problems, and we believe that the proposed implementation has significant potential to be applied in various domains, such as engineering, finance, and machine learning, where optimization is often used to solve complex problems. Furthermore, we propose that this implementation can be used to optimize models that cannot propagate an error gradient, which is an excellent choice for non‐gradient‐based optimizers.","PeriodicalId":51053,"journal":{"name":"Expert Systems","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multiple kernel k‐means clustering (MKKC) can efficiently incorporate multiple base kernels to generate an optimal kernel. Many existing MKKC methods all need two‐step operation: learning clustering indicator matrix and performing clustering on it. However, the optimal clustering results of two steps are not equivalent to those of original problem. To address this issue, in this paper we propose a novel method named one‐step multiple kernel k‐means clustering based on block diagonal representation (OS‐MKKC‐BD). By imposing a block diagonal constraint on the product of indicator matrix and its transpose, this method can encourage the indicator matrix to be block diagonal. Then the indicator matrix can produce explicit clustering indicator, so as to implement one‐step clustering, which avoids the disadvantage of two‐step operation. Furthermore, a simple kernel weighting strategy is used to obtain an optimal kernel, which boosts the quality of optimal kernel. In addition, a three‐step iterative algorithm is designed to solve the corresponding optimization problem, where the Riemann conjugate gradient iterative method is used to solve the optimization problem of the indicator matrix. Finally, by extensive experiments on eleven real data sets and comparison of clustering results with 10 MKC methods, it is concluded that OS‐MKKC‐BD is effective.
{"title":"One‐step multiple kernel k‐means clustering based on block diagonal representation","authors":"Cuiling Chen, Zhi Li","doi":"10.1111/exsy.13720","DOIUrl":"https://doi.org/10.1111/exsy.13720","url":null,"abstract":"Multiple kernel <jats:italic>k</jats:italic>‐means clustering (MKKC) can efficiently incorporate multiple base kernels to generate an optimal kernel. Many existing MKKC methods all need two‐step operation: learning clustering indicator matrix and performing clustering on it. However, the optimal clustering results of two steps are not equivalent to those of original problem. To address this issue, in this paper we propose a novel method named one‐step multiple kernel <jats:italic>k</jats:italic>‐means clustering based on block diagonal representation (OS‐MKKC‐BD). By imposing a block diagonal constraint on the product of indicator matrix and its transpose, this method can encourage the indicator matrix to be block diagonal. Then the indicator matrix can produce explicit clustering indicator, so as to implement one‐step clustering, which avoids the disadvantage of two‐step operation. Furthermore, a simple kernel weighting strategy is used to obtain an optimal kernel, which boosts the quality of optimal kernel. In addition, a three‐step iterative algorithm is designed to solve the corresponding optimization problem, where the Riemann conjugate gradient iterative method is used to solve the optimization problem of the indicator matrix. Finally, by extensive experiments on eleven real data sets and comparison of clustering results with 10 MKC methods, it is concluded that OS‐MKKC‐BD is effective.","PeriodicalId":51053,"journal":{"name":"Expert Systems","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multivariate time series have more complex and high‐dimensional characteristics, which makes it difficult to analyze and predict the data accurately. In this paper, a new multivariate time series prediction method is proposed. This method is a generative adversarial networks (GAN) method based on Fourier transform and bi‐directional gated recurrent unit (Bi‐GRU). First, the Fourier transform is utilized to extend the data features, which helps the GAN to better learn the distributional features of the original data. Second, in order to guide the model to fully learn the distribution of the original time series data, Bi‐GRU is introduced as the generator of GAN. To solve the problems of mode collapse and gradient vanishing that exist in GAN, Wasserstein distance is used as the loss function of GAN. Finally, the proposed method is used for the prediction of air quality, stock price and RMB exchange rate. The experimental results show that the model can effectively predict the trend of the time series compared with the other nine baseline models. It significantly improves the accuracy and flexibility of multivariate time series forecasting and provides new ideas and methods for accurate time series forecasting in industrial, financial and environmental fields.
多变量时间序列具有更加复杂和高维的特征,这给准确分析和预测数据带来了困难。本文提出了一种新的多元时间序列预测方法。该方法是一种基于傅立叶变换和双向门控递归单元(Bi-GRU)的生成对抗网络(GAN)方法。首先,利用傅立叶变换扩展数据特征,这有助于 GAN 更好地学习原始数据的分布特征。其次,为了引导模型充分学习原始时间序列数据的分布,引入了 Bi-GRU 作为 GAN 的生成器。为了解决 GAN 中存在的模式崩溃和梯度消失问题,采用 Wasserstein 距离作为 GAN 的损失函数。最后,将所提出的方法用于空气质量、股票价格和人民币汇率的预测。实验结果表明,与其他九种基线模型相比,该模型能有效预测时间序列的趋势。它极大地提高了多元时间序列预测的准确性和灵活性,为工业、金融和环境领域的精确时间序列预测提供了新的思路和方法。
{"title":"A new method based on generative adversarial networks for multivariate time series prediction","authors":"Xiwen Qin, Hongyu Shi, Xiaogang Dong, Siqi Zhang","doi":"10.1111/exsy.13700","DOIUrl":"https://doi.org/10.1111/exsy.13700","url":null,"abstract":"Multivariate time series have more complex and high‐dimensional characteristics, which makes it difficult to analyze and predict the data accurately. In this paper, a new multivariate time series prediction method is proposed. This method is a generative adversarial networks (GAN) method based on Fourier transform and bi‐directional gated recurrent unit (Bi‐GRU). First, the Fourier transform is utilized to extend the data features, which helps the GAN to better learn the distributional features of the original data. Second, in order to guide the model to fully learn the distribution of the original time series data, Bi‐GRU is introduced as the generator of GAN. To solve the problems of mode collapse and gradient vanishing that exist in GAN, Wasserstein distance is used as the loss function of GAN. Finally, the proposed method is used for the prediction of air quality, stock price and RMB exchange rate. The experimental results show that the model can effectively predict the trend of the time series compared with the other nine baseline models. It significantly improves the accuracy and flexibility of multivariate time series forecasting and provides new ideas and methods for accurate time series forecasting in industrial, financial and environmental fields.","PeriodicalId":51053,"journal":{"name":"Expert Systems","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142226348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Traditionally, the design of an expert system involves acquiring knowledge, in the form of symbolic rules, directly from the expert(s), which is a complex and time‐consuming task. Although expert systems approach is quite old, it is still present, especially where explicit knowledge representation and reasoning, which assure interpretability and explainability, are necessary. Therefore, machine learning methods have been devised to extract rules from data, to facilitate that task. However, those methods are quite inflexible in adapting to the application domain and provide no help in designing the expert system. In this work, we present a framework and corresponding tool, namely ACRES, for semi‐automatically generating expert systems from datasets. ACRES allows for data preprocessing, which helps in structuring knowledge in the form of a tree, called rule hierarchy, which represents (possible) dependencies among data variables and is used for rule formation. This improves interpretability and explainability of the produced systems. We have also designed and evaluated alternative methods for rule extraction from data and for calculation and use of certainty factors, to represent uncertainty; CFs can be dynamically updated. Experimental results on seven well‐known datasets show that the proposed rule extraction methods are comparable to other popular machine learning approaches like decision trees, CART, JRip, PART, Random Forest, and so on, for the classification task. Finally, we give insights on two applications of ACRES.
{"title":"ACRES: A framework for (semi)automatic generation of rule‐based expert systems with uncertainty from datasets","authors":"Konstantinos Kovas, Ioannis Hatzilygeroudis","doi":"10.1111/exsy.13723","DOIUrl":"https://doi.org/10.1111/exsy.13723","url":null,"abstract":"Traditionally, the design of an expert system involves acquiring knowledge, in the form of symbolic rules, directly from the expert(s), which is a complex and time‐consuming task. Although expert systems approach is quite old, it is still present, especially where explicit knowledge representation and reasoning, which assure interpretability and explainability, are necessary. Therefore, machine learning methods have been devised to extract rules from data, to facilitate that task. However, those methods are quite inflexible in adapting to the application domain and provide no help in designing the expert system. In this work, we present a framework and corresponding tool, namely ACRES, for semi‐automatically generating expert systems from datasets. ACRES allows for data preprocessing, which helps in structuring knowledge in the form of a tree, called rule hierarchy, which represents (possible) dependencies among data variables and is used for rule formation. This improves interpretability and explainability of the produced systems. We have also designed and evaluated alternative methods for rule extraction from data and for calculation and use of certainty factors, to represent uncertainty; CFs can be dynamically updated. Experimental results on seven well‐known datasets show that the proposed rule extraction methods are comparable to other popular machine learning approaches like decision trees, CART, JRip, PART, Random Forest, and so on, for the classification task. Finally, we give insights on two applications of ACRES.","PeriodicalId":51053,"journal":{"name":"Expert Systems","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This research involved designing and building an interactive generative AI application to conduct a comparative analysis of two advanced Large Language Models (LLMs), GPT‐4, and Claude 2, using Langsmith evaluation tools. The project was developed to explore the potential of LLMs in facilitating postgraduate course recommendations within a simulated environment at Munster Technological University (MTU). Designed for comparative analysis, the application enables testing of GPT‐4 and Claude 2 and can be hosted flexibly on either Amazon Web Services (AWS) or Azure. It utilizes advanced natural language processing and retrieval‐augmented generation (RAG) techniques to process proprietary data tailored to postgraduate needs. A key component of this research was the rigorous assessment of the LLMs using the Langsmith evaluation tool against both customized and standard benchmarks. The evaluation focused on metrics such as bias, safety, accuracy, cost, robustness, and latency. Additionally, adaptability covering critical features like language translation and internet access, was independently researched since the Langsmith tool does not evaluate this metric. This ensures a holistic assessment of the LLM's capabilities.
这项研究涉及设计和构建一个交互式生成人工智能应用程序,利用兰史密斯评估工具对两种先进的大型语言模型(LLM)--GPT-4 和 Claude 2 进行比较分析。开发该项目的目的是在明斯特理工大学(MTU)的模拟环境中探索 LLM 在促进研究生课程推荐方面的潜力。该应用程序专为比较分析而设计,可对 GPT-4 和 Claude 2 进行测试,并可灵活地托管在亚马逊网络服务(AWS)或 Azure 上。它利用先进的自然语言处理和检索增强生成(RAG)技术来处理专有数据,以满足研究生的需求。本研究的一个关键组成部分是使用兰斯史密斯评估工具,根据定制和标准基准对 LLM 进行严格评估。评估的重点是偏差、安全性、准确性、成本、稳健性和延迟等指标。此外,还对语言翻译和互联网接入等关键功能的适应性进行了独立研究,因为兰斯史密斯工具并不对这一指标进行评估。这确保了对 LLM 能力的全面评估。
{"title":"Comparative evaluation of Large Language Models using key metrics and emerging tools","authors":"Sarah McAvinue, Kapal Dev","doi":"10.1111/exsy.13719","DOIUrl":"https://doi.org/10.1111/exsy.13719","url":null,"abstract":"This research involved designing and building an interactive generative AI application to conduct a comparative analysis of two advanced Large Language Models (LLMs), GPT‐4, and Claude 2, using Langsmith evaluation tools. The project was developed to explore the potential of LLMs in facilitating postgraduate course recommendations within a simulated environment at Munster Technological University (MTU). Designed for comparative analysis, the application enables testing of GPT‐4 and Claude 2 and can be hosted flexibly on either Amazon Web Services (AWS) or Azure. It utilizes advanced natural language processing and retrieval‐augmented generation (RAG) techniques to process proprietary data tailored to postgraduate needs. A key component of this research was the rigorous assessment of the LLMs using the Langsmith evaluation tool against both customized and standard benchmarks. The evaluation focused on metrics such as bias, safety, accuracy, cost, robustness, and latency. Additionally, adaptability covering critical features like language translation and internet access, was independently researched since the Langsmith tool does not evaluate this metric. This ensures a holistic assessment of the LLM's capabilities.","PeriodicalId":51053,"journal":{"name":"Expert Systems","volume":null,"pages":null},"PeriodicalIF":3.3,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142206595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}