Pub Date : 2024-06-26DOI: 10.1007/s10115-024-02168-6
Minglan Xiong, Zhaoguo Hou, Huawei Wang, Changchang Che, Rui Luo
The safety of the civil aviation system has been of increasing concern with several accidents in recent years. It is urgent to put forward a precise accident prediction model, which can systematically analyze safety from the perspective of accident mechanism to enhance training accuracy. Furthermore, the predictive model is critical for stakeholders to identify risk and implement the proactive safety paradigm. In this work, to mitigate casualties and economic losses arising from aviation accidents and improve system safety, the focus is on predicting the aircraft damage severity, the injury/death severity, and the flight phases in the sequence of identifying event risk sources. This work establishes a multi-task deep convolutional neural network (MTCNN) learning framework to accomplish this goal. An innovative prediction rule will be developed to refine prediction results from two approaches: handling imbalanced classes and Bayesian optimization. By comparing the performance of the proposed multi-task model with other single-task machine learning models with ten-fold cross-validation and statistical testing, the effectiveness of the developed model in predicting aviation accident severity and flight phase is demonstrated.
{"title":"An aviation accidents prediction method based on MTCNN and Bayesian optimization","authors":"Minglan Xiong, Zhaoguo Hou, Huawei Wang, Changchang Che, Rui Luo","doi":"10.1007/s10115-024-02168-6","DOIUrl":"https://doi.org/10.1007/s10115-024-02168-6","url":null,"abstract":"<p>The safety of the civil aviation system has been of increasing concern with several accidents in recent years. It is urgent to put forward a precise accident prediction model, which can systematically analyze safety from the perspective of accident mechanism to enhance training accuracy. Furthermore, the predictive model is critical for stakeholders to identify risk and implement the proactive safety paradigm. In this work, to mitigate casualties and economic losses arising from aviation accidents and improve system safety, the focus is on predicting the aircraft damage severity, the injury/death severity, and the flight phases in the sequence of identifying event risk sources. This work establishes a multi-task deep convolutional neural network (MTCNN) learning framework to accomplish this goal. An innovative prediction rule will be developed to refine prediction results from two approaches: handling imbalanced classes and Bayesian optimization. By comparing the performance of the proposed multi-task model with other single-task machine learning models with ten-fold cross-validation and statistical testing, the effectiveness of the developed model in predicting aviation accident severity and flight phase is demonstrated.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"10 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141510289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-26DOI: 10.1007/s10115-024-02167-7
Zahra Jalali Khalil Abadi, Najme Mansouri, Mohammad Masoud Javidi
Many fields of research use parallelized and distributed computing environments, including astronomy, earth science, and bioinformatics. Due to an increase in client requests, service providers face various challenges, such as task scheduling, security, resource management, and virtual machine migration. NP-hard scheduling problems require a long time to implement an optimal or suboptimal solution due to their large solution space. With recent advances in artificial intelligence, deep reinforcement learning (DRL) can be used to solve scheduling problems. The DRL approach combines the strength of deep learning and neural networks with reinforcement learning’s feedback-based learning. This paper provides a comprehensive overview of DRL-based scheduling algorithms in distributed systems by categorizing algorithms and applications. As a result, several articles are assessed based on their main objectives, quality of service and scheduling parameters, as well as evaluation environments (i.e., simulation tools, real-world environment). The literature review indicates that algorithms based on RL, such as Q-learning, are effective for learning scaling and scheduling policies in a cloud environment. Additionally, the challenges and directions for further research on deep reinforcement learning to address scheduling problems were summarized (e.g., edge intelligence, ideal dynamic task scheduling framework, human–machine interaction, resource-hungry artificial intelligence (AI) and sustainability).
{"title":"Deep reinforcement learning-based scheduling in distributed systems: a critical review","authors":"Zahra Jalali Khalil Abadi, Najme Mansouri, Mohammad Masoud Javidi","doi":"10.1007/s10115-024-02167-7","DOIUrl":"https://doi.org/10.1007/s10115-024-02167-7","url":null,"abstract":"<p>Many fields of research use parallelized and distributed computing environments, including astronomy, earth science, and bioinformatics. Due to an increase in client requests, service providers face various challenges, such as task scheduling, security, resource management, and virtual machine migration. NP-hard scheduling problems require a long time to implement an optimal or suboptimal solution due to their large solution space. With recent advances in artificial intelligence, deep reinforcement learning (DRL) can be used to solve scheduling problems. The DRL approach combines the strength of deep learning and neural networks with reinforcement learning’s feedback-based learning. This paper provides a comprehensive overview of DRL-based scheduling algorithms in distributed systems by categorizing algorithms and applications. As a result, several articles are assessed based on their main objectives, quality of service and scheduling parameters, as well as evaluation environments (i.e., simulation tools, real-world environment). The literature review indicates that algorithms based on RL, such as Q-learning, are effective for learning scaling and scheduling policies in a cloud environment. Additionally, the challenges and directions for further research on deep reinforcement learning to address scheduling problems were summarized (e.g., edge intelligence, ideal dynamic task scheduling framework, human–machine interaction, resource-hungry artificial intelligence (AI) and sustainability).</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"39 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141510290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-19DOI: 10.1007/s10115-024-02163-x
Félicité Gamgne Domgue, Norbert Tsopze, René Ndoundam
Many hierarchical methods for community detection in multicolored networks are capable of finding clusters when there are interslice correlation between layers. However, in general, they aggregate all the links in different layer treating them as being equivalent. Therefore, such aggregation might ignore the information about the relevance of a dimension in which the node is involved. In this paper, we fill this gap by proposing a hierarchical classification-based Louvain method for interslice-multicolored networks. In particular, we define a new node centrality measure named Attractivity to describe the inter-slice correlation that incorporates within and across-dimension topological features in order to identify the relevant dimension. Then, after merging dimensions through a frequential aggregation, we group nodes by their relational and attribute similarity, where attributes correspond to their relevant dimensions. We conduct an extensive experimentation using seven real-world multicolored networks, which also includes comparison with state-of-the-art methods. Results show the significance of our proposed method in discovering relevant communities over multiple dimensions and highlight its ability in producing optimal covers with higher values of the multidimensional version of the modularity function.
{"title":"UCAD: commUnity disCovery method in Attribute-based multicoloreD networks","authors":"Félicité Gamgne Domgue, Norbert Tsopze, René Ndoundam","doi":"10.1007/s10115-024-02163-x","DOIUrl":"https://doi.org/10.1007/s10115-024-02163-x","url":null,"abstract":"<p>Many hierarchical methods for community detection in multicolored networks are capable of finding clusters when there are interslice correlation between layers. However, in general, they aggregate all the links in different layer treating them as being equivalent. Therefore, such aggregation might ignore the information about the relevance of a dimension in which the node is involved. In this paper, we fill this gap by proposing a hierarchical classification-based Louvain method for interslice-multicolored networks. In particular, we define a new node centrality measure named <i>Attractivity</i> to describe the inter-slice correlation that incorporates within and across-dimension topological features in order to identify the relevant dimension. Then, after merging dimensions through a frequential aggregation, we group nodes by their relational and attribute similarity, where attributes correspond to their relevant dimensions. We conduct an extensive experimentation using seven real-world multicolored networks, which also includes comparison with state-of-the-art methods. Results show the significance of our proposed method in discovering relevant communities over multiple dimensions and highlight its ability in producing optimal covers with higher values of the multidimensional version of the modularity function.\u0000</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"48 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141510291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-18DOI: 10.1007/s10115-024-02136-0
Maria Helena Franciscatto, Luis Carlos Erpen de Bona, Celio Trois, Marcos Didonet Del FabroFabro, João Carlos Damasceno Lima
Question Answering (QA) systems provide accurate answers to questions; however, they lack the ability to consolidate data from multiple sources, making it difficult to manage complex questions that could be answered with additional data retrieved and integrated on the fly. This integration is inherent to Situational Data Integration (SDI) approaches that deal with dynamic requirements of ad hoc queries that neither traditional database management systems, nor search engines are effective in providing an answer. Thus, if QA systems include SDI characteristics, they could be able to return validated and immediate information for supporting users decisions. For this reason, we surveyed QA-based systems, assessing their capabilities to support SDI features, i.e., Ad hoc Data Retrieval, Data Management, and Timely Decision Support. We also identified patterns concerning these features in the surveyed studies, highlighting them in a timeline that shows the SDI evolution in the QA domain. To the best of your knowledge, this study is precursor in the joint analysis of SDI and QA, showing a combination that can favor the way systems support users. Our analyses show that most of SDI features are rarely addressed in QA systems, and based on that, we discuss directions for further research.
{"title":"Situational Data Integration in Question Answering systems: a survey over two decades","authors":"Maria Helena Franciscatto, Luis Carlos Erpen de Bona, Celio Trois, Marcos Didonet Del FabroFabro, João Carlos Damasceno Lima","doi":"10.1007/s10115-024-02136-0","DOIUrl":"https://doi.org/10.1007/s10115-024-02136-0","url":null,"abstract":"<p>Question Answering (QA) systems provide accurate answers to questions; however, they lack the ability to consolidate data from multiple sources, making it difficult to manage complex questions that could be answered with additional data retrieved and integrated on the fly. This integration is inherent to Situational Data Integration (SDI) approaches that deal with dynamic requirements of ad hoc queries that neither traditional database management systems, nor search engines are effective in providing an answer. Thus, if QA systems include SDI characteristics, they could be able to return validated and immediate information for supporting users decisions. For this reason, we surveyed QA-based systems, assessing their capabilities to support SDI features, i.e., <i>Ad hoc Data Retrieval, Data Management,</i> and <i>Timely Decision Support</i>. We also identified patterns concerning these features in the surveyed studies, highlighting them in a timeline that shows the SDI evolution in the QA domain. To the best of your knowledge, this study is precursor in the joint analysis of SDI and QA, showing a combination that can favor the way systems support users. Our analyses show that most of SDI features are rarely addressed in QA systems, and based on that, we discuss directions for further research.\u0000</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"175 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141510292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-17DOI: 10.1007/s10115-024-02144-0
Su Li, Junlu Wang, Wanting Ji, Ze Chen, Baoyan Song
A favorable business environment plays a crucial role in facilitating the high-quality development of a modern economy. In order to enhance the credibility and efficiency of business environment evaluation, this paper proposes a hybrid storage blockchain-based query efficiency enhancement method for business environment evaluation. Currently, most blockchain systems store block data in key-value databases or file systems with simple semantic descriptions. However, such systems have a single query interface, limited supported query types, and high storage overhead, which leads to low performance. To tackle these challenges, this paper proposes a query efficiency enhancement method based on hybrid storage blockchain. Firstly, data are stored in a hybrid data storage architecture combining on-chain and off-chain. Additionally, relational semantics are added to block data, and three index mechanisms are designed to expedite data access. Subsequently, corresponding query efficiency enhancement algorithms are designed based on the query types that are applicable to the aforementioned three index mechanisms, further refining the query processing. Finally, a comprehensive authentication query is implemented on the blockchain for the light client, and the user can verify the soundness and integrity of the query results. Experimental results on three open datasets show that the method proposed in this paper significantly reduces storage overhead, has shorter query latency for three different query types, and improves retrieval performance and verification efficiency.
{"title":"A hybrid storage blockchain-based query efficiency enhancement method for business environment evaluation","authors":"Su Li, Junlu Wang, Wanting Ji, Ze Chen, Baoyan Song","doi":"10.1007/s10115-024-02144-0","DOIUrl":"https://doi.org/10.1007/s10115-024-02144-0","url":null,"abstract":"<p>A favorable business environment plays a crucial role in facilitating the high-quality development of a modern economy. In order to enhance the credibility and efficiency of business environment evaluation, this paper proposes a hybrid storage blockchain-based query efficiency enhancement method for business environment evaluation. Currently, most blockchain systems store block data in key-value databases or file systems with simple semantic descriptions. However, such systems have a single query interface, limited supported query types, and high storage overhead, which leads to low performance. To tackle these challenges, this paper proposes a query efficiency enhancement method based on hybrid storage blockchain. Firstly, data are stored in a hybrid data storage architecture combining on-chain and off-chain. Additionally, relational semantics are added to block data, and three index mechanisms are designed to expedite data access. Subsequently, corresponding query efficiency enhancement algorithms are designed based on the query types that are applicable to the aforementioned three index mechanisms, further refining the query processing. Finally, a comprehensive authentication query is implemented on the blockchain for the light client, and the user can verify the soundness and integrity of the query results. Experimental results on three open datasets show that the method proposed in this paper significantly reduces storage overhead, has shorter query latency for three different query types, and improves retrieval performance and verification efficiency.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"26 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141510293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-13DOI: 10.1007/s10115-024-02143-1
Farek Lazhar, Benaidja Amira
The high dimensionality of text data is a challenging issue that requires efficient methods to reduce vector space and improve classification accuracy. Existing filter-based methods fail to address the redundancy issue, resulting in the selection of irrelevant and redundant features. Information theory-based methods effectively solve this problem but are not practical for large amounts of data due to their high time complexity. The proposed method, termed semantic similarity-aware feature selection and redundancy removal (SS-FSRR), employs joint mutual information between the pairs of semantically related terms and the class label to capture redundant features. It is predicated on the assumption that semantically related terms imply potentially redundant ones, which can significantly reduce execution time by avoiding sequential search strategies. In this work, we use Word2Vec’s CBOW model to obtain semantic similarity between terms. The efficiency of the SS-FSRR is compared to six state-of-the-art competitive selection methods for categorical data using two traditional classifiers (SVM and NB) and a robust deep learning model (LSTM) on seven datasets with 10-fold cross-validation, where experimental results show that the SS-FSRR outperforms the other methods on most tested datasets with high stability as measured by the Jaccard’s Index.
{"title":"Semantic similarity-aware feature selection and redundancy removal for text classification using joint mutual information","authors":"Farek Lazhar, Benaidja Amira","doi":"10.1007/s10115-024-02143-1","DOIUrl":"https://doi.org/10.1007/s10115-024-02143-1","url":null,"abstract":"<p>The high dimensionality of text data is a challenging issue that requires efficient methods to reduce vector space and improve classification accuracy. Existing filter-based methods fail to address the redundancy issue, resulting in the selection of irrelevant and redundant features. Information theory-based methods effectively solve this problem but are not practical for large amounts of data due to their high time complexity. The proposed method, termed semantic similarity-aware feature selection and redundancy removal (SS-FSRR), employs joint mutual information between the pairs of semantically related terms and the class label to capture redundant features. It is predicated on the assumption that semantically related terms imply potentially redundant ones, which can significantly reduce execution time by avoiding sequential search strategies. In this work, we use Word2Vec’s CBOW model to obtain semantic similarity between terms. The efficiency of the SS-FSRR is compared to six state-of-the-art competitive selection methods for categorical data using two traditional classifiers (SVM and NB) and a robust deep learning model (LSTM) on seven datasets with 10-fold cross-validation, where experimental results show that the SS-FSRR outperforms the other methods on most tested datasets with high stability as measured by the Jaccard’s Index.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"48 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141510379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-04DOI: 10.1007/s10115-024-02122-6
Sheetal Waghchaware, Radhika Joshi
Human activity recognition (HAR) has received the significant attention in the field of security and surveillance due to its high potential for real-time monitoring, identifying the abnormal activities and situational awareness. HAR is able to identify the abnormal activity or behaviour patterns, which may indicate potential security risks. HAR system attempts to automatically provide the information and classification regarding activities performed in the environment by learning the data captured through sensor or video stream. The overview of existing research work in the security and surveillance area, which includes traditional, machine learning (ML) and deep learning (DL) algorithms applicable to field, is presented. The comparative analysis of different HAR techniques based on features, input source, public data sets is presented for quick understanding, and it focuses on the recent trends in HAR field. This review paper provides guidelines for the selection of appropriate algorithm, data set, performance metrics when evaluating HAR systems in the context of security and surveillance. Overall, this review aims to provide a comprehensive understanding of HAR in the field of security and surveillance and to serve as a basis for further research and development.
人类活动识别(HAR)因其在实时监控、识别异常活动和态势感知方面的巨大潜力,在安全和监控领域备受关注。HAR 能够识别异常活动或行为模式,这可能预示着潜在的安全风险。HAR 系统试图通过学习传感器或视频流捕获的数据,自动提供有关环境中活动的信息和分类。本文概述了安防和监控领域的现有研究工作,包括适用于该领域的传统算法、机器学习(ML)算法和深度学习(DL)算法。为了便于快速理解,本文对基于特征、输入源和公共数据集的不同 HAR 技术进行了比较分析,并重点介绍了 HAR 领域的最新趋势。本综述论文为在安防和监控背景下评估 HAR 系统时选择合适的算法、数据集和性能指标提供了指导。总之,本综述旨在提供对安防和监控领域 HAR 的全面了解,并为进一步研究和开发奠定基础。
{"title":"Machine learning and deep learning models for human activity recognition in security and surveillance: a review","authors":"Sheetal Waghchaware, Radhika Joshi","doi":"10.1007/s10115-024-02122-6","DOIUrl":"https://doi.org/10.1007/s10115-024-02122-6","url":null,"abstract":"<p>Human activity recognition (HAR) has received the significant attention in the field of security and surveillance due to its high potential for real-time monitoring, identifying the abnormal activities and situational awareness. HAR is able to identify the abnormal activity or behaviour patterns, which may indicate potential security risks. HAR system attempts to automatically provide the information and classification regarding activities performed in the environment by learning the data captured through sensor or video stream. The overview of existing research work in the security and surveillance area, which includes traditional, machine learning (ML) and deep learning (DL) algorithms applicable to field, is presented. The comparative analysis of different HAR techniques based on features, input source, public data sets is presented for quick understanding, and it focuses on the recent trends in HAR field. This review paper provides guidelines for the selection of appropriate algorithm, data set, performance metrics when evaluating HAR systems in the context of security and surveillance. Overall, this review aims to provide a comprehensive understanding of HAR in the field of security and surveillance and to serve as a basis for further research and development.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"106 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141254266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01DOI: 10.1007/s10115-024-02142-2
Jieming Feng, Zhanhuai Li, Qun Chen, Hailong Liu
For cardinality estimation in DBMS, building multiple local models instead of one global model can usually improve estimation accuracy as well as reducing the effort to label large amounts of training data. Unfortunately, the existing approach of localized learning requires users to explicitly specify which query patterns a local model can handle. Making these decisions is very arduous and error-prone for users; to make things worse, it limits the usability of local models. In this paper, we propose a localized learning solution for cardinality estimation based on XGBoost, which can automatically build an optimal combination of local models given a query workload. It consists of two phases: 1) model initialization; 2) model evolution. In the first phase, it clusters training data into a set of coarse-grained query pattern groups based on pattern similarity and constructs a separate local model for each group. In the second phase, it iteratively merges and splits clusters to identify an optimal combination by reconstructing local models. We formulate the problem of identifying the optimal combination of local models as a combinatorial optimization problem and present an efficient heuristic algorithm, named MMS (Models Merging and Splitting), for its solution due to its exponential complexity. Finally, we validate its performance superiority over the existing learning alternatives by extensive experiments on real datasets.
{"title":"Automating localized learning for cardinality estimation based on XGBoost","authors":"Jieming Feng, Zhanhuai Li, Qun Chen, Hailong Liu","doi":"10.1007/s10115-024-02142-2","DOIUrl":"https://doi.org/10.1007/s10115-024-02142-2","url":null,"abstract":"<p>For cardinality estimation in DBMS, building multiple local models instead of one global model can usually improve estimation accuracy as well as reducing the effort to label large amounts of training data. Unfortunately, the existing approach of localized learning requires users to explicitly specify which query patterns a local model can handle. Making these decisions is very arduous and error-prone for users; to make things worse, it limits the usability of local models. In this paper, we propose a localized learning solution for cardinality estimation based on XGBoost, which can automatically build an optimal combination of local models given a query workload. It consists of two phases: 1) model initialization; 2) model evolution. In the first phase, it clusters training data into a set of coarse-grained query pattern groups based on pattern similarity and constructs a separate local model for each group. In the second phase, it iteratively merges and splits clusters to identify an optimal combination by reconstructing local models. We formulate the problem of identifying the optimal combination of local models as a combinatorial optimization problem and present an efficient heuristic algorithm, named <b>MMS</b> (<b>M</b>odels <b>M</b>erging and <b>S</b>plitting), for its solution due to its exponential complexity. Finally, we validate its performance superiority over the existing learning alternatives by extensive experiments on real datasets.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"13 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141193279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-30DOI: 10.1007/s10115-024-02133-3
Mohavia Ben Amid Sinon, Jules Clement Mba
The common goal for investors is to minimise the risk and maximise the returns on their investments. This is often achieved through diversification, where investors spread their investments across various assets. This study aims to use the MAD-entropy model to minimise the absolute deviation, maximise the mean return, and maximise the Shannon entropy of the portfolio. The MAD model is used because it is a linear programming model, allowing it to resolve large-scale problems and nonnormally distributed data. Entropy is added to the MAD model because it can better diversify the weight of assets in the portfolios. The analysed portfolios consist of cryptocurrencies, stablecoins, and selected world indices such as the SP500 and FTSE obtained from Yahoo Finance. The models found that stablecoins pegged to the US dollar, followed by stablecoins pegged to gold, are better diversifiers for traditional cryptocurrencies and stocks. These results are probably due to their low volatility compared to the other assets. Findings from this study may assist investors since the MAD-Entropy model outperforms the MAD model by providing more significant portfolio mean returns with minimal risk. Therefore, crypto investors can design a well-diversified portfolio using MAD entropy to reduce unsystematic risk. Further research integrating mad entropy with machine learning techniques may improve accuracy and risk management.
{"title":"The analysis of diversification properties of stablecoins through the Shannon entropy measure","authors":"Mohavia Ben Amid Sinon, Jules Clement Mba","doi":"10.1007/s10115-024-02133-3","DOIUrl":"https://doi.org/10.1007/s10115-024-02133-3","url":null,"abstract":"<p>The common goal for investors is to minimise the risk and maximise the returns on their investments. This is often achieved through diversification, where investors spread their investments across various assets. This study aims to use the MAD-entropy model to minimise the absolute deviation, maximise the mean return, and maximise the Shannon entropy of the portfolio. The MAD model is used because it is a linear programming model, allowing it to resolve large-scale problems and nonnormally distributed data. Entropy is added to the MAD model because it can better diversify the weight of assets in the portfolios. The analysed portfolios consist of cryptocurrencies, stablecoins, and selected world indices such as the SP500 and FTSE obtained from Yahoo Finance. The models found that stablecoins pegged to the US dollar, followed by stablecoins pegged to gold, are better diversifiers for traditional cryptocurrencies and stocks. These results are probably due to their low volatility compared to the other assets. Findings from this study may assist investors since the MAD-Entropy model outperforms the MAD model by providing more significant portfolio mean returns with minimal risk. Therefore, crypto investors can design a well-diversified portfolio using MAD entropy to reduce unsystematic risk. Further research integrating mad entropy with machine learning techniques may improve accuracy and risk management.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"13 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141193162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-30DOI: 10.1007/s10115-024-02139-x
Nicolás Leutwyler, Mario Lezoche, Chiara Franciosi, Hervé Panetto, Laurent Teste, Diego Torres
The Internet of Things massive adoption in many industrial areas in addition to the requirement of modern services is posing huge challenges to the field of data mining. Moreover, the semantic interoperability of systems and enterprises requires to operate between many different formats such as ontologies, knowledge graphs, or relational databases, as well as different contexts such as static, dynamic, or real time. Consequently, supporting this semantic interoperability requires a wide range of knowledge discovery methods with different capabilities that answer to the context of distributed architectures (DA). However, to the best of our knowledge there is no general review in recent time about the state of the art of Concept Analysis (CA) and multi-relational data mining (MRDM) methods regarding knowledge discovery in DA considering semantic interoperability. In this work, a systematic literature review on CA and MRDM is conducted, providing a discussion on the characteristics they have according to the papers reviewed, supported by a clusterization technique based on association rules. Moreover, the review allowed the identification of three research gaps toward a more scalable set of methods in the context of DA and heterogeneous sources.
除了对现代服务的要求之外,物联网在许多工业领域的大规模应用也给数据挖掘领域带来了巨大挑战。此外,系统和企业的语义互操作性要求在本体、知识图谱或关系数据库等多种不同格式以及静态、动态或实时等不同上下文之间进行操作。因此,支持这种语义互操作性需要多种知识发现方法,这些方法具有不同的功能,可满足分布式架构(DA)的要求。然而,据我们所知,最近还没有关于概念分析(CA)和多关系数据挖掘(MRDM)方法在考虑到语义互操作性的 DA 中的知识发现方面的最新进展的综述。在这项工作中,对 CA 和 MRDM 进行了系统的文献综述,根据所综述的论文讨论了它们的特点,并辅以基于关联规则的聚类技术。此外,通过综述还确定了三个研究缺口,以便在数据分析和异构源的背景下,找到一套更具可扩展性的方法。
{"title":"Methods for concept analysis and multi-relational data mining: a systematic literature review","authors":"Nicolás Leutwyler, Mario Lezoche, Chiara Franciosi, Hervé Panetto, Laurent Teste, Diego Torres","doi":"10.1007/s10115-024-02139-x","DOIUrl":"https://doi.org/10.1007/s10115-024-02139-x","url":null,"abstract":"<p>The Internet of Things massive adoption in many industrial areas in addition to the requirement of modern services is posing huge challenges to the field of data mining. Moreover, the semantic interoperability of systems and enterprises requires to operate between many different formats such as ontologies, knowledge graphs, or relational databases, as well as different contexts such as static, dynamic, or real time. Consequently, supporting this semantic interoperability requires a wide range of knowledge discovery methods with different capabilities that answer to the context of <i>distributed architectures</i> (DA). However, to the best of our knowledge there is no general review in recent time about the state of the art of Concept Analysis (CA) and multi-relational data mining (MRDM) methods regarding knowledge discovery in DA considering semantic interoperability. In this work, a systematic literature review on CA and MRDM is conducted, providing a discussion on the characteristics they have according to the papers reviewed, supported by a clusterization technique based on association rules. Moreover, the review allowed the identification of three research gaps toward a more scalable set of methods in the context of DA and heterogeneous sources.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"297 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141193750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}