Pub Date : 2024-07-22DOI: 10.1007/s10115-024-02166-8
Bin Feng, Shulan Ruan, Likang Wu, Huijie Liu, Kai Zhang, Kun Zhang, Qi Liu, Enhong Chen
Knowledge-based visual question answering (KB-VQA) requires to answer questions according to the given image with the assistance of external knowledge. Recently, researchers generally tend to design different multimodal networks to extract visual and text semantic features for KB-VQA. Despite the significant progress, ‘caption’ information, a textual form of image semantics, which can also provide visually non-obvious cues for the reasoning process, is often ignored. In this paper, we introduce a novel framework, the Knowledge Based Caption Enhanced Net (KBCEN), designed to integrate caption information into the KB-VQA process. Specifically, for better knowledge reasoning, we make utilization of caption information comprehensively from both explicit and implicit perspectives. For the former, we explicitly link caption entities to knowledge graph together with object tags and question entities. While for the latter, a pre-trained multimodal BERT with natural implicit knowledge is leveraged to co-represent caption tokens, object regions as well as question tokens. Moreover, we develop a mutual correlation module to discern intricate correlations between explicit and implicit representations, thereby facilitating knowledge integration and final prediction. We conduct extensive experiments on three publicly available datasets (i.e., OK-VQA v1.0, OK-VQA v1.1 and A-OKVQA). Both quantitative and qualitative results demonstrate the superiority and rationality of our proposed KBCEN.
{"title":"Caption matters: a new perspective for knowledge-based visual question answering","authors":"Bin Feng, Shulan Ruan, Likang Wu, Huijie Liu, Kai Zhang, Kun Zhang, Qi Liu, Enhong Chen","doi":"10.1007/s10115-024-02166-8","DOIUrl":"https://doi.org/10.1007/s10115-024-02166-8","url":null,"abstract":"<p>Knowledge-based visual question answering (KB-VQA) requires to answer questions according to the given image with the assistance of external knowledge. Recently, researchers generally tend to design different multimodal networks to extract visual and text semantic features for KB-VQA. Despite the significant progress, ‘caption’ information, a textual form of image semantics, which can also provide visually non-obvious cues for the reasoning process, is often ignored. In this paper, we introduce a novel framework, the Knowledge Based Caption Enhanced Net (KBCEN), designed to integrate caption information into the KB-VQA process. Specifically, for better knowledge reasoning, we make utilization of caption information comprehensively from both explicit and implicit perspectives. For the former, we explicitly link caption entities to knowledge graph together with object tags and question entities. While for the latter, a pre-trained multimodal BERT with natural implicit knowledge is leveraged to co-represent caption tokens, object regions as well as question tokens. Moreover, we develop a mutual correlation module to discern intricate correlations between explicit and implicit representations, thereby facilitating knowledge integration and final prediction. We conduct extensive experiments on three publicly available datasets (i.e., OK-VQA v1.0, OK-VQA v1.1 and A-OKVQA). Both quantitative and qualitative results demonstrate the superiority and rationality of our proposed KBCEN.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141743155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-22DOI: 10.1007/s10115-024-02150-2
Doaa Mohey El-Din, Aboul Ella Hassanein, Ehab E. Hassanien
There is a growing interest in multidisciplinary research in multimodal synthesis technology to stimulate diversity of modal interpretation in different application contexts. The real requirement for modality diversity across multiple contextual representation fields is due to the conflicting nature of data in multitarget sensors, which introduces other obstacles including ambiguity, uncertainty, imbalance, and redundancy in multiobject classification. This paper proposes a new adaptive and late multimodal fusion framework using evidence-enhanced deep learning guided by Dempster–Shafer theory and concatenation strategy to interpret multiple modalities and contextual representations that achieves a bigger number of features for interpreting unstructured multimodality types based on late fusion. Furthermore, it is designed based on a multifusion learning solution to solve the modality and context-based fusion that leads to improving decisions. It creates a fully automated selective deep neural network and constructs an adaptive fusion model for all modalities based on the input type. The proposed framework is implemented based on five layers which are a software-defined fusion layer, a preprocessing layer, a dynamic classification layer, an adaptive fusion layer, and an evaluation layer. The framework is formalizing the modality/context-based problem into an adaptive multifusion framework based on a late fusion level. The particle swarm optimization was used in multiple smart context systems to improve the final classification layer with the best optimal parameters that tracing 30 changes in hyperparameters of deep learning training models. This paper applies multiple experimental with multimodalities inputs in multicontext to show the behaviors the proposed multifusion framework. Experimental results on four challenging datasets including military, agricultural, COIVD-19, and food health data provide impressive results compared to other state-of-the-art multiple fusion models. The main strengths of proposed adaptive fusion framework can classify multiobjects with reduced features automatically and solves the fused data ambiguity and inconsistent data. In addition, it can increase the certainty and reduce the redundancy data with improving the unbalancing data. The experimental results of multimodalities experiment in multicontext using the proposed multimodal fusion framework achieve 98.45% of accuracy.
{"title":"An adaptive and late multifusion framework in contextual representation based on evidential deep learning and Dempster–Shafer theory","authors":"Doaa Mohey El-Din, Aboul Ella Hassanein, Ehab E. Hassanien","doi":"10.1007/s10115-024-02150-2","DOIUrl":"https://doi.org/10.1007/s10115-024-02150-2","url":null,"abstract":"<p>There is a growing interest in multidisciplinary research in multimodal synthesis technology to stimulate diversity of modal interpretation in different application contexts. The real requirement for modality diversity across multiple contextual representation fields is due to the conflicting nature of data in multitarget sensors, which introduces other obstacles including ambiguity, uncertainty, imbalance, and redundancy in multiobject classification. This paper proposes a new adaptive and late multimodal fusion framework using evidence-enhanced deep learning guided by Dempster–Shafer theory and concatenation strategy to interpret multiple modalities and contextual representations that achieves a bigger number of features for interpreting unstructured multimodality types based on late fusion. Furthermore, it is designed based on a multifusion learning solution to solve the modality and context-based fusion that leads to improving decisions. It creates a fully automated selective deep neural network and constructs an adaptive fusion model for all modalities based on the input type. The proposed framework is implemented based on five layers which are a software-defined fusion layer, a preprocessing layer, a dynamic classification layer, an adaptive fusion layer, and an evaluation layer. The framework is formalizing the modality/context-based problem into an adaptive multifusion framework based on a late fusion level. The particle swarm optimization was used in multiple smart context systems to improve the final classification layer with the best optimal parameters that tracing 30 changes in hyperparameters of deep learning training models. This paper applies multiple experimental with multimodalities inputs in multicontext to show the behaviors the proposed multifusion framework. Experimental results on four challenging datasets including military, agricultural, COIVD-19, and food health data provide impressive results compared to other state-of-the-art multiple fusion models. The main strengths of proposed adaptive fusion framework can classify multiobjects with reduced features automatically and solves the fused data ambiguity and inconsistent data. In addition, it can increase the certainty and reduce the redundancy data with improving the unbalancing data. The experimental results of multimodalities experiment in multicontext using the proposed multimodal fusion framework achieve 98.45% of accuracy.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141743298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Currently, online learning systems in the education sector are widely used and have become a new trend, generating large amounts of educational data based on students’ activities. In order to improve online learning experiences, sophisticated data analysis techniques are required. Adding value to E-learning platforms through the efficient processing of big learning data is possible with Big Data. With time, the E-learning management system’s repository expands and becomes a rich source of learning materials. Subject matter experts may benefit from using E-learning resources to reuse previously created content when creating online content. In addition, it might be beneficial to the students by giving them access to the pertinent documents for achieving their learning objectives effectively. An improved intelligent information retrieval and reliable storage (OIIRS) scheme is proposed for E-learning using hybrid deep learning techniques. Assume that relevant E-learning documents are stored in cloud and dynamically updated according to users’ status. First, we present a highly robust and lightweight crypto, i.e., optimized CLEFIA, for securely storing data in local repositories that improve the reliability of data loading. We develop an improved butterfly optimization algorithm to provide an optimal solution for CLEFIA that selects private keys. In addition, a hybrid deep learning method, i.e., backward diagonal search-based deep recurrent neural network (BD-DRNN) is introduced for optimal intelligent information retrieval based on keywords rather than semantics. Here, feature extraction and key feature matching are performed by the modified Hungarian optimization (MHO) algorithm that improves searching accuracy. Finally, we test our proposed OIIRS scheme with different benchmark datasets and use simulation results to test the performance.
{"title":"Optimal intelligent information retrieval and reliable storage scheme for cloud environment and E-learning big data analytics","authors":"Chandrasekar Venkatachalam, Shanmugavalli Venkatachalam","doi":"10.1007/s10115-024-02152-0","DOIUrl":"https://doi.org/10.1007/s10115-024-02152-0","url":null,"abstract":"<p>Currently, online learning systems in the education sector are widely used and have become a new trend, generating large amounts of educational data based on students’ activities. In order to improve online learning experiences, sophisticated data analysis techniques are required. Adding value to E-learning platforms through the efficient processing of big learning data is possible with Big Data. With time, the E-learning management system’s repository expands and becomes a rich source of learning materials. Subject matter experts may benefit from using E-learning resources to reuse previously created content when creating online content. In addition, it might be beneficial to the students by giving them access to the pertinent documents for achieving their learning objectives effectively. An improved intelligent information retrieval and reliable storage (OIIRS) scheme is proposed for E-learning using hybrid deep learning techniques. Assume that relevant E-learning documents are stored in cloud and dynamically updated according to users’ status. First, we present a highly robust and lightweight crypto, i.e., optimized CLEFIA, for securely storing data in local repositories that improve the reliability of data loading. We develop an improved butterfly optimization algorithm to provide an optimal solution for CLEFIA that selects private keys. In addition, a hybrid deep learning method, i.e., backward diagonal search-based deep recurrent neural network (BD-DRNN) is introduced for optimal intelligent information retrieval based on keywords rather than semantics. Here, feature extraction and key feature matching are performed by the modified Hungarian optimization (MHO) algorithm that improves searching accuracy. Finally, we test our proposed OIIRS scheme with different benchmark datasets and use simulation results to test the performance.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141783057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-18DOI: 10.1007/s10115-024-02182-8
Aditya Kumar, Jainath Yadav
This paper introduces a novel approach for feature set partitioning in multi-view ensemble learning (MVEL) utilizing the minimum spanning tree clustering (MSTC) algorithm. The proposed method aims to generate informative and diverse feature subsets to enhance classification performance in the MVEL framework. The MSTC algorithm constructs a minimum spanning tree based on correlation measures and divides features into non-overlapping clusters, representing distinct views used to improve ensemble learning. We evaluate the effectiveness of the MSTC-based MVEL framework on ten high-dimensional datasets using support vector machines. Results indicate significant improvements in classification performance compared to single-view learning and other cutting-edge feature partitioning approaches. Statistical analysis confirms the enhanced classification accuracy achieved by the proposed MVEL framework, reaching a level of accuracy that is both reliable and competitive.
{"title":"Minimum spanning tree clustering approach for effective feature partitioning in multi-view ensemble learning","authors":"Aditya Kumar, Jainath Yadav","doi":"10.1007/s10115-024-02182-8","DOIUrl":"https://doi.org/10.1007/s10115-024-02182-8","url":null,"abstract":"<p>This paper introduces a novel approach for feature set partitioning in multi-view ensemble learning (MVEL) utilizing the minimum spanning tree clustering (MSTC) algorithm. The proposed method aims to generate informative and diverse feature subsets to enhance classification performance in the MVEL framework. The MSTC algorithm constructs a minimum spanning tree based on correlation measures and divides features into non-overlapping clusters, representing distinct views used to improve ensemble learning. We evaluate the effectiveness of the MSTC-based MVEL framework on ten high-dimensional datasets using support vector machines. Results indicate significant improvements in classification performance compared to single-view learning and other cutting-edge feature partitioning approaches. Statistical analysis confirms the enhanced classification accuracy achieved by the proposed MVEL framework, reaching a level of accuracy that is both reliable and competitive.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141743299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-17DOI: 10.1007/s10115-024-02172-w
Angelica Liguori, Ettore Ritacco, Francesco Sergio Pisani, Giuseppe Manco
The capability to devise robust outlier and anomaly detection tools is an important research topic in machine learning and data mining. Recent techniques have been focusing on reinforcing detection with sophisticated data generation tools that successfully refine the learning process by generating variants of the data that expand the recognition capabilities of the outlier detector. In this paper, we propose (textrm{ARN}), a semi-supervised anomaly detection and generation method based on adversarial counterfactual reconstruction. (textrm{ARN}) exploits a regularized autoencoder to optimize the reconstruction of variants of normal examples with minimal differences that are recognized as outliers. The combination of regularization and counterfactual reconstruction helps to stabilize the learning process, which results in both realistic outlier generation and substantially extended detection capability. In fact, the counterfactual generation enables a smart exploration of the search space by successfully relating small changes in all the actual samples from the true distribution to high anomaly scores. Experiments on several benchmark datasets show that our model improves the current state of the art by valuable margins because of its ability to model the true boundaries of the data manifold.
{"title":"Robust anomaly detection via adversarial counterfactual generation","authors":"Angelica Liguori, Ettore Ritacco, Francesco Sergio Pisani, Giuseppe Manco","doi":"10.1007/s10115-024-02172-w","DOIUrl":"https://doi.org/10.1007/s10115-024-02172-w","url":null,"abstract":"<p>The capability to devise robust outlier and anomaly detection tools is an important research topic in machine learning and data mining. Recent techniques have been focusing on reinforcing detection with sophisticated data generation tools that successfully refine the learning process by generating variants of the data that expand the recognition capabilities of the outlier detector. In this paper, we propose <span>(textrm{ARN})</span>, a semi-supervised anomaly detection and generation method based on adversarial counterfactual reconstruction. <span>(textrm{ARN})</span> exploits a regularized autoencoder to optimize the reconstruction of variants of normal examples with minimal differences that are recognized as outliers. The combination of regularization and counterfactual reconstruction helps to stabilize the learning process, which results in both realistic outlier generation and substantially extended detection capability. In fact, the counterfactual generation enables a smart exploration of the search space by successfully relating small changes in all the actual samples from the true distribution to high anomaly scores. Experiments on several benchmark datasets show that our model improves the current state of the art by valuable margins because of its ability to model the true boundaries of the data manifold.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":null,"pages":null},"PeriodicalIF":2.7,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141717792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-16DOI: 10.1007/s10115-024-02175-7
Jiezhong He, Yixin Chen, Zhouyang Liu, Dongsheng Li
{"title":"Optimizing subgraph retrieval and matching with an efficient indexing scheme","authors":"Jiezhong He, Yixin Chen, Zhouyang Liu, Dongsheng Li","doi":"10.1007/s10115-024-02175-7","DOIUrl":"https://doi.org/10.1007/s10115-024-02175-7","url":null,"abstract":"","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141642985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-16DOI: 10.1007/s10115-024-02157-9
G. Bharathi Mohan, R. Prasanna Kumar, P. Vishal Krishh, A. Keerthinathan, G. Lavanya, Meka Kavya Uma Meghana, Sheba Sulthana, Srinath Doss
{"title":"Correction: An analysis of large language models: their impact and potential applications","authors":"G. Bharathi Mohan, R. Prasanna Kumar, P. Vishal Krishh, A. Keerthinathan, G. Lavanya, Meka Kavya Uma Meghana, Sheba Sulthana, Srinath Doss","doi":"10.1007/s10115-024-02157-9","DOIUrl":"https://doi.org/10.1007/s10115-024-02157-9","url":null,"abstract":"","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141643486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-16DOI: 10.1007/s10115-024-02158-8
Boshen Xia, Jiwei Qin, Lu Han, Aohua Gao, Chao Ma
{"title":"Knowledge filter contrastive learning for recommendation","authors":"Boshen Xia, Jiwei Qin, Lu Han, Aohua Gao, Chao Ma","doi":"10.1007/s10115-024-02158-8","DOIUrl":"https://doi.org/10.1007/s10115-024-02158-8","url":null,"abstract":"","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141643203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}