首页 > 最新文献

Information Fusion最新文献

英文 中文
Confidence ensembles: Tabular data classifiers on steroids
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-17 DOI: 10.1016/j.inffus.2025.103126
Tommaso Zoppi , Peter Popov
The astounding amount of research conducted in the last decades provided plenty of Machine Learning (ML) algorithms and models for solving a wide variety of tasks for tabular data. However, classifiers are not always fast, accurate, and robust to unknown inputs, calling for further research in the domain. This paper proposes two classifiers based on confidence ensembles: Confidence Bagging (ConfBag) and Confidence Boosting (ConfBoost). Confidence ensembles build upon a base estimator and create base learners relying on the concept of “confidence” in predictions. They apply to any classification problem: binary and multi-class, supervised or unsupervised, without requiring additional data with respect to those already required by the base estimator. Our experimental evaluation using a range of tabular datasets shows that confidence ensembles, and especially ConfBoost, i) build more accurate classifiers than base estimators alone, even using a limited amount of base learners, ii) are relatively easy to tune as they rely on a limited number of hyper-parameters, and iii) are significantly more robust when dealing with unknown, unexpected input data compared to other tabular data classifiers. Amongst others, confidence ensembles showed potential in going beyond the performance of de-facto standard classifiers for tabular data such as Random Forest and eXtreme Gradient Boosting. ConfBag and ConfBoost are publicly available as PyPI package, compliant with widely used Python frameworks such as scikit-learn and pyod, and require little to no tuning to be exercised on tabular datasets for classification tasks.
{"title":"Confidence ensembles: Tabular data classifiers on steroids","authors":"Tommaso Zoppi ,&nbsp;Peter Popov","doi":"10.1016/j.inffus.2025.103126","DOIUrl":"10.1016/j.inffus.2025.103126","url":null,"abstract":"<div><div>The astounding amount of research conducted in the last decades provided plenty of Machine Learning (ML) algorithms and models for solving a wide variety of tasks for tabular data. However, classifiers are not always fast, accurate, and robust to unknown inputs, calling for further research in the domain. This paper proposes two classifiers based on <em>confidence ensembles</em>: Confidence Bagging (ConfBag) and Confidence Boosting (ConfBoost). Confidence ensembles build upon a base estimator and create base learners relying on the concept of “confidence” in predictions. They apply to any classification problem: binary and multi-class, supervised or unsupervised, without requiring additional data with respect to those already required by the base estimator. Our experimental evaluation using a range of tabular datasets shows that confidence ensembles, and especially ConfBoost, i) build more accurate classifiers than base estimators alone, even using a limited amount of base learners, ii) are relatively easy to tune as they rely on a limited number of hyper-parameters, and iii) are significantly more robust when dealing with unknown, unexpected input data compared to other tabular data classifiers. Amongst others, confidence ensembles showed potential in going beyond the performance of de-facto standard classifiers for tabular data such as Random Forest and eXtreme Gradient Boosting. ConfBag and ConfBoost are publicly available as PyPI package, compliant with widely used Python frameworks such as <em>scikit-learn</em> and <em>pyod</em>, and require little to no tuning to be exercised on tabular datasets for classification tasks.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103126"},"PeriodicalIF":14.7,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143679385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient genome sequence compression via the fusion of MDL-based heuristics 通过融合基于 MDL 的启发式方法高效压缩基因组序列
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-17 DOI: 10.1016/j.inffus.2025.103083
M. Zohaib Nawaz , M. Saqib Nawaz , Philippe Fournier-Viger , Shoaib Nawaz , Jerry Chun-Wei Lin , Vincent S. Tseng
Developing novel methods for the efficient and lossless compression of genome sequences has become a pressing issue in bioinformatics due to the rapidly increasing volume of genomic data. Although recent reference-free genome compressors have shown potential, they often require substantial computational resources, lack interpretability, and fail to fully utilize the inherent sequential characteristics of genome sequences. To overcome these limitations, this paper presents HMG (Heuristic-driven MDL-based Genome sequence compressor), a novel compressor based on the Minimum Description Length (MDL) principle. HMG is designed to identify the optimal set of k-mers (patterns) for the maximal compression of a dataset. By fusing heuristic algorithms—specifically the Genetic Algorithm and Simulated Annealing—with the MDL framework, HMG effectively navigates the extensive search space of k-mer patterns. An experimental comparison with state-of-the-art genome compressors shows that HMG is fast, and achieves a low bit-per-base. Furthermore, the optimal k-mers derived by HMG for compression are employed for genome classification, thereby offering multifunctional advantages over previous genome compressors. HMG is available at https://github.com/MuhammadzohaibNawaz/HMG.
{"title":"Efficient genome sequence compression via the fusion of MDL-based heuristics","authors":"M. Zohaib Nawaz ,&nbsp;M. Saqib Nawaz ,&nbsp;Philippe Fournier-Viger ,&nbsp;Shoaib Nawaz ,&nbsp;Jerry Chun-Wei Lin ,&nbsp;Vincent S. Tseng","doi":"10.1016/j.inffus.2025.103083","DOIUrl":"10.1016/j.inffus.2025.103083","url":null,"abstract":"<div><div>Developing novel methods for the efficient and lossless compression of genome sequences has become a pressing issue in bioinformatics due to the rapidly increasing volume of genomic data. Although recent reference-free genome compressors have shown potential, they often require substantial computational resources, lack interpretability, and fail to fully utilize the inherent sequential characteristics of genome sequences. To overcome these limitations, this paper presents HMG (Heuristic-driven MDL-based Genome sequence compressor), a novel compressor based on the Minimum Description Length (MDL) principle. HMG is designed to identify the optimal set of k-mers (patterns) for the maximal compression of a dataset. By fusing heuristic algorithms—specifically the Genetic Algorithm and Simulated Annealing—with the MDL framework, HMG effectively navigates the extensive search space of k-mer patterns. An experimental comparison with state-of-the-art genome compressors shows that HMG is fast, and achieves a low bit-per-base. Furthermore, the optimal k-mers derived by HMG for compression are employed for genome classification, thereby offering multifunctional advantages over previous genome compressors. HMG is available at <span><span>https://github.com/MuhammadzohaibNawaz/HMG</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103083"},"PeriodicalIF":14.7,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143696106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-view evidential K-NN classification
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-17 DOI: 10.1016/j.inffus.2025.103113
Chaoyu Gong , Zhi-gang Su , Thierry Denoeux
Multi-view classification, aiming to classify samples represented by multiple feature vectors, has become a hot topic in pattern recognition. Although many methods with promising performances have been proposed, their practicality is still limited by the lack of interpretability in some situations. Besides, an appropriate description for the soft labels of multi-view samples is missing, which may degrade the classification performance, especially for those samples located in highly-overlapping areas of multiple vector spaces. To address these issues, we extend the K-nearest neighbor (K-NN) classification algorithm to multi-view learning, under the theoretical framework of evidence theory. The learning process is formalized, firstly, as an optimization problem, where the weights of different views, an adaptive K value of every sample and the distance matrix are determined jointly based on training error. Then, the final classification result is derived according to the philosophy of the evidential K-NN classification algorithm. Detailed ablation studies demonstrate the benefits of the joint learning for adaptive neighborhoods and view weights in a supervised way. Comparative experiments on real-world datasets show that our algorithm performs better than other state-of-the-art methods. A real-world industrial application for condition monitoring shown in Appendix F exemplifies the need to use the evidence theory and the benefits from the unique interpretability of K-NN in detail.
{"title":"Multi-view evidential K-NN classification","authors":"Chaoyu Gong ,&nbsp;Zhi-gang Su ,&nbsp;Thierry Denoeux","doi":"10.1016/j.inffus.2025.103113","DOIUrl":"10.1016/j.inffus.2025.103113","url":null,"abstract":"<div><div>Multi-view classification, aiming to classify samples represented by multiple feature vectors, has become a hot topic in pattern recognition. Although many methods with promising performances have been proposed, their practicality is still limited by the lack of interpretability in some situations. Besides, an appropriate description for the soft labels of multi-view samples is missing, which may degrade the classification performance, especially for those samples located in highly-overlapping areas of multiple vector spaces. To address these issues, we extend the <em>K</em>-nearest neighbor (K-NN) classification algorithm to multi-view learning, under the theoretical framework of evidence theory. The learning process is formalized, firstly, as an optimization problem, where the weights of different views, an adaptive <em>K</em> value of every sample and the distance matrix are determined jointly based on training error. Then, the final classification result is derived according to the philosophy of the evidential K-NN classification algorithm. Detailed ablation studies demonstrate the benefits of the joint learning for adaptive neighborhoods and view weights in a supervised way. Comparative experiments on real-world datasets show that our algorithm performs better than other state-of-the-art methods. A real-world industrial application for condition monitoring shown in Appendix F exemplifies the need to use the evidence theory and the benefits from the unique interpretability of K-NN in detail.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103113"},"PeriodicalIF":14.7,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143679172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FFS-MCC: Fusing approximation and fuzzy uncertainty measures for feature selection with multi-correlation collaboration
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-16 DOI: 10.1016/j.inffus.2025.103101
Jihong Wan , Xiaoping Li , Pengfei Zhang , Hongmei Chen , Xiaocao Ouyang , Tianrui Li , Kay Chen Tan
In many practical applications, the characteristics of data collected from multiple sources are high-dimensional, uncertain, and complexly correlated, which brings great challenges to feature selection. It is much difficult to select suitable features quickly and accurately for high-dimensional and uncertain (usually randomness, fuzziness, and inconsistency) data. Correlations between features are complex and their collaborations are dynamically changing in an uncertain data environment. Since the classification accuracy and the reduction rate of feature selection are conflicting, it is hard to obtain a trade-off between them. For the issues mentioned above, in this work, a feature selection method with multiple correlations and their collaborations is proposed by fusing the approximation and fuzzy uncertainty measures. Specifically, the feature-class correlation is defined by the approximation uncertainty measure while feature-feature correlations are mined by the fuzzy uncertainty measure. Further, a collaborative intensity is calculated by positive or negative effect of multiple correlations. A novel feature evaluation strategy of max-dependent relevance and min-conditional redundancy based on the multi-correlation collaborative intensity is proposed. Experimental results show that the proposed algorithm on joint evaluation outperforms the compared ones because of the considered multi-correlation collaboration between features.
{"title":"FFS-MCC: Fusing approximation and fuzzy uncertainty measures for feature selection with multi-correlation collaboration","authors":"Jihong Wan ,&nbsp;Xiaoping Li ,&nbsp;Pengfei Zhang ,&nbsp;Hongmei Chen ,&nbsp;Xiaocao Ouyang ,&nbsp;Tianrui Li ,&nbsp;Kay Chen Tan","doi":"10.1016/j.inffus.2025.103101","DOIUrl":"10.1016/j.inffus.2025.103101","url":null,"abstract":"<div><div>In many practical applications, the characteristics of data collected from multiple sources are high-dimensional, uncertain, and complexly correlated, which brings great challenges to feature selection. It is much difficult to select suitable features quickly and accurately for high-dimensional and uncertain (usually randomness, fuzziness, and inconsistency) data. Correlations between features are complex and their collaborations are dynamically changing in an uncertain data environment. Since the classification accuracy and the reduction rate of feature selection are conflicting, it is hard to obtain a trade-off between them. For the issues mentioned above, in this work, a feature selection method with multiple correlations and their collaborations is proposed by fusing the approximation and fuzzy uncertainty measures. Specifically, the feature-class correlation is defined by the approximation uncertainty measure while feature-feature correlations are mined by the fuzzy uncertainty measure. Further, a collaborative intensity is calculated by positive or negative effect of multiple correlations. A novel feature evaluation strategy of max-dependent relevance and min-conditional redundancy based on the multi-correlation collaborative intensity is proposed. Experimental results show that the proposed algorithm on joint evaluation outperforms the compared ones because of the considered multi-correlation collaboration between features.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103101"},"PeriodicalIF":14.7,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143679503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Listening and seeing again: Generative error correction for audio-visual speech recognition
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-15 DOI: 10.1016/j.inffus.2025.103077
Rui Liu , Hongyu Yuan , Guanglai Gao , Haizhou Li
Unlike traditional Automatic Speech Recognition (ASR), Audio-Visual Speech Recognition (AVSR) takes audio and visual signals simultaneously to infer the transcription. Recent studies have shown that Large Language Models (LLMs) can be effectively used for Generative Error Correction (GER) in ASR by predicting the best transcription from ASR-generated N-best hypotheses. However, these LLMs lack the ability to simultaneously understand audio and visual, making the GER approach challenging to apply in AVSR. In this work, we propose a novel GER paradigm for AVSR, termed AVGER, that follows the concept of “listening and seeing again”. Specifically, we first use the powerful AVSR system to read the audio and visual signals to get the N-Best hypotheses, and then use the Q-former-based Multimodal Synchronous Encoder to read the audio and visual information again and convert them into an audio and video compression representation respectively that can be understood by LLM. Afterward, the audio-visual compression representation and the N-Best hypothesis together constitute a Cross-modal Prompt to guide the LLM in producing the best transcription. In addition, we also proposed a Multi-Level Consistency Constraint training criterion, including logits-level, utterance-level and representations-level, to improve the correction accuracy while enhancing the interpretability of audio and visual compression representations. The experimental results on the LRS3 dataset show that our method outperforms current mainstream AVSR systems. The proposed AVGER can reduce the Word Error Rate (WER) by 27.59% compared to them. Code and models can be found at: https://github.com/AI-S2-Lab/AVGER.
{"title":"Listening and seeing again: Generative error correction for audio-visual speech recognition","authors":"Rui Liu ,&nbsp;Hongyu Yuan ,&nbsp;Guanglai Gao ,&nbsp;Haizhou Li","doi":"10.1016/j.inffus.2025.103077","DOIUrl":"10.1016/j.inffus.2025.103077","url":null,"abstract":"<div><div>Unlike traditional Automatic Speech Recognition (ASR), Audio-Visual Speech Recognition (AVSR) takes audio and visual signals simultaneously to infer the transcription. Recent studies have shown that Large Language Models (LLMs) can be effectively used for Generative Error Correction (GER) in ASR by predicting the best transcription from ASR-generated N-best hypotheses. However, these LLMs lack the ability to simultaneously understand audio and visual, making the GER approach challenging to apply in AVSR. In this work, we propose a novel GER paradigm for AVSR, termed <strong>AVGER</strong>, that follows the concept of “listening and seeing again”. Specifically, we first use the powerful AVSR system to read the audio and visual signals to get the N-Best hypotheses, and then use the Q-former-based Multimodal Synchronous Encoder to read the audio and visual information again and convert them into an audio and video compression representation respectively that can be understood by LLM. Afterward, the audio-visual compression representation and the N-Best hypothesis together constitute a Cross-modal Prompt to guide the LLM in producing the best transcription. In addition, we also proposed a Multi-Level Consistency Constraint training criterion, including logits-level, utterance-level and representations-level, to improve the correction accuracy while enhancing the interpretability of audio and visual compression representations. The experimental results on the LRS3 dataset show that our method outperforms current mainstream AVSR systems. The proposed AVGER can reduce the Word Error Rate (WER) by 27.59% compared to them. Code and models can be found at: <span><span>https://github.com/AI-S2-Lab/AVGER</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103077"},"PeriodicalIF":14.7,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143635811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AOGN-CZSL: An Attribute- and Object-Guided Network for Compositional Zero-Shot Learning
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-15 DOI: 10.1016/j.inffus.2025.103096
Jing Yang , Xingjiang Ma , Yuankai Wu , Chengjiang Li , Zhidong Su , Ji Xu , Yixiong Feng
Humans are able to readily acquire knowledge about unfamiliar and unknown objects. However, it is extremely challenging for artificial intelligence to achieve this skill. With the rapid development of artificial intelligence, compositional zero-shot learning (CZSL) can generalize unseen compositions by learning prior knowledge of seen attributes and object compositions during training. Although existing composition-based and relationship-based methods show great potential for addressing this challenge, they still exhibit some limitations. Composition-based methods often ignore the intrinsic correlations between attributes, objects, and images, which may lead the model to perform poorly when it is generalized to unseen compositions. Some relationship-based methods can better capture the relationships between attributes, objects, and images but may overlook the interdependencies between distinct attributes and objects. Therefore, the advantages of the composition-based and relationship-based methods are combined, and a new method is proposed for learning attribute and object dependencies (AOGN-CZSL). AOGN-CZSL learns the dependencies between different attributes or objects. It also learns all attributes and objects simultaneously. Different from traditional composition-based methods, it typically address each attribute-object compositions separately. Moreover, unlike general relation-based approaches, this paper adopts learned textual and visual modality features of attributes and objects for attribute scoring and object scoring, respectively. The code and dataset are available at: https://github.com/ybyangjing/AOGN-CZSL.
{"title":"AOGN-CZSL: An Attribute- and Object-Guided Network for Compositional Zero-Shot Learning","authors":"Jing Yang ,&nbsp;Xingjiang Ma ,&nbsp;Yuankai Wu ,&nbsp;Chengjiang Li ,&nbsp;Zhidong Su ,&nbsp;Ji Xu ,&nbsp;Yixiong Feng","doi":"10.1016/j.inffus.2025.103096","DOIUrl":"10.1016/j.inffus.2025.103096","url":null,"abstract":"<div><div>Humans are able to readily acquire knowledge about unfamiliar and unknown objects. However, it is extremely challenging for artificial intelligence to achieve this skill. With the rapid development of artificial intelligence, compositional zero-shot learning (CZSL) can generalize unseen compositions by learning prior knowledge of seen attributes and object compositions during training. Although existing composition-based and relationship-based methods show great potential for addressing this challenge, they still exhibit some limitations. Composition-based methods often ignore the intrinsic correlations between attributes, objects, and images, which may lead the model to perform poorly when it is generalized to unseen compositions. Some relationship-based methods can better capture the relationships between attributes, objects, and images but may overlook the interdependencies between distinct attributes and objects. Therefore, the advantages of the composition-based and relationship-based methods are combined, and a new method is proposed for learning attribute and object dependencies (AOGN-CZSL). AOGN-CZSL learns the dependencies between different attributes or objects. It also learns all attributes and objects simultaneously. Different from traditional composition-based methods, it typically address each attribute-object compositions separately. Moreover, unlike general relation-based approaches, this paper adopts learned textual and visual modality features of attributes and objects for attribute scoring and object scoring, respectively. The code and dataset are available at: <span><span>https://github.com/ybyangjing/AOGN-CZSL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103096"},"PeriodicalIF":14.7,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143679499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical multi-source cues fusion for mono-to-binaural based Audio Deepfake Detection
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-14 DOI: 10.1016/j.inffus.2025.103097
Rui Liu , Jinhua Zhang , Haizhou Li
Audio Deepfake Detection (ADD) targets identifying forgery cues in audio generated by text-to-speech (TTS), voice conversion (VC), voice editing, etc. With the advancement of generative artificial intelligence(AI), ADD has gained increasing attention. In recent years, mono-to-binaural (M2B) conversion has been explored in ADD to uncover forgery cues from a novel perspective. However, M2B-based methods may weaken or overlook unique forgery cues specific to mono, limiting detection performance. To this end, this paper proposes a Hierarchical Multi-Source Cues Fusion network for more accurate ADD (HMSCF-ADD). This approach leverages mono alongside binaural left and right channels as three distinct sources for hierarchical information fusion, it distinguishes common and binaural-specific features while removing redundant information for more effective detection. Specifically, binaural-specific and common features are first extracted and fused as binaural information, followed by dynamic fusion of mono and binaural information to achieve hierarchical fusion. Experiments on ASVspoof2019-LA and ASVspoof2021-PA datasets demonstrate that HMSCF-ADD outperforms all mono-input and M2B-based baselines. Detailed comparisons on fusion strategies and M2B conversion further validate the framework’s effectiveness. The codes are available at: https://github.com/AI-S2-Lab/HMSCF-ADD.
{"title":"Hierarchical multi-source cues fusion for mono-to-binaural based Audio Deepfake Detection","authors":"Rui Liu ,&nbsp;Jinhua Zhang ,&nbsp;Haizhou Li","doi":"10.1016/j.inffus.2025.103097","DOIUrl":"10.1016/j.inffus.2025.103097","url":null,"abstract":"<div><div>Audio Deepfake Detection (ADD) targets identifying forgery cues in audio generated by text-to-speech (TTS), voice conversion (VC), voice editing, etc. With the advancement of generative artificial intelligence(AI), ADD has gained increasing attention. In recent years, mono-to-binaural (M2B) conversion has been explored in ADD to uncover forgery cues from a novel perspective. However, M2B-based methods may weaken or overlook unique forgery cues specific to mono, limiting detection performance. To this end, this paper proposes a <strong>H</strong>ierarchical <strong>M</strong>ulti-<strong>S</strong>ource <strong>C</strong>ues <strong>F</strong>usion network for more accurate <strong>ADD (HMSCF-ADD)</strong>. This approach leverages mono alongside binaural left and right channels as three distinct sources for hierarchical information fusion, it distinguishes common and binaural-specific features while removing redundant information for more effective detection. Specifically, binaural-specific and common features are first extracted and fused as binaural information, followed by dynamic fusion of mono and binaural information to achieve hierarchical fusion. Experiments on ASVspoof2019-LA and ASVspoof2021-PA datasets demonstrate that HMSCF-ADD outperforms all mono-input and M2B-based baselines. Detailed comparisons on fusion strategies and M2B conversion further validate the framework’s effectiveness. The codes are available at: <span><span>https://github.com/AI-S2-Lab/HMSCF-ADD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103097"},"PeriodicalIF":14.7,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143642750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Second FRCSyn-onGoing: Winning solutions and post-challenge analysis to improve face recognition with synthetic data
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-14 DOI: 10.1016/j.inffus.2025.103099
Ivan DeAndres-Tame , Ruben Tolosana , Pietro Melzi , Ruben Vera-Rodriguez , Minchul Kim , Christian Rathgeb , Xiaoming Liu , Luis F. Gomez , Aythami Morales , Julian Fierrez , Javier Ortega-Garcia , Zhizhou Zhong , Yuge Huang , Yuxi Mi , Shouhong Ding , Shuigeng Zhou , Shuai He , Lingzhi Fu , Heng Cong , Rongyu Zhang , David Menotti
Synthetic data is gaining increasing popularity for face recognition technologies, mainly due to the privacy concerns and challenges associated with obtaining real data, including diverse scenarios, quality, and demographic groups, among others. It also offers some advantages over real data, such as the large amount of data that can be generated or the ability to customize it to adapt to specific problem-solving needs. To effectively use such data, face recognition models should also be specifically designed to exploit synthetic data to its fullest potential. In order to promote the proposal of novel Generative AI methods and synthetic data, and investigate the application of synthetic data to better train face recognition systems, we introduce the 2nd FRCSyn-onGoing challenge, based on the 2nd Face Recognition Challenge in the Era of Synthetic Data (FRCSyn), originally launched at CVPR 2024. This is an ongoing challenge that provides researchers with an accessible platform to benchmark (i) the proposal of novel Generative AI methods and synthetic data, and (ii) novel face recognition systems that are specifically proposed to take advantage of synthetic data. We focus on exploring the use of synthetic data both individually and in combination with real data to solve current challenges in face recognition such as demographic bias, domain adaptation, and performance constraints in demanding situations, such as age disparities between training and testing, changes in the pose, or occlusions. Very interesting findings are obtained in this second edition, including a direct comparison with the first one, in which synthetic databases were restricted to DCFace and GANDiffFace.
{"title":"Second FRCSyn-onGoing: Winning solutions and post-challenge analysis to improve face recognition with synthetic data","authors":"Ivan DeAndres-Tame ,&nbsp;Ruben Tolosana ,&nbsp;Pietro Melzi ,&nbsp;Ruben Vera-Rodriguez ,&nbsp;Minchul Kim ,&nbsp;Christian Rathgeb ,&nbsp;Xiaoming Liu ,&nbsp;Luis F. Gomez ,&nbsp;Aythami Morales ,&nbsp;Julian Fierrez ,&nbsp;Javier Ortega-Garcia ,&nbsp;Zhizhou Zhong ,&nbsp;Yuge Huang ,&nbsp;Yuxi Mi ,&nbsp;Shouhong Ding ,&nbsp;Shuigeng Zhou ,&nbsp;Shuai He ,&nbsp;Lingzhi Fu ,&nbsp;Heng Cong ,&nbsp;Rongyu Zhang ,&nbsp;David Menotti","doi":"10.1016/j.inffus.2025.103099","DOIUrl":"10.1016/j.inffus.2025.103099","url":null,"abstract":"<div><div>Synthetic data is gaining increasing popularity for face recognition technologies, mainly due to the privacy concerns and challenges associated with obtaining real data, including diverse scenarios, quality, and demographic groups, among others. It also offers some advantages over real data, such as the large amount of data that can be generated or the ability to customize it to adapt to specific problem-solving needs. To effectively use such data, face recognition models should also be specifically designed to exploit synthetic data to its fullest potential. In order to promote the proposal of novel Generative AI methods and synthetic data, and investigate the application of synthetic data to better train face recognition systems, we introduce the 2<span><math><msup><mrow></mrow><mrow><mtext>nd</mtext></mrow></msup></math></span> FRCSyn-onGoing challenge, based on the 2<span><math><msup><mrow></mrow><mrow><mtext>nd</mtext></mrow></msup></math></span> Face Recognition Challenge in the Era of Synthetic Data (FRCSyn), originally launched at CVPR 2024. This is an ongoing challenge that provides researchers with an accessible platform to benchmark (i) the proposal of novel Generative AI methods and synthetic data, and (ii) novel face recognition systems that are specifically proposed to take advantage of synthetic data. We focus on exploring the use of synthetic data both individually and in combination with real data to solve current challenges in face recognition such as demographic bias, domain adaptation, and performance constraints in demanding situations, such as age disparities between training and testing, changes in the pose, or occlusions. Very interesting findings are obtained in this second edition, including a direct comparison with the first one, in which synthetic databases were restricted to DCFace and GANDiffFace.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103099"},"PeriodicalIF":14.7,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143654728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Span-based syntactic feature fusion for aspect sentiment triplet extraction
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-14 DOI: 10.1016/j.inffus.2025.103078
Guangtao Xu, Zhihao Yang, Bo Xu, Ling Luo, Hongfei Lin
Aspect sentiment triplet extraction (ASTE) is a particularly challenging subtask in aspect-based sentiment analysis. The span-based method is currently one of the mainstream solutions in this area. However, existing span-based methods focus only on semantic information, neglecting syntactic information, which has been proven effective in aspect-based sentiment classification. In this work, we combine syntactic information with the span-based method according to task characteristics and propose a span-based syntactic feature fusion (SSFF) model for ASTE. Firstly, we introduce part-of-speech information to assist span category prediction. Secondly, we introduce dependency distance information to assist sentiment polarity category prediction. By introducing the aforementioned syntactic information, the learning objectives of the first and second stages of the span-based method are clearly distinguished, which effectively improves the performance of the span-based method. We conduct experiments on the widely used public dataset ASTE-V2. The experimental results demonstrate that SSFF significantly improves the performance of the span-based method and outperforms all baseline models, achieving new state-of-the-art performance.
{"title":"Span-based syntactic feature fusion for aspect sentiment triplet extraction","authors":"Guangtao Xu,&nbsp;Zhihao Yang,&nbsp;Bo Xu,&nbsp;Ling Luo,&nbsp;Hongfei Lin","doi":"10.1016/j.inffus.2025.103078","DOIUrl":"10.1016/j.inffus.2025.103078","url":null,"abstract":"<div><div>Aspect sentiment triplet extraction (ASTE) is a particularly challenging subtask in aspect-based sentiment analysis. The span-based method is currently one of the mainstream solutions in this area. However, existing span-based methods focus only on semantic information, neglecting syntactic information, which has been proven effective in aspect-based sentiment classification. In this work, we combine syntactic information with the span-based method according to task characteristics and propose a span-based syntactic feature fusion (SSFF) model for ASTE. Firstly, we introduce part-of-speech information to assist span category prediction. Secondly, we introduce dependency distance information to assist sentiment polarity category prediction. By introducing the aforementioned syntactic information, the learning objectives of the first and second stages of the span-based method are clearly distinguished, which effectively improves the performance of the span-based method. We conduct experiments on the widely used public dataset ASTE-V2. The experimental results demonstrate that SSFF significantly improves the performance of the span-based method and outperforms all baseline models, achieving new state-of-the-art performance.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103078"},"PeriodicalIF":14.7,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143654729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Behavior-Pred: A semantic-enhanced trajectory pre-training framework for motion forecasting
IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-03-13 DOI: 10.1016/j.inffus.2025.103086
Jianxin Shi , Jinhao Chen , Yuandong Wang , Tao Feng , Zhen Yang , Tianyu Wo
Predicting the future movements of dynamic traffic agents is crucial for autonomous systems. Effectively understanding the behavioral patterns of traffic agents is key to accurately predicting their future movements.
Inspired by the success of the pre-training and fine-tuning paradigm in artificial intelligence, we develop a semantic-enhanced trajectory pre-training framework for motion forecasting in the autonomous driving domain, named Behavior-Pred. In detail, we design two kinds of tasks during the pre-training phase: fine-grained reconstruction and coarse-grained contrastive tasks, to learn a better representation of both historical and future behaviors, as well as their pattern consistency. In fine-grained reconstruction learning, we utilize a time-dimensional masking strategy based on the timestep level, which reserves historical and future patterns compared to agent-based masking. In coarse-grained contrastive learning, we design a similarity-based loss function to grasp the relationship/consistency between history patterns and the future. Overall, Behavior-Pred learns more comprehensive behavioral semantics via multi-granularity pre-training tasks. Experimental results demonstrate that our framework outperforms various baselines.
{"title":"Behavior-Pred: A semantic-enhanced trajectory pre-training framework for motion forecasting","authors":"Jianxin Shi ,&nbsp;Jinhao Chen ,&nbsp;Yuandong Wang ,&nbsp;Tao Feng ,&nbsp;Zhen Yang ,&nbsp;Tianyu Wo","doi":"10.1016/j.inffus.2025.103086","DOIUrl":"10.1016/j.inffus.2025.103086","url":null,"abstract":"<div><div>Predicting the future movements of dynamic traffic agents is crucial for autonomous systems. Effectively understanding the behavioral patterns of traffic agents is key to accurately predicting their future movements.</div><div>Inspired by the success of the pre-training and fine-tuning paradigm in artificial intelligence, we develop a semantic-enhanced trajectory pre-training framework for motion forecasting in the autonomous driving domain, named <strong>Behavior-Pred</strong>. In detail, we design two kinds of tasks during the pre-training phase: fine-grained reconstruction and coarse-grained contrastive tasks, to learn a better representation of both historical and future behaviors, as well as their pattern consistency. In fine-grained reconstruction learning, we utilize a time-dimensional masking strategy based on the timestep level, which reserves historical and future patterns compared to agent-based masking. In coarse-grained contrastive learning, we design a similarity-based loss function to grasp the relationship/consistency between history patterns and the future. Overall, Behavior-Pred learns more comprehensive behavioral semantics via multi-granularity pre-training tasks. Experimental results demonstrate that our framework outperforms various baselines.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103086"},"PeriodicalIF":14.7,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143679386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information Fusion
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1