Pub Date : 2025-03-17DOI: 10.1016/j.inffus.2025.103126
Tommaso Zoppi , Peter Popov
The astounding amount of research conducted in the last decades provided plenty of Machine Learning (ML) algorithms and models for solving a wide variety of tasks for tabular data. However, classifiers are not always fast, accurate, and robust to unknown inputs, calling for further research in the domain. This paper proposes two classifiers based on confidence ensembles: Confidence Bagging (ConfBag) and Confidence Boosting (ConfBoost). Confidence ensembles build upon a base estimator and create base learners relying on the concept of “confidence” in predictions. They apply to any classification problem: binary and multi-class, supervised or unsupervised, without requiring additional data with respect to those already required by the base estimator. Our experimental evaluation using a range of tabular datasets shows that confidence ensembles, and especially ConfBoost, i) build more accurate classifiers than base estimators alone, even using a limited amount of base learners, ii) are relatively easy to tune as they rely on a limited number of hyper-parameters, and iii) are significantly more robust when dealing with unknown, unexpected input data compared to other tabular data classifiers. Amongst others, confidence ensembles showed potential in going beyond the performance of de-facto standard classifiers for tabular data such as Random Forest and eXtreme Gradient Boosting. ConfBag and ConfBoost are publicly available as PyPI package, compliant with widely used Python frameworks such as scikit-learn and pyod, and require little to no tuning to be exercised on tabular datasets for classification tasks.
{"title":"Confidence ensembles: Tabular data classifiers on steroids","authors":"Tommaso Zoppi , Peter Popov","doi":"10.1016/j.inffus.2025.103126","DOIUrl":"10.1016/j.inffus.2025.103126","url":null,"abstract":"<div><div>The astounding amount of research conducted in the last decades provided plenty of Machine Learning (ML) algorithms and models for solving a wide variety of tasks for tabular data. However, classifiers are not always fast, accurate, and robust to unknown inputs, calling for further research in the domain. This paper proposes two classifiers based on <em>confidence ensembles</em>: Confidence Bagging (ConfBag) and Confidence Boosting (ConfBoost). Confidence ensembles build upon a base estimator and create base learners relying on the concept of “confidence” in predictions. They apply to any classification problem: binary and multi-class, supervised or unsupervised, without requiring additional data with respect to those already required by the base estimator. Our experimental evaluation using a range of tabular datasets shows that confidence ensembles, and especially ConfBoost, i) build more accurate classifiers than base estimators alone, even using a limited amount of base learners, ii) are relatively easy to tune as they rely on a limited number of hyper-parameters, and iii) are significantly more robust when dealing with unknown, unexpected input data compared to other tabular data classifiers. Amongst others, confidence ensembles showed potential in going beyond the performance of de-facto standard classifiers for tabular data such as Random Forest and eXtreme Gradient Boosting. ConfBag and ConfBoost are publicly available as PyPI package, compliant with widely used Python frameworks such as <em>scikit-learn</em> and <em>pyod</em>, and require little to no tuning to be exercised on tabular datasets for classification tasks.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103126"},"PeriodicalIF":14.7,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143679385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-17DOI: 10.1016/j.inffus.2025.103083
M. Zohaib Nawaz , M. Saqib Nawaz , Philippe Fournier-Viger , Shoaib Nawaz , Jerry Chun-Wei Lin , Vincent S. Tseng
Developing novel methods for the efficient and lossless compression of genome sequences has become a pressing issue in bioinformatics due to the rapidly increasing volume of genomic data. Although recent reference-free genome compressors have shown potential, they often require substantial computational resources, lack interpretability, and fail to fully utilize the inherent sequential characteristics of genome sequences. To overcome these limitations, this paper presents HMG (Heuristic-driven MDL-based Genome sequence compressor), a novel compressor based on the Minimum Description Length (MDL) principle. HMG is designed to identify the optimal set of k-mers (patterns) for the maximal compression of a dataset. By fusing heuristic algorithms—specifically the Genetic Algorithm and Simulated Annealing—with the MDL framework, HMG effectively navigates the extensive search space of k-mer patterns. An experimental comparison with state-of-the-art genome compressors shows that HMG is fast, and achieves a low bit-per-base. Furthermore, the optimal k-mers derived by HMG for compression are employed for genome classification, thereby offering multifunctional advantages over previous genome compressors. HMG is available at https://github.com/MuhammadzohaibNawaz/HMG.
{"title":"Efficient genome sequence compression via the fusion of MDL-based heuristics","authors":"M. Zohaib Nawaz , M. Saqib Nawaz , Philippe Fournier-Viger , Shoaib Nawaz , Jerry Chun-Wei Lin , Vincent S. Tseng","doi":"10.1016/j.inffus.2025.103083","DOIUrl":"10.1016/j.inffus.2025.103083","url":null,"abstract":"<div><div>Developing novel methods for the efficient and lossless compression of genome sequences has become a pressing issue in bioinformatics due to the rapidly increasing volume of genomic data. Although recent reference-free genome compressors have shown potential, they often require substantial computational resources, lack interpretability, and fail to fully utilize the inherent sequential characteristics of genome sequences. To overcome these limitations, this paper presents HMG (Heuristic-driven MDL-based Genome sequence compressor), a novel compressor based on the Minimum Description Length (MDL) principle. HMG is designed to identify the optimal set of k-mers (patterns) for the maximal compression of a dataset. By fusing heuristic algorithms—specifically the Genetic Algorithm and Simulated Annealing—with the MDL framework, HMG effectively navigates the extensive search space of k-mer patterns. An experimental comparison with state-of-the-art genome compressors shows that HMG is fast, and achieves a low bit-per-base. Furthermore, the optimal k-mers derived by HMG for compression are employed for genome classification, thereby offering multifunctional advantages over previous genome compressors. HMG is available at <span><span>https://github.com/MuhammadzohaibNawaz/HMG</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103083"},"PeriodicalIF":14.7,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143696106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-17DOI: 10.1016/j.inffus.2025.103113
Chaoyu Gong , Zhi-gang Su , Thierry Denoeux
Multi-view classification, aiming to classify samples represented by multiple feature vectors, has become a hot topic in pattern recognition. Although many methods with promising performances have been proposed, their practicality is still limited by the lack of interpretability in some situations. Besides, an appropriate description for the soft labels of multi-view samples is missing, which may degrade the classification performance, especially for those samples located in highly-overlapping areas of multiple vector spaces. To address these issues, we extend the K-nearest neighbor (K-NN) classification algorithm to multi-view learning, under the theoretical framework of evidence theory. The learning process is formalized, firstly, as an optimization problem, where the weights of different views, an adaptive K value of every sample and the distance matrix are determined jointly based on training error. Then, the final classification result is derived according to the philosophy of the evidential K-NN classification algorithm. Detailed ablation studies demonstrate the benefits of the joint learning for adaptive neighborhoods and view weights in a supervised way. Comparative experiments on real-world datasets show that our algorithm performs better than other state-of-the-art methods. A real-world industrial application for condition monitoring shown in Appendix F exemplifies the need to use the evidence theory and the benefits from the unique interpretability of K-NN in detail.
{"title":"Multi-view evidential K-NN classification","authors":"Chaoyu Gong , Zhi-gang Su , Thierry Denoeux","doi":"10.1016/j.inffus.2025.103113","DOIUrl":"10.1016/j.inffus.2025.103113","url":null,"abstract":"<div><div>Multi-view classification, aiming to classify samples represented by multiple feature vectors, has become a hot topic in pattern recognition. Although many methods with promising performances have been proposed, their practicality is still limited by the lack of interpretability in some situations. Besides, an appropriate description for the soft labels of multi-view samples is missing, which may degrade the classification performance, especially for those samples located in highly-overlapping areas of multiple vector spaces. To address these issues, we extend the <em>K</em>-nearest neighbor (K-NN) classification algorithm to multi-view learning, under the theoretical framework of evidence theory. The learning process is formalized, firstly, as an optimization problem, where the weights of different views, an adaptive <em>K</em> value of every sample and the distance matrix are determined jointly based on training error. Then, the final classification result is derived according to the philosophy of the evidential K-NN classification algorithm. Detailed ablation studies demonstrate the benefits of the joint learning for adaptive neighborhoods and view weights in a supervised way. Comparative experiments on real-world datasets show that our algorithm performs better than other state-of-the-art methods. A real-world industrial application for condition monitoring shown in Appendix F exemplifies the need to use the evidence theory and the benefits from the unique interpretability of K-NN in detail.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103113"},"PeriodicalIF":14.7,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143679172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-16DOI: 10.1016/j.inffus.2025.103101
Jihong Wan , Xiaoping Li , Pengfei Zhang , Hongmei Chen , Xiaocao Ouyang , Tianrui Li , Kay Chen Tan
In many practical applications, the characteristics of data collected from multiple sources are high-dimensional, uncertain, and complexly correlated, which brings great challenges to feature selection. It is much difficult to select suitable features quickly and accurately for high-dimensional and uncertain (usually randomness, fuzziness, and inconsistency) data. Correlations between features are complex and their collaborations are dynamically changing in an uncertain data environment. Since the classification accuracy and the reduction rate of feature selection are conflicting, it is hard to obtain a trade-off between them. For the issues mentioned above, in this work, a feature selection method with multiple correlations and their collaborations is proposed by fusing the approximation and fuzzy uncertainty measures. Specifically, the feature-class correlation is defined by the approximation uncertainty measure while feature-feature correlations are mined by the fuzzy uncertainty measure. Further, a collaborative intensity is calculated by positive or negative effect of multiple correlations. A novel feature evaluation strategy of max-dependent relevance and min-conditional redundancy based on the multi-correlation collaborative intensity is proposed. Experimental results show that the proposed algorithm on joint evaluation outperforms the compared ones because of the considered multi-correlation collaboration between features.
{"title":"FFS-MCC: Fusing approximation and fuzzy uncertainty measures for feature selection with multi-correlation collaboration","authors":"Jihong Wan , Xiaoping Li , Pengfei Zhang , Hongmei Chen , Xiaocao Ouyang , Tianrui Li , Kay Chen Tan","doi":"10.1016/j.inffus.2025.103101","DOIUrl":"10.1016/j.inffus.2025.103101","url":null,"abstract":"<div><div>In many practical applications, the characteristics of data collected from multiple sources are high-dimensional, uncertain, and complexly correlated, which brings great challenges to feature selection. It is much difficult to select suitable features quickly and accurately for high-dimensional and uncertain (usually randomness, fuzziness, and inconsistency) data. Correlations between features are complex and their collaborations are dynamically changing in an uncertain data environment. Since the classification accuracy and the reduction rate of feature selection are conflicting, it is hard to obtain a trade-off between them. For the issues mentioned above, in this work, a feature selection method with multiple correlations and their collaborations is proposed by fusing the approximation and fuzzy uncertainty measures. Specifically, the feature-class correlation is defined by the approximation uncertainty measure while feature-feature correlations are mined by the fuzzy uncertainty measure. Further, a collaborative intensity is calculated by positive or negative effect of multiple correlations. A novel feature evaluation strategy of max-dependent relevance and min-conditional redundancy based on the multi-correlation collaborative intensity is proposed. Experimental results show that the proposed algorithm on joint evaluation outperforms the compared ones because of the considered multi-correlation collaboration between features.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103101"},"PeriodicalIF":14.7,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143679503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-15DOI: 10.1016/j.inffus.2025.103077
Rui Liu , Hongyu Yuan , Guanglai Gao , Haizhou Li
Unlike traditional Automatic Speech Recognition (ASR), Audio-Visual Speech Recognition (AVSR) takes audio and visual signals simultaneously to infer the transcription. Recent studies have shown that Large Language Models (LLMs) can be effectively used for Generative Error Correction (GER) in ASR by predicting the best transcription from ASR-generated N-best hypotheses. However, these LLMs lack the ability to simultaneously understand audio and visual, making the GER approach challenging to apply in AVSR. In this work, we propose a novel GER paradigm for AVSR, termed AVGER, that follows the concept of “listening and seeing again”. Specifically, we first use the powerful AVSR system to read the audio and visual signals to get the N-Best hypotheses, and then use the Q-former-based Multimodal Synchronous Encoder to read the audio and visual information again and convert them into an audio and video compression representation respectively that can be understood by LLM. Afterward, the audio-visual compression representation and the N-Best hypothesis together constitute a Cross-modal Prompt to guide the LLM in producing the best transcription. In addition, we also proposed a Multi-Level Consistency Constraint training criterion, including logits-level, utterance-level and representations-level, to improve the correction accuracy while enhancing the interpretability of audio and visual compression representations. The experimental results on the LRS3 dataset show that our method outperforms current mainstream AVSR systems. The proposed AVGER can reduce the Word Error Rate (WER) by 27.59% compared to them. Code and models can be found at: https://github.com/AI-S2-Lab/AVGER.
{"title":"Listening and seeing again: Generative error correction for audio-visual speech recognition","authors":"Rui Liu , Hongyu Yuan , Guanglai Gao , Haizhou Li","doi":"10.1016/j.inffus.2025.103077","DOIUrl":"10.1016/j.inffus.2025.103077","url":null,"abstract":"<div><div>Unlike traditional Automatic Speech Recognition (ASR), Audio-Visual Speech Recognition (AVSR) takes audio and visual signals simultaneously to infer the transcription. Recent studies have shown that Large Language Models (LLMs) can be effectively used for Generative Error Correction (GER) in ASR by predicting the best transcription from ASR-generated N-best hypotheses. However, these LLMs lack the ability to simultaneously understand audio and visual, making the GER approach challenging to apply in AVSR. In this work, we propose a novel GER paradigm for AVSR, termed <strong>AVGER</strong>, that follows the concept of “listening and seeing again”. Specifically, we first use the powerful AVSR system to read the audio and visual signals to get the N-Best hypotheses, and then use the Q-former-based Multimodal Synchronous Encoder to read the audio and visual information again and convert them into an audio and video compression representation respectively that can be understood by LLM. Afterward, the audio-visual compression representation and the N-Best hypothesis together constitute a Cross-modal Prompt to guide the LLM in producing the best transcription. In addition, we also proposed a Multi-Level Consistency Constraint training criterion, including logits-level, utterance-level and representations-level, to improve the correction accuracy while enhancing the interpretability of audio and visual compression representations. The experimental results on the LRS3 dataset show that our method outperforms current mainstream AVSR systems. The proposed AVGER can reduce the Word Error Rate (WER) by 27.59% compared to them. Code and models can be found at: <span><span>https://github.com/AI-S2-Lab/AVGER</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103077"},"PeriodicalIF":14.7,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143635811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-15DOI: 10.1016/j.inffus.2025.103096
Jing Yang , Xingjiang Ma , Yuankai Wu , Chengjiang Li , Zhidong Su , Ji Xu , Yixiong Feng
Humans are able to readily acquire knowledge about unfamiliar and unknown objects. However, it is extremely challenging for artificial intelligence to achieve this skill. With the rapid development of artificial intelligence, compositional zero-shot learning (CZSL) can generalize unseen compositions by learning prior knowledge of seen attributes and object compositions during training. Although existing composition-based and relationship-based methods show great potential for addressing this challenge, they still exhibit some limitations. Composition-based methods often ignore the intrinsic correlations between attributes, objects, and images, which may lead the model to perform poorly when it is generalized to unseen compositions. Some relationship-based methods can better capture the relationships between attributes, objects, and images but may overlook the interdependencies between distinct attributes and objects. Therefore, the advantages of the composition-based and relationship-based methods are combined, and a new method is proposed for learning attribute and object dependencies (AOGN-CZSL). AOGN-CZSL learns the dependencies between different attributes or objects. It also learns all attributes and objects simultaneously. Different from traditional composition-based methods, it typically address each attribute-object compositions separately. Moreover, unlike general relation-based approaches, this paper adopts learned textual and visual modality features of attributes and objects for attribute scoring and object scoring, respectively. The code and dataset are available at: https://github.com/ybyangjing/AOGN-CZSL.
{"title":"AOGN-CZSL: An Attribute- and Object-Guided Network for Compositional Zero-Shot Learning","authors":"Jing Yang , Xingjiang Ma , Yuankai Wu , Chengjiang Li , Zhidong Su , Ji Xu , Yixiong Feng","doi":"10.1016/j.inffus.2025.103096","DOIUrl":"10.1016/j.inffus.2025.103096","url":null,"abstract":"<div><div>Humans are able to readily acquire knowledge about unfamiliar and unknown objects. However, it is extremely challenging for artificial intelligence to achieve this skill. With the rapid development of artificial intelligence, compositional zero-shot learning (CZSL) can generalize unseen compositions by learning prior knowledge of seen attributes and object compositions during training. Although existing composition-based and relationship-based methods show great potential for addressing this challenge, they still exhibit some limitations. Composition-based methods often ignore the intrinsic correlations between attributes, objects, and images, which may lead the model to perform poorly when it is generalized to unseen compositions. Some relationship-based methods can better capture the relationships between attributes, objects, and images but may overlook the interdependencies between distinct attributes and objects. Therefore, the advantages of the composition-based and relationship-based methods are combined, and a new method is proposed for learning attribute and object dependencies (AOGN-CZSL). AOGN-CZSL learns the dependencies between different attributes or objects. It also learns all attributes and objects simultaneously. Different from traditional composition-based methods, it typically address each attribute-object compositions separately. Moreover, unlike general relation-based approaches, this paper adopts learned textual and visual modality features of attributes and objects for attribute scoring and object scoring, respectively. The code and dataset are available at: <span><span>https://github.com/ybyangjing/AOGN-CZSL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103096"},"PeriodicalIF":14.7,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143679499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-14DOI: 10.1016/j.inffus.2025.103097
Rui Liu , Jinhua Zhang , Haizhou Li
Audio Deepfake Detection (ADD) targets identifying forgery cues in audio generated by text-to-speech (TTS), voice conversion (VC), voice editing, etc. With the advancement of generative artificial intelligence(AI), ADD has gained increasing attention. In recent years, mono-to-binaural (M2B) conversion has been explored in ADD to uncover forgery cues from a novel perspective. However, M2B-based methods may weaken or overlook unique forgery cues specific to mono, limiting detection performance. To this end, this paper proposes a Hierarchical Multi-Source Cues Fusion network for more accurate ADD (HMSCF-ADD). This approach leverages mono alongside binaural left and right channels as three distinct sources for hierarchical information fusion, it distinguishes common and binaural-specific features while removing redundant information for more effective detection. Specifically, binaural-specific and common features are first extracted and fused as binaural information, followed by dynamic fusion of mono and binaural information to achieve hierarchical fusion. Experiments on ASVspoof2019-LA and ASVspoof2021-PA datasets demonstrate that HMSCF-ADD outperforms all mono-input and M2B-based baselines. Detailed comparisons on fusion strategies and M2B conversion further validate the framework’s effectiveness. The codes are available at: https://github.com/AI-S2-Lab/HMSCF-ADD.
{"title":"Hierarchical multi-source cues fusion for mono-to-binaural based Audio Deepfake Detection","authors":"Rui Liu , Jinhua Zhang , Haizhou Li","doi":"10.1016/j.inffus.2025.103097","DOIUrl":"10.1016/j.inffus.2025.103097","url":null,"abstract":"<div><div>Audio Deepfake Detection (ADD) targets identifying forgery cues in audio generated by text-to-speech (TTS), voice conversion (VC), voice editing, etc. With the advancement of generative artificial intelligence(AI), ADD has gained increasing attention. In recent years, mono-to-binaural (M2B) conversion has been explored in ADD to uncover forgery cues from a novel perspective. However, M2B-based methods may weaken or overlook unique forgery cues specific to mono, limiting detection performance. To this end, this paper proposes a <strong>H</strong>ierarchical <strong>M</strong>ulti-<strong>S</strong>ource <strong>C</strong>ues <strong>F</strong>usion network for more accurate <strong>ADD (HMSCF-ADD)</strong>. This approach leverages mono alongside binaural left and right channels as three distinct sources for hierarchical information fusion, it distinguishes common and binaural-specific features while removing redundant information for more effective detection. Specifically, binaural-specific and common features are first extracted and fused as binaural information, followed by dynamic fusion of mono and binaural information to achieve hierarchical fusion. Experiments on ASVspoof2019-LA and ASVspoof2021-PA datasets demonstrate that HMSCF-ADD outperforms all mono-input and M2B-based baselines. Detailed comparisons on fusion strategies and M2B conversion further validate the framework’s effectiveness. The codes are available at: <span><span>https://github.com/AI-S2-Lab/HMSCF-ADD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103097"},"PeriodicalIF":14.7,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143642750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-14DOI: 10.1016/j.inffus.2025.103099
Ivan DeAndres-Tame , Ruben Tolosana , Pietro Melzi , Ruben Vera-Rodriguez , Minchul Kim , Christian Rathgeb , Xiaoming Liu , Luis F. Gomez , Aythami Morales , Julian Fierrez , Javier Ortega-Garcia , Zhizhou Zhong , Yuge Huang , Yuxi Mi , Shouhong Ding , Shuigeng Zhou , Shuai He , Lingzhi Fu , Heng Cong , Rongyu Zhang , David Menotti
Synthetic data is gaining increasing popularity for face recognition technologies, mainly due to the privacy concerns and challenges associated with obtaining real data, including diverse scenarios, quality, and demographic groups, among others. It also offers some advantages over real data, such as the large amount of data that can be generated or the ability to customize it to adapt to specific problem-solving needs. To effectively use such data, face recognition models should also be specifically designed to exploit synthetic data to its fullest potential. In order to promote the proposal of novel Generative AI methods and synthetic data, and investigate the application of synthetic data to better train face recognition systems, we introduce the 2 FRCSyn-onGoing challenge, based on the 2 Face Recognition Challenge in the Era of Synthetic Data (FRCSyn), originally launched at CVPR 2024. This is an ongoing challenge that provides researchers with an accessible platform to benchmark (i) the proposal of novel Generative AI methods and synthetic data, and (ii) novel face recognition systems that are specifically proposed to take advantage of synthetic data. We focus on exploring the use of synthetic data both individually and in combination with real data to solve current challenges in face recognition such as demographic bias, domain adaptation, and performance constraints in demanding situations, such as age disparities between training and testing, changes in the pose, or occlusions. Very interesting findings are obtained in this second edition, including a direct comparison with the first one, in which synthetic databases were restricted to DCFace and GANDiffFace.
{"title":"Second FRCSyn-onGoing: Winning solutions and post-challenge analysis to improve face recognition with synthetic data","authors":"Ivan DeAndres-Tame , Ruben Tolosana , Pietro Melzi , Ruben Vera-Rodriguez , Minchul Kim , Christian Rathgeb , Xiaoming Liu , Luis F. Gomez , Aythami Morales , Julian Fierrez , Javier Ortega-Garcia , Zhizhou Zhong , Yuge Huang , Yuxi Mi , Shouhong Ding , Shuigeng Zhou , Shuai He , Lingzhi Fu , Heng Cong , Rongyu Zhang , David Menotti","doi":"10.1016/j.inffus.2025.103099","DOIUrl":"10.1016/j.inffus.2025.103099","url":null,"abstract":"<div><div>Synthetic data is gaining increasing popularity for face recognition technologies, mainly due to the privacy concerns and challenges associated with obtaining real data, including diverse scenarios, quality, and demographic groups, among others. It also offers some advantages over real data, such as the large amount of data that can be generated or the ability to customize it to adapt to specific problem-solving needs. To effectively use such data, face recognition models should also be specifically designed to exploit synthetic data to its fullest potential. In order to promote the proposal of novel Generative AI methods and synthetic data, and investigate the application of synthetic data to better train face recognition systems, we introduce the 2<span><math><msup><mrow></mrow><mrow><mtext>nd</mtext></mrow></msup></math></span> FRCSyn-onGoing challenge, based on the 2<span><math><msup><mrow></mrow><mrow><mtext>nd</mtext></mrow></msup></math></span> Face Recognition Challenge in the Era of Synthetic Data (FRCSyn), originally launched at CVPR 2024. This is an ongoing challenge that provides researchers with an accessible platform to benchmark (i) the proposal of novel Generative AI methods and synthetic data, and (ii) novel face recognition systems that are specifically proposed to take advantage of synthetic data. We focus on exploring the use of synthetic data both individually and in combination with real data to solve current challenges in face recognition such as demographic bias, domain adaptation, and performance constraints in demanding situations, such as age disparities between training and testing, changes in the pose, or occlusions. Very interesting findings are obtained in this second edition, including a direct comparison with the first one, in which synthetic databases were restricted to DCFace and GANDiffFace.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103099"},"PeriodicalIF":14.7,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143654728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-14DOI: 10.1016/j.inffus.2025.103078
Guangtao Xu, Zhihao Yang, Bo Xu, Ling Luo, Hongfei Lin
Aspect sentiment triplet extraction (ASTE) is a particularly challenging subtask in aspect-based sentiment analysis. The span-based method is currently one of the mainstream solutions in this area. However, existing span-based methods focus only on semantic information, neglecting syntactic information, which has been proven effective in aspect-based sentiment classification. In this work, we combine syntactic information with the span-based method according to task characteristics and propose a span-based syntactic feature fusion (SSFF) model for ASTE. Firstly, we introduce part-of-speech information to assist span category prediction. Secondly, we introduce dependency distance information to assist sentiment polarity category prediction. By introducing the aforementioned syntactic information, the learning objectives of the first and second stages of the span-based method are clearly distinguished, which effectively improves the performance of the span-based method. We conduct experiments on the widely used public dataset ASTE-V2. The experimental results demonstrate that SSFF significantly improves the performance of the span-based method and outperforms all baseline models, achieving new state-of-the-art performance.
{"title":"Span-based syntactic feature fusion for aspect sentiment triplet extraction","authors":"Guangtao Xu, Zhihao Yang, Bo Xu, Ling Luo, Hongfei Lin","doi":"10.1016/j.inffus.2025.103078","DOIUrl":"10.1016/j.inffus.2025.103078","url":null,"abstract":"<div><div>Aspect sentiment triplet extraction (ASTE) is a particularly challenging subtask in aspect-based sentiment analysis. The span-based method is currently one of the mainstream solutions in this area. However, existing span-based methods focus only on semantic information, neglecting syntactic information, which has been proven effective in aspect-based sentiment classification. In this work, we combine syntactic information with the span-based method according to task characteristics and propose a span-based syntactic feature fusion (SSFF) model for ASTE. Firstly, we introduce part-of-speech information to assist span category prediction. Secondly, we introduce dependency distance information to assist sentiment polarity category prediction. By introducing the aforementioned syntactic information, the learning objectives of the first and second stages of the span-based method are clearly distinguished, which effectively improves the performance of the span-based method. We conduct experiments on the widely used public dataset ASTE-V2. The experimental results demonstrate that SSFF significantly improves the performance of the span-based method and outperforms all baseline models, achieving new state-of-the-art performance.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103078"},"PeriodicalIF":14.7,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143654729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-13DOI: 10.1016/j.inffus.2025.103086
Jianxin Shi , Jinhao Chen , Yuandong Wang , Tao Feng , Zhen Yang , Tianyu Wo
Predicting the future movements of dynamic traffic agents is crucial for autonomous systems. Effectively understanding the behavioral patterns of traffic agents is key to accurately predicting their future movements.
Inspired by the success of the pre-training and fine-tuning paradigm in artificial intelligence, we develop a semantic-enhanced trajectory pre-training framework for motion forecasting in the autonomous driving domain, named Behavior-Pred. In detail, we design two kinds of tasks during the pre-training phase: fine-grained reconstruction and coarse-grained contrastive tasks, to learn a better representation of both historical and future behaviors, as well as their pattern consistency. In fine-grained reconstruction learning, we utilize a time-dimensional masking strategy based on the timestep level, which reserves historical and future patterns compared to agent-based masking. In coarse-grained contrastive learning, we design a similarity-based loss function to grasp the relationship/consistency between history patterns and the future. Overall, Behavior-Pred learns more comprehensive behavioral semantics via multi-granularity pre-training tasks. Experimental results demonstrate that our framework outperforms various baselines.
{"title":"Behavior-Pred: A semantic-enhanced trajectory pre-training framework for motion forecasting","authors":"Jianxin Shi , Jinhao Chen , Yuandong Wang , Tao Feng , Zhen Yang , Tianyu Wo","doi":"10.1016/j.inffus.2025.103086","DOIUrl":"10.1016/j.inffus.2025.103086","url":null,"abstract":"<div><div>Predicting the future movements of dynamic traffic agents is crucial for autonomous systems. Effectively understanding the behavioral patterns of traffic agents is key to accurately predicting their future movements.</div><div>Inspired by the success of the pre-training and fine-tuning paradigm in artificial intelligence, we develop a semantic-enhanced trajectory pre-training framework for motion forecasting in the autonomous driving domain, named <strong>Behavior-Pred</strong>. In detail, we design two kinds of tasks during the pre-training phase: fine-grained reconstruction and coarse-grained contrastive tasks, to learn a better representation of both historical and future behaviors, as well as their pattern consistency. In fine-grained reconstruction learning, we utilize a time-dimensional masking strategy based on the timestep level, which reserves historical and future patterns compared to agent-based masking. In coarse-grained contrastive learning, we design a similarity-based loss function to grasp the relationship/consistency between history patterns and the future. Overall, Behavior-Pred learns more comprehensive behavioral semantics via multi-granularity pre-training tasks. Experimental results demonstrate that our framework outperforms various baselines.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103086"},"PeriodicalIF":14.7,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143679386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}