Pub Date : 2026-07-01Epub Date: 2026-01-27DOI: 10.1016/j.displa.2026.103368
Hao Liu , Maoji Qiu , Rong Huang
For block compressive sensing (BCS) of natural videos, existing reconstruction algorithms typically utilize nonlocal self-similarity (NSS) to generate sparse residuals, thereby achieving favorable recovery performance by exploiting the statistical characteristics of key frames and non-key frames. However, when applied to multi-perspective infrared aerial videos rather than natural videos, these reconstruction algorithms usually result in poor recovery quality because of the inflexibility in selecting similar patches and poor adaptability to dynamic scene changes. Due to the distribution property of infrared aerial imagery, inter-frame and intra-frame similar patches should be selected adaptively so that an accurate dictionary matrix can be learned. Therefore, this paper proposes a content-adaptive dual feature selection mechanism. It first conducts a rough screening of inter-frame and intra-frame similar patches based on the correlation of observed measurement vectors across frames. Then, it is followed by a fine screening stage, where principal component analysis (PCA) is applied to project the similar patch-group matrix into a low-dimensional space. Finally, the split Bregman iteration (SBI) is employed to solve the BCS reconstruction for infrared aerial video. Experimental results on both HIT-UAV and M200-XT2DroneVehicle datasets demonstrate that the proposed algorithm achieves better recovery quality compared to state-of-the-art algorithms.
{"title":"Content-adaptive dual feature selection for infrared aerial video compressive sensing reconstruction","authors":"Hao Liu , Maoji Qiu , Rong Huang","doi":"10.1016/j.displa.2026.103368","DOIUrl":"10.1016/j.displa.2026.103368","url":null,"abstract":"<div><div>For block compressive sensing (BCS) of natural videos, existing reconstruction algorithms typically utilize nonlocal self-similarity (NSS) to generate sparse residuals, thereby achieving favorable recovery performance by exploiting the statistical characteristics of key frames and non-key frames. However, when applied to multi-perspective infrared aerial videos rather than natural videos, these reconstruction algorithms usually result in poor recovery quality because of the inflexibility in selecting similar patches and poor adaptability to dynamic scene changes. Due to the distribution property of infrared aerial imagery, inter-frame and intra-frame similar patches should be selected adaptively so that an accurate dictionary matrix can be learned. Therefore, this paper proposes a content-adaptive dual feature selection mechanism. It first conducts a rough screening of inter-frame and intra-frame similar patches based on the correlation of observed measurement vectors across frames. Then, it is followed by a fine screening stage, where principal component analysis (PCA) is applied to project the similar patch-group matrix into a low-dimensional space. Finally, the split Bregman iteration (SBI) is employed to solve the BCS reconstruction for infrared aerial video. Experimental results on both HIT-UAV and M200-XT2DroneVehicle datasets demonstrate that the proposed algorithm achieves better recovery quality compared to state-of-the-art algorithms.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103368"},"PeriodicalIF":3.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146070880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Generative Adversarial Networks (GAN) have significantly improved data security in image steganography. However, existing GAN-based approaches often fail to consider the impact of transmission noise and rely on separately trained encoder–decoder architectures, which hinder the accurate recovery of hidden image data. To address these limitations, we propose a Residual and Multi-Attention Enhanced GAN (RME-GAN) for image steganography, which integrates residual networks, attention mechanisms, and multi-objective optimization to effectively enhance the recovery quality of secret images. In the generator, a residual preprocessing network combined with a global attention mechanism is employed to efficiently extract transmission noise features. In the extractor, a gated attention module is introduced to align the encoder and decoder features, thereby improving decoding accuracy. Moreover, a multi-objective loss function is formulated to jointly optimize both encoder and decoder through end-to-end training, enhancing the consistency between them. Experimental results on widely used datasets, including LFW, ImageNet, and Pascal, demonstrate that the proposed RME-GAN achieves superior robustness against noise and significantly improves Peak Signal-to-Noise Ratio (PSNR) and Structure Similarity Index Measure (SSIM) performance compared to existing methods.
{"title":"Robust image steganography based on residual and multi-attention enhanced Generative Adversarial Networks","authors":"Yuling Luo, Zhaohui Chen, Baoshan Lu, Yiting Huang, Qiang Fu, Sheng Qin, Junxiu Liu","doi":"10.1016/j.displa.2026.103384","DOIUrl":"10.1016/j.displa.2026.103384","url":null,"abstract":"<div><div>Generative Adversarial Networks (GAN) have significantly improved data security in image steganography. However, existing GAN-based approaches often fail to consider the impact of transmission noise and rely on separately trained encoder–decoder architectures, which hinder the accurate recovery of hidden image data. To address these limitations, we propose a Residual and Multi-Attention Enhanced GAN (RME-GAN) for image steganography, which integrates residual networks, attention mechanisms, and multi-objective optimization to effectively enhance the recovery quality of secret images. In the generator, a residual preprocessing network combined with a global attention mechanism is employed to efficiently extract transmission noise features. In the extractor, a gated attention module is introduced to align the encoder and decoder features, thereby improving decoding accuracy. Moreover, a multi-objective loss function is formulated to jointly optimize both encoder and decoder through end-to-end training, enhancing the consistency between them. Experimental results on widely used datasets, including LFW, ImageNet, and Pascal, demonstrate that the proposed RME-GAN achieves superior robustness against noise and significantly improves Peak Signal-to-Noise Ratio (PSNR) and Structure Similarity Index Measure (SSIM) performance compared to existing methods.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103384"},"PeriodicalIF":3.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-02-09DOI: 10.1016/j.displa.2026.103391
Jia Liu, Ao Zhang, Kun Zhang
With the wide application of 3D reconstruction technology in many fields, its efficient realization has become a research focus. The traditional 3D reconstruction method often adopts a relatively fixed mode when facing indoor single-scene and outdoor multi-scene, which is challenging to adjust flexibly according to the scene’s complexity. Therefore, this paper proposes a 3D reconstruction method based on the dynamic perception of scene complexity. To begin with, the scene complexity system is constructed. Next, based on the binary mask technology of transparency and volume, the points in the scene with minimal contribution are screened out. Subsequently, we combine the scene complexity with the octree structure to realize the spatial dynamic streamlining, which ensures the rendering quality and significantly improves the system efficiency at the same time. We conduct comparative experiments on Mip-NeRF 360, Tanks&Temples, and Deep Blending datasets to demonstrate that our method outperforms existing evaluation metrics and visual quality, thus validating its effectiveness.
{"title":"Scene complexity dynamic perception for 3D reconstruction","authors":"Jia Liu, Ao Zhang, Kun Zhang","doi":"10.1016/j.displa.2026.103391","DOIUrl":"10.1016/j.displa.2026.103391","url":null,"abstract":"<div><div>With the wide application of 3D reconstruction technology in many fields, its efficient realization has become a research focus. The traditional 3D reconstruction method often adopts a relatively fixed mode when facing indoor single-scene and outdoor multi-scene, which is challenging to adjust flexibly according to the scene’s complexity. Therefore, this paper proposes a 3D reconstruction method based on the dynamic perception of scene complexity. To begin with, the scene complexity system is constructed. Next, based on the binary mask technology of transparency and volume, the points in the scene with minimal contribution are screened out. Subsequently, we combine the scene complexity with the octree structure to realize the spatial dynamic streamlining, which ensures the rendering quality and significantly improves the system efficiency at the same time. We conduct comparative experiments on Mip-NeRF 360, Tanks&Temples, and Deep Blending datasets to demonstrate that our method outperforms existing evaluation metrics and visual quality, thus validating its effectiveness.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103391"},"PeriodicalIF":3.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-02-09DOI: 10.1016/j.displa.2026.103389
Qihui Li , Qiliang Du , Lianfang Tian , Guoyu Lu
Point cloud feature extraction and the rotation matrix prediction are fundamental tasks in robot perception and 3D computer vision, with critical applications in robot pose estimation, object recognition, and manipulation based on LiDAR, RGB-D, or regular RGB cameras mounted on robots. However, existing methods typically address these two problems separately, often overlooking the intrinsic relationship between them. In this paper, we propose an innovative learning framework that jointly considers rotation invariance and the rotation matrix prediction to enhance point cloud feature extraction. Specifically, we use two parallel branches to extract features from the point clouds. One branch predicts the rotation matrix based on different feature representations. The other branch ensures the consistency of global features between the rotated point clouds for downstream tasks. By balancing the variability and invariance of the features, our approach further improves the robustness and accuracy of downstream tasks. Additionally, we introduce a multi-scale feature extraction module (MSFE), which better captures the local features of the point clouds. We also introduce an attention-based global feature aggregation (AGFA) module, which enhances the capture of global features, leading to improved overall performance. Our method is not only effective but also lightweight. It has relatively small parameters and low computational requirements, which are well-suited for deployment on mobile devices. It has the potential to significantly enhance robot capabilities in object recognition, perception, and navigation tasks, especially in dynamic and unstructured environments.
{"title":"Enhancing point cloud feature extraction for effective robot perception","authors":"Qihui Li , Qiliang Du , Lianfang Tian , Guoyu Lu","doi":"10.1016/j.displa.2026.103389","DOIUrl":"10.1016/j.displa.2026.103389","url":null,"abstract":"<div><div>Point cloud feature extraction and the rotation matrix prediction are fundamental tasks in robot perception and 3D computer vision, with critical applications in robot pose estimation, object recognition, and manipulation based on LiDAR, RGB-D, or regular RGB cameras mounted on robots. However, existing methods typically address these two problems separately, often overlooking the intrinsic relationship between them. In this paper, we propose an innovative learning framework that jointly considers rotation invariance and the rotation matrix prediction to enhance point cloud feature extraction. Specifically, we use two parallel branches to extract features from the point clouds. One branch predicts the rotation matrix based on different feature representations. The other branch ensures the consistency of global features between the rotated point clouds for downstream tasks. By balancing the variability and invariance of the features, our approach further improves the robustness and accuracy of downstream tasks. Additionally, we introduce a multi-scale feature extraction module (MSFE), which better captures the local features of the point clouds. We also introduce an attention-based global feature aggregation (AGFA) module, which enhances the capture of global features, leading to improved overall performance. Our method is not only effective but also lightweight. It has relatively small parameters and low computational requirements, which are well-suited for deployment on mobile devices. It has the potential to significantly enhance robot capabilities in object recognition, perception, and navigation tasks, especially in dynamic and unstructured environments.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103389"},"PeriodicalIF":3.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-02-04DOI: 10.1016/j.displa.2026.103372
Chen Yang , Jixiang Nie , Hui Chen , Weina Wang , Wanquan Liu
Point cloud registration typically relies on point-pair feature extraction. However, point cloud features are low-dimensional, and point-wise processing lacks topological structure and leads to high computational complexity. Address to these challenges, a multi-view 3D point cloud registration method based on generated multi-scale information granules is proposed to build the completed 3D reconstruction. Specifically, during the granule generation process, Fast Persistent Feature Histograms (FPFH) are integrated into Fuzzy C-means clustering to ensure the preservation of geometric features while reducing computational cost. Furthermore, to ensure feature completeness across regions with varying densities, a surface complexity threshold is employed to merge fine-grained granules and eliminate relatively flat surfaces. This approach avoids over-segmentation and redundancy, thereby improving the efficiency of point cloud processing. Finally, to tackle the uneven distribution of overlapping areas and noise-induced mismatches, a hierarchical GMM-based 3D registration framework based on multi-scale information granules is constructed. Point cloud granules are dynamically updated in real time to ensure registration between granules with complete geometric features, thus improving registration accuracy. Experiments conducted on benchmark datasets and real-world collected data demonstrate that the proposed method outperforms existing methods in multi-view registration, offering improved accuracy and efficiency.
{"title":"Multi-view 3D point cloud registration method based on generated multi-scale information granules","authors":"Chen Yang , Jixiang Nie , Hui Chen , Weina Wang , Wanquan Liu","doi":"10.1016/j.displa.2026.103372","DOIUrl":"10.1016/j.displa.2026.103372","url":null,"abstract":"<div><div>Point cloud registration typically relies on point-pair feature extraction. However, point cloud features are low-dimensional, and point-wise processing lacks topological structure and leads to high computational complexity. Address to these challenges, a multi-view 3D point cloud registration method based on generated multi-scale information granules is proposed to build the completed 3D reconstruction. Specifically, during the granule generation process, Fast Persistent Feature Histograms (FPFH) are integrated into Fuzzy C-means clustering to ensure the preservation of geometric features while reducing computational cost. Furthermore, to ensure feature completeness across regions with varying densities, a surface complexity threshold is employed to merge fine-grained granules and eliminate relatively flat surfaces. This approach avoids over-segmentation and redundancy, thereby improving the efficiency of point cloud processing. Finally, to tackle the uneven distribution of overlapping areas and noise-induced mismatches, a hierarchical GMM-based 3D registration framework based on multi-scale information granules is constructed. Point cloud granules are dynamically updated in real time to ensure registration between granules with complete geometric features, thus improving registration accuracy. Experiments conducted on benchmark datasets and real-world collected data demonstrate that the proposed method outperforms existing methods in multi-view registration, offering improved accuracy and efficiency.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103372"},"PeriodicalIF":3.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-28DOI: 10.1016/j.displa.2026.103364
Kexuan Shi , Zhuang Qi , Jingjing Zhu , Lei Meng , Yaochen Zhang , Haibei Huang , Xiangxu Meng
Open-set few-shot image classification aims to train models using a small amount of labeled data, enabling them to achieve good generalization when confronted with unknown environments. Existing methods mainly use visual information from a single image to learn class representations to distinguish known from unknown categories. However, these methods often overlook the benefits of integrating rich contextual information. To address this issue, this paper proposes a prototypical augmentation and alignment method, termed ProtoConNet, which incorporates background information from different samples to enhance the diversity of the feature space, breaking the spurious associations between context and image subjects in few-shot scenarios. Specifically, it consists of three main modules: the clustering-based data selection (CDS) module mines diverse data patterns while preserving core features; the contextual-enhanced semantic refinement (CSR) module builds a context dictionary to integrate into image representations, which boosts the model’s robustness in various scenarios; and the prototypical alignment (PA) module reduces the gap between image representations and class prototypes, amplifying feature distances for known and unknown classes. Experimental results from two datasets verified that ProtoConNet enhances the effectiveness of representation learning in few-shot scenarios and identifies open-set samples, making it superior to existing methods.
{"title":"ProtoConNet: Prototypical augmentation and alignment for open-set few-shot image classification","authors":"Kexuan Shi , Zhuang Qi , Jingjing Zhu , Lei Meng , Yaochen Zhang , Haibei Huang , Xiangxu Meng","doi":"10.1016/j.displa.2026.103364","DOIUrl":"10.1016/j.displa.2026.103364","url":null,"abstract":"<div><div>Open-set few-shot image classification aims to train models using a small amount of labeled data, enabling them to achieve good generalization when confronted with unknown environments. Existing methods mainly use visual information from a single image to learn class representations to distinguish known from unknown categories. However, these methods often overlook the benefits of integrating rich contextual information. To address this issue, this paper proposes a prototypical augmentation and alignment method, termed ProtoConNet, which incorporates background information from different samples to enhance the diversity of the feature space, breaking the spurious associations between context and image subjects in few-shot scenarios. Specifically, it consists of three main modules: the clustering-based data selection (CDS) module mines diverse data patterns while preserving core features; the contextual-enhanced semantic refinement (CSR) module builds a context dictionary to integrate into image representations, which boosts the model’s robustness in various scenarios; and the prototypical alignment (PA) module reduces the gap between image representations and class prototypes, amplifying feature distances for known and unknown classes. Experimental results from two datasets verified that ProtoConNet enhances the effectiveness of representation learning in few-shot scenarios and identifies open-set samples, making it superior to existing methods.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103364"},"PeriodicalIF":3.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-02-09DOI: 10.1016/j.displa.2026.103357
Yun Liang , Yuting Xiao , Zihan Zhou , Hongyu Wang , Jiabin Zhang , Jing Li , Yong Xu , Patrick Le Callet
Deep neural networks have shown remarkable progress in blind image quality assessment. However, accurately modeling human visual perception remains challenging due to the wide variations in image content and the complex interplay of distortion types. Existing methods, relying on content-agnostic or fixed receptive field approaches, struggle to capture adaptive perceptual features linking semantic regions and distortion perception. To address these limitations, we propose the dual perception-aware model, a two-stage framework integrating semantic- and distortion-aware representations, and then exploring dynamic global–local feature extraction. First, our method leverages superpixel similarity indicators as semantic-aware representations that capture perceptually coherent regions, enabling subsequent content-adaptive feature extraction beyond traditional grid-based methods. A cross-attention mechanism then facilitates mutual modulation between semantic importance and distortion sensitivity, allowing the model to focus on perceptually critical areas while maintaining distortion awareness. Second, we design an adaptive parallel feature extraction unit combining vision transformer blocks with enhanced adaptive filtering residual blocks, achieving comprehensive global–local feature representation that adapts to image-specific characteristics, followed by a weighted dual-pathway regressor for content-tailored quality predictions. Extensive experiments on benchmark datasets containing both synthetic and authentic distortions demonstrate superior performance compared to state-of-the-art methods, with comprehensive ablation studies validating the effectiveness of each proposed component.
{"title":"Dual perception-aware blind image quality assessment with semantic-distortion integration and dynamic global–local refinement","authors":"Yun Liang , Yuting Xiao , Zihan Zhou , Hongyu Wang , Jiabin Zhang , Jing Li , Yong Xu , Patrick Le Callet","doi":"10.1016/j.displa.2026.103357","DOIUrl":"10.1016/j.displa.2026.103357","url":null,"abstract":"<div><div>Deep neural networks have shown remarkable progress in blind image quality assessment. However, accurately modeling human visual perception remains challenging due to the wide variations in image content and the complex interplay of distortion types. Existing methods, relying on content-agnostic or fixed receptive field approaches, struggle to capture adaptive perceptual features linking semantic regions and distortion perception. To address these limitations, we propose the dual perception-aware model, a two-stage framework integrating semantic- and distortion-aware representations, and then exploring dynamic global–local feature extraction. First, our method leverages superpixel similarity indicators as semantic-aware representations that capture perceptually coherent regions, enabling subsequent content-adaptive feature extraction beyond traditional grid-based methods. A cross-attention mechanism then facilitates mutual modulation between semantic importance and distortion sensitivity, allowing the model to focus on perceptually critical areas while maintaining distortion awareness. Second, we design an adaptive parallel feature extraction unit combining vision transformer blocks with enhanced adaptive filtering residual blocks, achieving comprehensive global–local feature representation that adapts to image-specific characteristics, followed by a weighted dual-pathway regressor for content-tailored quality predictions. Extensive experiments on benchmark datasets containing both synthetic and authentic distortions demonstrate superior performance compared to state-of-the-art methods, with comprehensive ablation studies validating the effectiveness of each proposed component.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103357"},"PeriodicalIF":3.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-02-02DOI: 10.1016/j.displa.2026.103374
Yani Guo , Zhenhong Jia , Gang Zhou , Xiaohui Huang , Yue Li , Mingyan Li , Guohong Chen , Junjie Li
Numerous obstacles are faced in change detection tasks for large-field-of-view video images (e.g., those acquired by Eagle Eye devices) in low-light environments, mainly due to the difficulty in differentiating genuine changes from illumination-induced pseudo-changes, vulnerability to intricate noise interference, and constrained robustness in multi-scale change detection. To address these issues, a deep learning framework for large-field-of-view change detection in low-light environments is proposed in this paper, consisting of three core modules: Cross-scale Attention Feature Fusion, Difference Enhancement and Optimization, and Pseudo-Change Suppression and Multi-scale Fusion. Initially, the Cross-scale Attention Feature Fusion (CAF) module employs a cross-scale attention mechanism to fuse multi-scale features, capturing change information at various scales. Structural differences are then enhanced by the Difference Enhancement and Optimization (DEO) module through frequency-domain decomposition and boundary-aware strategies, mitigating the impact of illumination variations. Subsequently, illumination-induced pseudo-changes are suppressed by the Pseudo-Change Suppression and Multi-scale Fusion (PSF) module with Pseudo-Change Filtering Attention, and multi-scale feature fusion is performed to generate accurate change maps. Additionally, an end-to-end optimization strategy is introduced, incorporating contrastive learning and self-supervised pseudo-label generation, to further enhance the model’s robustness and generalization across various low-light scenarios. Experimental results demonstrate that, compared with other methods, The method described in this paper improved the F1 score by 3.65% and accuracy by 1.84%, verifying its ability to accurately distinguish between real and false changes in low-light environments.
{"title":"Change detection of large-field-of-view video images in low-light environments with cross-scale feature fusion and pseudo-change mitigation","authors":"Yani Guo , Zhenhong Jia , Gang Zhou , Xiaohui Huang , Yue Li , Mingyan Li , Guohong Chen , Junjie Li","doi":"10.1016/j.displa.2026.103374","DOIUrl":"10.1016/j.displa.2026.103374","url":null,"abstract":"<div><div>Numerous obstacles are faced in change detection tasks for large-field-of-view video images (e.g., those acquired by Eagle Eye devices) in low-light environments, mainly due to the difficulty in differentiating genuine changes from illumination-induced pseudo-changes, vulnerability to intricate noise interference, and constrained robustness in multi-scale change detection. To address these issues, a deep learning framework for large-field-of-view change detection in low-light environments is proposed in this paper, consisting of three core modules: Cross-scale Attention Feature Fusion, Difference Enhancement and Optimization, and Pseudo-Change Suppression and Multi-scale Fusion. Initially, the Cross-scale Attention Feature Fusion (CAF) module employs a cross-scale attention mechanism to fuse multi-scale features, capturing change information at various scales. Structural differences are then enhanced by the Difference Enhancement and Optimization (DEO) module through frequency-domain decomposition and boundary-aware strategies, mitigating the impact of illumination variations. Subsequently, illumination-induced pseudo-changes are suppressed by the Pseudo-Change Suppression and Multi-scale Fusion (PSF) module with Pseudo-Change Filtering Attention, and multi-scale feature fusion is performed to generate accurate change maps. Additionally, an end-to-end optimization strategy is introduced, incorporating contrastive learning and self-supervised pseudo-label generation, to further enhance the model’s robustness and generalization across various low-light scenarios. Experimental results demonstrate that, compared with other methods, The method described in this paper improved the F1 score by 3.65% and accuracy by 1.84%, verifying its ability to accurately distinguish between real and false changes in low-light environments.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103374"},"PeriodicalIF":3.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-02-02DOI: 10.1016/j.displa.2026.103377
Aizhong Zhou , Fengbo Wang , Jiong Guo , Yutao Liu
The Mixture of Experts (MoE) is a neural network architecture which is widely used in fields such as natural language processing (such as large language models, multilingual translation), computer vision (such as medical image analysis, multi-modal learning), and recommendation systems. A core problem of the MoE is how to select an expert assigned to a specific task among all experts. This problem can be transformed into an election problem where each expert is a candidate and the winner of election (a candidate or some candidates) is the expert who is assigned to the task by considering the votes. We study a variant of committee elections from the perspective of computational complexity. Given a set of candidates, each possessing a set of attributes and a profit value, and a set of constraints specified as propositional logical expressions on the attributes, the task is to select a committee of candidates that satisfies all constraints and whose total profit meets a given threshold. Regarding the classical complexity, we design two polynomial time algorithms for two special conditions and provide some NP-hardness results. Moreover, we examine the parameterized complexity and get some FPT, W[1]-hard and para-NP-hard results.
{"title":"Committee Elections with Candidate Attribute Constraints","authors":"Aizhong Zhou , Fengbo Wang , Jiong Guo , Yutao Liu","doi":"10.1016/j.displa.2026.103377","DOIUrl":"10.1016/j.displa.2026.103377","url":null,"abstract":"<div><div>The Mixture of Experts (MoE) is a neural network architecture which is widely used in fields such as natural language processing (such as large language models, multilingual translation), computer vision (such as medical image analysis, multi-modal learning), and recommendation systems. A core problem of the MoE is how to select an expert assigned to a specific task among all experts. This problem can be transformed into an election problem where each expert is a candidate and the winner of election (a candidate or some candidates) is the expert who is assigned to the task by considering the votes. We study a variant of committee elections from the perspective of computational complexity. Given a set of candidates, each possessing a set of attributes and a profit value, and a set of constraints specified as propositional logical expressions on the attributes, the task is to select a committee of <span><math><mi>k</mi></math></span> candidates that satisfies all constraints and whose total profit meets a given threshold. Regarding the classical complexity, we design two polynomial time algorithms for two special conditions and provide some NP-hardness results. Moreover, we examine the parameterized complexity and get some FPT, W[1]-hard and para-NP-hard results.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103377"},"PeriodicalIF":3.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-07-01Epub Date: 2026-01-27DOI: 10.1016/j.displa.2026.103366
Xinggang Hou , Bingchen Gou , Dengkai Chen , Jianjie Chu , Xiaosai Duan , Xuerui Li , Lin Ma , Jing Chen , Yao Zhou
In monitoring tasks involving sustained interaction with display systems, fatigue is a primary factor diminishing efficiency. Traditional models confuse sleepiness with mental fatigue, which compromises the reliability of assessments. We propose an explainable multimodal framework that models these two subtypes separately and integrates them into a comprehensive fatigue assessment. To validate our methodology, we invited 20 pilots to participate in a 90-minute continuous monitoring experiment, during which we collected multimodal data including their eye movements, electroencephalogram (EEG), electrocardiogram (ECG), and video. First, we derive explicit representation functions for sleepiness and mental fatigue using symbolic regression on facial and behavioral cues, enabling continuous subtype related labeling beyond intermittent questionnaires. Second, we identify compact physiological marker subsets via a cascaded feature selection method that combines mRMR prescreening with a heuristic search, yielding key feature sets while substantially reducing dimensionality. Finally, dynamic weighted coupling analysis based on information entropy revealed the nonlinear superposition effects between sleepiness and mental fatigue. Using 30 s windows under the current cohort and evaluation setting, the resulting comprehensive classifier achieves 94.8% accuracy. Following external validation and domain-specific adaptations, the methodology developed in this study holds broad application prospects across numerous automation scenarios involving monotonous human–machine interaction tasks.
{"title":"Distinguishing sleepiness from mental fatigue in sustained monitoring tasks to enhance the reliability of fatigue detection based on multimodal fusion","authors":"Xinggang Hou , Bingchen Gou , Dengkai Chen , Jianjie Chu , Xiaosai Duan , Xuerui Li , Lin Ma , Jing Chen , Yao Zhou","doi":"10.1016/j.displa.2026.103366","DOIUrl":"10.1016/j.displa.2026.103366","url":null,"abstract":"<div><div>In monitoring tasks involving sustained interaction with display systems, fatigue is a primary factor diminishing efficiency. Traditional models confuse sleepiness with mental fatigue, which compromises the reliability of assessments. We propose an explainable multimodal framework that models these two subtypes separately and integrates them into a comprehensive fatigue assessment. To validate our methodology, we invited 20 pilots to participate in a 90-minute continuous monitoring experiment, during which we collected multimodal data including their eye movements, electroencephalogram (EEG), electrocardiogram (ECG), and video. First, we derive explicit representation functions for sleepiness and mental fatigue using symbolic regression on facial and behavioral cues, enabling continuous subtype related labeling beyond intermittent questionnaires. Second, we identify compact physiological marker subsets via a cascaded feature selection method that combines mRMR prescreening with a heuristic search, yielding key feature sets while substantially reducing dimensionality. Finally, dynamic weighted coupling analysis based on information entropy revealed the nonlinear superposition effects between sleepiness and mental fatigue. Using 30 s windows under the current cohort and evaluation setting, the resulting comprehensive classifier achieves 94.8% accuracy. Following external validation and domain-specific adaptations, the methodology developed in this study holds broad application prospects across numerous automation scenarios involving monotonous human–machine interaction tasks.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"93 ","pages":"Article 103366"},"PeriodicalIF":3.4,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146191192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}