Pub Date : 2025-10-20DOI: 10.1016/j.displa.2025.103261
Jing Chen , Huimin Tao , Jiahui Wu , Quanjingzi Yuan , Lin Ma , Dengkai Chen , Mingjiu Yu
Generative AI can rapidly create user interfaces (UIs) with distinct emotional tones, yet few studies rigorously test how effectively such UIs convey emotion. Using the Valence–Arousal (VA) framework, we prompted generative AI to produce 40 static visual UIs targeting specific emotions and evaluated them with a mixed-methods protocol in which participants completed Check-All-That-Apply (CATA) descriptors while eye-tracking recorded saccade speed and pupil diameter. Analyses showed that UIs generated from different prompts formed three perceptual categories—positive valence, negative/high arousal, and negative/low arousal—with partial overlap between positive prompts (e.g., “Delighted” and “Relaxed”) and clearer distinctions for negative prompts (“Alarmed”, “Bored”), a pattern mirrored by differences in scanning speed. These findings indicate that AI-generated UIs can embed meaningful affective cues that shape how users feel when viewing on-screen elements, and the combination of subjective and physiological measures offers a practical framework for emotion-focused UI evaluation while motivating further work on refining prompt specificity, incorporating diverse emotion models, and testing broader user demographics.
生成式人工智能可以快速创建具有不同情感色调的用户界面(ui),但很少有研究严格测试这种ui传达情感的有效性。使用效价觉醒(VA)框架,我们促使生成式AI生成40个针对特定情绪的静态视觉ui,并使用混合方法协议对其进行评估,其中参与者完成check - all - thatapply (CATA)描述符,同时眼动追踪记录扫视速度和瞳孔直径。分析表明,由不同提示产生的ui形成了三种知觉类别——正效价、负/高唤醒和负/低唤醒——积极提示(如“高兴”和“放松”)和消极提示(如“警觉”、“无聊”)之间的部分重叠,这种模式反映在扫描速度的差异上。这些发现表明,人工智能生成的UI可以嵌入有意义的情感线索,塑造用户在观看屏幕元素时的感受,主观和生理测量的结合为以情感为中心的UI评估提供了一个实用的框架,同时激励进一步完善提示特异性、整合不同的情感模型和测试更广泛的用户人口统计数据。
{"title":"Affective conveyance assessment of AI-generative static visual user interfaces based on valence-arousal emotion model","authors":"Jing Chen , Huimin Tao , Jiahui Wu , Quanjingzi Yuan , Lin Ma , Dengkai Chen , Mingjiu Yu","doi":"10.1016/j.displa.2025.103261","DOIUrl":"10.1016/j.displa.2025.103261","url":null,"abstract":"<div><div>Generative AI can rapidly create user interfaces (UIs) with distinct emotional tones, yet few studies rigorously test how effectively such UIs convey emotion. Using the Valence–Arousal (VA) framework, we prompted generative AI to produce 40 static visual UIs targeting specific emotions and evaluated them with a mixed-methods protocol in which participants completed Check-All-That-Apply (CATA) descriptors while eye-tracking recorded saccade speed and pupil diameter. Analyses showed that UIs generated from different prompts formed three perceptual categories—positive valence, negative/high arousal, and negative/low arousal—with partial overlap between positive prompts (e.g., “Delighted” and “Relaxed”) and clearer distinctions for negative prompts (“Alarmed”, “Bored”), a pattern mirrored by differences in scanning speed. These findings indicate that AI-generated UIs can embed meaningful affective cues that shape how users feel when viewing on-screen elements, and the combination of subjective and physiological measures offers a practical framework for emotion-focused UI evaluation while motivating further work on refining prompt specificity, incorporating diverse emotion models, and testing broader user demographics.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103261"},"PeriodicalIF":3.4,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145361553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents AIBench, a flexible and rapidly updating benchmark that aggregates evaluation results from commercial platforms, popular open-source leaderboards, and internal evaluation benchmarks. While existing leaderboards primarily emphasize model capabilities, they often overlook safety evaluations and lack integrated cost-performance information, factors critical for informed decision-making by enterprises and end users. To address this gap, AIBench provides a comprehensive evaluation of foundation models across four key dimensions: Safety, Intelligence, Speed, and Price. Inspired by the 45°Law of Intelligence-Safety Balance, we visualize the trade-off patterns among leading models, offering a bird’s-eye view of how top-tier companies position their models along these two axes. In addition, to support the development of Specialized Generalist Intelligence (SGI), AIBench incorporates a general–special evaluation framework, designed to assess whether models excelling in specialized domains can also maintain strong general-purpose performance. AIBench also tracks performance evolution over time, revealing longitudinal trends in model development. Furthermore, we periodically curate and incorporate insights from the evaluation community to ensure that the benchmark remains timely and relevant. AIBench is intended to serve as a transparent, dynamic, and actionable benchmark for trustworthy evaluation, aiding both researchers and practitioners in navigating the rapidly evolving landscape of foundation models. AIBench is publicly available and maintained at: https://aiben.ch.
{"title":"AIBench: Towards trustworthy evaluation under the 45°law","authors":"Zicheng Zhang, Junying Wang, Yijin Guo, Farong Wen, Zijian Chen, Hanqing Wang, Wenzhe Li, Lu Sun, Yingjie Zhou, Jianbo Zhang, Bowen Yan, Ziheng Jia, Jiahao Xiao, Yuan Tian, Xiangyang Zhu, Kaiwei Zhang, Chunyi Li, Xiaohong Liu, Xiongkuo Min, Qi Jia, Guangtao Zhai","doi":"10.1016/j.displa.2025.103255","DOIUrl":"10.1016/j.displa.2025.103255","url":null,"abstract":"<div><div>This paper presents <strong>AIBench</strong>, a flexible and rapidly updating benchmark that aggregates evaluation results from commercial platforms, popular open-source leaderboards, and internal evaluation benchmarks. While existing leaderboards primarily emphasize model capabilities, they often overlook safety evaluations and lack integrated cost-performance information, factors critical for informed decision-making by enterprises and end users. To address this gap, <strong>AIBench</strong> provides a comprehensive evaluation of foundation models across four key dimensions: <strong>Safety</strong>, <strong>Intelligence</strong>, <strong>Speed</strong>, and <strong>Price</strong>. Inspired by the <em>45°Law of Intelligence-Safety Balance</em>, we visualize the trade-off patterns among leading models, offering a bird’s-eye view of how top-tier companies position their models along these two axes. In addition, to support the development of Specialized Generalist Intelligence (SGI), AIBench incorporates a general–special evaluation framework, designed to assess whether models excelling in specialized domains can also maintain strong general-purpose performance. <strong>AIBench</strong> also tracks performance evolution over time, revealing longitudinal trends in model development. Furthermore, we periodically curate and incorporate insights from the evaluation community to ensure that the benchmark remains timely and relevant. <strong>AIBench</strong> is intended to serve as a transparent, dynamic, and actionable benchmark for trustworthy evaluation, aiding both researchers and practitioners in navigating the rapidly evolving landscape of foundation models. <strong>AIBench</strong> is publicly available and maintained at: <span><span>https://aiben.ch</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103255"},"PeriodicalIF":3.4,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145361551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-18DOI: 10.1016/j.displa.2025.103263
Marios Sekadakis , Thodoris Garefalakis , Peter Moertl , George Yannis
This study investigates the factors influencing Take-Over Time (TOT) during transitions from automated to manual driving, emphasizing the novelty of applying XGBoost modeling combined with SHAP analysis to uncover non-linear and implicit dependencies between features. Using high-frequency data from a driving simulator, key variables such as automation level, driving measurements, different types of obstacles, and Human-Machine Interface (HMI) conditions were analyzed to understand their effects on TOT. The XGBoost model was optimized using a cross-validation approach, achieving strong predictive performance (R2 = 0.871 for testing set). Feature importance analysis revealed that Automated Driving (AD) level 2 or 3 was the most influential factor, underscoring how extended time budgets and reduced driver engagement interact in shaping TOT. Higher automation levels resulted in longer TOT, with SHAP values consistently positive for AD Level 3, demonstrating the added value of explainable machine learning in clarifying these patterns. Dynamic driving parameters, such as deceleration and speed variability, were also significant. Strong negative deceleration values were generally associated with shorter TOT, reflecting quicker responses under urgent braking. Speed showed a moderate positive effect on TOT at 80–110 km/h, with drivers taking additional time to assess the environment, but higher speeds (above 110 km/h) resulted in quicker responses. Beyond these established effects, SHAP analysis revealed how automation level, obstacle environment, and HMI design jointly condition driver responses. The HADRIAN HMI, slightly increasing TOT compared to the baseline, simultaneously seems to demonstrate potential safety benefits through tailored guidance and improved situational awareness. By combining methodological innovation with contextual insights, this study contributes to a deeper understanding of takeover behavior and provides actionable evidence for optimizing adaptive HMI design and takeover strategies in AD systems.
{"title":"Analyzing SHAP values of XGBoost algorithms to understand driving features affecting take-over time from vehicle alert to driver action","authors":"Marios Sekadakis , Thodoris Garefalakis , Peter Moertl , George Yannis","doi":"10.1016/j.displa.2025.103263","DOIUrl":"10.1016/j.displa.2025.103263","url":null,"abstract":"<div><div>This study investigates the factors influencing Take-Over Time (TOT) during transitions from automated to manual driving, emphasizing the novelty of applying XGBoost modeling combined with SHAP analysis to uncover non-linear and implicit dependencies between features. Using high-frequency data from a driving simulator, key variables such as automation level, driving measurements, different types of obstacles, and Human-Machine Interface (HMI) conditions were analyzed to understand their effects on TOT. The XGBoost model was optimized using a cross-validation approach, achieving strong predictive performance (R<sup>2</sup> = 0.871 for testing set). Feature importance analysis revealed that Automated Driving (AD) level 2 or 3 was the most influential factor, underscoring how extended time budgets and reduced driver engagement interact in shaping TOT. Higher automation levels resulted in longer TOT, with SHAP values consistently positive for AD Level 3, demonstrating the added value of explainable machine learning in clarifying these patterns. Dynamic driving parameters, such as deceleration and speed variability, were also significant. Strong negative deceleration values were generally associated with shorter TOT, reflecting quicker responses under urgent braking. Speed showed a moderate positive effect on TOT at 80–110 km/h, with drivers taking additional time to assess the environment, but higher speeds (above 110 km/h) resulted in quicker responses. Beyond these established effects, SHAP analysis revealed how automation level, obstacle environment, and HMI design jointly condition driver responses. The HADRIAN HMI, slightly increasing TOT compared to the baseline, simultaneously seems to demonstrate potential safety benefits through tailored guidance and improved situational awareness. By combining methodological innovation with contextual insights, this study contributes to a deeper understanding of takeover behavior and provides actionable evidence for optimizing adaptive HMI design and takeover strategies in AD systems.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103263"},"PeriodicalIF":3.4,"publicationDate":"2025-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145361550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-18DOI: 10.1016/j.displa.2025.103260
Liwan Lin , Zongyu Wu , Yijun Lu , Zhong Chen , Weijie Guo
As a cornerstone technology for metaverse interactions, advanced eye-tracking systems require solutions that address dynamic adaptability and computational efficiency challenges. To this end, this work proposes a novel calibration-free eye-tracking system employing a lightweight deep learning architecture with multi-module fusion, improving the gaze estimation accuracy, robustness, and real-time performance. The multi-scale feature extraction and an attention mechanism have been utilized to effectively capture gaze-related feature. The prediction on gaze point has been optimized through a multi-layer feature fusion integrating spatial and channel information. The mean square error loss and attention weight regularization loss have been incorporated to balance the prediction accuracy and feature stability during training. The highest gaze estimation accuracy of 1.76° and a real-time inference latency of 9.71 ms have been achieved, outperforming the traditional methods in both accuracy and efficiency.
{"title":"Lightweight deep learning with multi-scale feature fusion for high-precision and low-latency eye tracking","authors":"Liwan Lin , Zongyu Wu , Yijun Lu , Zhong Chen , Weijie Guo","doi":"10.1016/j.displa.2025.103260","DOIUrl":"10.1016/j.displa.2025.103260","url":null,"abstract":"<div><div>As a cornerstone technology for metaverse interactions, advanced eye-tracking systems require solutions that address dynamic adaptability and computational efficiency challenges. To this end, this work proposes a novel calibration-free eye-tracking system employing a lightweight deep learning architecture with multi-module fusion, improving the gaze estimation accuracy, robustness, and real-time performance. The multi-scale feature extraction and an attention mechanism have been utilized to effectively capture gaze-related feature. The prediction on gaze point has been optimized through a multi-layer feature fusion integrating spatial and channel information. The mean square error loss and attention weight regularization loss have been incorporated to balance the prediction accuracy and feature stability during training. The highest gaze estimation accuracy of 1.76° and a real-time inference latency of 9.71 ms have been achieved, outperforming the traditional methods in both accuracy and efficiency.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103260"},"PeriodicalIF":3.4,"publicationDate":"2025-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145415550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Image retargeting based on Seam Carving (SC) achieves content-aware size adjustment by iteratively removing or inserting the minimal-energy seam. However, its computational efficiency and quality are constrained by the discrete, pixel-wise representation of seam paths and the Dynamic Programming (DP) process. To address these, we proposes a novel retargeting method based on the Bezier curve and Mamba. First, we introduce cubic Bezier curves to represent seams. The seam can be precisely described by only six scalar parameters derived from four control points, significantly reducing representation complexity (e.g., from O(H) to O(1) per vertical seam. H is the image height). Second, we improve the energy map by introducing the geometric distance weight ω and a new fusion method termed Fluid Diffusion. This effectively integrates gradient information, edge structures, visual saliency, and depth cues, providing more robust importance guidance for seam. Finally, replacing the DP process in original SC, we construct the Mamba-based regression network BSCNet. It directly regresses the control points of curve seam (reducing the computational complexity per seam from quadratic to constant). Experimental results demonstrate that BSCNet outperforms reference methods in both image quality and computational efficiency. Specifically, compared to SC, BSCNet improves the TOPIQ score by 9.82% and reduces the BRISQUE score by 13.79%. In addition, BSCNet achieves an inference speed 22.62 times faster than that of SC in our evaluation. In conclusion, the proposed method overcomes the limitations of traditional SC by combining parametric seam representation and efficient regression-based seam searching, offering a promising solution for efficient and high-quality image retargeting.
{"title":"Efficient image retargeting with Bezier curves","authors":"Guojin Pei, Chen Xu, Huihui Wei, Genke Yang, Jian Chu","doi":"10.1016/j.displa.2025.103258","DOIUrl":"10.1016/j.displa.2025.103258","url":null,"abstract":"<div><div>Image retargeting based on Seam Carving (SC) achieves content-aware size adjustment by iteratively removing or inserting the minimal-energy seam. However, its computational efficiency and quality are constrained by the discrete, pixel-wise representation of seam paths and the Dynamic Programming (DP) process. To address these, we proposes a novel retargeting method based on the Bezier curve and Mamba. First, we introduce cubic Bezier curves to represent seams. The seam can be precisely described by only six scalar parameters derived from four control points, significantly reducing representation complexity (e.g., from <em>O</em>(H) to <em>O</em>(1) per vertical seam. H is the image height). Second, we improve the energy map by introducing the geometric distance weight <em>ω</em> and a new fusion method termed <em>Fluid Diffusion</em>. This effectively integrates gradient information, edge structures, visual saliency, and depth cues, providing more robust importance guidance for seam. Finally, replacing the DP process in original SC, we construct the Mamba-based regression network BSCNet. It directly regresses the control points of curve seam (reducing the computational complexity per seam from quadratic to constant). Experimental results demonstrate that BSCNet outperforms reference methods in both image quality and computational efficiency. Specifically, compared to SC, BSCNet improves the TOPIQ score by 9.82% and reduces the BRISQUE score by 13.79%. In addition, BSCNet achieves an inference speed 22.62 times faster than that of SC in our evaluation. In conclusion, the proposed method overcomes the limitations of traditional SC by combining parametric seam representation and efficient regression-based seam searching, offering a promising solution for efficient and high-quality image retargeting.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103258"},"PeriodicalIF":3.4,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145361622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-16DOI: 10.1016/j.displa.2025.103256
Junqin Chen , Ruoqing Xie , Lirong Chen , Runhui Feng , Jin Xie , Linfeng Yang , Meipeng Huang
Industrial drawing skills are typically transmitted from experts to novices through demonstration and practice, yet the embodied (sensorimotor) and cognitive components of such skills remain difficult to quantify. This study investigates whether augmented reality (AR)-assisted human–computer interaction training can digitize and facilitate the transfer of drawing skills from experts to novices. We created a virtual drawing environment to present expert postural trajectories to trainees via AR devices. Gesture postures and eye movements were recorded before and after training and analyzed using dynamic metrics, including gesture joint velocities and eye angular velocities. Skill acquisition and transfer were evaluated based on objective performance scores from expert raters and subjective participant feedback. The results indicate that AR-guided training induced rapid changes in joint velocity profiles and eye angular velocity, accompanied by reduced and more focused spectral energy expenditure. Improvements in objective scores were supported by positive subjective evaluations, and subjective assessments correlated with the observed kinematic changes. These findings demonstrate that AR-mediated human–computer interaction can effectively facilitate the transfer of both sensorimotor and cognitive aspects of drawing skill to novices. This approach shows promise for enhancing the efficiency of industrial design education and other skill-training applications.
{"title":"Enhancing drawing skills with augmented reality: A study on gesture and eye Movement-Based training☆☆","authors":"Junqin Chen , Ruoqing Xie , Lirong Chen , Runhui Feng , Jin Xie , Linfeng Yang , Meipeng Huang","doi":"10.1016/j.displa.2025.103256","DOIUrl":"10.1016/j.displa.2025.103256","url":null,"abstract":"<div><div>Industrial drawing skills are typically transmitted from experts to novices through demonstration and practice, yet the embodied (sensorimotor) and cognitive components of such skills remain difficult to quantify. This study investigates whether augmented reality (AR)-assisted human–computer interaction training can digitize and facilitate the transfer of drawing skills from experts to novices. We created a virtual drawing environment to present expert postural trajectories to trainees via AR devices. Gesture postures and eye movements were recorded before and after training and analyzed using dynamic metrics, including gesture joint velocities and eye angular velocities. Skill acquisition and transfer were evaluated based on objective performance scores from expert raters and subjective participant feedback. The results indicate that AR-guided training induced rapid changes in joint velocity profiles and eye angular velocity, accompanied by reduced and more focused spectral energy expenditure. Improvements in objective scores were supported by positive subjective evaluations, and subjective assessments correlated with the observed kinematic changes. These findings demonstrate that AR-mediated human–computer interaction can effectively facilitate the transfer of both sensorimotor and cognitive aspects of drawing skill to novices. This approach shows promise for enhancing the efficiency of industrial design education and other skill-training applications.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103256"},"PeriodicalIF":3.4,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145361571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-14DOI: 10.1016/j.displa.2025.103259
Da Ai , Ting He , Mingyue Lu , Dianwei Wang , Ying Liu
With the rapid proliferation of mobile Internet and social media, individual users have become significant contributors to video content creation. The quality of User-Generated Content (UGC) videos plays a crucial role in determining their dissemination effectiveness. Consequently, UGC quality assessment has emerged as one of the critical research topics in the field of image processing. To address the limitations of existing evaluation methods—such as inadequate detection of dynamic distortions, suboptimal spatio-temporal modeling, and degraded performance in evaluating long-sequence videos—we propose STAFF-Net, a UGC video quality assessment model based on spatio-temporal feature fusion. This model comprises three key modules. A multi-level static feature key domain weighting module is designed to efficiently capture key information consistent with human visual characteristics. An optical flow motion feature extraction module is integrated to capture the motion dynamics within videos. Additionally, a spatio-temporal Transformer encoder module with temporal attention weighting is developed. By leveraging the multi-head attention mechanism, it models global spatial dependencies. The incorporated temporal attention weighting module enhances the temporal correlations between video frames, thereby improving the model’s ability to learn dependencies across different segments of long sequences. Experiment results on the public UGC-VQA datasets demonstrate that the proposed method surpasses most state-of-the-art approaches in terms of SROCC and PLCC metrics. Moreover, Mean Opinion Score (MOS) evaluations exhibit excellent subjective consistency, validating the effectiveness of our proposed method.
{"title":"Spatio-temporal attention feature fusion: A video quality assessment method for User-Generated Content","authors":"Da Ai , Ting He , Mingyue Lu , Dianwei Wang , Ying Liu","doi":"10.1016/j.displa.2025.103259","DOIUrl":"10.1016/j.displa.2025.103259","url":null,"abstract":"<div><div>With the rapid proliferation of mobile Internet and social media, individual users have become significant contributors to video content creation. The quality of User-Generated Content (UGC) videos plays a crucial role in determining their dissemination effectiveness. Consequently, UGC quality assessment has emerged as one of the critical research topics in the field of image processing. To address the limitations of existing evaluation methods—such as inadequate detection of dynamic distortions, suboptimal spatio-temporal modeling, and degraded performance in evaluating long-sequence videos—we propose STAFF-Net, a UGC video quality assessment model based on spatio-temporal feature fusion. This model comprises three key modules. A multi-level static feature key domain weighting module is designed to efficiently capture key information consistent with human visual characteristics. An optical flow motion feature extraction module is integrated to capture the motion dynamics within videos. Additionally, a spatio-temporal Transformer encoder module with temporal attention weighting is developed. By leveraging the multi-head attention mechanism, it models global spatial dependencies. The incorporated temporal attention weighting module enhances the temporal correlations between video frames, thereby improving the model’s ability to learn dependencies across different segments of long sequences. Experiment results on the public UGC-VQA datasets demonstrate that the proposed method surpasses most state-of-the-art approaches in terms of SROCC and PLCC metrics. Moreover, Mean Opinion Score (MOS) evaluations exhibit excellent subjective consistency, validating the effectiveness of our proposed method.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103259"},"PeriodicalIF":3.4,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145319477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-14DOI: 10.1016/j.displa.2025.103253
Bingcheng Ke , Tzu-Yang Wang , Takaya Yuizono
This study investigates the effects of the visual presentation of surrounding avatars on users’ physical activity and perceived exertion in a virtual reality (VR) gym. Previous research has demonstrated that the presence and performance of others can significantly affect people’s exercise behavior; however, the specific effects of surrounding avatars’ exercise speed and body composition on users’ behavior and psychological experiences in the VR gym remain to be further explored. This study focused on two key visual representations of surrounding avatars: (1) exercise speed (fast and slow) and (2) body composition (normal weight and overweight). Participants cycled on a stationary bike in a VR gym while observing surrounding avatars exercising on a treadmill, during which their pedaling frequency, heart rate (HR), electromyography (EMG), perceived exertion, and self-perceived fitness were measured. The results showed that a faster exercise speed of surrounding avatars significantly increased users’ pedaling frequency, and overweight avatars enhanced users’ positive self-perception of fitness compared to normal-weight avatars. Furthermore, an interaction effect was observed: under fast exercise conditions, overweight avatars elicited higher heart rate and EMG values. Notably, the changes in pedaling frequency, EMG, and perceived exertion persisted even after the avatars left the VR gym.
{"title":"The effect of surrounding avatars’ speed and body composition on users’ physical activity and exertion perception in VR GYM","authors":"Bingcheng Ke , Tzu-Yang Wang , Takaya Yuizono","doi":"10.1016/j.displa.2025.103253","DOIUrl":"10.1016/j.displa.2025.103253","url":null,"abstract":"<div><div>This study investigates the effects of the visual presentation of surrounding avatars on users’ physical activity and perceived exertion in a virtual reality (VR) gym. Previous research has demonstrated that the presence and performance of others can significantly affect people’s exercise behavior; however, the specific effects of surrounding avatars’ exercise speed and body composition on users’ behavior and psychological experiences in the VR gym remain to be further explored. This study focused on two key visual representations of surrounding avatars: (1) exercise speed (fast and slow) and (2) body composition (normal weight and overweight). Participants cycled on a stationary bike in a VR gym while observing surrounding avatars exercising on a treadmill, during which their pedaling frequency, heart rate (HR), electromyography (EMG), perceived exertion, and self-perceived fitness were measured. The results showed that a faster exercise speed of surrounding avatars significantly increased users’ pedaling frequency, and overweight avatars enhanced users’ positive self-perception of fitness compared to normal-weight avatars. Furthermore, an interaction effect was observed: under fast exercise conditions, overweight avatars elicited higher heart rate and EMG values. Notably, the changes in pedaling frequency, EMG, and perceived exertion persisted even after the avatars left the VR gym.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103253"},"PeriodicalIF":3.4,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145319480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-13DOI: 10.1016/j.displa.2025.103254
Tiantian Chen, Moke Li, Zhangfan Shen
Despite numerous studies have investigated the impact of data visualization design on usability in the past, few have considered the influence of external environmental factors. This study examined the combined effects of vibration intensity, graph design, and time pressure on users’ accuracy and speed in data recognition tasks. Firstly, vibration data were collected based on three typical real-world road conditions to establish the vibration intensity parameters for the experiment. Secondly, twelve types of graphs in different graphic forms and scale precision were adopted in tasks through an extensive survey of typical types of data visualizations. Finally, thirty-three participants were asked to perform a data recognition task, where they completed six repetitions of data recognition for the 12 graph materials under varying vibration intensities and time pressures. The results suggested that vibration intensity negatively impacted data reading performance, with higher vibration intensity leading to decreased performance. The main effect of the graphic form also reached statistical significance. Specifically, the recognition accuracy was highest for horizontal bar graphs in a vibrating environment, and the recognition speed was fastest for semicircle graphs. The precision of perception played an important role in improving accuracy but came at the cost of slower recognition time, and the advantage of high precision diminished as vibration intensity increased. Furthermore, a strong interaction effect was observed between graphic form and time pressure, which indicated that reduced time pressure led to enhanced accuracy of circular graphs and narrowed the gap among horizontal bar, circular, and semicircle graphs. The findings provide practical guidelines for the design of data visualization.
{"title":"The influence of data chart encoding form on reading performance under vibration environment","authors":"Tiantian Chen, Moke Li, Zhangfan Shen","doi":"10.1016/j.displa.2025.103254","DOIUrl":"10.1016/j.displa.2025.103254","url":null,"abstract":"<div><div>Despite numerous studies have investigated the impact of data visualization design on usability in the past, few have considered the influence of external environmental factors. This study examined the combined effects of vibration intensity, graph design, and time pressure on users’ accuracy and speed in data recognition tasks. Firstly, vibration data were collected based on three typical real-world road conditions to establish the vibration intensity parameters for the experiment. Secondly, twelve types of graphs in different graphic forms and scale precision were adopted in tasks through an extensive survey of typical types of data visualizations. Finally, thirty-three participants were asked to perform a data recognition task, where they completed six repetitions of data recognition for the 12 graph materials under varying vibration intensities and time pressures. The results suggested that vibration intensity negatively impacted data reading performance, with higher vibration intensity leading to decreased performance. The main effect of the graphic form also reached statistical significance. Specifically, the recognition accuracy was highest for horizontal bar graphs in a vibrating environment, and the recognition speed was fastest for semicircle graphs. The precision of perception played an important role in improving accuracy but came at the cost of slower recognition time, and the advantage of high precision diminished as vibration intensity increased. Furthermore, a strong interaction effect was observed between graphic form and time pressure, which indicated that reduced time pressure led to enhanced accuracy of circular graphs and narrowed the gap among horizontal bar, circular, and semicircle graphs. The findings provide practical guidelines for the design of data visualization.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103254"},"PeriodicalIF":3.4,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145319479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-10DOI: 10.1016/j.displa.2025.103250
Zhou Yu , Li Xue , Weidong Xu , Zhisong Pan , Qi Jia , Yawen Liu , Ling Li , Xin Yang , Bentian Hao
The complexity of backgrounds at target configuration points significantly affects camouflage effectiveness. This study presents a method to characterize background complexity based on the Composite Multiscale Entropy (CMSE) of electroencephalography (EEG) signals. Background images with different vegetation coverage levels (10%, 30%, 50%, 70%, and 90%) and two vegetation distribution patterns (concentrated and dispersed) served as visual stimuli. Thirty-five participants took part in the experiment, and their EEG signals were recorded while viewing the images. CMSE values were computed from the collected EEG data. Statistical analysis indicated that CMSE values at the O1 and Oz channels reached a saturation point when vegetation coverage approached 70%, while the increase at the O2 channel slowed considerably beyond 50% coverage. Dispersed vegetation patterns yielded higher entropy values than concentrated patterns. The findings suggest that, within the scope of the present experimental conditions, vegetation coverage of around 70% at the target configuration point is more conducive to camouflage. Under constrained circumstances, backgrounds with vegetation coverage exceeding 50% are advisable. Furthermore, dispersed vegetation increases visual confusion, thereby enhancing camouflage effectiveness. These results provide useful guidance for selecting suitable target backgrounds, although their applicability to other environmental contexts requires further investigation.
{"title":"Study of background complexity based on composite multiscale entropy of EEG signals","authors":"Zhou Yu , Li Xue , Weidong Xu , Zhisong Pan , Qi Jia , Yawen Liu , Ling Li , Xin Yang , Bentian Hao","doi":"10.1016/j.displa.2025.103250","DOIUrl":"10.1016/j.displa.2025.103250","url":null,"abstract":"<div><div>The complexity of backgrounds at target configuration points significantly affects camouflage effectiveness. This study presents a method to characterize background complexity based on the Composite Multiscale Entropy (CMSE) of electroencephalography (EEG) signals. Background images with different vegetation coverage levels (10%, 30%, 50%, 70%, and 90%) and two vegetation distribution patterns (concentrated and dispersed) served as visual stimuli. Thirty-five participants took part in the experiment, and their EEG signals were recorded while viewing the images. CMSE values were computed from the collected EEG data. Statistical analysis indicated that CMSE values at the O1 and Oz channels reached a saturation point when vegetation coverage approached 70%, while the increase at the O2 channel slowed considerably beyond 50% coverage. Dispersed vegetation patterns yielded higher entropy values than concentrated patterns. The findings suggest that, within the scope of the present experimental conditions, vegetation coverage of around 70% at the target configuration point is more conducive to camouflage. Under constrained circumstances, backgrounds with vegetation coverage exceeding 50% are advisable. Furthermore, dispersed vegetation increases visual confusion, thereby enhancing camouflage effectiveness. These results provide useful guidance for selecting suitable target backgrounds, although their applicability to other environmental contexts requires further investigation.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103250"},"PeriodicalIF":3.4,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145319478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}