Pub Date : 2025-01-19DOI: 10.1016/j.displa.2025.102977
Haonan Fang , Yaoguang Hu , Shanguang Chen , Xiaonan Yang , Yan Zhao , Hongwei Niu , Chenfei Cai
Human-machine interfaces (HMIs) of teleoperation have become the primary method for astronauts to perceive and understand the unknown environment, as well as to accurately control remote mechanical devices to perform long-distance tasks in space. However, the spatial relationship conversion between the 2D interface information and dynamic objects in 3D space brings significant challenges to human spatial abilities, which may limit teleoperation performance. Furthermore, current designs of teleoperation HMIs ignore individual differences, failing to achieve human-centered adaptive adjustments. This study investigated the impact of HMIs designs (control and display interface) and spatial abilities (mental rotation ability and perspective-taking ability) on teleoperation cognitive load and task performance. We designed spatial manipulator teleoperation experiments using four HMIs with different control (buttons/joysticks) and display (graphical/numerical) modes. Results indicated that variations in spatial abilities directly affected the change of cognitive load during teleoperation. Furthermore, providing different display information and control modes for different spatial abilities effectively enhanced task performance. For operators with low perspective-taking ability, numerical information for display tended to improve operational efficiency, whereas for operators with low mental rotation ability, button interfaces were more helpful in reducing error rates. These findings underscore the importance of assessing operators’ cognitive load in supporting adaptive design of teleoperation HMIs based on spatial abilities.
{"title":"Effects of interface design and spatial ability on teleoperation cognitive load and task performance","authors":"Haonan Fang , Yaoguang Hu , Shanguang Chen , Xiaonan Yang , Yan Zhao , Hongwei Niu , Chenfei Cai","doi":"10.1016/j.displa.2025.102977","DOIUrl":"10.1016/j.displa.2025.102977","url":null,"abstract":"<div><div>Human-machine interfaces (HMIs) of teleoperation have become the primary method for astronauts to perceive and understand the unknown environment, as well as to accurately control remote mechanical devices to perform long-distance tasks in space. However, the spatial relationship conversion between the 2D interface information and dynamic objects in 3D space brings significant challenges to human spatial abilities, which may limit teleoperation performance. Furthermore, current designs of teleoperation HMIs ignore individual differences, failing to achieve human-centered adaptive adjustments. This study investigated the impact of HMIs designs (control and display interface) and spatial abilities (mental rotation ability and perspective-taking ability) on teleoperation cognitive load and task performance. We designed spatial manipulator teleoperation experiments using four HMIs with different control (buttons/joysticks) and display (graphical/numerical) modes. Results indicated that variations in spatial abilities directly affected the change of cognitive load during teleoperation. Furthermore, providing different display information and control modes for different spatial abilities effectively enhanced task performance. For operators with low perspective-taking ability, numerical information for display tended to improve operational efficiency, whereas for operators with low mental rotation ability, button interfaces were more helpful in reducing error rates. These findings underscore the importance of assessing operators’ cognitive load in supporting adaptive design of teleoperation HMIs based on spatial abilities.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102977"},"PeriodicalIF":3.7,"publicationDate":"2025-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143163114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-18DOI: 10.1016/j.displa.2025.102967
Yipeng Zhou, Huaming Qian
Object detectors are trained on routine datasets that are primarily obtained under suitable conditions, yet will encounter various extreme environments in the complex real-world. Distribution shift in the train and test datasets poses serious damage to the performance of models, the most cost-effective means of solving this problem is unsupervised domain adaptive (UDA) method. In this work, we use YOLOv8 as underlying detector to construct a domain adaptive framework called YOLO-SDCoN, which offers a new solution paradigm for the domain shift problem. Specifically, we propose an Synergistic Domain Classifier (SDC) with richer gradient flow, which takes all the multi-scale features used for detection as inputs, providing a more adequate way to generate domain-invariant features while eliminating the gradient vanishing phenomenon. Furthermore, a novel Batch-Instance Co-Normalization (BI-CoN) method is proposed, which enables adaptive selection and preservation of image styles under the implicit guidance of a domain classifier, thereby generating better domain-invariant features to enhance the robustness of cross-domain detection. We conducted extensive experiments on KITTI, Cityscapes, Foggy Cityscapes, and SIM10K datasets. The results show that the proposed YOLO-SDCoN is comprehensively superior to the Faster R-CNN based domain adaptive frameworks, and achieves superior results compared to other methods.
{"title":"Domain adaptive YOLO based on image style selection and synergistic domain classifier","authors":"Yipeng Zhou, Huaming Qian","doi":"10.1016/j.displa.2025.102967","DOIUrl":"10.1016/j.displa.2025.102967","url":null,"abstract":"<div><div>Object detectors are trained on routine datasets that are primarily obtained under suitable conditions, yet will encounter various extreme environments in the complex real-world. Distribution shift in the train and test datasets poses serious damage to the performance of models, the most cost-effective means of solving this problem is unsupervised domain adaptive (UDA) method. In this work, we use YOLOv8 as underlying detector to construct a domain adaptive framework called YOLO-SDCoN, which offers a new solution paradigm for the domain shift problem. Specifically, we propose an Synergistic Domain Classifier (SDC) with richer gradient flow, which takes all the multi-scale features used for detection as inputs, providing a more adequate way to generate domain-invariant features while eliminating the gradient vanishing phenomenon. Furthermore, a novel Batch-Instance Co-Normalization (BI-CoN) method is proposed, which enables adaptive selection and preservation of image styles under the implicit guidance of a domain classifier, thereby generating better domain-invariant features to enhance the robustness of cross-domain detection. We conducted extensive experiments on KITTI, Cityscapes, Foggy Cityscapes, and SIM10K datasets. The results show that the proposed YOLO-SDCoN is comprehensively superior to the Faster R-CNN based domain adaptive frameworks, and achieves superior results compared to other methods.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102967"},"PeriodicalIF":3.7,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143162396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-18DOI: 10.1016/j.displa.2025.102978
Ting Lei, Leshan Wang, Jixiang Chen
Gaze depth estimation is crucial in eye-based selection with 3D target occlusion. Previous work has done some research on gaze depth estimation, but the accuracy is limited. In this work, we propose a new method based on VOR (Vestibulo-Ocular Reflex), which is based on the fact that VOR helps us stabilize gaze on a target when the head moves by performing compensatory eye movement in the opposite direction, and the closer the target is, the more compensation is needed. In this work, we collected abundant head and eye data when performing VOR at different gaze depths, and explored the relationship between gaze depth and VOR motion through a simple user study. Then we designed a new temporal neural network GDENet (Gaze Depth Estimation Network), which adopts a new label representation combining classification and regression, and uses multiple supervision to predict the gaze depth from input VOR information. In the range of 0.2 m to 5 m, we achieve centimeter-level prediction accuracy (MAE = 0.145 m, MSE = 0.031 m). Finally, a user experiment indicates that our depth estimation method can be used in 3D target disambiguation and is suitable for various 3D scenarios.
{"title":"Gaze depth estimation using vestibulo-ocular reflex and GDENet for 3D target disambiguation","authors":"Ting Lei, Leshan Wang, Jixiang Chen","doi":"10.1016/j.displa.2025.102978","DOIUrl":"10.1016/j.displa.2025.102978","url":null,"abstract":"<div><div>Gaze depth estimation is crucial in eye-based selection with 3D target occlusion. Previous work has done some research on gaze depth estimation, but the accuracy is limited. In this work, we propose a new method based on VOR (Vestibulo-Ocular Reflex), which is based on the fact that VOR helps us stabilize gaze on a target when the head moves by performing compensatory eye movement in the opposite direction, and the closer the target is, the more compensation is needed. In this work, we collected abundant head and eye data when performing VOR at different gaze depths, and explored the relationship between gaze depth and VOR motion through a simple user study. Then we designed a new temporal neural network GDENet (Gaze Depth Estimation Network), which adopts a new label representation combining classification and regression, and uses multiple supervision to predict the gaze depth from input VOR information. In the range of 0.2 m to 5 m, we achieve centimeter-level prediction accuracy (MAE = 0.145 m, MSE = 0.031 m). Finally, a user experiment indicates that our depth estimation method can be used in 3D target disambiguation and is suitable for various 3D scenarios.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102978"},"PeriodicalIF":3.7,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143163115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-15DOI: 10.1016/j.displa.2025.102965
Ye-Hao Hou, Zhao-Song Li, Yi-Wei Zheng, Qian Huang, Yi-Long Li, Di Wang, Qiong-Hua Wang
The holographic display garners widespread attention for its ability to provide vivid and realistic 3D visuals. The Gerchberg-Saxton (GS) algorithm has played a significant role in computer-generated holography for many years. However, the conventional GS algorithm cannot meet the requirements of holographic display for high-quality and realistic visual effects. In this paper, a phase-only hologram generation method is proposed based on the complex amplitude constrained GS (CAC-GS) algorithm. In the proposed method, all information of the 3D scene is precalculated as a complex amplitude field at a target plane, and then the CAC-GS algorithm is used to reconstruct the target complex amplitude field and generate the hologram. The proposed method constrains both amplitude and phase on a target plane, thus significantly improving the image quality and efficiency of the hologram generation. Compared with the conventional GS algorithm, the holographic image quality of the proposed method is improved by approximately 50 %, and the calculation time is reduced by 97.27 %. This novel algorithm holds great promise for a wide range of applications including AR and VR.
{"title":"High-quality and efficient phase-only hologram generation method based on complex amplitude constrained Gerchberg-Saxton algorithm","authors":"Ye-Hao Hou, Zhao-Song Li, Yi-Wei Zheng, Qian Huang, Yi-Long Li, Di Wang, Qiong-Hua Wang","doi":"10.1016/j.displa.2025.102965","DOIUrl":"10.1016/j.displa.2025.102965","url":null,"abstract":"<div><div>The holographic display garners widespread attention for its ability to provide vivid and realistic 3D visuals. The Gerchberg-Saxton (GS) algorithm has played a significant role in computer-generated holography for many years. However, the conventional GS algorithm cannot meet the requirements of holographic display for high-quality and realistic visual effects. In this paper, a phase-only hologram generation method is proposed based on the complex amplitude constrained GS (CAC-GS) algorithm. In the proposed method, all information of the 3D scene is precalculated as a complex amplitude field at a target plane, and then the CAC-GS algorithm is used to reconstruct the target complex amplitude field and generate the hologram. The proposed method constrains both amplitude and phase on a target plane, thus significantly improving the image quality and efficiency of the hologram generation. Compared with the conventional GS algorithm, the holographic image quality of the proposed method is improved by approximately 50 %, and the calculation time is reduced by 97.27 %. This novel algorithm holds great promise for a wide range of applications including AR and VR.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102965"},"PeriodicalIF":3.7,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143162399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-13DOI: 10.1016/j.displa.2025.102971
Yuanrong Zhang , Yiming Zhao , Mengxin Wang , Yunyun Dong , Bingqian Yang , Yifeng Gong , Xiufang Feng
Background
Existing U-shaped models have shown significant potential in medical image segmentation. However, their performance is limited due to the constrained receptive field and lack of global reasoning capability in standard U-shaped structures. This paper aims to develop a module for U-shaped structures to enhance feature discrimination and improve segmentation accuracy.
Methods
This paper proposes a medical image segmentation network based on the U-shaped structure under the contrastive learning framework to achieve accurate segmentation of medical lesion areas. Initially, feature maps are extracted using the encoder of the U-shaped structure and mapped into a two-dimensional graph structure. We then propose a Sparse Dual Graph Mapping (SDGM) method to adaptively sparsify the graph structure, creating multiple sparse graph structures with different node attributes and topologies. Node-level and graph-level contrastive learning are defined using different judgments of positive and negative samples within the graph. Finally, supervised and unsupervised losses are aggregated to enhance the model’s discrimination ability, resulting in the final segmentation mask.
Main Results
Experimental results demonstrate that the proposed SGC module is applicable to various U-shaped networks and outperforms existing techniques on multiple datasets. It achieved 94.02% Dice on the honeycomb lung dataset, 91.78% Dice on the ACDC dataset, and 92.43% Dice on the polyp dataset, all showing state-of-the-art performance.
Significance
The proposed Sparse Graph Contrastive (SGC) module can be applied to any U-shaped structure to enhance its performance. This method maintains high correlation and consistency between automatic segmentation results and expert manual segmentation results. It significantly improves the segmentation performance of lesion areas in medical images, assisting doctors in early screening, accurate diagnosis, and adaptive treatment, with important clinical relevance in medical imaging-assisted diagnosis.
{"title":"A medical image segmentation method based on adaptive graph sparse algorithm under contrastive learning framework","authors":"Yuanrong Zhang , Yiming Zhao , Mengxin Wang , Yunyun Dong , Bingqian Yang , Yifeng Gong , Xiufang Feng","doi":"10.1016/j.displa.2025.102971","DOIUrl":"10.1016/j.displa.2025.102971","url":null,"abstract":"<div><h3>Background</h3><div>Existing U-shaped models have shown significant potential in medical image segmentation. However, their performance is limited due to the constrained receptive field and lack of global reasoning capability in standard U-shaped structures. This paper aims to develop a module for U-shaped structures to enhance feature discrimination and improve segmentation accuracy.</div></div><div><h3>Methods</h3><div>This paper proposes a medical image segmentation network based on the U-shaped structure under the contrastive learning framework to achieve accurate segmentation of medical lesion areas. Initially, feature maps are extracted using the encoder of the U-shaped structure and mapped into a two-dimensional graph structure. We then propose a Sparse Dual Graph Mapping (SDGM) method to adaptively sparsify the graph structure, creating multiple sparse graph structures with different node attributes and topologies. Node-level and graph-level contrastive learning are defined using different judgments of positive and negative samples within the graph. Finally, supervised and unsupervised losses are aggregated to enhance the model’s discrimination ability, resulting in the final segmentation mask.</div></div><div><h3>Main Results</h3><div>Experimental results demonstrate that the proposed SGC module is applicable to various U-shaped networks and outperforms existing techniques on multiple datasets. It achieved 94.02% Dice on the honeycomb lung dataset, 91.78% Dice on the ACDC dataset, and 92.43% Dice on the polyp dataset, all showing state-of-the-art performance.</div></div><div><h3>Significance</h3><div>The proposed Sparse Graph Contrastive (SGC) module can be applied to any U-shaped structure to enhance its performance. This method maintains high correlation and consistency between automatic segmentation results and expert manual segmentation results. It significantly improves the segmentation performance of lesion areas in medical images, assisting doctors in early screening, accurate diagnosis, and adaptive treatment, with important clinical relevance in medical imaging-assisted diagnosis.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102971"},"PeriodicalIF":3.7,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143163010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-13DOI: 10.1016/j.displa.2025.102969
Bin Zhao , Chengdong Wu , Lianjun Chang , Yang Jiang , Ruohuai Sun
In the process of intelligent manufacturing, collaborative robots have strict requirements in terms of safety, interaction, and flexibility. In order to solve the problem of flexible and smooth interaction of collaborative robots, this paper profoundly researches the zero-force control and collision detection method based on deep learning. First, for the zero-force control problem of collaborative robots, the complete kinetic equations of the three-time friction force model based on acceleration are established, and a genetic algorithm is used for multi-parameter identification of the friction force model. Second, for the problem of collision detection in demonstration reproduction, this paper proposes an enhanced sequence coding method based on the iTransformer network, which embeds the whole time series of each variable independently as a token by inverting the time series, to improve the generalization ability of the model. Meanwhile, considering local and global time series features, the CNN-iTransformer collision detection method combining CNN(convolutional neural network) and iTransformer network is constructed. The CNN-iTransformer can efficiently learn and retain the long-term dependencies in the input sequences, which solves the problem of inaccurate modeling of the schematic reproduction collision detection method. Finally, it is proved experimentally that the velocity-based cubic friction force model can better solve the zero-force control problem, and the CNN-iTransformer network can accurately detect the robot’s abnormal collision behavior without relying on the model.
{"title":"Research on Zero-Force control and collision detection of deep learning methods in collaborative robots","authors":"Bin Zhao , Chengdong Wu , Lianjun Chang , Yang Jiang , Ruohuai Sun","doi":"10.1016/j.displa.2025.102969","DOIUrl":"10.1016/j.displa.2025.102969","url":null,"abstract":"<div><div>In the process of intelligent manufacturing, collaborative robots have strict requirements in terms of safety, interaction, and flexibility. In order to solve the problem of flexible and smooth interaction of collaborative robots, this paper profoundly researches the zero-force control and collision detection method based on deep learning. First, for the zero-force control problem of collaborative robots, the complete kinetic equations of the three-time friction force model based on acceleration are established, and a genetic algorithm is used for multi-parameter identification of the friction force model. Second, for the problem of collision detection in demonstration reproduction, this paper proposes an enhanced sequence coding method based on the iTransformer network, which embeds the whole time series of each variable independently as a token by inverting the time series, to improve the generalization ability of the model. Meanwhile, considering local and global time series features, the CNN-iTransformer collision detection method combining CNN(convolutional neural network) and iTransformer network is constructed. The CNN-iTransformer can efficiently learn and retain the long-term dependencies in the input sequences, which solves the problem of inaccurate modeling of the schematic reproduction collision detection method. Finally, it is proved experimentally that the velocity-based cubic friction force model can better solve the zero-force control problem, and the CNN-iTransformer network can accurately detect the robot’s abnormal collision behavior without relying on the model.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102969"},"PeriodicalIF":3.7,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143163116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
<div><h3>Background</h3><div>Motor impairment of the upper limb (UL) post-stroke is prevalent, adversely affecting patients’ quality of life. Previous research has shown that constraint-induced movement therapy (CIMT) is effective in UL rehabilitation. However, CIMT’s rigorous regimen may hinder patient adherence, potentially affecting treatment efficacy. Immersive virtual reality (IVR) is an innovative approach for stroke rehabilitation. It utilizes VR technology to create dynamic environments and modify avatars efficiently, offering a less exhausting alternative to CIMT. We propose an IVR-based therapeutic approach that integrates positive reinforcement components to enhance motor coordination, offering an alternative to CIMT. This study aimed to evaluate the effect of incorporating positive reinforcement components into IVR-enhanced physical therapy (PT) on motor coordination.</div></div><div><h3>Method</h3><div>Eighteen stroke patients were randomly allocated to two groups: the intervention group (n = 10) received 30 ± 10 min/day of IVR therapy with PT, while the control group (n = 8) received PT alone. PT sessions, lasting 40 ± 10 min/day, were conducted on the ward in accordance with national guidelines. The mean number of sessions across all participants was 6.6, with a standard deviation of 2.98. Session frequency was tailored to individual hospital stays, adjusted due to pandemic-related early discharge protocols. For participants with stroke who received IVR (intervention group), the task involved reaching for 35 targets randomly distributed across seven different locations in the VR environment. The number of movement repetitions varied, depending on their ability to repeat the task and the length of stay in the stroke unit. The movement of the virtual image of the UL was reinforced by visual feedback to the participants, that is, the participants perceived their motor coordination as if their image of the UL was moving to a greater speed than the real UL monitored real-time while the participants were trying to reach a target. The primary outcome measure was investigated by the Fugl-Meyer assessment (FMA) scale for the affected UL, with secondary measures including a kinematic dataset (e.g., time to target) and a questionnaire assessing participant perception and achievement during therapy.</div></div><div><h3>Results</h3><div>The IVR group exhibited significant improvements in FMA scores (P = 0.02) between the first and fifth session, signifying a substantial recovery of UL motor function, with the fifth session showing higher scores. The time to target in the last session reduced compared with that in the first session, suggesting motor learning and recovery (P = 0.03). The patients were highly engaged and motivated during the sessions because they felt like they were in charge of controlling the virtual image of their upper body.</div></div><div><h3>Conclusions</h3><div>The results suggest that positive reinforcement within the IVR
{"title":"Immersive virtual reality enhanced reinforcement induced physical therapy (EVEREST)","authors":"Samirah Altukhaim , Naoko Sakabe , Kirubananthan Nagaratnam , Neelima Mannava , Toshiyuki Kondo , Yoshikatsu Hayashi","doi":"10.1016/j.displa.2024.102962","DOIUrl":"10.1016/j.displa.2024.102962","url":null,"abstract":"<div><h3>Background</h3><div>Motor impairment of the upper limb (UL) post-stroke is prevalent, adversely affecting patients’ quality of life. Previous research has shown that constraint-induced movement therapy (CIMT) is effective in UL rehabilitation. However, CIMT’s rigorous regimen may hinder patient adherence, potentially affecting treatment efficacy. Immersive virtual reality (IVR) is an innovative approach for stroke rehabilitation. It utilizes VR technology to create dynamic environments and modify avatars efficiently, offering a less exhausting alternative to CIMT. We propose an IVR-based therapeutic approach that integrates positive reinforcement components to enhance motor coordination, offering an alternative to CIMT. This study aimed to evaluate the effect of incorporating positive reinforcement components into IVR-enhanced physical therapy (PT) on motor coordination.</div></div><div><h3>Method</h3><div>Eighteen stroke patients were randomly allocated to two groups: the intervention group (n = 10) received 30 ± 10 min/day of IVR therapy with PT, while the control group (n = 8) received PT alone. PT sessions, lasting 40 ± 10 min/day, were conducted on the ward in accordance with national guidelines. The mean number of sessions across all participants was 6.6, with a standard deviation of 2.98. Session frequency was tailored to individual hospital stays, adjusted due to pandemic-related early discharge protocols. For participants with stroke who received IVR (intervention group), the task involved reaching for 35 targets randomly distributed across seven different locations in the VR environment. The number of movement repetitions varied, depending on their ability to repeat the task and the length of stay in the stroke unit. The movement of the virtual image of the UL was reinforced by visual feedback to the participants, that is, the participants perceived their motor coordination as if their image of the UL was moving to a greater speed than the real UL monitored real-time while the participants were trying to reach a target. The primary outcome measure was investigated by the Fugl-Meyer assessment (FMA) scale for the affected UL, with secondary measures including a kinematic dataset (e.g., time to target) and a questionnaire assessing participant perception and achievement during therapy.</div></div><div><h3>Results</h3><div>The IVR group exhibited significant improvements in FMA scores (P = 0.02) between the first and fifth session, signifying a substantial recovery of UL motor function, with the fifth session showing higher scores. The time to target in the last session reduced compared with that in the first session, suggesting motor learning and recovery (P = 0.03). The patients were highly engaged and motivated during the sessions because they felt like they were in charge of controlling the virtual image of their upper body.</div></div><div><h3>Conclusions</h3><div>The results suggest that positive reinforcement within the IVR","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102962"},"PeriodicalIF":3.7,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143162635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-10DOI: 10.1016/j.displa.2025.102964
Xinyan Yang , Fei Hu , Shaofei Liu , Long Ye , Ye Wang , Guanghua Zhu , Jiyin Li
As the amount of 3D scene data increases, plausibility quality assessment methods are urgently needed. The existing 3D scene assessment methods usually focus on visual but not semantical reasonability. The amounts and categories of the open-source 3D indoor scene data are still inadequate for training fully labeled learning assessment methods. In this paper, we build a minority category of 3D indoor scene assessment dataset 3D-SPAD-MI to extend the previous majority 3D-SPAD dataset. And expanding application scope and improving performance of the previous method 3D scene plausibility assessment network(3D-SPAN) by multimodality model(3D-SPAN-M) and few-shot learning(3D-SPAN-F). 3D-SPAN-M considers vision and semantics in 3D indoor scenes via fusing image and scene graph features. 3D-SPAN-F introduces multi-task meta-learning with prototypical networks into the 3D-SPAN so that it could evaluate more different categories of 3D indoor scenes. The comparison and ablation experiments verify performance improvement and generalization of our method.
{"title":"3D indoor scene assessment via layout plausibility","authors":"Xinyan Yang , Fei Hu , Shaofei Liu , Long Ye , Ye Wang , Guanghua Zhu , Jiyin Li","doi":"10.1016/j.displa.2025.102964","DOIUrl":"10.1016/j.displa.2025.102964","url":null,"abstract":"<div><div>As the amount of 3D scene data increases, plausibility quality assessment methods are urgently needed. The existing 3D scene assessment methods usually focus on visual but not semantical reasonability. The amounts and categories of the open-source 3D indoor scene data are still inadequate for training fully labeled learning assessment methods. In this paper, we build a minority category of 3D indoor scene assessment dataset 3D-SPAD-MI to extend the previous majority 3D-SPAD dataset. And expanding application scope and improving performance of the previous method 3D scene plausibility assessment network(3D-SPAN) by multimodality model(3D-SPAN-M) and few-shot learning(3D-SPAN-F). 3D-SPAN-M considers vision and semantics in 3D indoor scenes via fusing image and scene graph features. 3D-SPAN-F introduces multi-task meta-learning with prototypical networks into the 3D-SPAN so that it could evaluate more different categories of 3D indoor scenes. The comparison and ablation experiments verify performance improvement and generalization of our method.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102964"},"PeriodicalIF":3.7,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143162398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-06DOI: 10.1016/j.displa.2024.102955
Lei Wang, Qingbo Wu, Fanman Meng, Zhengning Wang, Chenhao Wu, Haoran Wei, King Ngi Ngan
Blind image quality assessment (BIQA) aims to quantitatively predict the subjective perception of the distorted image without accessing its corresponding clean version. Prevailing methods typically model BIQA as a regression task and strive to minimize the average prediction error in terms of the pointwise unstructured loss, such as Mean Square Error (MSE) or Mean Absolute Error (MAE), which ignores the perception toward the rank orders and perceptual differences between different images. This paper proposes a Scoring Structure regularized Gradient Boosting Network (SSGB-Net) to achieve a more comprehensive perception across all distorted images. More specifically, our SSGB-Net performs BIQA in three stages, pair-wise rectification and list-wise boosting, followed by point-wise prediction after linear transformation. First, we correct the initial scores by incorporating the structured pairwise loss, i.e., SoftRank, to preserve the perceptual rank orders of pairwise images. Then, we further boost the previous pairwise correction results with structured listwise loss, i.e., Norm-in-Norm, to maintain the perceptual difference across all images. Finally, the point-wise prediction measures the MSE between the transformed scores and the ground truth through a closed-form solution of the Exponential Moving Average (EMA) driven linear transformation. Based on these iterative corrections, our SSGB-Net can effectively balance multiple BIQA objectives and outperform many state-of-the-art methods in terms of Pearson Linear Correlation Coefficient (PLCC), Spearman Rank Correlation Coefficient (SRCC) and Root Mean Squared Error (RMSE).
{"title":"Scoring structure regularized gradient boosting network for blind image quality assessment","authors":"Lei Wang, Qingbo Wu, Fanman Meng, Zhengning Wang, Chenhao Wu, Haoran Wei, King Ngi Ngan","doi":"10.1016/j.displa.2024.102955","DOIUrl":"10.1016/j.displa.2024.102955","url":null,"abstract":"<div><div>Blind image quality assessment (BIQA) aims to quantitatively predict the subjective perception of the distorted image without accessing its corresponding clean version. Prevailing methods typically model BIQA as a regression task and strive to minimize the average prediction error in terms of the pointwise unstructured loss, such as Mean Square Error (MSE) or Mean Absolute Error (MAE), which ignores the perception toward the rank orders and perceptual differences between different images. This paper proposes a Scoring Structure regularized Gradient Boosting Network (SSGB-Net) to achieve a more comprehensive perception across all distorted images. More specifically, our SSGB-Net performs BIQA in three stages, pair-wise rectification and list-wise boosting, followed by point-wise prediction after linear transformation. First, we correct the initial scores by incorporating the structured pairwise loss, i.e., SoftRank, to preserve the perceptual rank orders of pairwise images. Then, we further boost the previous pairwise correction results with structured listwise loss, i.e., Norm-in-Norm, to maintain the perceptual difference across all images. Finally, the point-wise prediction measures the MSE between the transformed scores and the ground truth through a closed-form solution of the Exponential Moving Average (EMA) driven linear transformation. Based on these iterative corrections, our SSGB-Net can effectively balance multiple BIQA objectives and outperform many state-of-the-art methods in terms of Pearson Linear Correlation Coefficient (PLCC), Spearman Rank Correlation Coefficient (SRCC) and Root Mean Squared Error (RMSE).</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102955"},"PeriodicalIF":3.7,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143163100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-04DOI: 10.1016/j.displa.2024.102960
Can Zhang , Feipeng Da , Shaoyan Gai
In this paper, we introduce 3DVG-Deformable-Attention Transformer (3DVG-DT), a novel framework designed to address the challenge of imprecise target object localization in 3D Visual Grounding (3DVG) due to point cloud sparsity. By integrating Deformable Attention Transformer (DAT) and Geometry Affine Transformation (GAT), 3DVG-DT effectively mitigates the effects of point cloud sparsity and irregularity, significantly improving 3DVG accuracy. We propose a Dual-Mode Feature Fusion (DMF) module for object detection and matching within complex point clouds, while a Description-aware Keypoint Affine Transformation Sampling (DKAS) strategy further enhances performance. Leveraging DeBERTa-V3 for language encoding, we demonstrate the effectiveness of 3DVG-DT on ScanRefer and Referit3D datasets, showcasing improved target detection capabilities under sparse point cloud conditions. Experimental results reveal substantial gains over existing methods, particularly in handling sparse point clouds.
{"title":"Enhancing 3D Visual Grounding with Deformable Attention Transformer and Geometry Affine Transformation: Overcoming sparsity challenges","authors":"Can Zhang , Feipeng Da , Shaoyan Gai","doi":"10.1016/j.displa.2024.102960","DOIUrl":"10.1016/j.displa.2024.102960","url":null,"abstract":"<div><div>In this paper, we introduce 3DVG-Deformable-Attention Transformer (3DVG-DT), a novel framework designed to address the challenge of imprecise target object localization in 3D Visual Grounding (3DVG) due to point cloud sparsity. By integrating Deformable Attention Transformer (DAT) and Geometry Affine Transformation (GAT), 3DVG-DT effectively mitigates the effects of point cloud sparsity and irregularity, significantly improving 3DVG accuracy. We propose a Dual-Mode Feature Fusion (DMF) module for object detection and matching within complex point clouds, while a Description-aware Keypoint Affine Transformation Sampling (DKAS) strategy further enhances performance. Leveraging DeBERTa-V3 for language encoding, we demonstrate the effectiveness of 3DVG-DT on ScanRefer and Referit3D datasets, showcasing improved target detection capabilities under sparse point cloud conditions. Experimental results reveal substantial gains over existing methods, particularly in handling sparse point clouds.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102960"},"PeriodicalIF":3.7,"publicationDate":"2025-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143163094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}