Pub Date : 2024-11-16DOI: 10.1016/j.displa.2024.102876
Fangmei Chen, Chen Wang, Xingchen Yao, Fuming Sun
Chinese calligraphy font generation is an extremely challenging problem. Firstly, Chinese calligraphy fonts have complex structures. The accuracy and artistic quality of the generated fonts will be affected by the order and layout of the strokes as well as the relationships between them. Secondly, the number of Chinese characters is large, but existing calligraphy works are scarce. Hence, it is difficult to establish a comprehensive and high-quality Chinese calligraphy dataset. In this paper, we propose an unsupervised calligraphy font generation network SPFont. It is based on a generative adversarial network (GAN) framework. The generator includes a style feature encoder, a content feature encoder, a stroke potential feature fusion module (SPFM) and a decoder. The SPFM module, by overlaying lower-level style and content features, better preserves fine details of the font such as stroke thickness, curve shapes and other characteristics. The SPFM module and the extracted style features are fused and then fed into the decoder, allowing it to consider the influence of style, content and stroke potential simultaneously during the generation process. Experimental results demonstrate that our model generates Chinese calligraphy fonts with higher quality compared to previous methods.
{"title":"SPFont: Stroke potential features embedded GAN for Chinese calligraphy font generation","authors":"Fangmei Chen, Chen Wang, Xingchen Yao, Fuming Sun","doi":"10.1016/j.displa.2024.102876","DOIUrl":"10.1016/j.displa.2024.102876","url":null,"abstract":"<div><div>Chinese calligraphy font generation is an extremely challenging problem. Firstly, Chinese calligraphy fonts have complex structures. The accuracy and artistic quality of the generated fonts will be affected by the order and layout of the strokes as well as the relationships between them. Secondly, the number of Chinese characters is large, but existing calligraphy works are scarce. Hence, it is difficult to establish a comprehensive and high-quality Chinese calligraphy dataset. In this paper, we propose an unsupervised calligraphy font generation network SPFont. It is based on a generative adversarial network (GAN) framework. The generator includes a style feature encoder, a content feature encoder, a stroke potential feature fusion module (SPFM) and a decoder. The SPFM module, by overlaying lower-level style and content features, better preserves fine details of the font such as stroke thickness, curve shapes and other characteristics. The SPFM module and the extracted style features are fused and then fed into the decoder, allowing it to consider the influence of style, content and stroke potential simultaneously during the generation process. Experimental results demonstrate that our model generates Chinese calligraphy fonts with higher quality compared to previous methods.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102876"},"PeriodicalIF":3.7,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142706429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-16DOI: 10.1016/j.displa.2024.102884
Pengyi Hao , Cunqi Wu , Cong Bai
Extractive summarization aims to select important sentences from the document to generate a summary. However, current extractive document summarization methods fail to fully consider the semantic information among sentences and the various relations in the entire document. Therefore, a novel end-to-end framework named hierarchical heterogeneous graph learning for document summarization (HHGraphSum) is proposed in this paper. In this framework, a hierarchical heterogeneous graph is constructed for the whole document, where the representation of sentences is learnt by several levels of graph neural network. The combination of single-direction message passing and bidirectional message passing helps graph learning obtain effective relations among sentences and words. For capturing the rich semantic information, space–time collaborative learning is designed to generate the primary features of sentences which are enhanced in graph learning. For generating a less redundant and more precise summary, a LSTM based predictor and a blocking strategy are explored. Evaluations both on a single-document dataset and a multi-document dataset demonstrate the effectiveness of the HHGraphSum. The code of HHGraphSum is available on Github:https://github.com/Devin100086/HHGraphSum.
{"title":"HHGraphSum: Hierarchical heterogeneous graph learning for extractive document summarization","authors":"Pengyi Hao , Cunqi Wu , Cong Bai","doi":"10.1016/j.displa.2024.102884","DOIUrl":"10.1016/j.displa.2024.102884","url":null,"abstract":"<div><div>Extractive summarization aims to select important sentences from the document to generate a summary. However, current extractive document summarization methods fail to fully consider the semantic information among sentences and the various relations in the entire document. Therefore, a novel end-to-end framework named hierarchical heterogeneous graph learning for document summarization (HHGraphSum) is proposed in this paper. In this framework, a hierarchical heterogeneous graph is constructed for the whole document, where the representation of sentences is learnt by several levels of graph neural network. The combination of single-direction message passing and bidirectional message passing helps graph learning obtain effective relations among sentences and words. For capturing the rich semantic information, space–time collaborative learning is designed to generate the primary features of sentences which are enhanced in graph learning. For generating a less redundant and more precise summary, a LSTM based predictor and a blocking strategy are explored. Evaluations both on a single-document dataset and a multi-document dataset demonstrate the effectiveness of the HHGraphSum. The code of HHGraphSum is available on Github:<span><span>https://github.com/Devin100086/HHGraphSum</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"86 ","pages":"Article 102884"},"PeriodicalIF":3.7,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142706219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-16DOI: 10.1016/j.displa.2024.102871
Chengcheng Liu , Huikai Shao , Dexing Zhong
Palmprint anti-spoofing is essential for securing palmprint recognition systems. Although some anti-spoofing methods excel on closed datasets, their ability to generalize across unknown domains is often limited. This paper introduces the Domain-Adaptive Palmprint Anti-Spoofing Network (DAPANet), which leverages multiple known spoofing domains to extract domain-invariant spoofing clues from unlabeled domains. DAPANet tackles the domain adaptation challenge using three strategies: global domain alignment, subdomain alignment, and the separation of distinct subdomains. The framework consists of a public feature extraction module, a domain adaptation module, a domain classifier, and a fusion classifier. Initially, the public feature extraction module extracts palmprint features. Subsequently, the domain adaptation module aligns target domain features with source domain features to generate domain-specific outputs. The domain classifier provides initial classifiable features, which are then integrated by DAPANet, employing a unified fusion classifier for decision-making. Comprehensive experiments conducted on XJTU-PalmReplay database across various cross-domain scenarios confirm the efficacy of the proposed method.
{"title":"Learning domain-adaptive palmprint anti-spoofing feature from multi-source domains","authors":"Chengcheng Liu , Huikai Shao , Dexing Zhong","doi":"10.1016/j.displa.2024.102871","DOIUrl":"10.1016/j.displa.2024.102871","url":null,"abstract":"<div><div>Palmprint anti-spoofing is essential for securing palmprint recognition systems. Although some anti-spoofing methods excel on closed datasets, their ability to generalize across unknown domains is often limited. This paper introduces the Domain-Adaptive Palmprint Anti-Spoofing Network (DAPANet), which leverages multiple known spoofing domains to extract domain-invariant spoofing clues from unlabeled domains. DAPANet tackles the domain adaptation challenge using three strategies: global domain alignment, subdomain alignment, and the separation of distinct subdomains. The framework consists of a public feature extraction module, a domain adaptation module, a domain classifier, and a fusion classifier. Initially, the public feature extraction module extracts palmprint features. Subsequently, the domain adaptation module aligns target domain features with source domain features to generate domain-specific outputs. The domain classifier provides initial classifiable features, which are then integrated by DAPANet, employing a unified fusion classifier for decision-making. Comprehensive experiments conducted on XJTU-PalmReplay database across various cross-domain scenarios confirm the efficacy of the proposed method.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"86 ","pages":"Article 102871"},"PeriodicalIF":3.7,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142706220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-15DOI: 10.1016/j.displa.2024.102883
Qipei Li, Da Pan, Zefeng Ying, Qirong Liang, Ping Shi
Existing image super-resolution (SR) methods often lead to oversharpening, particularly in defocused images. However, we have observed that defocused regions and focused regions present different levels of recovery difficulty. This observation opens up opportunities for more efficient enhancements. In this paper, we introduce DefocusSR2, an efficient framework designed for super-resolution of defocused images. DefocusSR2 consists of two main modules: Depth-Guided Segmentation (DGS) and Defocus-Aware Classify Enhance (DCE). In the DGS module, we utilize MobileSAM, guided by depth information, to accurately segment the input image and generate defocus maps. These maps provide detailed information about the locations of defocused areas. In the DCE module, we crop the defocus map and classify the segments into defocused and focused patches based on a predefined threshold. Through knowledge distillation and the fusion of blur kernel matching, the network retains the fuzzy kernel to reduce computational load. Practically, the defocused patches are fed into the Efficient Blur Match SR Network (EBM-SR), where the blur kernel is preserved to alleviate computational demands. The focused patches, on the other hand, are processed using more computationally intensive operations. Thus, DefocusSR2 integrates defocus classification and super-resolution within a unified framework. Experiments demonstrate that DefocusSR2 can accelerate most SR methods, reducing the FLOPs of SR models by approximately 70% while maintaining state-of-the-art SR performance.
现有的图像超分辨率(SR)方法通常会导致过度锐化,尤其是在失焦图像中。然而,我们观察到,散焦区域和聚焦区域的恢复难度不同。这一观察结果为更有效的增强提供了机会。在本文中,我们介绍了 DefocusSR2,这是一个专为失焦图像超分辨率设计的高效框架。DefocusSR2 由两个主要模块组成:深度引导分割(DGS)和失焦感知分类增强(DCE)。在 DGS 模块中,我们利用 MobileSAM,在深度信息的引导下,对输入图像进行精确分割,并生成离焦地图。这些地图提供了有关散焦区域位置的详细信息。在 DCE 模块中,我们会裁剪散焦图,并根据预定义的阈值将分段划分为散焦斑块和聚焦斑块。通过知识提炼和模糊内核匹配的融合,网络保留了模糊内核,以减少计算负荷。实际上,失焦斑块被送入高效模糊匹配 SR 网络(EBM-SR),其中保留了模糊内核,以减轻计算需求。另一方面,聚焦补丁的处理需要使用更多计算密集型操作。因此,DefocusSR2 在一个统一的框架内集成了离焦分类和超分辨率。实验证明,DefocusSR2 可以加速大多数 SR 方法,将 SR 模型的 FLOPs 减少约 70%,同时保持最先进的 SR 性能。
{"title":"DefocusSR2: An efficient depth-guided and distillation-based framework for defocus images super-resolution","authors":"Qipei Li, Da Pan, Zefeng Ying, Qirong Liang, Ping Shi","doi":"10.1016/j.displa.2024.102883","DOIUrl":"10.1016/j.displa.2024.102883","url":null,"abstract":"<div><div>Existing image super-resolution (SR) methods often lead to oversharpening, particularly in defocused images. However, we have observed that defocused regions and focused regions present different levels of recovery difficulty. This observation opens up opportunities for more efficient enhancements. In this paper, we introduce DefocusSR2, an efficient framework designed for super-resolution of defocused images. DefocusSR2 consists of two main modules: Depth-Guided Segmentation (DGS) and Defocus-Aware Classify Enhance (DCE). In the DGS module, we utilize MobileSAM, guided by depth information, to accurately segment the input image and generate defocus maps. These maps provide detailed information about the locations of defocused areas. In the DCE module, we crop the defocus map and classify the segments into defocused and focused patches based on a predefined threshold. Through knowledge distillation and the fusion of blur kernel matching, the network retains the fuzzy kernel to reduce computational load. Practically, the defocused patches are fed into the Efficient Blur Match SR Network (EBM-SR), where the blur kernel is preserved to alleviate computational demands. The focused patches, on the other hand, are processed using more computationally intensive operations. Thus, DefocusSR2 integrates defocus classification and super-resolution within a unified framework. Experiments demonstrate that DefocusSR2 can accelerate most SR methods, reducing the FLOPs of SR models by approximately 70% while maintaining state-of-the-art SR performance.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"86 ","pages":"Article 102883"},"PeriodicalIF":3.7,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142706179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-15DOI: 10.1016/j.displa.2024.102890
Xiaoxiao Liu, Yan Zhao, Shigang Wang, Jian Wei
High-precision medical image segmentation provides a reliable basis for clinical analysis and diagnosis. Researchers have developed various models to enhance the segmentation performance of medical images. Among these methods, two-dimensional models such as Unet exhibit a simple structure, low computational resource requirements, and strong local feature capture capabilities. However, their spatial information utilization is insufficient, limiting their segmentation accuracy. Three-dimensional models, such as 3D Unet, utilize spatial information more fully and are suitable for complex tasks, but they require high computational resources and have limited real-time performance. In this paper, we propose a virtual 3D module (Mambav3d) based on mamba, which introduces spatial information into 2D segmentation tasks to more fully integrate the 3D information of the image and further improve segmentation accuracy under conditions of low computational resource requirements. Mambav3d leverages the properties of hidden states in the state space model, combined with the shift of visual perspective, to incorporate semantic information between different anatomical planes in different slices of the same 3D sample. The voxel segmentation is converted to pixel segmentation to reduce model training data requirements and model complexity while ensuring that the model integrates 3D information and enhances segmentation accuracy. The model references the information from previous layers when labeling the current layer, thereby facilitating the transfer of semantic information between slice layers and avoiding the high computational cost associated with using structures such as Transformers between layers. We have implemented Mambav3d on Unet and evaluated its performance on the BraTs, Amos, and KiTs datasets, demonstrating superiority over other state-of-the-art methods.
{"title":"Mambav3d: A mamba-based virtual 3D module stringing semantic information between layers of medical image slices","authors":"Xiaoxiao Liu, Yan Zhao, Shigang Wang, Jian Wei","doi":"10.1016/j.displa.2024.102890","DOIUrl":"10.1016/j.displa.2024.102890","url":null,"abstract":"<div><div>High-precision medical image segmentation provides a reliable basis for clinical analysis and diagnosis. Researchers have developed various models to enhance the segmentation performance of medical images. Among these methods, two-dimensional models such as Unet exhibit a simple structure, low computational resource requirements, and strong local feature capture capabilities. However, their spatial information utilization is insufficient, limiting their segmentation accuracy. Three-dimensional models, such as 3D Unet, utilize spatial information more fully and are suitable for complex tasks, but they require high computational resources and have limited real-time performance. In this paper, we propose a virtual 3D module (Mambav3d) based on mamba, which introduces spatial information into 2D segmentation tasks to more fully integrate the 3D information of the image and further improve segmentation accuracy under conditions of low computational resource requirements. Mambav3d leverages the properties of hidden states in the state space model, combined with the shift of visual perspective, to incorporate semantic information between different anatomical planes in different slices of the same 3D sample. The voxel segmentation is converted to pixel segmentation to reduce model training data requirements and model complexity while ensuring that the model integrates 3D information and enhances segmentation accuracy. The model references the information from previous layers when labeling the current layer, thereby facilitating the transfer of semantic information between slice layers and avoiding the high computational cost associated with using structures such as Transformers between layers. We have implemented Mambav3d on Unet and evaluated its performance on the BraTs, Amos, and KiTs datasets, demonstrating superiority over other state-of-the-art methods.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102890"},"PeriodicalIF":3.7,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142650754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-14DOI: 10.1016/j.displa.2024.102881
Zikang Chen , Zhouyan He , Ting Luo , Chongchong Jin , Yang Song
Tone-Mapping Operators (TMOs) play a crucial role in converting High Dynamic Range (HDR) images into Tone-Mapped Images (TMIs) with standard dynamic range for optimal display on standard monitors. Nevertheless, TMIs generated by distinct TMOs may exhibit diverse visual artifacts, highlighting the significance of TMI Quality Assessment (TMIQA) methods in predicting perceptual quality and guiding advancements in TMOs. Inspired by luminance decomposition and Transformer, a new no-reference TMIQA method based on deep learning is proposed in this paper, named LDT-TMIQA. Specifically, a TMI will change under the influence of different TMOs, potentially resulting in either over-exposure or under-exposure, leading to structure distortion and changes in texture details. Therefore, we first decompose the luminance channel of a TMI into a base layer and a detail layer that capture structure information and texture information, respectively. Then, they are employed with the TMI collectively as inputs to the Feature Extraction Module (FEM) to enhance the availability of prior information on luminance, structure, and texture. Additionally, the FEM incorporates the Cross Attention Prior Module (CAPM) to model the interdependencies among the base layer, detail layer, and TMI while employing the Iterative Attention Prior Module (IAPM) to extract multi-scale and multi-level visual features. Finally, a Feature Selection Fusion Module (FSFM) is proposed to obtain final effective features for predicting the quality scores of TMIs by reducing the weight of unnecessary features and fusing the features of different levels with equal importance. Extensive experiments on the publicly available TMI benchmark database indicate that the proposed LDT-TMIQA reaches the state-of-the-art level.
{"title":"Luminance decomposition and Transformer based no-reference tone-mapped image quality assessment","authors":"Zikang Chen , Zhouyan He , Ting Luo , Chongchong Jin , Yang Song","doi":"10.1016/j.displa.2024.102881","DOIUrl":"10.1016/j.displa.2024.102881","url":null,"abstract":"<div><div>Tone-Mapping Operators (TMOs) play a crucial role in converting High Dynamic Range (HDR) images into Tone-Mapped Images (TMIs) with standard dynamic range for optimal display on standard monitors. Nevertheless, TMIs generated by distinct TMOs may exhibit diverse visual artifacts, highlighting the significance of TMI Quality Assessment (TMIQA) methods in predicting perceptual quality and guiding advancements in TMOs. Inspired by luminance decomposition and Transformer, a new no-reference TMIQA method based on deep learning is proposed in this paper, named LDT-TMIQA. Specifically, a TMI will change under the influence of different TMOs, potentially resulting in either over-exposure or under-exposure, leading to structure distortion and changes in texture details. Therefore, we first decompose the luminance channel of a TMI into a base layer and a detail layer that capture structure information and texture information, respectively. Then, they are employed with the TMI collectively as inputs to the Feature Extraction Module (FEM) to enhance the availability of prior information on luminance, structure, and texture. Additionally, the FEM incorporates the Cross Attention Prior Module (CAPM) to model the interdependencies among the base layer, detail layer, and TMI while employing the Iterative Attention Prior Module (IAPM) to extract multi-scale and multi-level visual features. Finally, a Feature Selection Fusion Module (FSFM) is proposed to obtain final effective features for predicting the quality scores of TMIs by reducing the weight of unnecessary features and fusing the features of different levels with equal importance. Extensive experiments on the publicly available TMI benchmark database indicate that the proposed LDT-TMIQA reaches the state-of-the-art level.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102881"},"PeriodicalIF":3.7,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142650756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-14DOI: 10.1016/j.displa.2024.102889
Zhong Zheng, Zhaohua Zhou, Ruipeng Chen, Jiajie Liu, Chun Liu, Lirong Zhang, Lei Zhou, Miao Xu, Lei Wang, Weijing Wu, Junbiao Peng
Currently, Mura defects have a significant impact on the yield of AMOLED panels, and De-Mura plays a critical role in the compensation. To enhance the applicability of the subpixel luminance extraction method in De-Mura and to address inaccuracies caused by aperture diffraction limit and geometric defocusing in camera imaging, this paper proposes a precise extraction method based on effective area. We establish the concept of the effective area first and then determine the effective area of subpixel imaging on the camera sensor by incorporating the circle of confusion (CoC) caused by aperture diffraction limits and geometric defocusing. Finally, more precise luminance information is obtained. Results show that, after compensation, the Mura on the white screen is almost eliminated subjectively. Objectively, by constructing normalized luminance curves for subpixels in Mura regions, the standard deviation indicates that our method outperforms the traditional whole-pixel method, improving uniformity by approximately 50%.
{"title":"Precise subpixel luminance extraction method for De-Mura of AMOLED displays","authors":"Zhong Zheng, Zhaohua Zhou, Ruipeng Chen, Jiajie Liu, Chun Liu, Lirong Zhang, Lei Zhou, Miao Xu, Lei Wang, Weijing Wu, Junbiao Peng","doi":"10.1016/j.displa.2024.102889","DOIUrl":"10.1016/j.displa.2024.102889","url":null,"abstract":"<div><div>Currently, Mura defects have a significant impact on the yield of AMOLED panels, and De-Mura plays a critical role in the compensation. To enhance the applicability of the subpixel luminance extraction method in De-Mura and to address inaccuracies caused by aperture diffraction limit and geometric defocusing in camera imaging, this paper proposes a precise extraction method based on effective area. We establish the concept of the effective area first and then determine the effective area of subpixel imaging on the camera sensor by incorporating the circle of confusion (CoC) caused by aperture diffraction limits and geometric defocusing. Finally, more precise luminance information is obtained. Results show that, after compensation, the Mura on the white screen is almost eliminated subjectively. Objectively, by constructing normalized luminance curves for subpixels in Mura regions, the standard deviation indicates that our method outperforms the traditional whole-pixel method, improving uniformity by approximately 50%.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"86 ","pages":"Article 102889"},"PeriodicalIF":3.7,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142706177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-12DOI: 10.1016/j.displa.2024.102873
Wenchao Zhu , Zeliang Cheng , Qi Wang , Jing Du , Yingzi Lin
The readability of human–computer interfaces impacts the users’ visual performance while using electronic devices, which gains inadequate attention. This situation is critical during high-stress conditions such as firefighting, where accurate and fast information processing is critical. This study addresses how font and background color combinations on Liquid Crystal displays (LCDs) affect recognition efficiency. A novel concept, primary color Euclidean distance (PCED), is introduced and testified under a repeated-measures experiment. Three factors were investigated: background color (black, white), font color (red, green, blue), and PCEDs. A total of 24 participants were recruited. Results demonstrate that color combinations with specific PCED values can substantially impact recognition efficiency. By using RSA, this study modelled the response time in a generalized mathematical model, which is response surface analysis. Results showed that blue font colors under a black background showed the longest response time. This study also explored the influence of physical stress on recognition efficiency, revealing a latency of about 100 ms across all color combinations. The findings offer a methodological advancement in understanding the effects of color combinations in digital displays, setting the stage for future research in diverse demographic and technological contexts, including mixed reality.
{"title":"Font and background color combinations influence recognition efficiency: A novel method via primary color Euclidean distance and response surface analysis","authors":"Wenchao Zhu , Zeliang Cheng , Qi Wang , Jing Du , Yingzi Lin","doi":"10.1016/j.displa.2024.102873","DOIUrl":"10.1016/j.displa.2024.102873","url":null,"abstract":"<div><div>The readability of human–computer interfaces impacts the users’ visual performance while using electronic devices, which gains inadequate attention. This situation is critical during high-stress conditions such as firefighting, where accurate and fast information processing is critical. This study addresses how font and background color combinations on Liquid Crystal displays (LCDs) affect recognition efficiency. A novel concept, primary color Euclidean distance (PCED), is introduced and testified under a repeated-measures experiment. Three factors were investigated: background color (black, white), font color (red, green, blue), and PCEDs. A total of 24 participants were recruited. Results demonstrate that color combinations with specific PCED values can substantially impact recognition efficiency. By using RSA, this study modelled the response time in a generalized mathematical model, which is response surface analysis. Results showed that blue font colors under a black background showed the longest response time. This study also explored the influence of physical stress on recognition efficiency, revealing a latency of about 100 ms across all color combinations. The findings offer a methodological advancement in understanding the effects of color combinations in digital displays, setting the stage for future research in diverse demographic and technological contexts, including mixed reality.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102873"},"PeriodicalIF":3.7,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142706427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-09DOI: 10.1016/j.displa.2024.102882
Zhichao Chen , Shuyu Xiao , Yongfang Wang , Yihan Wang , Hongming Cai
No-reference Point Cloud Quality Assessment (NR-PCQA) is a challenge in the field of media quality assessment, such as inability to accurately capture quality-related features due to the unique scattered structure of points and less considering global features and local features jointly in the existing no-reference PCQA metrics. To address these challenges, we propose a Global and Local Dual-Branch Fusion (GLDBF) network for no-reference point cloud quality assessment. Firstly, sparse convolution is used to extract the global quality feature of distorted Point Clouds (PCs). Secondly, graph weighted PointNet++ is proposed to extract the multi-level local features of point cloud, and the offset attention mechanism is further used to enhance local effective features. Transformer-based fusion module is also proposed to fuse multi-level local features. Finally, we joint the global and local dual branch fusion modules via multilayer perceptron to predict the quality score of distorted PCs. Experimental results show that the proposed algorithm can achieves state-of-the-art performance compared with existing methods in assessing the quality of distorted PCs.
无参照点云质量评估(NR-PCQA)是媒体质量评估领域的一项挑战,例如,由于点的结构比较分散,无法准确捕捉与质量相关的特征,而且现有的无参照 PCQA 指标较少同时考虑全局特征和局部特征。针对这些挑战,我们提出了一种用于无参考点云质量评估的全局和局部双分支融合(GLDBF)网络。首先,使用稀疏卷积来提取扭曲点云(PC)的全局质量特征。其次,提出了图加权 PointNet++ 来提取点云的多级局部特征,并进一步使用偏移注意机制来增强局部有效特征。此外,还提出了基于变换器的融合模块来融合多级局部特征。最后,我们通过多层感知器将全局和局部双分支融合模块联合起来,预测失真 PC 的质量得分。实验结果表明,在评估失真 PC 的质量方面,与现有方法相比,所提出的算法可以达到最先进的性能。
{"title":"GLDBF: Global and local dual-branch fusion network for no-reference point cloud quality assessment","authors":"Zhichao Chen , Shuyu Xiao , Yongfang Wang , Yihan Wang , Hongming Cai","doi":"10.1016/j.displa.2024.102882","DOIUrl":"10.1016/j.displa.2024.102882","url":null,"abstract":"<div><div>No-reference Point Cloud Quality Assessment (NR-PCQA) is a challenge in the field of media quality assessment, such as inability to accurately capture quality-related features due to the unique scattered structure of points and less considering global features and local features jointly in the existing no-reference PCQA metrics. To address these challenges, we propose a Global and Local Dual-Branch Fusion (GLDBF) network for no-reference point cloud quality assessment. Firstly, sparse convolution is used to extract the global quality feature of distorted Point Clouds (PCs). Secondly, graph weighted PointNet++ is proposed to extract the multi-level local features of point cloud, and the offset attention mechanism is further used to enhance local effective features. Transformer-based fusion module is also proposed to fuse multi-level local features. Finally, we joint the global and local dual branch fusion modules via multilayer perceptron to predict the quality score of distorted PCs. Experimental results show that the proposed algorithm can achieves state-of-the-art performance compared with existing methods in assessing the quality of distorted PCs.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102882"},"PeriodicalIF":3.7,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142650753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-06DOI: 10.1016/j.displa.2024.102874
Faruk Enes Oğuz , Ahmet Alkan
Gastrointestinal diseases are significant health issues worldwide, requiring early diagnosis due to their serious health implications. Therefore, detecting these diseases using artificial intelligence-based medical decision support systems through colonoscopy images plays a critical role in early diagnosis. In this study, a deep learning-based method is proposed for the classification of gastrointestinal diseases and colon anatomical landmarks using colonoscopy images. For this purpose, five different Convolutional Neural Network (CNN) models, namely Xception, ResNet-101, NASNet-Large, EfficientNet, and NASNet-Mobile, were trained. An ensemble model was created using class-based recall values derived from the validation performances of the top three models (Xception, ResNet-101, NASNet-Large). A user-friendly Graphical User Interface (GUI) was developed, allowing users to perform classification tasks and use Gradient-weighted Class Activation Mapping (Grad-CAM), an explainable AI tool, to visualize the regions from which the model derives information. Grad-CAM visualizations contribute to a better understanding of the model’s decision-making processes and play an important role in the application of explainable AI. In the study, eight labels, including anatomical markers such as z-line, pylorus, and cecum, as well as pathological findings like esophagitis, polyps, and ulcerative colitis, were classified using the KVASIR V2 dataset. The proposed ensemble model achieved a 94.125% accuracy on the KVASIR V2 dataset, demonstrating competitive performance compared to similar studies in the literature. Additionally, the precision and F1 score values of this model are equal to 94.168% and 94.125%, respectively. These results suggest that the proposed method provides an effective solution for the diagnosis of GI diseases and can be beneficial for medical education.
胃肠道疾病是世界范围内的重大健康问题,因其对健康的严重影响而需要早期诊断。因此,利用基于人工智能的医疗决策支持系统通过结肠镜图像检测这些疾病在早期诊断中发挥着至关重要的作用。本研究提出了一种基于深度学习的方法,利用结肠镜图像对胃肠道疾病和结肠解剖地标进行分类。为此,我们训练了五个不同的卷积神经网络(CNN)模型,即 Xception、ResNet-101、NASNet-Large、EfficientNet 和 NASNet-Mobile。根据前三个模型(Xception、ResNet-101、NASNet-Large)的验证性能得出的基于类的召回值,创建了一个集合模型。我们开发了一个用户友好型图形用户界面(GUI),允许用户执行分类任务,并使用梯度加权类激活映射(Grad-CAM)这一可解释的人工智能工具来可视化模型从中获取信息的区域。Grad-CAM 可视化有助于更好地理解模型的决策过程,并在可解释人工智能的应用中发挥重要作用。在这项研究中,利用 KVASIR V2 数据集对八个标签进行了分类,包括 Z 线、幽门和盲肠等解剖标记以及食管炎、息肉和溃疡性结肠炎等病理结果。所提出的集合模型在 KVASIR V2 数据集上达到了 94.125% 的准确率,与文献中的类似研究相比,表现出了很强的竞争力。此外,该模型的精确度和 F1 分数分别为 94.168% 和 94.125%。这些结果表明,所提出的方法为消化道疾病的诊断提供了有效的解决方案,并可用于医学教育。
{"title":"Weighted ensemble deep learning approach for classification of gastrointestinal diseases in colonoscopy images aided by explainable AI","authors":"Faruk Enes Oğuz , Ahmet Alkan","doi":"10.1016/j.displa.2024.102874","DOIUrl":"10.1016/j.displa.2024.102874","url":null,"abstract":"<div><div>Gastrointestinal diseases are significant health issues worldwide, requiring early diagnosis due to their serious health implications. Therefore, detecting these diseases using artificial intelligence-based medical decision support systems through colonoscopy images plays a critical role in early diagnosis. In this study, a deep learning-based method is proposed for the classification of gastrointestinal diseases and colon anatomical landmarks using colonoscopy images. For this purpose, five different Convolutional Neural Network (CNN) models, namely Xception, ResNet-101, NASNet-Large, EfficientNet, and NASNet-Mobile, were trained. An ensemble model was created using class-based recall values derived from the validation performances of the top three models (Xception, ResNet-101, NASNet-Large). A user-friendly Graphical User Interface (GUI) was developed, allowing users to perform classification tasks and use Gradient-weighted Class Activation Mapping (Grad-CAM), an explainable AI tool, to visualize the regions from which the model derives information. Grad-CAM visualizations contribute to a better understanding of the model’s decision-making processes and play an important role in the application of explainable AI. In the study, eight labels, including anatomical markers such as z-line, pylorus, and cecum, as well as pathological findings like esophagitis, polyps, and ulcerative colitis, were classified using the KVASIR V2 dataset. The proposed ensemble model achieved a 94.125% accuracy on the KVASIR V2 dataset, demonstrating competitive performance compared to similar studies in the literature. Additionally, the precision and F1 score values of this model are equal to 94.168% and 94.125%, respectively. These results suggest that the proposed method provides an effective solution for the diagnosis of GI diseases and can be beneficial for medical education.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102874"},"PeriodicalIF":3.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142650763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}