首页 > 最新文献

Displays最新文献

英文 中文
Nighttime large-field video image change detection based on adaptive superpixel reconstruction and multi-scale singular value decomposition fusion 基于自适应超像素重建和多尺度奇异值分解融合的夜间大视场视频图像变化检测
IF 3.7 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-09-17 DOI: 10.1016/j.displa.2024.102840
Tianyu Ren , Jia He , Zhenhong Jia , Xiaohui Huang , Sensen Song , Jiajia Wang , Gang Zhou , Fei Shi , Ming Lv

With the development of technology and the needs of social governance, surveillance equipment has been widely used. It is very mature to detect the change of surveillance video images in conventional scenes through video image change detection algorithms. However, in the large field of view environment at night, there are complex random noise and low signal-to-noise ratio in surveillance video images, which makes it difficult for people to find small moving targets. To this end, we propose a new method for nighttime large-field surveillance video image change detection based on adaptive superpixel reconstruction and multi-scale singular value decomposition fusion. The proposed method consists of two parts. On the one hand, an adaptive superpixel reconstruction method is used to reconstruct the two denoised difference images by selecting different segmentation parameters, and the edge information of the two reconstructed difference images is significantly enhanced. On the other hand, a multi-scale singular value decomposition fusion method is used to fuse the two difference images. The multi-scale singular value decomposition fusion obtains a robust difference image by selecting fusion rules at different scales and using the complementary information of different difference images, and the Fuzzy c-means (FCM) clustering algorithm is used to obtain the final changed image. Experimental results on a self-built nighttime large-field video image dataset with two resolutions show that the proposed method is superior to other algorithms in terms of detection accuracy and robustness.

随着科技的发展和社会治理的需要,监控设备得到了广泛的应用。在常规场景下,通过视频图像变化检测算法来检测监控视频图像的变化已经非常成熟。然而,在夜间大视场环境下,监控视频图像中存在复杂的随机噪声,信噪比低,人们很难发现细小的移动目标。为此,我们提出了一种基于自适应超像素重构和多尺度奇异值分解融合的夜间大视场监控视频图像变化检测新方法。该方法由两部分组成。一方面,采用自适应超像素重建方法,通过选择不同的分割参数重建两幅去噪后的差分图像,并显著增强重建后两幅差分图像的边缘信息。另一方面,采用多尺度奇异值分解融合方法对两幅差分图像进行融合。多尺度奇异值分解融合通过选择不同尺度的融合规则,利用不同差异图像的互补信息,获得稳健的差异图像,并使用模糊 c-means (FCM) 聚类算法获得最终的变化图像。在自建的具有两种分辨率的夜间大视场视频图像数据集上的实验结果表明,所提出的方法在检测精度和鲁棒性方面优于其他算法。
{"title":"Nighttime large-field video image change detection based on adaptive superpixel reconstruction and multi-scale singular value decomposition fusion","authors":"Tianyu Ren ,&nbsp;Jia He ,&nbsp;Zhenhong Jia ,&nbsp;Xiaohui Huang ,&nbsp;Sensen Song ,&nbsp;Jiajia Wang ,&nbsp;Gang Zhou ,&nbsp;Fei Shi ,&nbsp;Ming Lv","doi":"10.1016/j.displa.2024.102840","DOIUrl":"10.1016/j.displa.2024.102840","url":null,"abstract":"<div><p>With the development of technology and the needs of social governance, surveillance equipment has been widely used. It is very mature to detect the change of surveillance video images in conventional scenes through video image change detection algorithms. However, in the large field of view environment at night, there are complex random noise and low signal-to-noise ratio in surveillance video images, which makes it difficult for people to find small moving targets. To this end, we propose a new method for nighttime large-field surveillance video image change detection based on adaptive superpixel reconstruction and multi-scale singular value decomposition fusion. The proposed method consists of two parts. On the one hand, an adaptive superpixel reconstruction method is used to reconstruct the two denoised difference images by selecting different segmentation parameters, and the edge information of the two reconstructed difference images is significantly enhanced. On the other hand, a multi-scale singular value decomposition fusion method is used to fuse the two difference images. The multi-scale singular value decomposition fusion obtains a robust difference image by selecting fusion rules at different scales and using the complementary information of different difference images, and the Fuzzy c-means (FCM) clustering algorithm is used to obtain the final changed image. Experimental results on a self-built nighttime large-field video image dataset with two resolutions show that the proposed method is superior to other algorithms in terms of detection accuracy and robustness.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102840"},"PeriodicalIF":3.7,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142239423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical techniques for digital pre-processing of computed tomography medical images: A current review 计算机断层扫描医学影像数字预处理的统计技术:最新综述
IF 3.7 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-09-14 DOI: 10.1016/j.displa.2024.102835
Oscar Valbuena Prada , Miguel Ángel Vera , Guillermo Ramirez , Ricardo Barrientos Rojel , David Mojica Maldonado

Digital pre-processing is a vital stage in the processing of the information contained in multilayer computed tomography images. The purpose of digital pre-processing is the minimization of the effect of image imperfections, which are associated with the noise and artifacts that affect the quality of the images during acquisition, storage, and/or transmission processes. Likewise, there is a wide variety of techniques in specialized literature that address the problem of imperfections, noise, and artifacts present in images. In this study, a comprehensive review of specialized literature on statistical techniques used in the pre-processing of digital images was conducted. The review summarizes updated information from 56 studies conducted over the last 5 years (2018–2022) on the main statistical techniques used for the digital processing of medical images obtained under different modalities, with a special focus on computed tomography. Additionally, the most often used statistical metrics for measuring the performance of pre-processing techniques in the field of medical imaging are described. The most often used pre-processing techniques in the field of medical imaging were found to be statistical filters based on median, neural networks, Gaussian filters based on deep learning, mean, and machine learning applied to multilayer computed tomography images and magnetic resonance images of the brain, abdomen, lungs, and heart, among other organs of the body.

数字预处理是处理多层计算机断层扫描图像所含信息的重要阶段。数字预处理的目的是最大限度地减少图像瑕疵的影响,这些瑕疵与在采集、存储和/或传输过程中影响图像质量的噪声和伪影有关。同样,专业文献中也有各种各样的技术来解决图像中存在的瑕疵、噪音和伪影问题。在本研究中,我们对用于数字图像预处理的统计技术的专业文献进行了全面回顾。综述总结了过去 5 年(2018-2022 年)进行的 56 项研究的最新信息,这些研究涉及在不同模式下获得的医学图像的数字化处理中使用的主要统计技术,特别关注计算机断层扫描。此外,还介绍了用于衡量医学影像预处理技术性能的最常用统计指标。研究发现,医学影像领域最常用的预处理技术是基于中值的统计滤波器、神经网络、基于深度学习的高斯滤波器、均值和机器学习,这些技术应用于多层计算机断层扫描图像以及大脑、腹部、肺部和心脏等人体器官的磁共振图像。
{"title":"Statistical techniques for digital pre-processing of computed tomography medical images: A current review","authors":"Oscar Valbuena Prada ,&nbsp;Miguel Ángel Vera ,&nbsp;Guillermo Ramirez ,&nbsp;Ricardo Barrientos Rojel ,&nbsp;David Mojica Maldonado","doi":"10.1016/j.displa.2024.102835","DOIUrl":"10.1016/j.displa.2024.102835","url":null,"abstract":"<div><p>Digital pre-processing is a vital stage in the processing of the information contained in multilayer computed tomography images. The purpose of digital pre-processing is the minimization of the effect of image imperfections, which are associated with the noise and artifacts that affect the quality of the images during acquisition, storage, and/or transmission processes. Likewise, there is a wide variety of techniques in specialized literature that address the problem of imperfections, noise, and artifacts present in images. In this study, a comprehensive review of specialized literature on statistical techniques used in the pre-processing of digital images was conducted. The review summarizes updated information from 56 studies conducted over the last 5 years (2018–2022) on the main statistical techniques used for the digital processing of medical images obtained under different modalities, with a special focus on computed tomography. Additionally, the most often used statistical metrics for measuring the performance of pre-processing techniques in the field of medical imaging are described. The most often used pre-processing techniques in the field of medical imaging were found to be statistical filters based on median, neural networks, Gaussian filters based on deep learning, mean, and machine learning applied to multilayer computed tomography images and magnetic resonance images of the brain, abdomen, lungs, and heart, among other organs of the body.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102835"},"PeriodicalIF":3.7,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141938224001999/pdfft?md5=e9830193894fc73d2f7b5d4f1a47d064&pid=1-s2.0-S0141938224001999-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142232852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human body features recognition based adaptive user interface for extra-large touch screens 基于人体特征识别的超大触摸屏自适应用户界面
IF 3.7 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-09-13 DOI: 10.1016/j.displa.2024.102838
Junfeng Wang, Jialin Li

With the widespread use of extra-large touch screens (eLTS) in various settings such as work and education, interaction efficiency and user experience have garnered increased attention. The current user interface (UI) layouts of eLTS are primarily categorized into two modes: fixed position and manual adjustment. The fixed UI layout fails to accommodate users of different heights and sizes, while the manual adjustment mode involves cumbersome steps and lacks sufficient flexibility. This study proposes an adaptive UI for eLTS. The optimal operational area on the eLTS is determined based on users’ height, eye level, arm length, face orientation, and distance from the screen. The eLTS menu is then positioned and displayed within this optimal area. Simulations involving users of various heights (P1 female, P50 male and female, and P99 male) were conducted to evaluate fatigue using the rapid upper limb assessment (RULA) method. The results indicate that the proposed adaptive UI significantly reduces user fatigue.

随着超大触摸屏(eLTS)在工作和教育等各种场合的广泛使用,交互效率和用户体验日益受到关注。目前 eLTS 的用户界面(UI)布局主要分为两种模式:固定位置和手动调节。固定的用户界面布局无法适应不同身高和体型的用户,而手动调节模式步骤繁琐,缺乏足够的灵活性。本研究为 eLTS 提出了一种自适应用户界面。根据用户的身高、视线高度、臂长、面部朝向和与屏幕的距离,确定 eLTS 上的最佳操作区域。然后将 eLTS 菜单定位并显示在这一最佳区域内。使用快速上肢评估法(RULA)对不同身高的用户(P1 女性、P50 男性和女性以及 P99 男性)进行了模拟,以评估疲劳程度。结果表明,所提出的自适应用户界面能显著减轻用户疲劳。
{"title":"Human body features recognition based adaptive user interface for extra-large touch screens","authors":"Junfeng Wang,&nbsp;Jialin Li","doi":"10.1016/j.displa.2024.102838","DOIUrl":"10.1016/j.displa.2024.102838","url":null,"abstract":"<div><p>With the widespread use of extra-large touch screens (eLTS) in various settings such as work and education, interaction efficiency and user experience have garnered increased attention. The current user interface (UI) layouts of eLTS are primarily categorized into two modes: fixed position and manual adjustment. The fixed UI layout fails to accommodate users of different heights and sizes, while the manual adjustment mode involves cumbersome steps and lacks sufficient flexibility. This study proposes an adaptive UI for eLTS. The optimal operational area on the eLTS is determined based on users’ height, eye level, arm length, face orientation, and distance from the screen. The eLTS menu is then positioned and displayed within this optimal area. Simulations involving users of various heights (P1 female, P50 male and female, and P99 male) were conducted to evaluate fatigue using the rapid upper limb assessment (RULA) method. The results indicate that the proposed adaptive UI significantly reduces user fatigue.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102838"},"PeriodicalIF":3.7,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142239514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Underwater target detection network based on differential routing assistance and bilateral attention synergy 基于差分路由辅助和双边注意力协同的水下目标探测网络
IF 3.7 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-09-13 DOI: 10.1016/j.displa.2024.102836
Zhiwei Chen, Suting Chen

Underwater target detection technology holds significant importance in both military and civilian applications of ocean exploration. However, due to the complex underwater environment, most targets are small and often obscured, leading to low detection accuracy and missed detections in existing target detection algorithms. To address these issues, we propose an underwater target detection algorithm that balances accuracy and speed. Specifically, we first propose the Differentiable Routing Assistance Sampling Network named (DRASN), where differentiable routing participates in training the sampling network but not in the inference process. It replaces the down-sampling network composed of Maxpool and convolution fusion in the backbone network, reducing the feature loss of small and occluded targets. Secondly, we proposed the Bilateral Attention Synergistic Network (BASN), which establishes connections between the backbone and neck with fine-grained information from both channel and spatial perspectives, thereby further enhancing the detection capability of targets in complex backgrounds. Finally, considering the characteristics of real frames, we proposed a scale approximation auxiliary loss function named (Aux-Loss) and modify the allocation strategy of positive and negative samples to enable the network to selectively learn high-quality anchors, thereby improving the convergence capability of the network. Compared with mainstream algorithms, our detection network achieves 82.9% in [email protected] on the URPC2021 dataset, which is 9.5%, 5.7%, and 2.8% higher than YOLOv8s, RT-DETR, and SDBB respectively. The speed reaches 75 FPS and meets the requirements for real-time performance.

水下目标探测技术在海洋探测的军事和民用应用中都具有重要意义。然而,由于水下环境复杂,大多数目标体积小且经常被遮挡,导致现有目标检测算法的检测精度低和漏检。为了解决这些问题,我们提出了一种兼顾精度和速度的水下目标检测算法。具体来说,我们首先提出了名为 "可微分路由辅助采样网络"(DRASN)的算法,其中可微分路由参与采样网络的训练,但不参与推理过程。它取代了骨干网络中由 Maxpool 和卷积融合组成的下采样网络,减少了小目标和隐蔽目标的特征损失。其次,我们提出了双边注意力协同网络(BASN),从信道和空间两个角度利用细粒度信息在骨干和颈部之间建立联系,从而进一步提高了复杂背景下的目标检测能力。最后,考虑到真实帧的特点,我们提出了尺度逼近辅助损失函数(Aux-Loss),并修改了正负样本的分配策略,使网络能够有选择地学习高质量的锚点,从而提高了网络的收敛能力。与主流算法相比,我们的检测网络在URPC2021数据集上的[email protected]达到了82.9%,比YOLOv8s、RT-DETR和SDBB分别高出9.5%、5.7%和2.8%。速度达到 75 FPS,满足实时性能要求。
{"title":"Underwater target detection network based on differential routing assistance and bilateral attention synergy","authors":"Zhiwei Chen,&nbsp;Suting Chen","doi":"10.1016/j.displa.2024.102836","DOIUrl":"10.1016/j.displa.2024.102836","url":null,"abstract":"<div><p>Underwater target detection technology holds significant importance in both military and civilian applications of ocean exploration. However, due to the complex underwater environment, most targets are small and often obscured, leading to low detection accuracy and missed detections in existing target detection algorithms. To address these issues, we propose an underwater target detection algorithm that balances accuracy and speed. Specifically, we first propose the Differentiable Routing Assistance Sampling Network named (DRASN), where differentiable routing participates in training the sampling network but not in the inference process. It replaces the down-sampling network composed of Maxpool and convolution fusion in the backbone network, reducing the feature loss of small and occluded targets. Secondly, we proposed the Bilateral Attention Synergistic Network (BASN), which establishes connections between the backbone and neck with fine-grained information from both channel and spatial perspectives, thereby further enhancing the detection capability of targets in complex backgrounds. Finally, considering the characteristics of real frames, we proposed a scale approximation auxiliary loss function named (Aux-Loss) and modify the allocation strategy of positive and negative samples to enable the network to selectively learn high-quality anchors, thereby improving the convergence capability of the network. Compared with mainstream algorithms, our detection network achieves 82.9% in [email protected] on the URPC2021 dataset, which is 9.5%, 5.7%, and 2.8% higher than YOLOv8s, RT-DETR, and SDBB respectively. The speed reaches 75 FPS and meets the requirements for real-time performance.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102836"},"PeriodicalIF":3.7,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142239516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of SeeColors filters for color vision correction and comparative analysis with EnChroma glasses 用于色觉矫正的 SeeColors 滤镜的评估以及与 EnChroma 眼镜的比较分析
IF 3.7 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-09-10 DOI: 10.1016/j.displa.2024.102831
Fangli Fan , Yifeng Wu , Danyan Tang , Yujie Shu , Zhen Deng , Hai Xin , Xiqiang Liu

The purpose of this study was to evaluate the ability of SeeColors filters to enhance color vision test results using the electronic version of the Farnsworth D-15 (E-D15) and Ishihara tests on a Samsung TV, and to compare their effectiveness with EnChroma glasses. Sixty-six subjects with congenital red–green color vision deficiency were tested. For both protan and deutan groups, the confusion angle shifted to negative values, with SeeColors filters exhibiting a greater effect than EnChroma glasses. In the deutan group, the TES, S-index, and C-index of the E-D15 test decreased, with the SeeColors D30 filter having a more pronounced effect than EnChroma deutan glasses. In the protan group, while EnChroma protan glasses tended to decrease the TES, S-index, and C-index, the SeeColors P30 filter increased them. For both groups, TS and TN of the Ishihara tests improved, with the SeeColors D30 filter demonstrating a stronger effect than EnChroma deutan glasses. The study concluded that both the SeeColors D30 filter and EnChroma deutan glasses were beneficial for deutans, albeit the SeeColors D30 filter was superior. In protans, neither the SeeColors P30 filter nor EnChroma protan glasses showed significant effectiveness, but the SeeColors P30 filter did improve performance in the pseudoisochromatic task.

本研究的目的是评估 SeeColors 滤色镜在三星电视上使用电子版法恩斯沃斯 D-15 (E-D15) 和石原测试提高色觉测试结果的能力,并将其效果与 EnChroma 眼镜进行比较。66 名患有先天性红绿色盲的受试者接受了测试。在原色组和去色组中,混淆角均向负值移动,SeeColors 滤色镜的效果优于 EnChroma 眼镜。在去丹组中,E-D15 测试的 TES、S 指数和 C 指数都有所下降,SeeColors D30 滤光片的效果比 EnChroma 去丹眼镜更明显。在原样组中,EnChroma 原样眼镜倾向于降低 TES、S 指数和 C 指数,而 SeeColors P30 滤光片则提高了它们。在石原测试中,两组的 TS 和 TN 都有所改善,SeeColors D30 滤镜的效果比 EnChroma deutan 眼镜更强。研究得出结论,SeeColors D30 滤光片和 EnChroma deutan 眼镜都对鼬獾有益,但 SeeColors D30 滤光片更胜一筹。对于原猩猩,SeeColors P30 滤光片和 EnChroma 原猩猩眼镜都没有显示出明显的效果,但 SeeColors P30 滤光片确实提高了假异色任务的表现。
{"title":"Evaluation of SeeColors filters for color vision correction and comparative analysis with EnChroma glasses","authors":"Fangli Fan ,&nbsp;Yifeng Wu ,&nbsp;Danyan Tang ,&nbsp;Yujie Shu ,&nbsp;Zhen Deng ,&nbsp;Hai Xin ,&nbsp;Xiqiang Liu","doi":"10.1016/j.displa.2024.102831","DOIUrl":"10.1016/j.displa.2024.102831","url":null,"abstract":"<div><p>The purpose of this study was to evaluate the ability of SeeColors filters to enhance color vision test results using the electronic version of the Farnsworth D-15 (E-D15) and Ishihara tests on a Samsung TV, and to compare their effectiveness with EnChroma glasses. Sixty-six subjects with congenital red–green color vision deficiency were tested. For both protan and deutan groups, the confusion angle shifted to negative values, with SeeColors filters exhibiting a greater effect than EnChroma glasses. In the deutan group, the TES, S-index, and C-index of the E-D15 test decreased, with the SeeColors D30 filter having a more pronounced effect than EnChroma deutan glasses. In the protan group, while EnChroma protan glasses tended to decrease the TES, S-index, and C-index, the SeeColors P30 filter increased them. For both groups, TS and TN of the Ishihara tests improved, with the SeeColors D30 filter demonstrating a stronger effect than EnChroma deutan glasses. The study concluded that both the SeeColors D30 filter and EnChroma deutan glasses were beneficial for deutans, albeit the SeeColors D30 filter was superior. In protans, neither the SeeColors P30 filter nor EnChroma protan glasses showed significant effectiveness, but the SeeColors P30 filter did improve performance in the pseudoisochromatic task.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102831"},"PeriodicalIF":3.7,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141938224001951/pdfft?md5=ac0d10063902d0bded9a2e66cc83aeea&pid=1-s2.0-S0141938224001951-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142162122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vehicle trajectory extraction and integration from multi-direction video on urban intersection 从城市交叉路口的多向视频中提取和整合车辆轨迹
IF 3.7 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-09-07 DOI: 10.1016/j.displa.2024.102834
Jinjun Tang, Weihe Wang

With the gradual maturity of computer vision technology, using intersection surveillance videos for vehicle trajectory extraction has become a popular method to analyze vehicle conflicts and safety in urban intersection. However, many intersection surveillance videos have blind spots, failing to fully cover the entire intersection. Vehicles may also obstruct each other, resulting in incomplete vehicle trajectories. The angle of surveillance videos can also lead to inaccurate trajectory extraction. In response to these challenges, this study proposes an vehicle trajectory extraction and integration framework using surveillance videos collected from four entrance of urban intersection. The framework first employs the improved YOLOv5s model to detect the positions of vehicles. Then, we proposed an object tracking model MS-SORT to extract the trajectories in each surveillance video. Subsequently, the trajectories of each surveillance video are mapped into the same coordinate system. Then the integration of trajectories is achieved using space–time information and re-identification (ReID) methods. The framework extracts and integrates trajectories from four intersection surveillance videos, obtaining trajectories with significantly broader temporal and spatial coverage compared to those obtained from any single direction of surveillance video. Our detection model improved mAP by 1.3 percentage points compared to the basic YOLOv5s, and our object tracking model improved MOTA and IDF1 by 2.6 and 2.1 percentage points compared to DeepSORT. The trojectory integration method achieved 94.7 % of F1-Score and RMSE of 0.51 m. The average length and number of the extracted trajectories has increased by at least 47.6 % and 24.2 % respectively compared to trajectories extracted from a single video.

随着计算机视觉技术的逐渐成熟,利用十字路口监控视频进行车辆轨迹提取已成为分析城市十字路口车辆冲突和安全的常用方法。然而,许多交叉路口监控视频存在盲区,无法完全覆盖整个交叉路口。车辆之间也可能相互遮挡,导致车辆轨迹不完整。监控视频的角度也会导致轨迹提取不准确。为应对这些挑战,本研究利用从城市十字路口四个入口采集的监控视频,提出了一种车辆轨迹提取和整合框架。该框架首先采用改进的 YOLOv5s 模型来检测车辆的位置。然后,我们提出了一个物体跟踪模型 MS-SORT,以提取每个监控视频中的轨迹。随后,将每个监控视频的轨迹映射到同一坐标系中。然后使用时空信息和重新识别(ReID)方法实现轨迹整合。该框架从四个交叉路口的监控视频中提取并整合轨迹,获得的轨迹与从任何单一方向的监控视频中获得的轨迹相比,在时间和空间覆盖范围上都明显更广。与基本的 YOLOv5s 相比,我们的检测模型将 mAP 提高了 1.3 个百分点;与 DeepSORT 相比,我们的目标跟踪模型将 MOTA 和 IDF1 提高了 2.6 和 2.1 个百分点。与从单个视频中提取的轨迹相比,提取轨迹的平均长度和数量至少分别增加了 47.6% 和 24.2%。
{"title":"Vehicle trajectory extraction and integration from multi-direction video on urban intersection","authors":"Jinjun Tang,&nbsp;Weihe Wang","doi":"10.1016/j.displa.2024.102834","DOIUrl":"10.1016/j.displa.2024.102834","url":null,"abstract":"<div><p>With the gradual maturity of computer vision technology, using intersection surveillance videos for vehicle trajectory extraction has become a popular method to analyze vehicle conflicts and safety in urban intersection. However, many intersection surveillance videos have blind spots, failing to fully cover the entire intersection. Vehicles may also obstruct each other, resulting in incomplete vehicle trajectories. The angle of surveillance videos can also lead to inaccurate trajectory extraction. In response to these challenges, this study proposes an vehicle trajectory extraction and integration framework using surveillance videos collected from four entrance of urban intersection. The framework first employs the improved YOLOv5s model to detect the positions of vehicles. Then, we proposed an object tracking model MS-SORT to extract the trajectories in each surveillance video. Subsequently, the trajectories of each surveillance video are mapped into the same coordinate system. Then the integration of trajectories is achieved using space–time information and re-identification (ReID) methods. The framework extracts and integrates trajectories from four intersection surveillance videos, obtaining trajectories with significantly broader temporal and spatial coverage compared to those obtained from any single direction of surveillance video. Our detection model improved mAP by 1.3 percentage points compared to the basic YOLOv5s, and our object tracking model improved MOTA and IDF1 by 2.6 and 2.1 percentage points compared to DeepSORT. The trojectory integration method achieved 94.7 % of F1-Score and RMSE of 0.51 m. The average length and number of the extracted trajectories has increased by at least 47.6 % and 24.2 % respectively compared to trajectories extracted from a single video.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102834"},"PeriodicalIF":3.7,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142229967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel noiselayer-decoder driven blind watermarking network 由噪声层解码器驱动的新型盲水印网络
IF 3.7 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-09-04 DOI: 10.1016/j.displa.2024.102823
Xiaorui Zhang , Rui Jiang , Wei Sun , Sunil Kr. Jha

Most blind watermarking methods adopt the Encode-Noiselayer-Decoder network architecture, called END. However, there are issues that impact the imperceptibility and robustness of the watermarking, such as the encoder blindly embedding redundant features, adversarial training failing to simulate unknown noise effectively, and the limited capability of single-scale feature extraction. To address these challenges, we propose a new Noiselayer-Decoder-driven blind watermarking network, called ND-END, which leverages prior knowledge of the noise layer and features extracted by the decoder to guide the encoder for generating images with fewer redundant modifications, enhancing the imperceptibility. To effectively simulate the unknown noise caused during adversarial training, we introduce an unknown noise layer based on the guided denoising diffusion probabilistic model, which gradually modifies the mean value of the predicted noise during the image generation process. It produces unknown noise images that closely resemble the encoded images but can mislead the decoder. Moreover, we propose a multi-scale spatial-channel feature extraction method for extracting multi-scale message features from the noised image, which aids in message extraction. Experimental results demonstrate the effectiveness of our model, ND-END achieves a lower bit error rate while improving the peak signal-to-noise ratio by approximately 6 dB (from about 33.5 dB to 39.5 dB).

大多数盲水印方法都采用编码器-异层解码器网络结构,即END。然而,有些问题会影响水印的不可感知性和鲁棒性,如编码器盲目嵌入冗余特征、对抗训练无法有效模拟未知噪声、单尺度特征提取能力有限等。为了应对这些挑战,我们提出了一种新的噪声层-解码器驱动盲水印网络,称为 ND-END,它利用噪声层的先验知识和解码器提取的特征来指导编码器生成具有较少冗余修改的图像,从而增强了不可感知性。为了有效模拟对抗训练过程中产生的未知噪声,我们引入了基于引导去噪扩散概率模型的未知噪声层,在图像生成过程中逐步修改预测噪声的平均值。它生成的未知噪声图像与编码图像非常相似,但会误导解码器。此外,我们还提出了一种多尺度空间信道特征提取方法,用于从噪声图像中提取多尺度信息特征,从而帮助信息提取。实验结果表明了我们模型的有效性,ND-END 实现了更低的误码率,同时将峰值信噪比提高了约 6 dB(从约 33.5 dB 提高到 39.5 dB)。
{"title":"A novel noiselayer-decoder driven blind watermarking network","authors":"Xiaorui Zhang ,&nbsp;Rui Jiang ,&nbsp;Wei Sun ,&nbsp;Sunil Kr. Jha","doi":"10.1016/j.displa.2024.102823","DOIUrl":"10.1016/j.displa.2024.102823","url":null,"abstract":"<div><p>Most blind watermarking methods adopt the Encode-Noiselayer-Decoder network architecture, called END. However, there are issues that impact the imperceptibility and robustness of the watermarking, such as the encoder blindly embedding redundant features, adversarial training failing to simulate unknown noise effectively, and the limited capability of single-scale feature extraction. To address these challenges, we propose a new Noiselayer-Decoder-driven blind watermarking network, called ND-END, which leverages prior knowledge of the noise layer and features extracted by the decoder to guide the encoder for generating images with fewer redundant modifications, enhancing the imperceptibility. To effectively simulate the unknown noise caused during adversarial training, we introduce an unknown noise layer based on the guided denoising diffusion probabilistic model, which gradually modifies the mean value of the predicted noise during the image generation process. It produces unknown noise images that closely resemble the encoded images but can mislead the decoder. Moreover, we propose a multi-scale spatial-channel feature extraction method for extracting multi-scale message features from the noised image, which aids in message extraction. Experimental results demonstrate the effectiveness of our model, ND-END achieves a lower bit error rate while improving the peak signal-to-noise ratio by approximately 6 dB (from about 33.5 dB to 39.5 dB).</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102823"},"PeriodicalIF":3.7,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142169129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic identification of breech face impressions based on deep local features 基于深层局部特征自动识别臀面印模
IF 3.7 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-09-02 DOI: 10.1016/j.displa.2024.102822
Baohong Li, Hao Zhang, Ashraf Uz Zaman Robin, Qianqian Yu

Breech face impressions are an essential type of physical evidence in forensic investigations. However, their surface morphology is complex and varies based on the machining method used on the gun’s breech face, making traditional handcrafted local feature-based methods exhibit high false rates and are unsuitable for striated impressions. We proposed a deep local feature-based method for firearm identification utilizing Detector-Free Local Feature Matching with Transformers (LoFTR). This method removes the module of feature point detection and directly utilizes self and cross-attention layers in the Transformer to transform the convolved coarse-level feature maps into a series of dense feature descriptors. Subsequently, matches with high confidence scores are filtered based on the score matrix calculated from the dense descriptors. Finally, the screened initial matches are refined into the convolved fine-level features, and a correlation-based approach is used to obtain the exact location of the match. Validation tests were conducted using three authoritative sets of the breech face impressions datasets provided by the National Institute of Standards and Technology (NIST). The validation results show that, compared with the traditional handcrafted local-feature based methods, the proposed method in this paper yields a lower identification error rate. Notably, the method can not only deal with granular impressions, but can also be applied to the striated impressions. The results indicate that the method proposed in this paper can be utilized for comparative analysis of breech face impressions, and provide a new automatic identification method for forensic investigations.

枪托表面印痕是法医调查中的一种重要物证。然而,枪支后膛面印痕的表面形态复杂,且根据枪支后膛面加工方法的不同而不同,这使得传统的基于局部特征的手工方法表现出较高的错误率,且不适合条纹印痕。我们提出了一种基于深度局部特征的枪支识别方法,该方法利用了带变换器的免检测局部特征匹配(LoFTR)。该方法取消了特征点检测模块,直接利用变换器中的自注意层和交叉注意层将卷积粗级特征图转换为一系列密集特征描述符。随后,根据密集描述符计算出的分数矩阵筛选出具有高置信度分数的匹配项。最后,将筛选出的初始匹配结果细化为卷积的精细特征,并采用基于相关性的方法来获取匹配结果的准确位置。利用美国国家标准与技术研究院(NIST)提供的三套权威的臀部面部印记数据集进行了验证测试。验证结果表明,与传统的基于局部特征的手工方法相比,本文提出的方法识别错误率较低。值得注意的是,该方法不仅可以处理颗粒状印迹,还可以应用于条纹状印迹。结果表明,本文提出的方法可用于臀部面部印痕的对比分析,为法医调查提供了一种新的自动识别方法。
{"title":"Automatic identification of breech face impressions based on deep local features","authors":"Baohong Li,&nbsp;Hao Zhang,&nbsp;Ashraf Uz Zaman Robin,&nbsp;Qianqian Yu","doi":"10.1016/j.displa.2024.102822","DOIUrl":"10.1016/j.displa.2024.102822","url":null,"abstract":"<div><p>Breech face impressions are an essential type of physical evidence in forensic investigations. However, their surface morphology is complex and varies based on the machining method used on the gun’s breech face, making traditional handcrafted local feature-based methods exhibit high false rates and are unsuitable for striated impressions. We proposed a deep local feature-based method for firearm identification utilizing Detector-Free Local Feature Matching with Transformers (LoFTR). This method removes the module of feature point detection and directly utilizes self and cross-attention layers in the Transformer to transform the convolved coarse-level feature maps into a series of dense feature descriptors. Subsequently, matches with high confidence scores are filtered based on the score matrix calculated from the dense descriptors. Finally, the screened initial matches are refined into the convolved fine-level features, and a correlation-based approach is used to obtain the exact location of the match. Validation tests were conducted using three authoritative sets of the breech face impressions datasets provided by the National Institute of Standards and Technology (NIST). The validation results show that, compared with the traditional handcrafted local-feature based methods, the proposed method in this paper yields a lower identification error rate. Notably, the method can not only deal with granular impressions, but can also be applied to the striated impressions. The results indicate that the method proposed in this paper can be utilized for comparative analysis of breech face impressions, and provide a new automatic identification method for forensic investigations.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102822"},"PeriodicalIF":3.7,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142148482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial awareness enhancement based single-stage anchor-free 3D object detection for autonomous driving 基于单级无锚三维物体检测的空间感知增强技术,用于自动驾驶
IF 3.7 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-09-02 DOI: 10.1016/j.displa.2024.102821
Xinyu Sun , Lisheng Jin , Huanhuan Wang , Zhen Huo , Yang He , Guangqi Wang

The real-time and accurate detection of three-dimensional (3D) objects based on LiDAR is a focal problem in the field of autonomous driving environment perception. Compared to two-stage and anchor-based 3D object detection methods that suffer from inference latency challenges, single-stage anchor-free 3D object detection approaches are more suitable for deployment in autonomous driving vehicles with the strict real-time requirement. However, they face the issue of insufficient spatial awareness, which can result in detection errors such as false positives and false negatives, thereby increasing the potential risks of autonomous driving. In response to this, we focus on enhancing the spatial awareness of CenterPoint, a widely used single-stage anchor-free 3D object detector in the industry. Considering the limited allocation of computational resources and the performance bottleneck caused by pillar encoder, we propose an efficient SSDCM backbone to strengthen feature representation and extraction. Furthermore, a simple BGC neck is devised to weight and exchange contextual information in order to deeply fuse multi-scale features. Combining improved backbone and neck networks, we construct a single-stage anchor-free 3D object detection model with spatial awareness enhancement, named CenterPoint-Spatial Awareness Enhancement (CenterPoint-SAE). We evaluate CenterPoint-SAE on two large-scale and challenging autonomous driving datasets, nuScenes and Waymo. It achieves 53.3% mAP and 62.5% NDS on nuScenes detection benchmark, and runs inference at a speed of 11.1 FPS. Compared to the baseline, the upgraded networks deliver a performance improvement of 1.6% mAP and 1.2% NDS at minor cost. Notably, on Waymo dataset, our method achieves competitive detection performance compared to two-stage and point-based methods.

基于激光雷达的三维(3D)物体实时准确检测是自动驾驶环境感知领域的一个焦点问题。与存在推理延迟问题的两阶段和基于锚的三维物体检测方法相比,单阶段无锚三维物体检测方法更适合部署在有严格实时性要求的自动驾驶车辆中。然而,它们面临着空间感知能力不足的问题,可能导致假阳性和假阴性等检测错误,从而增加自动驾驶的潜在风险。为此,我们重点研究了如何增强业界广泛使用的单级无锚三维物体检测器 CenterPoint 的空间感知能力。考虑到计算资源的有限分配和支柱编码器造成的性能瓶颈,我们提出了一种高效的 SSDCM 骨干来加强特征表示和提取。此外,我们还设计了一种简单的 BGC 颈部网络来加权和交换上下文信息,从而深度融合多尺度特征。结合改进后的骨干和颈部网络,我们构建了一种具有空间感知增强功能的单级无锚三维物体检测模型,命名为中心点-空间感知增强(CenterPoint-SAE)。我们在两个具有挑战性的大规模自动驾驶数据集 nuScenes 和 Waymo 上对 CenterPoint-SAE 进行了评估。它在 nuScenes 检测基准上实现了 53.3% 的 mAP 和 62.5% 的 NDS,并以 11.1 FPS 的速度运行推理。与基线相比,升级后的网络性能提高了 1.6% mAP 和 1.2% NDS,但成本较低。值得注意的是,在 Waymo 数据集上,与两阶段方法和基于点的方法相比,我们的方法实现了具有竞争力的检测性能。
{"title":"Spatial awareness enhancement based single-stage anchor-free 3D object detection for autonomous driving","authors":"Xinyu Sun ,&nbsp;Lisheng Jin ,&nbsp;Huanhuan Wang ,&nbsp;Zhen Huo ,&nbsp;Yang He ,&nbsp;Guangqi Wang","doi":"10.1016/j.displa.2024.102821","DOIUrl":"10.1016/j.displa.2024.102821","url":null,"abstract":"<div><p>The real-time and accurate detection of three-dimensional (3D) objects based on LiDAR is a focal problem in the field of autonomous driving environment perception. Compared to two-stage and anchor-based 3D object detection methods that suffer from inference latency challenges, single-stage anchor-free 3D object detection approaches are more suitable for deployment in autonomous driving vehicles with the strict real-time requirement. However, they face the issue of insufficient spatial awareness, which can result in detection errors such as false positives and false negatives, thereby increasing the potential risks of autonomous driving. In response to this, we focus on enhancing the spatial awareness of CenterPoint, a widely used single-stage anchor-free 3D object detector in the industry. Considering the limited allocation of computational resources and the performance bottleneck caused by pillar encoder, we propose an efficient SSDCM backbone to strengthen feature representation and extraction. Furthermore, a simple BGC neck is devised to weight and exchange contextual information in order to deeply fuse multi-scale features. Combining improved backbone and neck networks, we construct a single-stage anchor-free 3D object detection model with spatial awareness enhancement, named CenterPoint-Spatial Awareness Enhancement (CenterPoint-SAE). We evaluate CenterPoint-SAE on two large-scale and challenging autonomous driving datasets, nuScenes and Waymo. It achieves 53.3% mAP and 62.5% NDS on nuScenes detection benchmark, and runs inference at a speed of 11.1 FPS. Compared to the baseline, the upgraded networks deliver a performance improvement of 1.6% mAP and 1.2% NDS at minor cost. Notably, on Waymo dataset, our method achieves competitive detection performance compared to two-stage and point-based methods.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"85 ","pages":"Article 102821"},"PeriodicalIF":3.7,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142136773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Chinese–Braille translation: A two-part approach with token prediction and segmentation labeling 加强中文-盲文翻译:由标记预测和分段标记两部分组成的方法
IF 3.7 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-09-01 DOI: 10.1016/j.displa.2024.102819
Hailong Yu , Wei Su , Lei Liu , Jing Zhang , Chuan Cai , Cunlu Xu , Huajiu Quan , Yingchun Xie

Visually assistive systems for the visually impaired play a pivotal role in enhancing the quality of life for the visually impaired. Assistive technologies for the visually impaired have undergone a remarkable transformation with the advent of deep learning and sophisticated assistive devices. In particular, the paper utilizes the latest machine translation models and techniques to accomplish the Chinese–Braille translation task, providing convenience for visually impaired individuals. The Traditional end-to-end Chinese–Braille translation approach incorporates Braille dots and Braille word segmentation symbols as tokens within the model’s vocabulary. However, our findings reveal that Braille word segmentation is significantly more complex than Braille dot prediction. The paper proposes a novel Two-Part Loss (TPL) method that treats these tasks distinctly, leading to significant accuracy improvements. To enhance translation performance further, we introduce a BERT-Enhanced Segmentation Transformer (BEST) method. BEST leverages knowledge distillation techniques to transfer knowledge from a pre-trained BERT model to the translate model, mitigating its limitations in word segmentation. Additionally, soft label distillation is employed to improve overall efficacy further. The TPL approach achieves an average BLEU score improvement of 1.16 and 5.42 for Transformer and GPT models on four datasets, respectively. In addition, The work presents a two-stage deep learning-based translation approach that outperforms traditional multi-step and end-to-end methods. The proposed two-stage translation method achieves an average BLEU score improvement of 0.85 across four datasets.

视障人士视觉辅助系统在提高视障人士生活质量方面发挥着举足轻重的作用。随着深度学习和精密辅助设备的出现,视障人士辅助技术发生了显著变化。本文特别利用最新的机器翻译模型和技术来完成中文-盲文翻译任务,为视障人士提供便利。传统的端到端中文-盲文翻译方法将盲文点和盲文分词符号作为标记纳入模型词汇。然而,我们的研究结果表明,盲文单词分割比盲文点预测复杂得多。本文提出了一种新颖的两部分损失(TPL)方法,将这些任务区别对待,从而显著提高了准确性。为了进一步提高翻译性能,我们引入了 BERT 增强分割转换器 (BEST) 方法。BEST 利用知识蒸馏技术将知识从预先训练好的 BERT 模型转移到翻译模型,从而减轻了其在单词分割方面的局限性。此外,还采用了软标签蒸馏技术来进一步提高整体效率。在四个数据集上,TPL 方法使 Transformer 和 GPT 模型的平均 BLEU 分数分别提高了 1.16 和 5.42。此外,该作品还提出了一种基于深度学习的两阶段翻译方法,其性能优于传统的多步骤和端到端方法。所提出的两阶段翻译方法在四个数据集上的平均 BLEU 分数提高了 0.85。
{"title":"Enhancing Chinese–Braille translation: A two-part approach with token prediction and segmentation labeling","authors":"Hailong Yu ,&nbsp;Wei Su ,&nbsp;Lei Liu ,&nbsp;Jing Zhang ,&nbsp;Chuan Cai ,&nbsp;Cunlu Xu ,&nbsp;Huajiu Quan ,&nbsp;Yingchun Xie","doi":"10.1016/j.displa.2024.102819","DOIUrl":"10.1016/j.displa.2024.102819","url":null,"abstract":"<div><p>Visually assistive systems for the visually impaired play a pivotal role in enhancing the quality of life for the visually impaired. Assistive technologies for the visually impaired have undergone a remarkable transformation with the advent of deep learning and sophisticated assistive devices. In particular, the paper utilizes the latest machine translation models and techniques to accomplish the Chinese–Braille translation task, providing convenience for visually impaired individuals. The Traditional end-to-end Chinese–Braille translation approach incorporates Braille dots and Braille word segmentation symbols as tokens within the model’s vocabulary. However, our findings reveal that Braille word segmentation is significantly more complex than Braille dot prediction. The paper proposes a novel Two-Part Loss (TPL) method that treats these tasks distinctly, leading to significant accuracy improvements. To enhance translation performance further, we introduce a BERT-Enhanced Segmentation Transformer (BEST) method. BEST leverages knowledge distillation techniques to transfer knowledge from a pre-trained BERT model to the translate model, mitigating its limitations in word segmentation. Additionally, soft label distillation is employed to improve overall efficacy further. The TPL approach achieves an average BLEU score improvement of 1.16 and 5.42 for Transformer and GPT models on four datasets, respectively. In addition, The work presents a two-stage deep learning-based translation approach that outperforms traditional multi-step and end-to-end methods. The proposed two-stage translation method achieves an average BLEU score improvement of 0.85 across four datasets.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102819"},"PeriodicalIF":3.7,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Displays
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1