Pub Date : 2024-08-01DOI: 10.1016/j.vrih.2023.06.011
Senhua XUE, Liqing GAO, Liang WAN, Wei FENG
The hands and face are the most important parts for expressing sign language morphemes in sign language videos. However, we find that existing Continuous Sign Language Recognition (CSLR) methods lack the mining of hand and face information in visual backbones or use expensive and time-consuming external extractors to explore this information. In addition, the signs have different lengths, whereas previous CSLR methods typically use a fixed-length window to segment the video to capture sequential features and then perform global temporal modeling, which disturbs the perception of complete signs. In this study, we propose a Multi-Scale Context-Aware network (MSCA-Net) to solve the aforementioned problems. Our MSCA-Net contains two main modules: (1) Multi-Scale Motion Attention (MSMA), which uses the differences among frames to perceive information of the hands and face in multiple spatial scales, replacing the heavy feature extractors; and (2) Multi-Scale Temporal Modeling (MSTM), which explores crucial temporal information in the sign language video from different temporal scales. We conduct extensive experiments using three widely used sign language datasets, i.e., RWTH-PHOENIX-Weather-2014, RWTH-PHOENIX-Weather-2014T, and CSL-Daily. The proposed MSCA-Net achieve state-of-the-art performance, demonstrating the effectiveness of our approach.
{"title":"Multi-scale context-aware network for continuous sign language recognition","authors":"Senhua XUE, Liqing GAO, Liang WAN, Wei FENG","doi":"10.1016/j.vrih.2023.06.011","DOIUrl":"10.1016/j.vrih.2023.06.011","url":null,"abstract":"<div><p>The hands and face are the most important parts for expressing sign language morphemes in sign language videos. However, we find that existing Continuous Sign Language Recognition (CSLR) methods lack the mining of hand and face information in visual backbones or use expensive and time-consuming external extractors to explore this information. In addition, the signs have different lengths, whereas previous CSLR methods typically use a fixed-length window to segment the video to capture sequential features and then perform global temporal modeling, which disturbs the perception of complete signs. In this study, we propose a Multi-Scale Context-Aware network (MSCA-Net) to solve the aforementioned problems. Our MSCA-Net contains two main modules: <strong>(</strong>1) Multi-Scale Motion Attention (MSMA), which uses the differences among frames to perceive information of the hands and face in multiple spatial scales, replacing the heavy feature extractors; and <strong>(</strong>2) Multi-Scale Temporal Modeling (MSTM), which explores crucial temporal information in the sign language video from different temporal scales. We conduct extensive experiments using three widely used sign language datasets, i.e., RWTH-PHOENIX-Weather-2014, RWTH-PHOENIX-Weather-2014T, and CSL-Daily. The proposed MSCA-Net achieve state-of-the-art performance, demonstrating the effectiveness of our approach.</p></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"6 4","pages":"Pages 323-337"},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2096579623000414/pdfft?md5=d9cac344d105f6ddc495c1cb1e50a67a&pid=1-s2.0-S2096579623000414-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142039960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1016/j.vrih.2023.06.012
Zizhuo WANG, Kun HU, Chaoyangfan HUANG, Zixuan HU, Shuo YANG, Xingjun WANG
Digital watermarking technology plays an essential role in the work of anti-counterfeiting and traceability. However, image watermarking algorithms are weak against hybrid attacks, especially geometric at-tacks, such as cropping attacks, rotation attacks, etc. We propose a robust blind image watermarking algorithm that combines stable interest points and deep learning networks to improve the robustness of the watermarking algorithm further. First, to extract more sparse and stable interest points, we use the Superpoint algorithm for generation and design two steps to perform the screening procedure. We first keep the points with the highest possibility in a given region to ensure the sparsity of the points and then filter the robust interest points by hybrid attacks to ensure high stability. The message is embedded in sub-blocks centered on stable interest points using a deep learning-based framework. Different kinds of attacks and simulated noise are added to the adversarial training to guarantee the robustness of embedded blocks. We use the ConvNext network for watermark extraction and determine the division threshold based on the decoded values of the unembedded sub-blocks. Through extensive experimental results, we demonstrate that our proposed algorithm can improve the accuracy of the network in extracting information while ensuring high invisibility between the embedded image and the original cover image. Comparison with previous SOTA work reveals that our algorithm can achieve better visual and numerical results on hybrid and geometric attacks.
数字水印技术在防伪和溯源工作中发挥着至关重要的作用。然而,图像水印算法对混合攻击的抵抗力较弱,尤其是几何攻击,如裁剪攻击、旋转攻击等。我们提出了一种结合稳定兴趣点和深度学习网络的鲁棒盲图像水印算法,以进一步提高水印算法的鲁棒性。首先,为了提取更多稀疏且稳定的兴趣点,我们使用超级点算法进行生成,并设计了两个步骤来执行筛选程序。我们首先保留给定区域内可能性最大的点,以确保点的稀疏性,然后通过混合攻击筛选出稳健的兴趣点,以确保高稳定性。利用基于深度学习的框架,将信息嵌入以稳定兴趣点为中心的子块中。在对抗训练中加入不同类型的攻击和模拟噪声,以保证嵌入块的鲁棒性。我们使用 ConvNext 网络提取水印,并根据未嵌入子块的解码值确定分割阈值。通过大量的实验结果,我们证明了我们提出的算法可以提高网络提取信息的准确性,同时确保嵌入图像与原始覆盖图像之间的高隐蔽性。与之前的 SOTA 工作相比,我们的算法可以在混合攻击和几何攻击中取得更好的视觉和数值结果。
{"title":"Robust blind image watermarking based on interest points","authors":"Zizhuo WANG, Kun HU, Chaoyangfan HUANG, Zixuan HU, Shuo YANG, Xingjun WANG","doi":"10.1016/j.vrih.2023.06.012","DOIUrl":"10.1016/j.vrih.2023.06.012","url":null,"abstract":"<div><p>Digital watermarking technology plays an essential role in the work of anti-counterfeiting and traceability. However, image watermarking algorithms are weak against hybrid attacks, especially geometric at-tacks, such as cropping attacks, rotation attacks, etc. We propose a robust blind image watermarking algorithm that combines stable interest points and deep learning networks to improve the robustness of the watermarking algorithm further. First, to extract more sparse and stable interest points, we use the Superpoint algorithm for generation and design two steps to perform the screening procedure. We first keep the points with the highest possibility in a given region to ensure the sparsity of the points and then filter the robust interest points by hybrid attacks to ensure high stability. The message is embedded in sub-blocks centered on stable interest points using a deep learning-based framework. Different kinds of attacks and simulated noise are added to the adversarial training to guarantee the robustness of embedded blocks. We use the ConvNext network for watermark extraction and determine the division threshold based on the decoded values of the unembedded sub-blocks. Through extensive experimental results, we demonstrate that our proposed algorithm can improve the accuracy of the network in extracting information while ensuring high invisibility between the embedded image and the original cover image. Comparison with previous SOTA work reveals that our algorithm can achieve better visual and numerical results on hybrid and geometric attacks.</p></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"6 4","pages":"Pages 308-322"},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2096579623000426/pdfft?md5=0d46d851b07db92670b0c63431ec427e&pid=1-s2.0-S2096579623000426-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142039959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1016/j.vrih.2023.06.005
Yujie LIU, Xiaorui SUN, Wenbin SHAO, Yafu YUAN
Background
Despite the recent progress in 3D point cloud processing using deep convolutional neural networks, the inability to extract local features remains a challenging problem. In addition, existing methods consider only the spatial domain in the feature extraction process.
Methods
In this paper, we propose a spectral and spatial aggregation convolutional network (S2ANet), which combines spectral and spatial features for point cloud processing. First, we calculate the local frequency of the point cloud in the spectral domain. Then, we use the local frequency to group points and provide a spectral aggregation convolution module to extract the features of the points grouped by the local frequency. We simultaneously extract the local features in the spatial domain to supplement the final features.
Results
S2ANet was applied in several point cloud analysis tasks; it achieved state-of-the-art classification accuracies of 93.8%, 88.0%, and 83.1% on the ModelNet40, ShapeNetCore, and ScanObjectNN datasets, respectively. For indoor scene segmentation, training and testing were performed on the S3DIS dataset, and the mean intersection over union was 62.4%.
Conclusions
The proposed S2ANet can effectively capture the local geometric information of point clouds, thereby improving accuracy on various tasks.
{"title":"S2ANet: Combining local spectral and spatial point grouping for point cloud processing","authors":"Yujie LIU, Xiaorui SUN, Wenbin SHAO, Yafu YUAN","doi":"10.1016/j.vrih.2023.06.005","DOIUrl":"10.1016/j.vrih.2023.06.005","url":null,"abstract":"<div><h3>Background</h3><p>Despite the recent progress in 3D point cloud processing using deep convolutional neural networks, the inability to extract local features remains a challenging problem. In addition, existing methods consider only the spatial domain in the feature extraction process.</p></div><div><h3>Methods</h3><p>In this paper, we propose a spectral and spatial aggregation convolutional network (S<sup>2</sup>ANet), which combines spectral and spatial features for point cloud processing. First, we calculate the local frequency of the point cloud in the spectral domain. Then, we use the local frequency to group points and provide a spectral aggregation convolution module to extract the features of the points grouped by the local frequency. We simultaneously extract the local features in the spatial domain to supplement the final features.</p></div><div><h3>Results</h3><p>S<sup>2</sup>ANet was applied in several point cloud analysis tasks; it achieved state-of-the-art classification accuracies of 93.8%, 88.0%, and 83.1% on the ModelNet40, ShapeNetCore, and ScanObjectNN datasets, respectively. For indoor scene segmentation, training and testing were performed on the S3DIS dataset, and the mean intersection over union was 62.4%.</p></div><div><h3>Conclusions</h3><p>The proposed S<sup>2</sup>ANet can effectively capture the local geometric information of point clouds, thereby improving accuracy on various tasks.</p></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"6 4","pages":"Pages 267-279"},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2096579623000360/pdfft?md5=718a7d943dc6468abf44b38521bcc2cb&pid=1-s2.0-S2096579623000360-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142039956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1016/j.vrih.2023.06.010
Chuanyu PAN , Guowei YANG , Taijiang MU , Yu-Kun LAI
Background
With the development of virtual reality (VR) technology, there is a growing need for customized 3D avatars. However, traditional methods for 3D avatar modeling are either time-consuming or fail to retain the similarity to the person being modeled. This study presents a novel framework for generating animatable 3D cartoon faces from a single portrait image.
Methods
First, we transferred an input real-world portrait to a stylized cartoon image using StyleGAN. We then proposed a two-stage reconstruction method to recover a 3D cartoon face with detailed texture. Our two-stage strategy initially performs coarse estimation based on template models and subsequently refines the model by nonrigid deformation under landmark supervision. Finally, we proposed a semantic-preserving face-rigging method based on manually created templates and deformation transfer.
Conclusions
Compared with prior arts, the qualitative and quantitative results show that our method achieves better accuracy, aesthetics, and similarity criteria. Furthermore, we demonstrated the capability of the proposed 3D model for real-time facial animation.
{"title":"Generating animatable 3D cartoon faces from single portraits","authors":"Chuanyu PAN , Guowei YANG , Taijiang MU , Yu-Kun LAI","doi":"10.1016/j.vrih.2023.06.010","DOIUrl":"10.1016/j.vrih.2023.06.010","url":null,"abstract":"<div><h3>Background</h3><p>With the development of virtual reality (VR) technology, there is a growing need for customized 3D avatars. However, traditional methods for 3D avatar modeling are either time-consuming or fail to retain the similarity to the person being modeled. This study presents a novel framework for generating animatable 3D cartoon faces from a single portrait image.</p></div><div><h3>Methods</h3><p>First, we transferred an input real-world portrait to a stylized cartoon image using StyleGAN. We then proposed a two-stage reconstruction method to recover a 3D cartoon face with detailed texture. Our two-stage strategy initially performs coarse estimation based on template models and subsequently refines the model by nonrigid deformation under landmark supervision. Finally, we proposed a semantic-preserving face-rigging method based on manually created templates and deformation transfer.</p></div><div><h3>Conclusions</h3><p>Compared with prior arts, the qualitative and quantitative results show that our method achieves better accuracy, aesthetics, and similarity criteria. Furthermore, we demonstrated the capability of the proposed 3D model for real-time facial animation.</p></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"6 4","pages":"Pages 292-307"},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2096579623000359/pdfft?md5=e0641053e4314662ffe5dca1c167d86b&pid=1-s2.0-S2096579623000359-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142039958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01DOI: 10.1016/j.vrih.2023.05.001
Siyi XUN , Yan ZHANG , Sixu DUAN , Mingwei WANG , Jiangang CHEN , Tong TONG , Qinquan GAO , Chantong LAM , Menghan HU , Tao TAN
Background
Magnetic resonance imaging (MRI) has played an important role in the rapid growth of medical imaging diagnostic technology, especially in the diagnosis and treatment of brain tumors owing to its non-invasive characteristics and superior soft tissue contrast. However, brain tumors are characterized by high non-uniformity and non-obvious boundaries in MRI images because of their invasive and highly heterogeneous nature. In addition, the labeling of tumor areas is time-consuming and laborious.
Methods
To address these issues, this study uses a residual grouped convolution module, convolutional block attention module, and bilinear interpolation upsampling method to improve the classical segmentation network U-net. The influence of network normalization, loss function, and network depth on segmentation performance is further considered.
Results
In the experiments, the Dice score of the proposed segmentation model reached 97.581%, which is 12.438% higher than that of traditional U-net, demonstrating the effective segmentation of MRI brain tumor images.
Conclusions
In conclusion, we use the improved U-net network to achieve a good segmentation effect of brain tumor MRI images.
{"title":"ARGA-Unet: Advanced U-net segmentation model using residual grouped convolution and attention mechanism for brain tumor MRI image segmentation","authors":"Siyi XUN , Yan ZHANG , Sixu DUAN , Mingwei WANG , Jiangang CHEN , Tong TONG , Qinquan GAO , Chantong LAM , Menghan HU , Tao TAN","doi":"10.1016/j.vrih.2023.05.001","DOIUrl":"https://doi.org/10.1016/j.vrih.2023.05.001","url":null,"abstract":"<div><h3>Background</h3><p>Magnetic resonance imaging (MRI) has played an important role in the rapid growth of medical imaging diagnostic technology, especially in the diagnosis and treatment of brain tumors owing to its non-invasive characteristics and superior soft tissue contrast. However, brain tumors are characterized by high non-uniformity and non-obvious boundaries in MRI images because of their invasive and highly heterogeneous nature. In addition, the labeling of tumor areas is time-consuming and laborious.</p></div><div><h3>Methods</h3><p>To address these issues, this study uses a residual grouped convolution module, convolutional block attention module, and bilinear interpolation upsampling method to improve the classical segmentation network U-net. The influence of network normalization, loss function, and network depth on segmentation performance is further considered.</p></div><div><h3>Results</h3><p>In the experiments, the Dice score of the proposed segmentation model reached 97.581%, which is 12.438% higher than that of traditional U-net, demonstrating the effective segmentation of MRI brain tumor images.</p></div><div><h3>Conclusions</h3><p>In conclusion, we use the improved U-net network to achieve a good segmentation effect of brain tumor MRI images.</p></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"6 3","pages":"Pages 203-216"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2096579623000232/pdfft?md5=5e16730452951aa1e3b2edacee01d06e&pid=1-s2.0-S2096579623000232-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141481556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01DOI: 10.1016/j.vrih.2024.04.002
Yuanzong Mei , Wenyi Wang , Xi Liu , Wei Yong , Weijie Wu , Yifan Zhu , Shuai Wang , Jianwen Chen
Background
Face image animation generates a synthetic human face video that harmoniously integrates the identity derived from the source image and facial motion obtained from the driving video. This technology could be beneficial in multiple medical fields, such as diagnosis and privacy protection. Previous studies on face animation often relied on a single source image to generate an output video. With a significant pose difference between the source image and the driving frame, the quality of the generated video is likely to be suboptimal because the source image may not provide sufficient features for the warped feature map.
Methods
In this study, we propose a novel face-animation scheme based on multiple sources and perspective alignment to address these issues. We first introduce a multiple-source sampling and selection module to screen the optimal source image set from the provided driving video. We then propose an inter-frame interpolation and alignment module to further eliminate the misalignment between the selected source image and the driving frame.
Conclusions
The proposed method exhibits superior performance in terms of objective metrics and visual quality in large-angle animation scenes compared to other state-of-the-art face animation methods. It indicates the effectiveness of the proposed method in addressing the distortion issues in large-angle animation.
{"title":"Face animation based on multiple sources and perspective alignment","authors":"Yuanzong Mei , Wenyi Wang , Xi Liu , Wei Yong , Weijie Wu , Yifan Zhu , Shuai Wang , Jianwen Chen","doi":"10.1016/j.vrih.2024.04.002","DOIUrl":"https://doi.org/10.1016/j.vrih.2024.04.002","url":null,"abstract":"<div><h3>Background</h3><p>Face image animation generates a synthetic human face video that harmoniously integrates the identity derived from the source image and facial motion obtained from the driving video. This technology could be beneficial in multiple medical fields, such as diagnosis and privacy protection<em>.</em> Previous studies on face animation often relied on a single source image to generate an output video. With a significant pose difference between the source image and the driving frame, the quality of the generated video is likely to be suboptimal because the source image may not provide sufficient features for the warped feature map.</p></div><div><h3>Methods</h3><p>In this study, we propose a novel face-animation scheme based on multiple sources and perspective alignment to address these issues. We first introduce a multiple-source sampling and selection module to screen the optimal source image set from the provided driving video. We then propose an inter-frame interpolation and alignment module to further eliminate the misalignment between the selected source image and the driving frame.</p></div><div><h3>Conclusions</h3><p>The proposed method exhibits superior performance in terms of objective metrics and visual quality in large-angle animation scenes compared to other state-of-the-art face animation methods. It indicates the effectiveness of the proposed method in addressing the distortion issues in large-angle animation.</p></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"6 3","pages":"Pages 252-266"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2096579624000202/pdfft?md5=2a9475967792588ba319db5427a9033d&pid=1-s2.0-S2096579624000202-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141484842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01DOI: 10.1016/j.vrih.2024.04.001
Lai WEI, Menghan HU
Deep learning has been extensively applied to medical image segmentation, resulting in significant advancements in the field of deep neural networks for medical image segmentation since the notable success of U-Net in 2015. However, the application of deep learning models to ocular medical image segmentation poses unique challenges, especially compared to other body parts, due to the complexity, small size, and blurriness of such images, coupled with the scarcity of data. This article aims to provide a comprehensive review of medical image segmentation from two perspectives: the development of deep network structures and the application of segmentation in ocular imaging. Initially, the article introduces an overview of medical imaging, data processing, and performance evaluation metrics. Subsequently, it analyzes recent developments in U-Net-based network structures. Finally, for the segmentation of ocular medical images, the application of deep learning is reviewed and categorized by the type of ocular tissue.
{"title":"A review of medical ocular image segmentation","authors":"Lai WEI, Menghan HU","doi":"10.1016/j.vrih.2024.04.001","DOIUrl":"https://doi.org/10.1016/j.vrih.2024.04.001","url":null,"abstract":"<div><p>Deep learning has been extensively applied to medical image segmentation, resulting in significant advancements in the field of deep neural networks for medical image segmentation since the notable success of U-Net in 2015. However, the application of deep learning models to ocular medical image segmentation poses unique challenges, especially compared to other body parts, due to the complexity, small size, and blurriness of such images, coupled with the scarcity of data. This article aims to provide a comprehensive review of medical image segmentation from two perspectives: the development of deep network structures and the application of segmentation in ocular imaging. Initially, the article introduces an overview of medical imaging, data processing, and performance evaluation metrics. Subsequently, it analyzes recent developments in U-Net-based network structures. Finally, for the segmentation of ocular medical images, the application of deep learning is reviewed and categorized by the type of ocular tissue.</p></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"6 3","pages":"Pages 181-202"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S209657962400010X/pdfft?md5=c30a9952442a34ae8a35e52683ed1214&pid=1-s2.0-S209657962400010X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141484810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01DOI: 10.1016/j.vrih.2024.02.001
B.A.O. Lingyun , Zhengrui HUANG , Zehui LIN , Yue SUN , Hui CHEN , You LI , Zhang LI , Xiaochen YUAN , Lin XU , Tao TAN
Background
Deep convolutional neural networks have garnered considerable attention in numerous machine learning applications, particularly in visual recognition tasks such as image and video analyses. There is a growing interest in applying this technology to diverse applications in medical image analysis. Automated three-dimensional Breast Ultrasound is a vital tool for detecting breast cancer, and computer-assisted diagnosis software, developed based on deep learning, can effectively assist radiologists in diagnosis. However, the network model is prone to overfitting during training, owing to challenges such as insufficient training data. This study attempts to solve the problem caused by small datasets and improve model detection performance.
Methods
We propose a breast cancer detection framework based on deep learning (a transfer learning method based on cross-organ cancer detection) and a contrastive learning method based on breast imaging reporting and data systems (BI-RADS).
Results
When using cross organ transfer learning and BIRADS based contrastive learning, the average sensitivity of the model increased by a maximum of 16.05%.
Conclusion
Our experiments have demonstrated that the parameters and experiences of cross-organ cancer detection can be mutually referenced, and contrastive learning method based on BI-RADS can improve the detection performance of the model.
{"title":"Automatic detection of breast lesions in automated 3D breast ultrasound with cross-organ transfer learning","authors":"B.A.O. Lingyun , Zhengrui HUANG , Zehui LIN , Yue SUN , Hui CHEN , You LI , Zhang LI , Xiaochen YUAN , Lin XU , Tao TAN","doi":"10.1016/j.vrih.2024.02.001","DOIUrl":"https://doi.org/10.1016/j.vrih.2024.02.001","url":null,"abstract":"<div><h3>Background</h3><p>Deep convolutional neural networks have garnered considerable attention in numerous machine learning applications, particularly in visual recognition tasks such as image and video analyses. There is a growing interest in applying this technology to diverse applications in medical image analysis. Automated three-dimensional Breast Ultrasound is a vital tool for detecting breast cancer, and computer-assisted diagnosis software, developed based on deep learning, can effectively assist radiologists in diagnosis. However, the network model is prone to overfitting during training, owing to challenges such as insufficient training data. This study attempts to solve the problem caused by small datasets and improve model detection performance.</p></div><div><h3>Methods</h3><p>We propose a breast cancer detection framework based on deep learning (a transfer learning method based on cross-organ cancer detection) and a contrastive learning method based on breast imaging reporting and data systems (BI-RADS).</p></div><div><h3>Results</h3><p>When using cross organ transfer learning and BIRADS based contrastive learning, the average sensitivity of the model increased by a maximum of 16.05%.</p></div><div><h3>Conclusion</h3><p>Our experiments have demonstrated that the parameters and experiences of cross-organ cancer detection can be mutually referenced, and contrastive learning method based on BI-RADS can improve the detection performance of the model.</p></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"6 3","pages":"Pages 239-251"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S209657962400007X/pdfft?md5=a1bdf0d74f499e2548f6f5735dd9b5bf&pid=1-s2.0-S209657962400007X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141484848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01DOI: 10.1016/j.vrih.2023.08.002
Hui XIE , Jianfang ZHANG , Lijuan DING , Tao TAN , Qing LI
Background
The prognosis and survival of patients with lung cancer are likely to deteriorate with metastasis. Using deep-learning in the detection of lymph node metastasis can facilitate the noninvasive calculation of the likelihood of such metastasis, thereby providing clinicians with crucial information to enhance diagnostic precision and ultimately improve patient survival and prognosis
Methods
In total, 623 eligible patients were recruited from two medical institutions. Seven deep learning models, namely Alex, GoogLeNet, Resnet18, Resnet101, Vgg16, Vgg19, and MobileNetv3 (small), were utilized to extract deep image histological features. The dimensionality of the extracted features was then reduced using the Spearman correlation coefficient (r ≥ 0.9) and Least Absolute Shrinkage and Selection Operator. Eleven machine learning methods, namely Support Vector Machine, K-nearest neighbor, Random Forest, Extra Trees, XGBoost, LightGBM, Naive Bayes, AdaBoost, Gradient Boosting Decision Tree, Linear Regression, and Multilayer Perceptron, were employed to construct classification prediction models for the filtered final features. The diagnostic performances of the models were assessed using various metrics, including accuracy, area under the receiver operating characteristic curve, sensitivity, specificity, positive predictive value, and negative predictive value. Calibration and decision-curve analyses were also performed.
Results
The present study demonstrated that using deep radiomic features extracted from Vgg16, in conjunction with a prediction model constructed via a linear regression algorithm, effectively distinguished the status of mediastinal lymph nodes in patients with lung cancer. The performance of the model was evaluated based on various metrics, including accuracy, area under the receiver operating characteristic curve, sensitivity, specificity, positive predictive value, and negative predictive value, which yielded values of 0.808, 0.834, 0.851, 0.745, 0.829, and 0.776, respectively. The validation set of the model was assessed using clinical decision curves, calibration curves, and confusion matrices, which collectively demonstrated the model's stability and accuracy
Conclusion
In this study, information on the deep radiomics of Vgg16 was obtained from computed tomography images, and the linear regression method was able to accurately diagnose mediastinal lymph node metastases in patients with lung cancer.
{"title":"Combining machine and deep transfer learning for mediastinal lymph node evaluation in patients with lung cancer","authors":"Hui XIE , Jianfang ZHANG , Lijuan DING , Tao TAN , Qing LI","doi":"10.1016/j.vrih.2023.08.002","DOIUrl":"https://doi.org/10.1016/j.vrih.2023.08.002","url":null,"abstract":"<div><h3>Background</h3><p>The prognosis and survival of patients with lung cancer are likely to deteriorate with metastasis. Using deep-learning in the detection of lymph node metastasis can facilitate the noninvasive calculation of the likelihood of such metastasis, thereby providing clinicians with crucial information to enhance diagnostic precision and ultimately improve patient survival and prognosis</p></div><div><h3>Methods</h3><p>In total, 623 eligible patients were recruited from two medical institutions. Seven deep learning models, namely Alex, GoogLeNet, Resnet18, Resnet101, Vgg16, Vgg19, and MobileNetv3 (small), were utilized to extract deep image histological features. The dimensionality of the extracted features was then reduced using the Spearman correlation coefficient (r ≥ 0.9) and Least Absolute Shrinkage and Selection Operator. Eleven machine learning methods, namely Support Vector Machine, K-nearest neighbor, Random Forest, Extra Trees, XGBoost, LightGBM, Naive Bayes, AdaBoost, Gradient Boosting Decision Tree, Linear Regression, and Multilayer Perceptron, were employed to construct classification prediction models for the filtered final features. The diagnostic performances of the models were assessed using various metrics, including accuracy, area under the receiver operating characteristic curve, sensitivity, specificity, positive predictive value, and negative predictive value. Calibration and decision-curve analyses were also performed.</p></div><div><h3>Results</h3><p>The present study demonstrated that using deep radiomic features extracted from Vgg16, in conjunction with a prediction model constructed via a linear regression algorithm, effectively distinguished the status of mediastinal lymph nodes in patients with lung cancer. The performance of the model was evaluated based on various metrics, including accuracy, area under the receiver operating characteristic curve, sensitivity, specificity, positive predictive value, and negative predictive value, which yielded values of 0.808, 0.834, 0.851, 0.745, 0.829, and 0.776, respectively. The validation set of the model was assessed using clinical decision curves, calibration curves, and confusion matrices, which collectively demonstrated the model's stability and accuracy</p></div><div><h3>Conclusion</h3><p>In this study, information on the deep radiomics of Vgg16 was obtained from computed tomography images, and the linear regression method was able to accurately diagnose mediastinal lymph node metastases in patients with lung cancer.</p></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"6 3","pages":"Pages 226-238"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2096579623000463/pdfft?md5=d355b811e3e99356748d10c345ee1b33&pid=1-s2.0-S2096579623000463-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141484841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-01DOI: 10.1016/j.vrih.2023.05.002
Yiman LIU , Size HOU , Xiaoxiang HAN , Tongtong LIANG , Menghan HU , Xin WANG , Wei GU , Yuqi ZHANG , Qingli LI , Jiangang CHEN
Background
Atrial septal defect (ASD) is one of the most common congenital heart diseases. The diagnosis of ASD via transthoracic echocardiography is subjective and time-consuming.
Methods
The objective of this study was to evaluate the feasibility and accuracy of automatic detection of ASD in children based on color Doppler echocardiographic static images using end-to-end convolutional neural networks. The proposed depthwise separable convolution model identifies ASDs with static color Doppler images in a standard view. Among the standard views, we selected two echocardiographic views, i.e., the subcostal sagittal view of the atrium septum and the low parasternal four-chamber view. The developed ASD detection system was validated using a training set consisting of 396 echocardiographic images corresponding to 198 cases. Additionally, an independent test dataset of 112 images corresponding to 56 cases was used, including 101 cases with ASDs and 153 cases with normal hearts.
Results
The average area under the receiver operating characteristic curve, recall, precision, specificity, F1-score, and accuracy of the proposed ASD detection model were 91.99, 80.00, 82.22, 87.50, 79.57, and 83.04, respectively.
Conclusions
The proposed model can accurately and automatically identify ASD, providing a strong foundation for the intelligent diagnosis of congenital heart diseases.
{"title":"Intelligent diagnosis of atrial septal defect in children using echocardiography with deep learning","authors":"Yiman LIU , Size HOU , Xiaoxiang HAN , Tongtong LIANG , Menghan HU , Xin WANG , Wei GU , Yuqi ZHANG , Qingli LI , Jiangang CHEN","doi":"10.1016/j.vrih.2023.05.002","DOIUrl":"https://doi.org/10.1016/j.vrih.2023.05.002","url":null,"abstract":"<div><h3>Background</h3><p>Atrial septal defect (ASD) is one of the most common congenital heart diseases. The diagnosis of ASD via transthoracic echocardiography is subjective and time-consuming.</p></div><div><h3>Methods</h3><p>The objective of this study was to evaluate the feasibility and accuracy of automatic detection of ASD in children based on color Doppler echocardiographic static images using end-to-end convolutional neural networks. The proposed depthwise separable convolution model identifies ASDs with static color Doppler images in a standard view. Among the standard views, we selected two echocardiographic views, i.e., the subcostal sagittal view of the atrium septum and the low parasternal four-chamber view. The developed ASD detection system was validated using a training set consisting of 396 echocardiographic images corresponding to 198 cases. Additionally, an independent test dataset of 112 images corresponding to 56 cases was used, including 101 cases with ASDs and 153 cases with normal hearts.</p></div><div><h3>Results</h3><p>The average area under the receiver operating characteristic curve, recall, precision, specificity, F1-score, and accuracy of the proposed ASD detection model were 91.99, 80.00, 82.22, 87.50, 79.57, and 83.04, respectively.</p></div><div><h3>Conclusions</h3><p>The proposed model can accurately and automatically identify ASD, providing a strong foundation for the intelligent diagnosis of congenital heart diseases.</p></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"6 3","pages":"Pages 217-225"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2096579623000244/pdfft?md5=3ade0d91e713f6555fd1c75181120add&pid=1-s2.0-S2096579623000244-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141484840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}