首页 > 最新文献

Visual Computing for Industry Biomedicine and Art最新文献

英文 中文
Principal component analysis and fine-tuned vision transformation integrating model explainability for breast cancer prediction.
IF 3.2 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-03-10 DOI: 10.1186/s42492-025-00186-x
Huong Hoang Luong, Phuc Phan Hong, Dat Vo Minh, Thinh Nguyen Le Quang, Anh Dinh The, Nguyen Thai-Nghe, Hai Thanh Nguyen

Breast cancer, which is the most commonly diagnosed cancers among women, is a notable health issues globally. Breast cancer is a result of abnormal cells in the breast tissue growing out of control. Histopathology, which refers to the detection and learning of tissue diseases, has appeared as a solution for breast cancer treatment as it plays a vital role in its diagnosis and classification. Thus, considerable research on histopathology in medical and computer science has been conducted to develop an effective method for breast cancer treatment. In this study, a vision Transformer (ViT) was employed to classify tumors into two classes, benign and malignant, in the Breast Cancer Histopathological Database (BreakHis). To enhance the model performance, we introduced the novel multi-head locality large kernel self-attention during fine-tuning, achieving an accuracy of 95.94% at 100× magnification, thereby improving the accuracy by 3.34% compared to a standard ViT (which uses multi-head self-attention). In addition, the application of principal component analysis for dimensionality reduction led to an accuracy improvement of 3.34%, highlighting its role in mitigating overfitting and reducing the computational complexity. In the final phase, SHapley Additive exPlanations, Local Interpretable Model-agnostic Explanations, and Gradient-weighted Class Activation Mapping were used for the interpretability and explainability of machine-learning models, aiding in understanding the feature importance and local explanations, and visualizing the model attention. In another experiment, ensemble learning with VGGIN further boosted the performance to 97.13% accuracy. Our approach exhibited a 0.98% to 17.13% improvement in accuracy compared with state-of-the-art methods, establishing a new benchmark for breast cancer histopathological image classification.

{"title":"Principal component analysis and fine-tuned vision transformation integrating model explainability for breast cancer prediction.","authors":"Huong Hoang Luong, Phuc Phan Hong, Dat Vo Minh, Thinh Nguyen Le Quang, Anh Dinh The, Nguyen Thai-Nghe, Hai Thanh Nguyen","doi":"10.1186/s42492-025-00186-x","DOIUrl":"https://doi.org/10.1186/s42492-025-00186-x","url":null,"abstract":"<p><p>Breast cancer, which is the most commonly diagnosed cancers among women, is a notable health issues globally. Breast cancer is a result of abnormal cells in the breast tissue growing out of control. Histopathology, which refers to the detection and learning of tissue diseases, has appeared as a solution for breast cancer treatment as it plays a vital role in its diagnosis and classification. Thus, considerable research on histopathology in medical and computer science has been conducted to develop an effective method for breast cancer treatment. In this study, a vision Transformer (ViT) was employed to classify tumors into two classes, benign and malignant, in the Breast Cancer Histopathological Database (BreakHis). To enhance the model performance, we introduced the novel multi-head locality large kernel self-attention during fine-tuning, achieving an accuracy of 95.94% at 100× magnification, thereby improving the accuracy by 3.34% compared to a standard ViT (which uses multi-head self-attention). In addition, the application of principal component analysis for dimensionality reduction led to an accuracy improvement of 3.34%, highlighting its role in mitigating overfitting and reducing the computational complexity. In the final phase, SHapley Additive exPlanations, Local Interpretable Model-agnostic Explanations, and Gradient-weighted Class Activation Mapping were used for the interpretability and explainability of machine-learning models, aiding in understanding the feature importance and local explanations, and visualizing the model attention. In another experiment, ensemble learning with VGGIN further boosted the performance to 97.13% accuracy. Our approach exhibited a 0.98% to 17.13% improvement in accuracy compared with state-of-the-art methods, establishing a new benchmark for breast cancer histopathological image classification.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"5"},"PeriodicalIF":3.2,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143598058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Global residual stress field inference method for die-forging structural parts based on fusion of monitoring data and distribution prior.
IF 3.2 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-03-06 DOI: 10.1186/s42492-025-00187-w
Shuyuan Chen, Yingguang Li, Changqing Liu, Zhiwei Zhao, Zhibin Chen, Xiao Liu

Die-forging structural parts are widely used in the main load-bearing components of aircrafts because of their excellent mechanical properties and fatigue resistance. However, the forming and heat treatment processes of die-forging structural parts are complex, leading to high levels of internal stress and a complex distribution of residual stress fields (RSFs), which affect the deformation, fatigue life, and failure of structural parts throughout their lifecycles. Hence, the global RSF can provide the basis for process control. The existing RSF inference method based on deformation force data can utilize monitoring data to infer the global RSF of a regular part. However, owing to the irregular geometry of die-forging structural parts and the complexity of the RSF, it is challenging to solve ill-conditioned problems during the inference process, which makes it difficult to obtain the RSF accurately. This paper presents a global RSF inference method for the die-forging structural parts based on the fusion of monitoring data and distribution prior. Prior knowledge was derived from the RSF distribution trends obtained through finite element analysis. This enables the low-dimensional characterization of the RSF, reducing the number of parameters required to solve the equations. The effectiveness of this method was validated in both simulation and actual environments.

{"title":"Global residual stress field inference method for die-forging structural parts based on fusion of monitoring data and distribution prior.","authors":"Shuyuan Chen, Yingguang Li, Changqing Liu, Zhiwei Zhao, Zhibin Chen, Xiao Liu","doi":"10.1186/s42492-025-00187-w","DOIUrl":"10.1186/s42492-025-00187-w","url":null,"abstract":"<p><p>Die-forging structural parts are widely used in the main load-bearing components of aircrafts because of their excellent mechanical properties and fatigue resistance. However, the forming and heat treatment processes of die-forging structural parts are complex, leading to high levels of internal stress and a complex distribution of residual stress fields (RSFs), which affect the deformation, fatigue life, and failure of structural parts throughout their lifecycles. Hence, the global RSF can provide the basis for process control. The existing RSF inference method based on deformation force data can utilize monitoring data to infer the global RSF of a regular part. However, owing to the irregular geometry of die-forging structural parts and the complexity of the RSF, it is challenging to solve ill-conditioned problems during the inference process, which makes it difficult to obtain the RSF accurately. This paper presents a global RSF inference method for the die-forging structural parts based on the fusion of monitoring data and distribution prior. Prior knowledge was derived from the RSF distribution trends obtained through finite element analysis. This enables the low-dimensional characterization of the RSF, reducing the number of parameters required to solve the equations. The effectiveness of this method was validated in both simulation and actual environments.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"4"},"PeriodicalIF":3.2,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11885777/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143568451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable machine learning framework for cataracts recognition using visual features. 使用视觉特征识别白内障的可解释机器学习框架。
IF 3.2 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-17 DOI: 10.1186/s42492-024-00183-6
Xiao Wu, Lingxi Hu, Zunjie Xiao, Xiaoqing Zhang, Risa Higashita, Jiang Liu

Cataract is the leading ocular disease of blindness and visual impairment globally. Deep neural networks (DNNs) have achieved promising cataracts recognition performance based on anterior segment optical coherence tomography (AS-OCT) images; however, they have poor explanations, limiting their clinical applications. In contrast, visual features extracted from original AS-OCT images and their transform forms (e.g., AS-OCT-based histograms) have good explanations but have not been fully exploited. Motivated by these observations, an explainable machine learning framework to recognize cataracts severity levels automatically using AS-OCT images was proposed, consisting of three stages: visual feature extraction, feature importance explanation and selection, and recognition. First, the intensity histogram and intensity-based statistical methods are applied to extract visual features from original AS-OCT images and AS-OCT-based histograms. Subsequently, the SHapley Additive exPlanations and Pearson correlation coefficient methods are applied to analyze the feature importance and select significant visual features. Finally, an ensemble multi-class ridge regression method is applied to recognize the cataracts severity levels based on the selected visual features. Experiments on a clinical AS-OCT-NC dataset demonstrate that the proposed framework not only achieves competitive performance through comparisons with DNNs, but also has a good explanation ability, meeting the requirements of clinical diagnostic practice.

白内障是全球致盲和视力损害的主要眼部疾病。基于前段光学相干断层扫描(AS-OCT)图像的深度神经网络(DNNs)已经取得了很好的白内障识别性能;然而,它们的解释不充分,限制了它们的临床应用。相比之下,从原始AS-OCT图像中提取的视觉特征及其变换形式(例如,基于AS-OCT的直方图)有很好的解释,但尚未得到充分利用。基于这些观察结果,提出了一种基于AS-OCT图像的白内障严重程度自动识别的机器学习框架,该框架包括三个阶段:视觉特征提取、特征重要性解释和选择以及识别。首先,应用强度直方图和基于强度的统计方法从原始AS-OCT图像和基于AS-OCT的直方图中提取视觉特征;随后,应用SHapley加性解释和Pearson相关系数方法分析特征重要性,选择显著的视觉特征。最后,基于所选择的视觉特征,应用集成多类岭回归方法对白内障的严重程度进行识别。在临床AS-OCT-NC数据集上的实验表明,所提出的框架不仅与dnn相比具有竞争力,而且具有良好的解释能力,满足临床诊断实践的要求。
{"title":"Explainable machine learning framework for cataracts recognition using visual features.","authors":"Xiao Wu, Lingxi Hu, Zunjie Xiao, Xiaoqing Zhang, Risa Higashita, Jiang Liu","doi":"10.1186/s42492-024-00183-6","DOIUrl":"10.1186/s42492-024-00183-6","url":null,"abstract":"<p><p>Cataract is the leading ocular disease of blindness and visual impairment globally. Deep neural networks (DNNs) have achieved promising cataracts recognition performance based on anterior segment optical coherence tomography (AS-OCT) images; however, they have poor explanations, limiting their clinical applications. In contrast, visual features extracted from original AS-OCT images and their transform forms (e.g., AS-OCT-based histograms) have good explanations but have not been fully exploited. Motivated by these observations, an explainable machine learning framework to recognize cataracts severity levels automatically using AS-OCT images was proposed, consisting of three stages: visual feature extraction, feature importance explanation and selection, and recognition. First, the intensity histogram and intensity-based statistical methods are applied to extract visual features from original AS-OCT images and AS-OCT-based histograms. Subsequently, the SHapley Additive exPlanations and Pearson correlation coefficient methods are applied to analyze the feature importance and select significant visual features. Finally, an ensemble multi-class ridge regression method is applied to recognize the cataracts severity levels based on the selected visual features. Experiments on a clinical AS-OCT-NC dataset demonstrate that the proposed framework not only achieves competitive performance through comparisons with DNNs, but also has a good explanation ability, meeting the requirements of clinical diagnostic practice.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"3"},"PeriodicalIF":3.2,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11748710/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143012990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harmonized technical standard test methods for quality evaluation of medical fluorescence endoscopic imaging systems. 医用荧光内窥镜成像系统质量评价的协调技术标准试验方法。
IF 3.2 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-10 DOI: 10.1186/s42492-024-00184-5
Bodong Liu, Zhaojun Guo, Pengfei Yang, Jian'an Ye, Kunshan He, Shen Gao, Chongwei Chi, Yu An, Jie Tian

Fluorescence endoscopy technology utilizes a light source of a specific wavelength to excite the fluorescence signals of biological tissues. This capability is extremely valuable for the early detection and precise diagnosis of pathological changes. Identifying a suitable experimental approach and metric for objectively and quantitatively assessing the imaging quality of fluorescence endoscopy is imperative to enhance the image evaluation criteria of fluorescence imaging technology. In this study, we propose a new set of standards for fluorescence endoscopy technology to evaluate the optical performance and image quality of fluorescence imaging objectively and quantitatively. This comprehensive set of standards encompasses fluorescence test models and imaging quality assessment protocols to ensure that the performance of fluorescence endoscopy systems meets the required standards. In addition, it aims to enhance the accuracy and uniformity of the results by standardizing testing procedures. The formulation of pivotal metrics and testing methodologies is anticipated to facilitate direct quantitative comparisons of the performance of fluorescence endoscopy devices. This advancement is expected to foster the harmonization of clinical and preclinical evaluations using fluorescence endoscopy imaging systems, thereby improving diagnostic precision and efficiency.

荧光内窥镜技术利用特定波长的光源激发生物组织的荧光信号。这种能力对于病理变化的早期发现和精确诊断是非常有价值的。为客观定量地评价荧光内镜成像质量,确定合适的实验方法和指标是提高荧光成像技术图像评价标准的必要条件。在本研究中,我们提出了一套新的荧光内窥镜技术标准,客观定量地评价荧光成像的光学性能和图像质量。这套全面的标准包括荧光测试模型和成像质量评估协议,以确保荧光内窥镜系统的性能符合要求的标准。此外,它旨在通过标准化测试程序来提高结果的准确性和统一性。预计关键指标和测试方法的制定将促进荧光内窥镜设备性能的直接定量比较。这一进展有望促进使用荧光内窥镜成像系统的临床和临床前评估的协调,从而提高诊断的准确性和效率。
{"title":"Harmonized technical standard test methods for quality evaluation of medical fluorescence endoscopic imaging systems.","authors":"Bodong Liu, Zhaojun Guo, Pengfei Yang, Jian'an Ye, Kunshan He, Shen Gao, Chongwei Chi, Yu An, Jie Tian","doi":"10.1186/s42492-024-00184-5","DOIUrl":"10.1186/s42492-024-00184-5","url":null,"abstract":"<p><p>Fluorescence endoscopy technology utilizes a light source of a specific wavelength to excite the fluorescence signals of biological tissues. This capability is extremely valuable for the early detection and precise diagnosis of pathological changes. Identifying a suitable experimental approach and metric for objectively and quantitatively assessing the imaging quality of fluorescence endoscopy is imperative to enhance the image evaluation criteria of fluorescence imaging technology. In this study, we propose a new set of standards for fluorescence endoscopy technology to evaluate the optical performance and image quality of fluorescence imaging objectively and quantitatively. This comprehensive set of standards encompasses fluorescence test models and imaging quality assessment protocols to ensure that the performance of fluorescence endoscopy systems meets the required standards. In addition, it aims to enhance the accuracy and uniformity of the results by standardizing testing procedures. The formulation of pivotal metrics and testing methodologies is anticipated to facilitate direct quantitative comparisons of the performance of fluorescence endoscopy devices. This advancement is expected to foster the harmonization of clinical and preclinical evaluations using fluorescence endoscopy imaging systems, thereby improving diagnostic precision and efficiency.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"2"},"PeriodicalIF":3.2,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11723869/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142956034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images. 推进乳腺癌诊断:标记视觉变压器更快,更准确地分类组织病理图像。
IF 3.2 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-01-08 DOI: 10.1186/s42492-024-00181-8
Mouhamed Laid Abimouloud, Khaled Bensid, Mohamed Elleuch, Mohamed Ben Ammar, Monji Kherallah

The vision transformer (ViT) architecture, with its attention mechanism based on multi-head attention layers, has been widely adopted in various computer-aided diagnosis tasks due to its effectiveness in processing medical image information. ViTs are notably recognized for their complex architecture, which requires high-performance GPUs or CPUs for efficient model training and deployment in real-world medical diagnostic devices. This renders them more intricate than convolutional neural networks (CNNs). This difficulty is also challenging in the context of histopathology image analysis, where the images are both limited and complex. In response to these challenges, this study proposes a TokenMixer hybrid-architecture that combines the strengths of CNNs and ViTs. This hybrid architecture aims to enhance feature extraction and classification accuracy with shorter training time and fewer parameters by minimizing the number of input patches employed during training, while incorporating tokenization of input patches using convolutional layers and encoder transformer layers to process patches across all network layers for fast and accurate breast cancer tumor subtype classification. The TokenMixer mechanism is inspired by the ConvMixer and TokenLearner models. First, the ConvMixer model dynamically generates spatial attention maps using convolutional layers, enabling the extraction of patches from input images to minimize the number of input patches used in training. Second, the TokenLearner model extracts relevant regions from the selected input patches, tokenizes them to improve feature extraction, and trains all tokenized patches in an encoder transformer network. We evaluated the TokenMixer model on the BreakHis public dataset, comparing it with ViT-based and other state-of-the-art methods. Our approach achieved impressive results for both binary and multi-classification of breast cancer subtypes across various magnification levels (40×, 100×, 200×, 400×). The model demonstrated accuracies of 97.02% for binary classification and 93.29% for multi-classification, with decision times of 391.71 and 1173.56 s, respectively. These results highlight the potential of our hybrid deep ViT-CNN architecture for advancing tumor classification in histopathological images. The source code is accessible: https://github.com/abimouloud/TokenMixer .

视觉转换器(vision transformer, ViT)架构以其基于多头注意层的注意机制,在医学图像信息处理方面的有效性被广泛应用于各种计算机辅助诊断任务中。vit以其复杂的体系结构而闻名,这需要高性能gpu或cpu才能在现实世界的医疗诊断设备中进行有效的模型训练和部署。这使得它们比卷积神经网络(cnn)更复杂。在组织病理学图像分析的背景下,这一困难也是具有挑战性的,因为图像既有限又复杂。为了应对这些挑战,本研究提出了一种结合cnn和ViTs优势的TokenMixer混合架构。该混合架构旨在通过最小化训练过程中使用的输入补丁数量,以更短的训练时间和更少的参数提高特征提取和分类精度,同时结合使用卷积层和编码器变压器层对输入补丁进行标记化,跨所有网络层处理补丁,以实现快速准确的乳腺癌肿瘤亚型分类。TokenMixer机制的灵感来自于ConvMixer和TokenLearner模型。首先,ConvMixer模型使用卷积层动态生成空间注意图,从而能够从输入图像中提取补丁,从而最大限度地减少训练中使用的输入补丁数量。其次,TokenLearner模型从选择的输入patch中提取相关区域,对其进行标记以改进特征提取,并在编码器变压器网络中训练所有标记过的patch。我们在BreakHis公共数据集上评估了TokenMixer模型,并将其与基于viti的方法和其他最先进的方法进行了比较。我们的方法在不同放大倍数(40倍、100倍、200倍、400倍)下对乳腺癌亚型的二元和多重分类都取得了令人印象深刻的结果。该模型对二元分类的准确率为97.02%,对多重分类的准确率为93.29%,决策时间分别为391.71 s和1173.56 s。这些结果突出了我们的混合深度ViT-CNN架构在组织病理学图像中推进肿瘤分类的潜力。源代码可访问:https://github.com/abimouloud/TokenMixer。
{"title":"Advancing breast cancer diagnosis: token vision transformers for faster and accurate classification of histopathology images.","authors":"Mouhamed Laid Abimouloud, Khaled Bensid, Mohamed Elleuch, Mohamed Ben Ammar, Monji Kherallah","doi":"10.1186/s42492-024-00181-8","DOIUrl":"10.1186/s42492-024-00181-8","url":null,"abstract":"<p><p>The vision transformer (ViT) architecture, with its attention mechanism based on multi-head attention layers, has been widely adopted in various computer-aided diagnosis tasks due to its effectiveness in processing medical image information. ViTs are notably recognized for their complex architecture, which requires high-performance GPUs or CPUs for efficient model training and deployment in real-world medical diagnostic devices. This renders them more intricate than convolutional neural networks (CNNs). This difficulty is also challenging in the context of histopathology image analysis, where the images are both limited and complex. In response to these challenges, this study proposes a TokenMixer hybrid-architecture that combines the strengths of CNNs and ViTs. This hybrid architecture aims to enhance feature extraction and classification accuracy with shorter training time and fewer parameters by minimizing the number of input patches employed during training, while incorporating tokenization of input patches using convolutional layers and encoder transformer layers to process patches across all network layers for fast and accurate breast cancer tumor subtype classification. The TokenMixer mechanism is inspired by the ConvMixer and TokenLearner models. First, the ConvMixer model dynamically generates spatial attention maps using convolutional layers, enabling the extraction of patches from input images to minimize the number of input patches used in training. Second, the TokenLearner model extracts relevant regions from the selected input patches, tokenizes them to improve feature extraction, and trains all tokenized patches in an encoder transformer network. We evaluated the TokenMixer model on the BreakHis public dataset, comparing it with ViT-based and other state-of-the-art methods. Our approach achieved impressive results for both binary and multi-classification of breast cancer subtypes across various magnification levels (40×, 100×, 200×, 400×). The model demonstrated accuracies of 97.02% for binary classification and 93.29% for multi-classification, with decision times of 391.71 and 1173.56 s, respectively. These results highlight the potential of our hybrid deep ViT-CNN architecture for advancing tumor classification in histopathological images. The source code is accessible: https://github.com/abimouloud/TokenMixer .</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"1"},"PeriodicalIF":3.2,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11711433/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142956033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-supervised contour-driven broad learning system for autonomous segmentation of concealed prohibited baggage items. 隐藏违禁行李物品自主分割的半监督轮廓驱动广义学习系统。
IF 3.2 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-12-24 DOI: 10.1186/s42492-024-00182-7
Divya Velayudhan, Abdelfatah Ahmed, Taimur Hassan, Muhammad Owais, Neha Gour, Mohammed Bennamoun, Ernesto Damiani, Naoufel Werghi

With the exponential rise in global air traffic, ensuring swift passenger processing while countering potential security threats has become a paramount concern for aviation security. Although X-ray baggage monitoring is now standard, manual screening has several limitations, including the propensity for errors, and raises concerns about passenger privacy. To address these drawbacks, researchers have leveraged recent advances in deep learning to design threat-segmentation frameworks. However, these models require extensive training data and labour-intensive dense pixel-wise annotations and are finetuned separately for each dataset to account for inter-dataset discrepancies. Hence, this study proposes a semi-supervised contour-driven broad learning system (BLS) for X-ray baggage security threat instance segmentation referred to as C-BLX. The research methodology involved enhancing representation learning and achieving faster training capability to tackle severe occlusion and class imbalance using a single training routine with limited baggage scans. The proposed framework was trained with minimal supervision using resource-efficient image-level labels to localize illegal items in multi-vendor baggage scans. More specifically, the framework generated candidate region segments from the input X-ray scans based on local intensity transition cues, effectively identifying concealed prohibited items without entire baggage scans. The multi-convolutional BLS exploits the rich complementary features extracted from these region segments to predict object categories, including threat and benign classes. The contours corresponding to the region segments predicted as threats were then utilized to yield the segmentation results. The proposed C-BLX system was thoroughly evaluated on three highly imbalanced public datasets and surpassed other competitive approaches in baggage-threat segmentation, yielding 90.04%, 78.92%, and 59.44% in terms of mIoU on GDXray, SIXray, and Compass-XP, respectively. Furthermore, the limitations of the proposed system in extracting precise region segments in intricate noisy settings and potential strategies for overcoming them through post-processing techniques were explored (source code will be available at https://github.com/Divs1159/CNN_BLS .).

随着全球空中交通的指数增长,确保快速处理旅客,同时应对潜在的安全威胁已成为航空安全的首要问题。虽然x光行李监控现在是标准的,但人工筛查有一些局限性,包括容易出错,并引发了对乘客隐私的担忧。为了解决这些问题,研究人员利用深度学习的最新进展来设计威胁分割框架。然而,这些模型需要大量的训练数据和劳动密集型的密集像素级注释,并且需要针对每个数据集分别进行微调,以解释数据集之间的差异。因此,本研究提出了一种用于x射线行李安全威胁实例分割的半监督轮廓驱动广义学习系统(BLS),称为C-BLX。研究方法包括增强表征学习和实现更快的训练能力,以解决严重的闭塞和班级不平衡,使用单一的训练程序和有限的行李扫描。该框架在最小的监督下进行训练,使用资源高效的图像级标签来定位多供应商行李扫描中的非法物品。更具体地说,该框架基于局部强度转换线索从输入的x射线扫描中生成候选区域片段,有效识别隐藏的违禁物品,而无需对整个行李进行扫描。多卷积BLS利用从这些区域段中提取的丰富的互补特征来预测对象类别,包括威胁类和良性类。然后利用预测为威胁的区域段对应的轮廓来产生分割结果。本文提出的C-BLX系统在三个高度不平衡的公共数据集上进行了全面评估,在行李威胁分割方面超过了其他竞争方法,在GDXray、SIXray和Compass-XP上的mIoU分别达到90.04%、78.92%和59.44%。此外,所提出的系统在复杂的噪声环境中提取精确区域片段的局限性以及通过后处理技术克服它们的潜在策略进行了探讨(源代码将在https://github.com/Divs1159/CNN_BLS .)。
{"title":"Semi-supervised contour-driven broad learning system for autonomous segmentation of concealed prohibited baggage items.","authors":"Divya Velayudhan, Abdelfatah Ahmed, Taimur Hassan, Muhammad Owais, Neha Gour, Mohammed Bennamoun, Ernesto Damiani, Naoufel Werghi","doi":"10.1186/s42492-024-00182-7","DOIUrl":"10.1186/s42492-024-00182-7","url":null,"abstract":"<p><p>With the exponential rise in global air traffic, ensuring swift passenger processing while countering potential security threats has become a paramount concern for aviation security. Although X-ray baggage monitoring is now standard, manual screening has several limitations, including the propensity for errors, and raises concerns about passenger privacy. To address these drawbacks, researchers have leveraged recent advances in deep learning to design threat-segmentation frameworks. However, these models require extensive training data and labour-intensive dense pixel-wise annotations and are finetuned separately for each dataset to account for inter-dataset discrepancies. Hence, this study proposes a semi-supervised contour-driven broad learning system (BLS) for X-ray baggage security threat instance segmentation referred to as C-BLX. The research methodology involved enhancing representation learning and achieving faster training capability to tackle severe occlusion and class imbalance using a single training routine with limited baggage scans. The proposed framework was trained with minimal supervision using resource-efficient image-level labels to localize illegal items in multi-vendor baggage scans. More specifically, the framework generated candidate region segments from the input X-ray scans based on local intensity transition cues, effectively identifying concealed prohibited items without entire baggage scans. The multi-convolutional BLS exploits the rich complementary features extracted from these region segments to predict object categories, including threat and benign classes. The contours corresponding to the region segments predicted as threats were then utilized to yield the segmentation results. The proposed C-BLX system was thoroughly evaluated on three highly imbalanced public datasets and surpassed other competitive approaches in baggage-threat segmentation, yielding 90.04%, 78.92%, and 59.44% in terms of mIoU on GDXray, SIXray, and Compass-XP, respectively. Furthermore, the limitations of the proposed system in extracting precise region segments in intricate noisy settings and potential strategies for overcoming them through post-processing techniques were explored (source code will be available at https://github.com/Divs1159/CNN_BLS .).</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"7 1","pages":"30"},"PeriodicalIF":3.2,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11666859/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142883119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Energy consumption forecasting for laser manufacturing of large artifacts based on fusionable transfer learning. 基于可融合迁移学习的大型工件激光加工能耗预测。
IF 3.2 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-12-02 DOI: 10.1186/s42492-024-00178-3
Linxuan Wang, Jinghua Xu, Shuyou Zhang, Jianrong Tan, Shaomei Fei, Xuezhi Shi, Jihong Pang, Sheng Luo

This study presents an energy consumption (EC) forecasting method for laser melting manufacturing of metal artifacts based on fusionable transfer learning (FTL). To predict the EC of manufacturing products, particularly from scale-down to scale-up, a general paradigm was first developed by categorizing the overall process into three main sub-steps. The operating electrical power was further formulated as a combinatorial function, based on which an operator learning network was adopted to fit the nonlinear relations between the fabricating arguments and EC. Parallel-arranged networks were constructed to investigate the impacts of fabrication variables and devices on power. Considering the interconnections among these factors, the outputs of the neural networks were blended and fused to jointly predict the electrical power. Most innovatively, large artifacts can be decomposed into time-dependent laser-scanning trajectories, which can be further transformed into fusionable information via neural networks, inspired by large language model. Accordingly, transfer learning can deal with either scale-down or scale-up forecasting, namely, FTL with scalability within artifact structures. The effectiveness of the proposed FTL was verified through physical fabrication experiments via laser powder bed fusion. The relative error of the average and overall EC predictions based on FTL was maintained below 0.83%. The melting fusion quality was examined using metallographic diagrams. The proposed FTL framework can forecast the EC of scaled structures, which is particularly helpful in price estimation and quotation of large metal products towards carbon peaking and carbon neutrality.

提出了一种基于可融合迁移学习(FTL)的金属工件激光熔化加工能耗预测方法。为了预测制造产品的EC,特别是从按比例缩小到按比例扩大,首先通过将整个过程分为三个主要子步骤,开发了一个一般范例。在此基础上,采用算子学习网络拟合加工参数与电导率之间的非线性关系。构建了并联排列的网络,研究了制造变量和设备对功率的影响。考虑到这些因素之间的相互联系,将神经网络的输出进行混合融合,共同预测电功率。最具创新性的是,大型工件可以分解为与时间相关的激光扫描轨迹,这些轨迹可以进一步通过神经网络转化为可融合的信息,并受到大型语言模型的启发。因此,迁移学习可以处理按比例缩小或按比例扩大的预测,即在工件结构中具有可伸缩性的FTL。通过激光粉末床融合物理制造实验,验证了该超光速装置的有效性。基于超光速的平均和总体EC预测的相对误差保持在0.83%以下。用金相图对熔炼质量进行了检验。所提出的FTL框架可以预测规模结构的EC,特别有助于大型金属产品的碳峰值和碳中和价格估计和报价。
{"title":"Energy consumption forecasting for laser manufacturing of large artifacts based on fusionable transfer learning.","authors":"Linxuan Wang, Jinghua Xu, Shuyou Zhang, Jianrong Tan, Shaomei Fei, Xuezhi Shi, Jihong Pang, Sheng Luo","doi":"10.1186/s42492-024-00178-3","DOIUrl":"10.1186/s42492-024-00178-3","url":null,"abstract":"<p><p>This study presents an energy consumption (EC) forecasting method for laser melting manufacturing of metal artifacts based on fusionable transfer learning (FTL). To predict the EC of manufacturing products, particularly from scale-down to scale-up, a general paradigm was first developed by categorizing the overall process into three main sub-steps. The operating electrical power was further formulated as a combinatorial function, based on which an operator learning network was adopted to fit the nonlinear relations between the fabricating arguments and EC. Parallel-arranged networks were constructed to investigate the impacts of fabrication variables and devices on power. Considering the interconnections among these factors, the outputs of the neural networks were blended and fused to jointly predict the electrical power. Most innovatively, large artifacts can be decomposed into time-dependent laser-scanning trajectories, which can be further transformed into fusionable information via neural networks, inspired by large language model. Accordingly, transfer learning can deal with either scale-down or scale-up forecasting, namely, FTL with scalability within artifact structures. The effectiveness of the proposed FTL was verified through physical fabrication experiments via laser powder bed fusion. The relative error of the average and overall EC predictions based on FTL was maintained below 0.83%. The melting fusion quality was examined using metallographic diagrams. The proposed FTL framework can forecast the EC of scaled structures, which is particularly helpful in price estimation and quotation of large metal products towards carbon peaking and carbon neutrality.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"7 1","pages":"29"},"PeriodicalIF":3.2,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11612079/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142772951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computational analysis of variability and uncertainty in the clinical reference on magnetic resonance imaging radiomics: modelling and performance. 磁共振成像放射组学临床参考文献中变异性和不确定性的计算分析:建模与性能。
IF 3.2 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-11-19 DOI: 10.1186/s42492-024-00180-9
Cindy Xue, Jing Yuan, Gladys G Lo, Darren M C Poon, Winnie Cw Chu

To conduct a computational investigation to explore the influence of clinical reference uncertainty on magnetic resonance imaging (MRI) radiomics feature selection, modelling, and performance. This study used two sets of publicly available prostate cancer MRI = radiomics data (Dataset 1: n = 260; Dataset 2: n = 100) with Gleason score clinical references. Each dataset was divided into training and holdout testing datasets at a ratio of 7:3 and analysed independently. The clinical references of the training set were permuted at different levels (increments of 5%) and repeated 20 times. Four feature selection algorithms and two classifiers were used to construct the models. Cross-validation was employed for training, while a separate hold-out testing set was used for evaluation. The Jaccard similarity coefficient was used to evaluate feature selection, while the area under the curve (AUC) and accuracy were used to assess model performance. An analysis of variance test with Bonferroni correction was conducted to compare the metrics of each model. The consistency of the feature selection performance decreased substantially with the clinical reference permutation. AUCs of the trained models with permutation particularly after 20% were significantly lower (Dataset 1 (with ≥ 20% permutation): 0.67, and Dataset 2 (≥ 20% permutation): 0.74), compared to the AUC of models without permutation (Dataset 1: 0.94, Dataset 2: 0.97). The performances of the models were also associated with larger uncertainties and an increasing number of permuted clinical references. Clinical reference uncertainty can substantially influence MRI radiomic feature selection and modelling. The high accuracy of clinical references should be helpful in building reliable and robust radiomic models. Careful interpretation of the model performance is necessary, particularly for high-dimensional data.

进行计算研究,探索临床参考不确定性对磁共振成像(MRI)放射组学特征选择、建模和性能的影响。本研究使用了两组公开的前列腺癌磁共振成像 = 放射组学数据(数据集 1:n = 260;数据集 2:n = 100),其中包含格里森评分临床参考值。每个数据集按 7:3 的比例分为训练数据集和暂停测试数据集,并进行独立分析。训练集的临床参考资料按不同级别(增量为 5%)进行置换,并重复 20 次。在构建模型时使用了四种特征选择算法和两种分类器。训练时采用交叉验证,评估时则使用单独的保留测试集。Jaccard 相似系数用于评估特征选择,而曲线下面积(AUC)和准确率则用于评估模型性能。对每个模型的指标进行了带 Bonferroni 校正的方差分析测试比较。特征选择性能的一致性随着临床参照排列的增加而大大降低。经过训练的模型的AUCs,尤其是20%以后的包被率明显降低(数据集1(包被率≥20%):0.67;数据集2(包被率≥20%):0.67):0.67,数据集 2(≥ 20% 变异):0.74):数据集 1:0.94;数据集 2:0.97)。模型的性能还与不确定性的增大和置换临床参考文献数量的增加有关。临床参考文献的不确定性会严重影响磁共振成像放射学特征选择和建模。临床参考文献的高准确性应有助于建立可靠、稳健的放射学模型。有必要对模型性能进行仔细解读,尤其是高维数据。
{"title":"Computational analysis of variability and uncertainty in the clinical reference on magnetic resonance imaging radiomics: modelling and performance.","authors":"Cindy Xue, Jing Yuan, Gladys G Lo, Darren M C Poon, Winnie Cw Chu","doi":"10.1186/s42492-024-00180-9","DOIUrl":"10.1186/s42492-024-00180-9","url":null,"abstract":"<p><p>To conduct a computational investigation to explore the influence of clinical reference uncertainty on magnetic resonance imaging (MRI) radiomics feature selection, modelling, and performance. This study used two sets of publicly available prostate cancer MRI = radiomics data (Dataset 1: n = 260; Dataset 2: n = 100) with Gleason score clinical references. Each dataset was divided into training and holdout testing datasets at a ratio of 7:3 and analysed independently. The clinical references of the training set were permuted at different levels (increments of 5%) and repeated 20 times. Four feature selection algorithms and two classifiers were used to construct the models. Cross-validation was employed for training, while a separate hold-out testing set was used for evaluation. The Jaccard similarity coefficient was used to evaluate feature selection, while the area under the curve (AUC) and accuracy were used to assess model performance. An analysis of variance test with Bonferroni correction was conducted to compare the metrics of each model. The consistency of the feature selection performance decreased substantially with the clinical reference permutation. AUCs of the trained models with permutation particularly after 20% were significantly lower (Dataset 1 (with ≥ 20% permutation): 0.67, and Dataset 2 (≥ 20% permutation): 0.74), compared to the AUC of models without permutation (Dataset 1: 0.94, Dataset 2: 0.97). The performances of the models were also associated with larger uncertainties and an increasing number of permuted clinical references. Clinical reference uncertainty can substantially influence MRI radiomic feature selection and modelling. The high accuracy of clinical references should be helpful in building reliable and robust radiomic models. Careful interpretation of the model performance is necessary, particularly for high-dimensional data.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"7 1","pages":"28"},"PeriodicalIF":3.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11573982/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142669232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Survey of real-time brainmedia in artistic exploration. 艺术探索中的实时脑媒体调查。
IF 3.2 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-11-18 DOI: 10.1186/s42492-024-00179-2
Rem RunGu Lin, Kang Zhang

This survey examines the evolution and impact of real-time brainmedia on artistic exploration, contextualizing developments within a historical framework. To enhance knowledge on the entanglement between the brain, mind, and body in an increasingly mediated world, this work defines a clear scope at the intersection of bio art and interactive art, concentrating on real-time brainmedia artworks developed in the 21st century. It proposes a set of criteria and a taxonomy based on historical notions, interaction dynamics, and media art representations. The goal is to provide a comprehensive overview of real-time brainmedia, setting the stage for future explorations of new paradigms in communication between humans, machines, and the environment.

本调查研究了实时脑媒体的演变及其对艺术探索的影响,并在历史框架内对其发展进行了梳理。为了增进人们对日益媒介化的世界中大脑、心灵和身体之间的纠葛的了解,这项工作在生物艺术和互动艺术的交叉点上界定了一个明确的范围,集中研究 21 世纪开发的实时脑媒体艺术作品。它根据历史概念、互动动态和媒体艺术表现形式,提出了一套标准和分类法。其目的是对实时脑媒体进行全面概述,为未来探索人类、机器和环境之间交流的新范式奠定基础。
{"title":"Survey of real-time brainmedia in artistic exploration.","authors":"Rem RunGu Lin, Kang Zhang","doi":"10.1186/s42492-024-00179-2","DOIUrl":"10.1186/s42492-024-00179-2","url":null,"abstract":"<p><p>This survey examines the evolution and impact of real-time brainmedia on artistic exploration, contextualizing developments within a historical framework. To enhance knowledge on the entanglement between the brain, mind, and body in an increasingly mediated world, this work defines a clear scope at the intersection of bio art and interactive art, concentrating on real-time brainmedia artworks developed in the 21st century. It proposes a set of criteria and a taxonomy based on historical notions, interaction dynamics, and media art representations. The goal is to provide a comprehensive overview of real-time brainmedia, setting the stage for future explorations of new paradigms in communication between humans, machines, and the environment.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"7 1","pages":"27"},"PeriodicalIF":3.2,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11570570/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142649143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Achieving view-distance and -angle invariance in motion prediction using a simple network. 利用简单网络实现运动预测中的视距和角度不变性
IF 3.2 4区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Pub Date : 2024-10-28 DOI: 10.1186/s42492-024-00176-5
Haichuan Zhao, Xudong Ru, Peng Du, Shaolong Liu, Na Liu, Xingce Wang, Zhongke Wu

Recently, human motion prediction has gained significant attention and achieved notable success. However, current methods primarily rely on training and testing with ideal datasets, overlooking the impact of variations in the viewing distance and viewing angle, which are commonly encountered in practical scenarios. In this study, we address the issue of model invariance by ensuring robust performance despite variations in view distances and angles. To achieve this, we employed Riemannian geometry methods to constrain the learning process of neural networks, enabling the prediction of invariances using a simple network. Furthermore, this enhances the application of motion prediction in various scenarios. Our framework uses Riemannian geometry to encode motion into a novel motion space to achieve prediction with an invariant viewing distance and angle using a simple network. Specifically, the specified path transport square-root velocity function is proposed to aid in removing the view-angle equivalence class and encode motion sequences into a flattened space. Motion coding by the geometry method linearizes the optimization problem in a non-flattened space and effectively extracts motion information, allowing the proposed method to achieve competitive performance using a simple network. Experimental results on Human 3.6M and CMU MoCap demonstrate that the proposed framework has competitive performance and invariance to the viewing distance and viewing angle.

近来,人体运动预测受到了广泛关注,并取得了显著成就。然而,目前的方法主要依赖于理想数据集的训练和测试,忽略了实际场景中常见的视距和视角变化的影响。在本研究中,我们通过确保在视距和视角发生变化时仍能保持稳定的性能来解决模型不变性问题。为此,我们采用了黎曼几何方法来约束神经网络的学习过程,从而能够使用简单的网络预测不变性。此外,这还增强了运动预测在各种场景中的应用。我们的框架利用黎曼几何将运动编码到一个新颖的运动空间,从而利用简单的网络实现视距和视角不变的预测。具体来说,我们提出了指定路径传输平方根速度函数,以帮助消除视角等价类,并将运动序列编码到扁平化空间中。通过几何方法进行运动编码,可在非扁平化空间中线性化优化问题,并有效提取运动信息,从而使所提出的方法能够利用简单的网络实现具有竞争力的性能。在人类 3.6M 和 CMU MoCap 上的实验结果表明,所提出的框架具有极佳的性能,并且不受观看距离和观看角度的影响。
{"title":"Achieving view-distance and -angle invariance in motion prediction using a simple network.","authors":"Haichuan Zhao, Xudong Ru, Peng Du, Shaolong Liu, Na Liu, Xingce Wang, Zhongke Wu","doi":"10.1186/s42492-024-00176-5","DOIUrl":"10.1186/s42492-024-00176-5","url":null,"abstract":"<p><p>Recently, human motion prediction has gained significant attention and achieved notable success. However, current methods primarily rely on training and testing with ideal datasets, overlooking the impact of variations in the viewing distance and viewing angle, which are commonly encountered in practical scenarios. In this study, we address the issue of model invariance by ensuring robust performance despite variations in view distances and angles. To achieve this, we employed Riemannian geometry methods to constrain the learning process of neural networks, enabling the prediction of invariances using a simple network. Furthermore, this enhances the application of motion prediction in various scenarios. Our framework uses Riemannian geometry to encode motion into a novel motion space to achieve prediction with an invariant viewing distance and angle using a simple network. Specifically, the specified path transport square-root velocity function is proposed to aid in removing the view-angle equivalence class and encode motion sequences into a flattened space. Motion coding by the geometry method linearizes the optimization problem in a non-flattened space and effectively extracts motion information, allowing the proposed method to achieve competitive performance using a simple network. Experimental results on Human 3.6M and CMU MoCap demonstrate that the proposed framework has competitive performance and invariance to the viewing distance and viewing angle.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"7 1","pages":"26"},"PeriodicalIF":3.2,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11519277/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142523255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Visual Computing for Industry Biomedicine and Art
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1