首页 > 最新文献

Multimedia Tools and Applications最新文献

英文 中文
SMOTE-Based deep network with adaptive boosted sooty for the detection and classification of type 2 diabetes mellitus 基于SMOTE的深度网络与自适应提升烟尘技术用于2型糖尿病的检测和分类
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-11 DOI: 10.1007/s11042-024-19770-z
Phani Kumar Immadisetty, C. Rajabhushanam

Type 2 diabetes (T2D) is a prolonged disease caused by abnormal rise in glucose levels due to poor insulin production in the pancreas. However, the detection and classification of this type of disease is very challenging and requires effective techniques for learning the T2D features. Therefore, this study proposes the use of a novel hybridized deep learning-based technique to automatically detect and categorize T2D by effectively learning disease attributes. First, missing value imputation and a normalization-based pre-processing phase are introduced to improve the quality of the data. The Adaptive Boosted Sooty Tern Optimization (Adap-BSTO) approach is then used to select the best features while minimizing complexity. After that, the Synthetic Minority Oversampling Technique (SMOTE) is used to verify that the database classes are evenly distributed. Finally, the Deep Convolutional Attention-based Bidirectional Recurrent Neural Network (DCA-BiRNN) technique is proposed to detect and classify the presence and absence of T2D disease accurately. The proposed study is instigated via the Python platform, and two publicly available PIMA Indian and HFD databases are utilized in this study. Accuracy, NPV, kappa score, Mathew's correlation coefficient (MCC), false discovery rate (FDR), and time complexity are among the assessment metrics examined and compared to prior research. For the PIMA Indian dataset, the proposed method obtains an overall accuracy of 99.6%, FDR of 0.0038, kappa of 99.24%, and NPV of 99.6%. For the HFD dataset, the proposed method acquires an overall accuracy of 99.5%, FDR of 0.0052, kappa of 99%, and NPV of 99.4%, respectively.

2 型糖尿病(T2D)是一种因胰腺胰岛素分泌不足而导致血糖水平异常升高的长期疾病。然而,这类疾病的检测和分类非常具有挑战性,需要有效的技术来学习 T2D 特征。因此,本研究提出使用一种基于混合深度学习的新型技术,通过有效学习疾病属性来自动检测和分类 T2D。首先,引入缺失值估算和基于归一化的预处理阶段来提高数据质量。然后,使用自适应助推燕鸥优化(Adap-BSTO)方法来选择最佳特征,同时最大限度地降低复杂性。之后,使用合成少数群体过度采样技术(SMOTE)来验证数据库类别的均匀分布。最后,提出了基于深度卷积注意力的双向循环神经网络(DCA-BiRNN)技术,用于准确检测和分类是否患有 T2D 疾病。该研究是通过 Python 平台进行的,并利用了两个公开的 PIMA 印度和 HFD 数据库。准确度、NPV、kappa 分数、Mathew 相关系数(MCC)、误诊率(FDR)和时间复杂性等评估指标都在研究之列,并与之前的研究进行了比较。对于 PIMA 印度数据集,建议方法的总体准确率为 99.6%,FDR 为 0.0038,kappa 为 99.24%,NPV 为 99.6%。对于 HFD 数据集,建议的方法分别获得了 99.5% 的总体准确率、0.0052 的 FDR、99% 的 kappa 和 99.4% 的 NPV。
{"title":"SMOTE-Based deep network with adaptive boosted sooty for the detection and classification of type 2 diabetes mellitus","authors":"Phani Kumar Immadisetty, C. Rajabhushanam","doi":"10.1007/s11042-024-19770-z","DOIUrl":"https://doi.org/10.1007/s11042-024-19770-z","url":null,"abstract":"<p>Type 2 diabetes (T2D) is a prolonged disease caused by abnormal rise in glucose levels due to poor insulin production in the pancreas. However, the detection and classification of this type of disease is very challenging and requires effective techniques for learning the T2D features. Therefore, this study proposes the use of a novel hybridized deep learning-based technique to automatically detect and categorize T2D by effectively learning disease attributes. First, missing value imputation and a normalization-based pre-processing phase are introduced to improve the quality of the data. The Adaptive Boosted Sooty Tern Optimization (Adap-BSTO) approach is then used to select the best features while minimizing complexity. After that, the Synthetic Minority Oversampling Technique (SMOTE) is used to verify that the database classes are evenly distributed. Finally, the Deep Convolutional Attention-based Bidirectional Recurrent Neural Network (DCA-BiRNN) technique is proposed to detect and classify the presence and absence of T2D disease accurately. The proposed study is instigated via the Python platform, and two publicly available PIMA Indian and HFD databases are utilized in this study. Accuracy, NPV, kappa score, Mathew's correlation coefficient (MCC), false discovery rate (FDR), and time complexity are among the assessment metrics examined and compared to prior research. For the PIMA Indian dataset, the proposed method obtains an overall accuracy of 99.6%, FDR of 0.0038, kappa of 99.24%, and NPV of 99.6%. For the HFD dataset, the proposed method acquires an overall accuracy of 99.5%, FDR of 0.0052, kappa of 99%, and NPV of 99.4%, respectively.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"3 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting multi-transformer encoder with multiple-hypothesis aggregation via diffusion model for 3D human pose estimation 通过扩散模型利用多变换器编码器和多假设聚合进行三维人体姿态估计
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-10 DOI: 10.1007/s11042-024-20179-x
Sathiyamoorthi Arthanari, Jae Hoon Jeong, Young Hoon Joo

The transformer architecture has consistently achieved cutting-edge performance in the task of 2D to 3D lifting human pose estimation. Despite advances in transformer-based methods they still suffer from issues related to sequential data processing, addressing depth ambiguity, and effective handling of sensitive noisy data. As a result, transformer encoders encounter difficulties in precisely estimating human positions. To solve this problem, a novel multi-transformer encoder with a multiple-hypothesis aggregation (MHAFormer) module is proposed in this study. To do this, a diffusion module is first introduced that generates multiple 3D pose hypotheses and gradually distributes Gaussian noise to ground truth 3D poses. Subsequently, the denoiser is employed within the diffusion module to restore the feasible 3D poses by leveraging the information from the 2D keypoints. Moreover, we propose the multiple-hypothesis aggregation with a join-level reprojection (MHAJR) approach that redesigns the 3D hypotheses into the 2D position and selects the optimal hypothesis by considering reprojection errors. In particular, the multiple-hypothesis aggregation approach tackles depth ambiguity and sequential data processing by considering various possible poses and combining their strengths for a more accurate final estimation. Next, we present the improved spatial-temporal transformers encoder that can help to improve the accuracy and reduce the ambiguity of 3D pose estimation by explicitly modeling the spatial and temporal relationships between different body joints. Specifically, the temporal-transformer encoder introduces the temporal constriction & proliferation (TCP) attention mechanism and the feature aggregation refinement module (FAR) into the refined temporal constriction & proliferation (RTCP) transformer, which enhances intra-block temporal modeling and further refines inter-block feature interaction. Finally, the superiority of the proposed approach is demonstrated through comparison with existing methods using the Human3.6M and MPI-INF-3DHP benchmark datasets.

在从二维到三维的升降式人体姿态估计任务中,变换器架构一直保持着最先进的性能。尽管基于变压器的方法取得了进步,但仍存在与顺序数据处理、解决深度模糊性和有效处理敏感噪声数据相关的问题。因此,变压器编码器在精确估计人体位置方面遇到了困难。为解决这一问题,本研究提出了一种带有多重假设聚合(MHAFormer)模块的新型多变换器编码器。为此,首先引入一个扩散模块,生成多个三维姿态假设,并逐渐将高斯噪声分布到地面真实三维姿态上。然后,在扩散模块中使用去噪器,利用二维关键点的信息恢复可行的三维姿势。此外,我们还提出了带有连接级重投(MHAJR)的多假设聚合方法,将三维假设重新设计为二维位置,并通过考虑重投误差来选择最优假设。特别是,多假设聚合方法通过考虑各种可能的姿势并结合其优势以获得更准确的最终估计,从而解决了深度模糊性和顺序数据处理问题。接下来,我们介绍了改进的时空变换器编码器,它可以通过明确模拟不同身体关节之间的时空关系,帮助提高三维姿势估计的准确性并减少模糊性。具体来说,时空变换器编码器将时空收缩与扩散(TCP)注意机制和特征聚合细化模块(FAR)引入到细化时空收缩与扩散(RTCP)变换器中,从而增强了块内时空建模,并进一步细化了块间特征交互。最后,通过使用 Human3.6M 和 MPI-INF-3DHP 基准数据集与现有方法进行比较,证明了所提出方法的优越性。
{"title":"Exploiting multi-transformer encoder with multiple-hypothesis aggregation via diffusion model for 3D human pose estimation","authors":"Sathiyamoorthi Arthanari, Jae Hoon Jeong, Young Hoon Joo","doi":"10.1007/s11042-024-20179-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20179-x","url":null,"abstract":"<p>The transformer architecture has consistently achieved cutting-edge performance in the task of 2D to 3D lifting human pose estimation. Despite advances in transformer-based methods they still suffer from issues related to sequential data processing, addressing depth ambiguity, and effective handling of sensitive noisy data. As a result, transformer encoders encounter difficulties in precisely estimating human positions. To solve this problem, a novel multi-transformer encoder with a multiple-hypothesis aggregation (MHAFormer) module is proposed in this study. To do this, a diffusion module is first introduced that generates multiple 3D pose hypotheses and gradually distributes Gaussian noise to ground truth 3D poses. Subsequently, the denoiser is employed within the diffusion module to restore the feasible 3D poses by leveraging the information from the 2D keypoints. Moreover, we propose the multiple-hypothesis aggregation with a join-level reprojection (MHAJR) approach that redesigns the 3D hypotheses into the 2D position and selects the optimal hypothesis by considering reprojection errors. In particular, the multiple-hypothesis aggregation approach tackles depth ambiguity and sequential data processing by considering various possible poses and combining their strengths for a more accurate final estimation. Next, we present the improved spatial-temporal transformers encoder that can help to improve the accuracy and reduce the ambiguity of 3D pose estimation by explicitly modeling the spatial and temporal relationships between different body joints. Specifically, the temporal-transformer encoder introduces the temporal constriction &amp; proliferation (TCP) attention mechanism and the feature aggregation refinement module (FAR) into the refined temporal constriction &amp; proliferation (RTCP) transformer, which enhances intra-block temporal modeling and further refines inter-block feature interaction. Finally, the superiority of the proposed approach is demonstrated through comparison with existing methods using the Human3.6M and MPI-INF-3DHP benchmark datasets.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"47 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning attention characterization based on head pose sight estimation 基于头部姿势视线估计的注意力特征学习
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-10 DOI: 10.1007/s11042-024-20204-z
Jianwen Mo, Haochang Liang, Hua Yuan, Zhaoyu Shou, Huibing Zhang

The degree of students’ attentiveness in the classroom is known as learning attention and is the main indicator used to portray students’ learning status in the classroom. Studying smart classroom time-series image data and analyzing students’ attention to learning are important tools for improving student learning effects. To this end, this paper proposes a learning attention analysis algorithm based on the head pose sight estimation.The algorithm first employs multi-scale hourglass attention to enable the head pose estimation model to capture more spatial pose features.It is also proposed that the multi-classification multi-regression losses guide the model to learn different granularity of pose features, making the model more sensitive to subtle inter-class distinction of the data;Second, a sight estimation algorithm on 3D space is innovatively adopted to compute the coordinates of the student’s sight landing point through the head pose; Finally, a model of sight analysis over the duration of a knowledge point is constructed to characterize students’ attention to learning. Experiments show that the algorithm in this paper can effectively reduce the head pose estimation error, accurately characterize students’ learning attention, and provide strong technical support for the analysis of students’ learning effect. The algorithm demonstrates its potential application value and can be deployed in smart classrooms in schools.

学生在课堂上的专注程度被称为学习注意力,是描绘学生课堂学习状态的主要指标。研究智能课堂时序图像数据,分析学生的学习注意力,是提高学生学习效果的重要手段。为此,本文提出了一种基于头部姿态视线估计的学习注意力分析算法。该算法首先采用多尺度沙漏注意力,使头部姿态估计模型能够捕捉到更多的空间姿态特征。该算法首先采用多尺度沙漏注意力,使头部姿态估计模型能够捕捉到更多的空间姿态特征,并提出多分类多回归损失引导模型学习不同粒度的姿态特征,使模型对数据细微的类间区分更加敏感;其次,创新性地采用了三维空间的视线估计算法,通过头部姿态计算学生视线落点的坐标;最后,构建了知识点持续时间的视线分析模型,以表征学生的学习注意力。实验表明,本文算法能有效降低头部姿态估计误差,准确表征学生的学习注意力,为学生学习效果分析提供了有力的技术支持。该算法展示了其潜在的应用价值,可应用于学校的智慧教室。
{"title":"Learning attention characterization based on head pose sight estimation","authors":"Jianwen Mo, Haochang Liang, Hua Yuan, Zhaoyu Shou, Huibing Zhang","doi":"10.1007/s11042-024-20204-z","DOIUrl":"https://doi.org/10.1007/s11042-024-20204-z","url":null,"abstract":"<p>The degree of students’ attentiveness in the classroom is known as learning attention and is the main indicator used to portray students’ learning status in the classroom. Studying smart classroom time-series image data and analyzing students’ attention to learning are important tools for improving student learning effects. To this end, this paper proposes a learning attention analysis algorithm based on the head pose sight estimation.The algorithm first employs multi-scale hourglass attention to enable the head pose estimation model to capture more spatial pose features.It is also proposed that the multi-classification multi-regression losses guide the model to learn different granularity of pose features, making the model more sensitive to subtle inter-class distinction of the data;Second, a sight estimation algorithm on 3D space is innovatively adopted to compute the coordinates of the student’s sight landing point through the head pose; Finally, a model of sight analysis over the duration of a knowledge point is constructed to characterize students’ attention to learning. Experiments show that the algorithm in this paper can effectively reduce the head pose estimation error, accurately characterize students’ learning attention, and provide strong technical support for the analysis of students’ learning effect. The algorithm demonstrates its potential application value and can be deployed in smart classrooms in schools.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"24 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ambient-NeRF: light train enhancing neural radiance fields in low-light conditions with ambient-illumination 环境-神经辐射场:用环境照明增强弱光条件下神经辐射场的光列
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-10 DOI: 10.1007/s11042-024-19699-3
Peng Zhang, Gengsheng Hu, Mei Chen, Mahmoud Emam

NeRF can render photorealistic 3D scenes. It is widely used in virtual reality, autonomous driving, game development and other fields, and quickly becomes one of the most popular technologies in the field of 3D reconstruction. NeRF generates a realistic 3D scene by emitting light from the camera’s spatial coordinates and viewpoint, passing through the scene and calculating the view seen from the viewpoint. However, when the brightness of the original input image is low, it is difficult to recover the scene. Inspired by the ambient illumination in the Phong model of computer graphics, it is assumed that the final rendered image is the product of scene color and ambient illumination. In this paper, we employ Multi-Layer Perceptron (MLP) network to train the ambient illumination tensor (textbf{I}), which is multiplied by the color predicted by NeRF to render images with normal illumination. Furthermore, we use tiny-cuda-nn as a backbone network to simplify the proposed network structure and greatly improve the training speed. Additionally, a new loss function is introduced to achieve a better image quality under low illumination conditions. The experimental results demonstrate the efficiency of the proposed method in enhancing low-light scene images compared with other state-of-the-art methods, with an overall average of PSNR: 20.53 , SSIM: 0.785, and LPIPS: 0.258 on the LOM dataset.

NeRF 可以渲染逼真的 3D 场景。它被广泛应用于虚拟现实、自动驾驶、游戏开发等领域,并迅速成为三维重建领域最流行的技术之一。NeRF 通过摄像机的空间坐标和视点发射光线,穿过场景并计算从视点看到的景象,从而生成逼真的三维场景。然而,当原始输入图像亮度较低时,很难复原场景。受计算机图形学 Phong 模型中环境光照的启发,假设最终渲染的图像是场景颜色和环境光照的乘积。在本文中,我们采用多层感知器(MLP)网络来训练环境光照张量(textbf{I}),并将其与 NeRF 预测的颜色相乘,从而渲染出具有正常光照的图像。此外,我们使用 tiny-cuda-nn 作为骨干网络,简化了所提出的网络结构,大大提高了训练速度。此外,我们还引入了一个新的损失函数,以便在低照度条件下获得更好的图像质量。实验结果表明,与其他最先进的方法相比,所提出的方法在增强低照度场景图像方面非常有效,在 LOM 数据集上的总体平均 PSNR 为 20.53,SSIM 为 0.785,LPIPS 为 0.258。
{"title":"Ambient-NeRF: light train enhancing neural radiance fields in low-light conditions with ambient-illumination","authors":"Peng Zhang, Gengsheng Hu, Mei Chen, Mahmoud Emam","doi":"10.1007/s11042-024-19699-3","DOIUrl":"https://doi.org/10.1007/s11042-024-19699-3","url":null,"abstract":"<p>NeRF can render photorealistic 3D scenes. It is widely used in virtual reality, autonomous driving, game development and other fields, and quickly becomes one of the most popular technologies in the field of 3D reconstruction. NeRF generates a realistic 3D scene by emitting light from the camera’s spatial coordinates and viewpoint, passing through the scene and calculating the view seen from the viewpoint. However, when the brightness of the original input image is low, it is difficult to recover the scene. Inspired by the ambient illumination in the Phong model of computer graphics, it is assumed that the final rendered image is the product of scene color and ambient illumination. In this paper, we employ Multi-Layer Perceptron (MLP) network to train the ambient illumination tensor <span>(textbf{I})</span>, which is multiplied by the color predicted by NeRF to render images with normal illumination. Furthermore, we use tiny-cuda-nn as a backbone network to simplify the proposed network structure and greatly improve the training speed. Additionally, a new loss function is introduced to achieve a better image quality under low illumination conditions. The experimental results demonstrate the efficiency of the proposed method in enhancing low-light scene images compared with other state-of-the-art methods, with an overall average of PSNR: 20.53 , SSIM: 0.785, and LPIPS: 0.258 on the LOM dataset.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"106 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discrete ripplet-II transform feature extraction and metaheuristic-optimized feature selection for enhanced glaucoma detection in fundus images using least square-support vector machine 离散涟漪-II 变换特征提取和元搜索优化特征选择用于使用最小平方支持向量机增强眼底图像中的青光眼检测功能
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-10 DOI: 10.1007/s11042-024-19974-3
Santosh Kumar Sharma, Debendra Muduli, Adyasha Rath, Sujata Dash, Ganapati Panda, Achyut Shankar, Dinesh Chandra Dobhal

Recently, significant progress has been made in developing computer-aided diagnosis (CAD) systems for identifying glaucoma abnormalities using fundus images. Despite their drawbacks, methods for extracting features such as wavelets and their variations, along with classifier like support vector machines (SVM), are frequently employed in such systems. This paper introduces a practical and enhanced system for detecting glaucoma in fundus images. The proposed model adresses the chanallages encountered by other existing models in recent litrature. Initially, we have employed contrast limited adaputive histogram equalization (CLAHE) to enhanced the visualization of input fundus inmages. Then, the discrete ripplet-II transform (DR2T) employing a degree of 2 for feature extraction. Afterwards, we have utilized a golden jackal optimization algorithm (GJO) employed to select the optimal features to reduce the dimension of the extracted feature vector. For classification purposes, we have employed a least square support vector machine (LS-SVM) equipped with three kernels: linear, polynomial, and radial basis function (RBF). This setup has been utilized to classify fundus images as either indicative of glaucoma or healthy. The proposed method is validated with the current state-of-the-art models on two standard datasets, namely, G1020 and ORIGA. The results obtained from our experimental result demonstrate that our best suggested approach DR2T+GJO+LS-SVM-RBF obtains better classification accuracy 93.38% and 97.31% for G1020 and ORIGA dataset with less number of features. It establishes a more streamlined network layout compared to conventional classifiers.

最近,利用眼底图像识别青光眼异常的计算机辅助诊断(CAD)系统的开发取得了重大进展。尽管小波及其变体等特征提取方法和支持向量机(SVM)等分类器存在缺陷,但仍经常被用于此类系统中。本文介绍了一种实用的增强型系统,用于检测眼底图像中的青光眼。所提出的模型解决了近年来其他现有模型所遇到的问题。首先,我们采用了对比度受限的自适应直方图均衡(CLAHE)来增强输入眼底图像的可视化。然后,我们使用离散涟漪-II 变换(DR2T)(阶数为 2)进行特征提取。然后,我们使用金豺优化算法(GJO)来选择最佳特征,以减少提取特征向量的维度。为了进行分类,我们采用了配备线性、多项式和径向基函数(RBF)三种内核的最小平方支持向量机(LS-SVM)。利用这种设置将眼底图像分类为青光眼或健康眼底图像。我们在两个标准数据集(即 G1020 和 ORIGA)上对所提出的方法与当前最先进的模型进行了验证。实验结果表明,我们建议的最佳方法 DR2T+GJO+LS-SVM-RBF 在 G1020 和 ORIGA 数据集上分别获得了 93.38% 和 97.31% 的较高分类准确率,且特征数量较少。与传统分类器相比,它建立了一个更精简的网络布局。
{"title":"Discrete ripplet-II transform feature extraction and metaheuristic-optimized feature selection for enhanced glaucoma detection in fundus images using least square-support vector machine","authors":"Santosh Kumar Sharma, Debendra Muduli, Adyasha Rath, Sujata Dash, Ganapati Panda, Achyut Shankar, Dinesh Chandra Dobhal","doi":"10.1007/s11042-024-19974-3","DOIUrl":"https://doi.org/10.1007/s11042-024-19974-3","url":null,"abstract":"<p>Recently, significant progress has been made in developing computer-aided diagnosis (CAD) systems for identifying glaucoma abnormalities using fundus images. Despite their drawbacks, methods for extracting features such as wavelets and their variations, along with classifier like support vector machines (SVM), are frequently employed in such systems. This paper introduces a practical and enhanced system for detecting glaucoma in fundus images. The proposed model adresses the chanallages encountered by other existing models in recent litrature. Initially, we have employed contrast limited adaputive histogram equalization (CLAHE) to enhanced the visualization of input fundus inmages. Then, the discrete ripplet-II transform (DR2T) employing a degree of 2 for feature extraction. Afterwards, we have utilized a golden jackal optimization algorithm (GJO) employed to select the optimal features to reduce the dimension of the extracted feature vector. For classification purposes, we have employed a least square support vector machine (LS-SVM) equipped with three kernels: linear, polynomial, and radial basis function (RBF). This setup has been utilized to classify fundus images as either indicative of glaucoma or healthy. The proposed method is validated with the current state-of-the-art models on two standard datasets, namely, G1020 and ORIGA. The results obtained from our experimental result demonstrate that our best suggested approach DR2T+GJO+LS-SVM-RBF obtains better classification accuracy 93.38% and 97.31% for G1020 and ORIGA dataset with less number of features. It establishes a more streamlined network layout compared to conventional classifiers.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"4 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An efficient iterative pseudo point elimination technique to represent the shape of the digital image boundary 表示数字图像边界形状的高效迭代伪点消除技术
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-09 DOI: 10.1007/s11042-024-20183-1
Mangayarkarasi Ramaiah, Vinayakumar Ravi, Vanmathi Chandrasekaran, Vanitha Mohanraj, Deepa Mani, Angulakshmi Maruthamuthu

Visually, the environment is made up of a chaotic of irregular polygons. It is an important and intriguing issue in many fields of study to represent and comprehend the irregular polygon. However, approximating the polygon presents significant difficulties from a variety of perspectives. The method provided in this research eliminates the pseudo-redundant points that are not contributing to shape retention and then makes the polygonal approximation with the remaining high-curvature points, as opposed to searching for the real points on the digital image boundary curve. The proposed method uses chain code assignment to obtain initial segmentation points. Using integer arithmetic, the presented method calculates the curvature at each initial pseudo point using sum of squares of deviation. For every initial segmented pseudo point, the difference incurred by all the boundary points lies between its earlier pseudo point and its next initial pseudo point was taken into account. Then, this new proposal removes the redundant point from the subset of initial segmentation points whose curvature deviation is the lowest with each iteration. The method then recalculates the deviation information for the next and previous close pseudo points. Experiments are done with MPEG datasets and synthetic contours to show how well the proposed method works in both quantitative and qualitative ways. The experimental result shows the effectiveness of the proposed method in creating polygons with few points.

从视觉上看,环境是由杂乱无章的不规则多边形组成的。在许多研究领域,表示和理解不规则多边形都是一个重要而有趣的问题。然而,从多个角度来看,近似多边形都存在很大困难。与在数字图像边界曲线上寻找真实点相比,本研究提供的方法剔除了无助于形状保持的伪冗余点,然后用剩余的高曲率点进行多边形逼近。所提出的方法使用链码赋值来获取初始分割点。该方法使用整数运算,利用偏差平方和计算每个初始伪点的曲率。对于每个初始分段伪点,所有边界点位于其上一个伪点和下一个初始伪点之间所产生的差值都被考虑在内。然后,这一新方案将多余的点从初始分割点子集中删除,其曲率偏差在每次迭代中都是最小的。然后,该方法重新计算下一个和上一个接近伪点的偏差信息。我们使用 MPEG 数据集和合成轮廓进行了实验,从定量和定性两个方面展示了所提出方法的效果。实验结果表明,建议的方法在创建点数较少的多边形时非常有效。
{"title":"An efficient iterative pseudo point elimination technique to represent the shape of the digital image boundary","authors":"Mangayarkarasi Ramaiah, Vinayakumar Ravi, Vanmathi Chandrasekaran, Vanitha Mohanraj, Deepa Mani, Angulakshmi Maruthamuthu","doi":"10.1007/s11042-024-20183-1","DOIUrl":"https://doi.org/10.1007/s11042-024-20183-1","url":null,"abstract":"<p>Visually, the environment is made up of a chaotic of irregular polygons. It is an important and intriguing issue in many fields of study to represent and comprehend the irregular polygon. However, approximating the polygon presents significant difficulties from a variety of perspectives. The method provided in this research eliminates the pseudo-redundant points that are not contributing to shape retention and then makes the polygonal approximation with the remaining high-curvature points, as opposed to searching for the real points on the digital image boundary curve. The proposed method uses chain code assignment to obtain initial segmentation points. Using integer arithmetic, the presented method calculates the curvature at each initial pseudo point using sum of squares of deviation. For every initial segmented pseudo point, the difference incurred by all the boundary points lies between its earlier pseudo point and its next initial pseudo point was taken into account. Then, this new proposal removes the redundant point from the subset of initial segmentation points whose curvature deviation is the lowest with each iteration. The method then recalculates the deviation information for the next and previous close pseudo points. Experiments are done with MPEG datasets and synthetic contours to show how well the proposed method works in both quantitative and qualitative ways. The experimental result shows the effectiveness of the proposed method in creating polygons with few points.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"35 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient compressed storage and fast reconstruction of large binary images using chain codes 利用链码高效压缩存储和快速重建大型二进制图像
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-09 DOI: 10.1007/s11042-024-20199-7
Damjan Strnad, Danijel Žlaus, Andrej Nerat, Borut Žalik

Large binary images are used in many modern applications of image processing. For instance, they serve as inputs or target masks for training machine learning (ML) models in computer vision and image segmentation. Storing large binary images in limited memory and loading them repeatedly on demand, which is common in ML, calls for efficient image encoding and decoding mechanisms. In the paper, we propose an encoding scheme for efficient compressed storage of large binary images based on chain codes, and introduce a new single-pass algorithm for fast parallel reconstruction of raster images from the encoded representation. We use three large real-life binary masks to test the efficiency of the proposed method, which were derived from vector layers of single-class objects – a building cadaster, a woody vegetation landscape feature map, and a road network map. We show that the masks encoded by the proposed method require significantly less storage space than standard lossless compression formats. We further compared the proposed method for mask reconstruction from chain codes with a recent state-of-the-art algorithm, and achieved between (12%) and (33%) faster reconstruction on test data.

大型二进制图像被用于现代图像处理的许多应用中。例如,在计算机视觉和图像分割中,它们被用作训练机器学习(ML)模型的输入或目标掩码。将大型二进制图像存储在有限的内存中并按需反复加载(这在 ML 中很常见),需要高效的图像编码和解码机制。在本文中,我们提出了一种基于链码的编码方案,用于高效压缩存储大型二进制图像,并引入了一种新的单程算法,用于从编码表示快速并行重建光栅图像。我们使用三个大型真实二进制掩码来测试所提方法的效率,这三个掩码分别来自单类对象的矢量图层--建筑清册、木本植被景观特征图和道路网络图。我们发现,与标准的无损压缩格式相比,拟议方法编码的掩码所需的存储空间要少得多。我们进一步比较了所提出的从链码中重建掩码的方法和最近的一种最先进的算法,并在测试数据上实现了介于(12%)和(33%)之间的更快的重建速度。
{"title":"Efficient compressed storage and fast reconstruction of large binary images using chain codes","authors":"Damjan Strnad, Danijel Žlaus, Andrej Nerat, Borut Žalik","doi":"10.1007/s11042-024-20199-7","DOIUrl":"https://doi.org/10.1007/s11042-024-20199-7","url":null,"abstract":"<p>Large binary images are used in many modern applications of image processing. For instance, they serve as inputs or target masks for training machine learning (ML) models in computer vision and image segmentation. Storing large binary images in limited memory and loading them repeatedly on demand, which is common in ML, calls for efficient image encoding and decoding mechanisms. In the paper, we propose an encoding scheme for efficient compressed storage of large binary images based on chain codes, and introduce a new single-pass algorithm for fast parallel reconstruction of raster images from the encoded representation. We use three large real-life binary masks to test the efficiency of the proposed method, which were derived from vector layers of single-class objects – a building cadaster, a woody vegetation landscape feature map, and a road network map. We show that the masks encoded by the proposed method require significantly less storage space than standard lossless compression formats. We further compared the proposed method for mask reconstruction from chain codes with a recent state-of-the-art algorithm, and achieved between <span>(12%)</span> and <span>(33%)</span> faster reconstruction on test data.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"168 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting eye-tracking assisted web page segmentation 预测眼动跟踪辅助网页分割
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-09 DOI: 10.1007/s11042-024-20202-1
Abdullah Sulayfani, Sukru Eraslan, Yeliz Yesilada

Different kinds of algorithms have been proposed to identify the visual elements of web pages for different purposes, such as improving web accessibility, measuring web page visual quality and aesthetics etc. One group of these algorithms identifies the elements by analyzing the source code and visual representation of web pages, whereas another group discovers the attractive elements by analyzing the eye movements of users. A previous approach proposes combining these two approaches to consider both the source code and visual representation of web pages and users’ eye movements on those pages. The result of the proposed approach can be considered eye-tracking-assisted web page segmentation. However, since the eye-tracking data collection procedure is elaborate, time-consuming, and expensive, and it is not feasible to collect eye-tracking data for each page, we aim to develop a model to predict such segmentation without requiring eye-tracking data. In this paper, we present our experiments with different Machine and Deep Learning algorithms and show that the K-Nearest Neighbour (KNN) model yields the best results in prediction. We present a KNN model that predicts eye-tracking-assisted web page segmentation with an F1-score of 78.74%. This work shows how an Machine Learning algorithm can automate web page segmentation driven by eye-tracking data.

为了提高网页的可访问性、衡量网页的视觉质量和美感等不同目的,人们提出了不同类型的算法来识别网页的视觉元素。其中一类算法通过分析源代码和网页的视觉表现来识别元素,而另一类算法则通过分析用户的眼球运动来发现有吸引力的元素。前一种方法建议将这两种方法结合起来,同时考虑网页的源代码和视觉表现以及用户在这些网页上的眼球运动。该方法的结果可视为眼动跟踪辅助网页分割。然而,由于眼动跟踪数据收集过程繁琐、耗时且昂贵,而且为每个网页收集眼动跟踪数据并不可行,因此我们的目标是开发一种无需眼动跟踪数据即可预测网页分割的模型。在本文中,我们使用不同的机器学习和深度学习算法进行了实验,结果表明 K-Nearest Neighbour (KNN) 模型的预测效果最好。我们提出了一个 KNN 模型,该模型可预测眼动跟踪辅助网页分割,F1 分数高达 78.74%。这项工作展示了机器学习算法如何在眼动跟踪数据的驱动下自动进行网页分割。
{"title":"Predicting eye-tracking assisted web page segmentation","authors":"Abdullah Sulayfani, Sukru Eraslan, Yeliz Yesilada","doi":"10.1007/s11042-024-20202-1","DOIUrl":"https://doi.org/10.1007/s11042-024-20202-1","url":null,"abstract":"<p>Different kinds of algorithms have been proposed to identify the visual elements of web pages for different purposes, such as improving web accessibility, measuring web page visual quality and aesthetics etc. One group of these algorithms identifies the elements by analyzing the source code and visual representation of web pages, whereas another group discovers the attractive elements by analyzing the eye movements of users. A previous approach proposes combining these two approaches to consider both the source code and visual representation of web pages and users’ eye movements on those pages. The result of the proposed approach can be considered eye-tracking-assisted web page segmentation. However, since the eye-tracking data collection procedure is elaborate, time-consuming, and expensive, and it is not feasible to collect eye-tracking data for each page, we aim to develop a model to predict such segmentation without requiring eye-tracking data. In this paper, we present our experiments with different Machine and Deep Learning algorithms and show that the K-Nearest Neighbour (KNN) model yields the best results in prediction. We present a KNN model that predicts eye-tracking-assisted web page segmentation with an F1-score of 78.74%. This work shows how an Machine Learning algorithm can automate web page segmentation driven by eye-tracking data.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"21 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An optimized cluster validity index for identification of cancer mediating genes 用于识别癌症介导基因的优化聚类有效性指数
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-09 DOI: 10.1007/s11042-024-20105-1
Subir Hazra, Anupam Ghosh

One of the major challenges in bioinformatics lies in identification of modified gene expressions of an affected person due to medical ailments. Focused research has been observed till date in such identification, leading to multiple proposals pivoting in clustering of gene expressions. Moreover, while clustering proves to be an effective way to demarcate the affected gene expression vectors, there has been global research on the cluster count that optimizes the gene expression variations among the clusters. This study proposes a new index called mean-max index (MMI) to determine the cluster count which divides the data collection into ideal number of clusters depending on gene expression variations. MMI works on the principle of minimization of the intra cluster variations among the members and maximization of inter cluster variations. In this regard, the study has been conducted on publicly available dataset comprising of gene expressions for three diseases, namely lung disease, leukaemia, and colon cancer. The data count for normal as well as diseased patients lie at 10 and 86 for lung disease patients, 43 and 13 for patients observed with leukaemia, and 18 and 18 for patients with colon cancer respectively. The gene expression vectors for the three diseases comprise of 7129,22283, and 6600 respectively. Three clustering models have been used for this study, namely k-means, partition around medoid, and fuzzy c-means, all using the proposed MMI technique for finalizing the cluster count. The Comparative analysis reflects that the proposed MMI index is able to recognize much more true positives (biologically enriched) cancer mediating genes with respect to other cluster validity indices and it can be considered as superior to other with respect to enhanced accuracy by 85%.

生物信息学面临的主要挑战之一是识别患者因疾病而改变的基因表达。迄今为止,人们一直在对此类识别进行重点研究,并提出了以基因表达聚类为核心的多项建议。此外,虽然聚类被证明是划分受影响基因表达向量的有效方法,但全球范围内一直在研究如何优化聚类间基因表达变化的聚类计数。本研究提出了一种名为均值-最大值指数(MMI)的新指标来确定聚类数,该指标可根据基因表达变化将数据集划分为理想的聚类数。MMI 的工作原理是将聚类内成员间的差异最小化,而将聚类间的差异最大化。在这方面,研究是在公开可用的数据集上进行的,该数据集包括肺病、白血病和结肠癌三种疾病的基因表达。肺病患者的正常和患病数据分别为 10 和 86,白血病患者的正常和患病数据分别为 43 和 13,结肠癌患者的正常和患病数据分别为 18 和 18。三种疾病的基因表达向量分别为 7129、22283 和 6600。本研究使用了三种聚类模型,分别是 K-均值聚类、围绕 Medoid 的分区聚类和模糊 C-均值聚类,所有模型都使用了建议的 MMI 技术来最终确定聚类数量。对比分析表明,与其他聚类有效性指数相比,所提出的 MMI 指数能够识别出更多的真阳性(生物富集)癌症介导基因,其准确率提高了 85%,可谓优于其他指数。
{"title":"An optimized cluster validity index for identification of cancer mediating genes","authors":"Subir Hazra, Anupam Ghosh","doi":"10.1007/s11042-024-20105-1","DOIUrl":"https://doi.org/10.1007/s11042-024-20105-1","url":null,"abstract":"<p>One of the major challenges in bioinformatics lies in identification of modified gene expressions of an affected person due to medical ailments. Focused research has been observed till date in such identification, leading to multiple proposals pivoting in clustering of gene expressions. Moreover, while clustering proves to be an effective way to demarcate the affected gene expression vectors, there has been global research on the cluster count that optimizes the gene expression variations among the clusters. This study proposes a new index called mean-max index (MMI) to determine the cluster count which divides the data collection into ideal number of clusters depending on gene expression variations. MMI works on the principle of minimization of the intra cluster variations among the members and maximization of inter cluster variations. In this regard, the study has been conducted on publicly available dataset comprising of gene expressions for three diseases, namely lung disease, leukaemia, and colon cancer. The data count for normal as well as diseased patients lie at 10 and 86 for lung disease patients, 43 and 13 for patients observed with leukaemia, and 18 and 18 for patients with colon cancer respectively. The gene expression vectors for the three diseases comprise of 7129,22283, and 6600 respectively. Three clustering models have been used for this study, namely k-means, partition around medoid, and fuzzy c-means, all using the proposed MMI technique for finalizing the cluster count. The Comparative analysis reflects that the proposed MMI index is able to recognize much more true positives (biologically enriched) cancer mediating genes with respect to other cluster validity indices and it can be considered as superior to other with respect to enhanced accuracy by 85%.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"407 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A GAN based method for cross-scene classification of hyperspectral scenes captured by different sensors 基于 GAN 的方法,用于对不同传感器捕获的高光谱场景进行跨场景分类
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-09 DOI: 10.1007/s11042-024-19969-0
Amir Mahmoudi, Alireza Ahmadyfard

Labeling samples in hyperspectral images is time-consuming and labor-intensive. Domain adaptation methods seek to address this challenge by transferring the knowledge from a labeled source domain to an unlabeled target domain, enabling classification with minimal or no labeled samples in the target domain. This is achieved by mitigating the domain shift caused by differences in sensing conditions. However, most of the existing works implement domain adaptation techniques on homogeneous hyperspectral data where both source and target are acquired by the same sensor and contain an equal number of spectral bands. The present paper proposes an end-to-end network, Generative Adversarial Network for Heterogeneous Domain Adaptation (GANHDA), capable of handling domain adaptation between target and source scenes captured by different sensors with varying spectral and spatial resolutions, resulting in non-equivalent data representations across domains. GANHDA leverages adversarial training, a bi-classifier, variational autoencoders, and graph regularization to transfer high-level conceptual knowledge from the source to the target domain, aiming for improved classification performance. This approach is applied to two heterogeneous hyperspectral datasets, namely RPaviaU-DPaviaC and EHangzhou-RPaviaHR. All source labels are used for training, while only 5 pixels per class from the target are used for training. The results are promising and we achieved an overall accuracy of 90.16% for RPaviaU-DPaviaC and 99.12% for EHangzhou-RPaviaHR, outperforming previous methods. Our code Implementation can be found at https://github.com/amirmah/HSI_GANHDA.

对高光谱图像中的样本进行标记既耗时又耗力。域适应方法旨在解决这一难题,它将知识从已标注的源域转移到未标注的目标域,从而在目标域中使用最少或无标注样本进行分类。这是通过减轻感知条件差异造成的领域偏移来实现的。然而,现有的大多数研究都是在同质高光谱数据上实施域自适应技术,即源数据和目标数据由同一个传感器采集,且包含相同数量的光谱带。本文提出了一种端到端网络--异构域自适应生成对抗网络(GANHDA),能够处理由不同传感器捕获的目标和源场景之间的域自适应问题,这些场景的光谱和空间分辨率各不相同,导致跨域的数据表示不相等。GANHDA 利用对抗训练、双分类器、变异自动编码器和图正则化将高级概念知识从源领域转移到目标领域,从而提高分类性能。这种方法适用于两个异构高光谱数据集,即 RPaviaU-DPaviaC 和 EHangzhou-RPaviaHR。所有源标签都被用于训练,而目标的每个类别只有 5 个像素被用于训练。结果令人鼓舞,RPaviaU-DPaviaC 和 EHangzhou-RPaviaHR 的总体准确率分别达到 90.16% 和 99.12%,优于之前的方法。我们的代码实现见 https://github.com/amirmah/HSI_GANHDA。
{"title":"A GAN based method for cross-scene classification of hyperspectral scenes captured by different sensors","authors":"Amir Mahmoudi, Alireza Ahmadyfard","doi":"10.1007/s11042-024-19969-0","DOIUrl":"https://doi.org/10.1007/s11042-024-19969-0","url":null,"abstract":"<p>Labeling samples in hyperspectral images is time-consuming and labor-intensive. Domain adaptation methods seek to address this challenge by transferring the knowledge from a labeled source domain to an unlabeled target domain, enabling classification with minimal or no labeled samples in the target domain. This is achieved by mitigating the domain shift caused by differences in sensing conditions. However, most of the existing works implement domain adaptation techniques on homogeneous hyperspectral data where both source and target are acquired by the same sensor and contain an equal number of spectral bands. The present paper proposes an end-to-end network, Generative Adversarial Network for Heterogeneous Domain Adaptation (GANHDA), capable of handling domain adaptation between target and source scenes captured by different sensors with varying spectral and spatial resolutions, resulting in non-equivalent data representations across domains. GANHDA leverages adversarial training, a bi-classifier, variational autoencoders, and graph regularization to transfer high-level conceptual knowledge from the source to the target domain, aiming for improved classification performance. This approach is applied to two heterogeneous hyperspectral datasets, namely RPaviaU-DPaviaC and EHangzhou-RPaviaHR. All source labels are used for training, while only 5 pixels per class from the target are used for training. The results are promising and we achieved an overall accuracy of 90.16% for RPaviaU-DPaviaC and 99.12% for EHangzhou-RPaviaHR, outperforming previous methods. Our code Implementation can be found at https://github.com/amirmah/HSI_GANHDA.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"2 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Multimedia Tools and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1