The degree of students’ attentiveness in the classroom is known as learning attention and is the main indicator used to portray students’ learning status in the classroom. Studying smart classroom time-series image data and analyzing students’ attention to learning are important tools for improving student learning effects. To this end, this paper proposes a learning attention analysis algorithm based on the head pose sight estimation.The algorithm first employs multi-scale hourglass attention to enable the head pose estimation model to capture more spatial pose features.It is also proposed that the multi-classification multi-regression losses guide the model to learn different granularity of pose features, making the model more sensitive to subtle inter-class distinction of the data;Second, a sight estimation algorithm on 3D space is innovatively adopted to compute the coordinates of the student’s sight landing point through the head pose; Finally, a model of sight analysis over the duration of a knowledge point is constructed to characterize students’ attention to learning. Experiments show that the algorithm in this paper can effectively reduce the head pose estimation error, accurately characterize students’ learning attention, and provide strong technical support for the analysis of students’ learning effect. The algorithm demonstrates its potential application value and can be deployed in smart classrooms in schools.
{"title":"Learning attention characterization based on head pose sight estimation","authors":"Jianwen Mo, Haochang Liang, Hua Yuan, Zhaoyu Shou, Huibing Zhang","doi":"10.1007/s11042-024-20204-z","DOIUrl":"https://doi.org/10.1007/s11042-024-20204-z","url":null,"abstract":"<p>The degree of students’ attentiveness in the classroom is known as learning attention and is the main indicator used to portray students’ learning status in the classroom. Studying smart classroom time-series image data and analyzing students’ attention to learning are important tools for improving student learning effects. To this end, this paper proposes a learning attention analysis algorithm based on the head pose sight estimation.The algorithm first employs multi-scale hourglass attention to enable the head pose estimation model to capture more spatial pose features.It is also proposed that the multi-classification multi-regression losses guide the model to learn different granularity of pose features, making the model more sensitive to subtle inter-class distinction of the data;Second, a sight estimation algorithm on 3D space is innovatively adopted to compute the coordinates of the student’s sight landing point through the head pose; Finally, a model of sight analysis over the duration of a knowledge point is constructed to characterize students’ attention to learning. Experiments show that the algorithm in this paper can effectively reduce the head pose estimation error, accurately characterize students’ learning attention, and provide strong technical support for the analysis of students’ learning effect. The algorithm demonstrates its potential application value and can be deployed in smart classrooms in schools.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"24 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-10DOI: 10.1007/s11042-024-19699-3
Peng Zhang, Gengsheng Hu, Mei Chen, Mahmoud Emam
NeRF can render photorealistic 3D scenes. It is widely used in virtual reality, autonomous driving, game development and other fields, and quickly becomes one of the most popular technologies in the field of 3D reconstruction. NeRF generates a realistic 3D scene by emitting light from the camera’s spatial coordinates and viewpoint, passing through the scene and calculating the view seen from the viewpoint. However, when the brightness of the original input image is low, it is difficult to recover the scene. Inspired by the ambient illumination in the Phong model of computer graphics, it is assumed that the final rendered image is the product of scene color and ambient illumination. In this paper, we employ Multi-Layer Perceptron (MLP) network to train the ambient illumination tensor (textbf{I}), which is multiplied by the color predicted by NeRF to render images with normal illumination. Furthermore, we use tiny-cuda-nn as a backbone network to simplify the proposed network structure and greatly improve the training speed. Additionally, a new loss function is introduced to achieve a better image quality under low illumination conditions. The experimental results demonstrate the efficiency of the proposed method in enhancing low-light scene images compared with other state-of-the-art methods, with an overall average of PSNR: 20.53 , SSIM: 0.785, and LPIPS: 0.258 on the LOM dataset.
{"title":"Ambient-NeRF: light train enhancing neural radiance fields in low-light conditions with ambient-illumination","authors":"Peng Zhang, Gengsheng Hu, Mei Chen, Mahmoud Emam","doi":"10.1007/s11042-024-19699-3","DOIUrl":"https://doi.org/10.1007/s11042-024-19699-3","url":null,"abstract":"<p>NeRF can render photorealistic 3D scenes. It is widely used in virtual reality, autonomous driving, game development and other fields, and quickly becomes one of the most popular technologies in the field of 3D reconstruction. NeRF generates a realistic 3D scene by emitting light from the camera’s spatial coordinates and viewpoint, passing through the scene and calculating the view seen from the viewpoint. However, when the brightness of the original input image is low, it is difficult to recover the scene. Inspired by the ambient illumination in the Phong model of computer graphics, it is assumed that the final rendered image is the product of scene color and ambient illumination. In this paper, we employ Multi-Layer Perceptron (MLP) network to train the ambient illumination tensor <span>(textbf{I})</span>, which is multiplied by the color predicted by NeRF to render images with normal illumination. Furthermore, we use tiny-cuda-nn as a backbone network to simplify the proposed network structure and greatly improve the training speed. Additionally, a new loss function is introduced to achieve a better image quality under low illumination conditions. The experimental results demonstrate the efficiency of the proposed method in enhancing low-light scene images compared with other state-of-the-art methods, with an overall average of PSNR: 20.53 , SSIM: 0.785, and LPIPS: 0.258 on the LOM dataset.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"106 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, significant progress has been made in developing computer-aided diagnosis (CAD) systems for identifying glaucoma abnormalities using fundus images. Despite their drawbacks, methods for extracting features such as wavelets and their variations, along with classifier like support vector machines (SVM), are frequently employed in such systems. This paper introduces a practical and enhanced system for detecting glaucoma in fundus images. The proposed model adresses the chanallages encountered by other existing models in recent litrature. Initially, we have employed contrast limited adaputive histogram equalization (CLAHE) to enhanced the visualization of input fundus inmages. Then, the discrete ripplet-II transform (DR2T) employing a degree of 2 for feature extraction. Afterwards, we have utilized a golden jackal optimization algorithm (GJO) employed to select the optimal features to reduce the dimension of the extracted feature vector. For classification purposes, we have employed a least square support vector machine (LS-SVM) equipped with three kernels: linear, polynomial, and radial basis function (RBF). This setup has been utilized to classify fundus images as either indicative of glaucoma or healthy. The proposed method is validated with the current state-of-the-art models on two standard datasets, namely, G1020 and ORIGA. The results obtained from our experimental result demonstrate that our best suggested approach DR2T+GJO+LS-SVM-RBF obtains better classification accuracy 93.38% and 97.31% for G1020 and ORIGA dataset with less number of features. It establishes a more streamlined network layout compared to conventional classifiers.
{"title":"Discrete ripplet-II transform feature extraction and metaheuristic-optimized feature selection for enhanced glaucoma detection in fundus images using least square-support vector machine","authors":"Santosh Kumar Sharma, Debendra Muduli, Adyasha Rath, Sujata Dash, Ganapati Panda, Achyut Shankar, Dinesh Chandra Dobhal","doi":"10.1007/s11042-024-19974-3","DOIUrl":"https://doi.org/10.1007/s11042-024-19974-3","url":null,"abstract":"<p>Recently, significant progress has been made in developing computer-aided diagnosis (CAD) systems for identifying glaucoma abnormalities using fundus images. Despite their drawbacks, methods for extracting features such as wavelets and their variations, along with classifier like support vector machines (SVM), are frequently employed in such systems. This paper introduces a practical and enhanced system for detecting glaucoma in fundus images. The proposed model adresses the chanallages encountered by other existing models in recent litrature. Initially, we have employed contrast limited adaputive histogram equalization (CLAHE) to enhanced the visualization of input fundus inmages. Then, the discrete ripplet-II transform (DR2T) employing a degree of 2 for feature extraction. Afterwards, we have utilized a golden jackal optimization algorithm (GJO) employed to select the optimal features to reduce the dimension of the extracted feature vector. For classification purposes, we have employed a least square support vector machine (LS-SVM) equipped with three kernels: linear, polynomial, and radial basis function (RBF). This setup has been utilized to classify fundus images as either indicative of glaucoma or healthy. The proposed method is validated with the current state-of-the-art models on two standard datasets, namely, G1020 and ORIGA. The results obtained from our experimental result demonstrate that our best suggested approach DR2T+GJO+LS-SVM-RBF obtains better classification accuracy 93.38% and 97.31% for G1020 and ORIGA dataset with less number of features. It establishes a more streamlined network layout compared to conventional classifiers.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"4 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Visually, the environment is made up of a chaotic of irregular polygons. It is an important and intriguing issue in many fields of study to represent and comprehend the irregular polygon. However, approximating the polygon presents significant difficulties from a variety of perspectives. The method provided in this research eliminates the pseudo-redundant points that are not contributing to shape retention and then makes the polygonal approximation with the remaining high-curvature points, as opposed to searching for the real points on the digital image boundary curve. The proposed method uses chain code assignment to obtain initial segmentation points. Using integer arithmetic, the presented method calculates the curvature at each initial pseudo point using sum of squares of deviation. For every initial segmented pseudo point, the difference incurred by all the boundary points lies between its earlier pseudo point and its next initial pseudo point was taken into account. Then, this new proposal removes the redundant point from the subset of initial segmentation points whose curvature deviation is the lowest with each iteration. The method then recalculates the deviation information for the next and previous close pseudo points. Experiments are done with MPEG datasets and synthetic contours to show how well the proposed method works in both quantitative and qualitative ways. The experimental result shows the effectiveness of the proposed method in creating polygons with few points.
{"title":"An efficient iterative pseudo point elimination technique to represent the shape of the digital image boundary","authors":"Mangayarkarasi Ramaiah, Vinayakumar Ravi, Vanmathi Chandrasekaran, Vanitha Mohanraj, Deepa Mani, Angulakshmi Maruthamuthu","doi":"10.1007/s11042-024-20183-1","DOIUrl":"https://doi.org/10.1007/s11042-024-20183-1","url":null,"abstract":"<p>Visually, the environment is made up of a chaotic of irregular polygons. It is an important and intriguing issue in many fields of study to represent and comprehend the irregular polygon. However, approximating the polygon presents significant difficulties from a variety of perspectives. The method provided in this research eliminates the pseudo-redundant points that are not contributing to shape retention and then makes the polygonal approximation with the remaining high-curvature points, as opposed to searching for the real points on the digital image boundary curve. The proposed method uses chain code assignment to obtain initial segmentation points. Using integer arithmetic, the presented method calculates the curvature at each initial pseudo point using sum of squares of deviation. For every initial segmented pseudo point, the difference incurred by all the boundary points lies between its earlier pseudo point and its next initial pseudo point was taken into account. Then, this new proposal removes the redundant point from the subset of initial segmentation points whose curvature deviation is the lowest with each iteration. The method then recalculates the deviation information for the next and previous close pseudo points. Experiments are done with MPEG datasets and synthetic contours to show how well the proposed method works in both quantitative and qualitative ways. The experimental result shows the effectiveness of the proposed method in creating polygons with few points.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"35 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-09DOI: 10.1007/s11042-024-20199-7
Damjan Strnad, Danijel Žlaus, Andrej Nerat, Borut Žalik
Large binary images are used in many modern applications of image processing. For instance, they serve as inputs or target masks for training machine learning (ML) models in computer vision and image segmentation. Storing large binary images in limited memory and loading them repeatedly on demand, which is common in ML, calls for efficient image encoding and decoding mechanisms. In the paper, we propose an encoding scheme for efficient compressed storage of large binary images based on chain codes, and introduce a new single-pass algorithm for fast parallel reconstruction of raster images from the encoded representation. We use three large real-life binary masks to test the efficiency of the proposed method, which were derived from vector layers of single-class objects – a building cadaster, a woody vegetation landscape feature map, and a road network map. We show that the masks encoded by the proposed method require significantly less storage space than standard lossless compression formats. We further compared the proposed method for mask reconstruction from chain codes with a recent state-of-the-art algorithm, and achieved between (12%) and (33%) faster reconstruction on test data.
大型二进制图像被用于现代图像处理的许多应用中。例如,在计算机视觉和图像分割中,它们被用作训练机器学习(ML)模型的输入或目标掩码。将大型二进制图像存储在有限的内存中并按需反复加载(这在 ML 中很常见),需要高效的图像编码和解码机制。在本文中,我们提出了一种基于链码的编码方案,用于高效压缩存储大型二进制图像,并引入了一种新的单程算法,用于从编码表示快速并行重建光栅图像。我们使用三个大型真实二进制掩码来测试所提方法的效率,这三个掩码分别来自单类对象的矢量图层--建筑清册、木本植被景观特征图和道路网络图。我们发现,与标准的无损压缩格式相比,拟议方法编码的掩码所需的存储空间要少得多。我们进一步比较了所提出的从链码中重建掩码的方法和最近的一种最先进的算法,并在测试数据上实现了介于(12%)和(33%)之间的更快的重建速度。
{"title":"Efficient compressed storage and fast reconstruction of large binary images using chain codes","authors":"Damjan Strnad, Danijel Žlaus, Andrej Nerat, Borut Žalik","doi":"10.1007/s11042-024-20199-7","DOIUrl":"https://doi.org/10.1007/s11042-024-20199-7","url":null,"abstract":"<p>Large binary images are used in many modern applications of image processing. For instance, they serve as inputs or target masks for training machine learning (ML) models in computer vision and image segmentation. Storing large binary images in limited memory and loading them repeatedly on demand, which is common in ML, calls for efficient image encoding and decoding mechanisms. In the paper, we propose an encoding scheme for efficient compressed storage of large binary images based on chain codes, and introduce a new single-pass algorithm for fast parallel reconstruction of raster images from the encoded representation. We use three large real-life binary masks to test the efficiency of the proposed method, which were derived from vector layers of single-class objects – a building cadaster, a woody vegetation landscape feature map, and a road network map. We show that the masks encoded by the proposed method require significantly less storage space than standard lossless compression formats. We further compared the proposed method for mask reconstruction from chain codes with a recent state-of-the-art algorithm, and achieved between <span>(12%)</span> and <span>(33%)</span> faster reconstruction on test data.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"168 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-09DOI: 10.1007/s11042-024-20202-1
Abdullah Sulayfani, Sukru Eraslan, Yeliz Yesilada
Different kinds of algorithms have been proposed to identify the visual elements of web pages for different purposes, such as improving web accessibility, measuring web page visual quality and aesthetics etc. One group of these algorithms identifies the elements by analyzing the source code and visual representation of web pages, whereas another group discovers the attractive elements by analyzing the eye movements of users. A previous approach proposes combining these two approaches to consider both the source code and visual representation of web pages and users’ eye movements on those pages. The result of the proposed approach can be considered eye-tracking-assisted web page segmentation. However, since the eye-tracking data collection procedure is elaborate, time-consuming, and expensive, and it is not feasible to collect eye-tracking data for each page, we aim to develop a model to predict such segmentation without requiring eye-tracking data. In this paper, we present our experiments with different Machine and Deep Learning algorithms and show that the K-Nearest Neighbour (KNN) model yields the best results in prediction. We present a KNN model that predicts eye-tracking-assisted web page segmentation with an F1-score of 78.74%. This work shows how an Machine Learning algorithm can automate web page segmentation driven by eye-tracking data.
{"title":"Predicting eye-tracking assisted web page segmentation","authors":"Abdullah Sulayfani, Sukru Eraslan, Yeliz Yesilada","doi":"10.1007/s11042-024-20202-1","DOIUrl":"https://doi.org/10.1007/s11042-024-20202-1","url":null,"abstract":"<p>Different kinds of algorithms have been proposed to identify the visual elements of web pages for different purposes, such as improving web accessibility, measuring web page visual quality and aesthetics etc. One group of these algorithms identifies the elements by analyzing the source code and visual representation of web pages, whereas another group discovers the attractive elements by analyzing the eye movements of users. A previous approach proposes combining these two approaches to consider both the source code and visual representation of web pages and users’ eye movements on those pages. The result of the proposed approach can be considered eye-tracking-assisted web page segmentation. However, since the eye-tracking data collection procedure is elaborate, time-consuming, and expensive, and it is not feasible to collect eye-tracking data for each page, we aim to develop a model to predict such segmentation without requiring eye-tracking data. In this paper, we present our experiments with different Machine and Deep Learning algorithms and show that the K-Nearest Neighbour (KNN) model yields the best results in prediction. We present a KNN model that predicts eye-tracking-assisted web page segmentation with an F1-score of 78.74%. This work shows how an Machine Learning algorithm can automate web page segmentation driven by eye-tracking data.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"21 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-09DOI: 10.1007/s11042-024-20105-1
Subir Hazra, Anupam Ghosh
One of the major challenges in bioinformatics lies in identification of modified gene expressions of an affected person due to medical ailments. Focused research has been observed till date in such identification, leading to multiple proposals pivoting in clustering of gene expressions. Moreover, while clustering proves to be an effective way to demarcate the affected gene expression vectors, there has been global research on the cluster count that optimizes the gene expression variations among the clusters. This study proposes a new index called mean-max index (MMI) to determine the cluster count which divides the data collection into ideal number of clusters depending on gene expression variations. MMI works on the principle of minimization of the intra cluster variations among the members and maximization of inter cluster variations. In this regard, the study has been conducted on publicly available dataset comprising of gene expressions for three diseases, namely lung disease, leukaemia, and colon cancer. The data count for normal as well as diseased patients lie at 10 and 86 for lung disease patients, 43 and 13 for patients observed with leukaemia, and 18 and 18 for patients with colon cancer respectively. The gene expression vectors for the three diseases comprise of 7129,22283, and 6600 respectively. Three clustering models have been used for this study, namely k-means, partition around medoid, and fuzzy c-means, all using the proposed MMI technique for finalizing the cluster count. The Comparative analysis reflects that the proposed MMI index is able to recognize much more true positives (biologically enriched) cancer mediating genes with respect to other cluster validity indices and it can be considered as superior to other with respect to enhanced accuracy by 85%.
{"title":"An optimized cluster validity index for identification of cancer mediating genes","authors":"Subir Hazra, Anupam Ghosh","doi":"10.1007/s11042-024-20105-1","DOIUrl":"https://doi.org/10.1007/s11042-024-20105-1","url":null,"abstract":"<p>One of the major challenges in bioinformatics lies in identification of modified gene expressions of an affected person due to medical ailments. Focused research has been observed till date in such identification, leading to multiple proposals pivoting in clustering of gene expressions. Moreover, while clustering proves to be an effective way to demarcate the affected gene expression vectors, there has been global research on the cluster count that optimizes the gene expression variations among the clusters. This study proposes a new index called mean-max index (MMI) to determine the cluster count which divides the data collection into ideal number of clusters depending on gene expression variations. MMI works on the principle of minimization of the intra cluster variations among the members and maximization of inter cluster variations. In this regard, the study has been conducted on publicly available dataset comprising of gene expressions for three diseases, namely lung disease, leukaemia, and colon cancer. The data count for normal as well as diseased patients lie at 10 and 86 for lung disease patients, 43 and 13 for patients observed with leukaemia, and 18 and 18 for patients with colon cancer respectively. The gene expression vectors for the three diseases comprise of 7129,22283, and 6600 respectively. Three clustering models have been used for this study, namely k-means, partition around medoid, and fuzzy c-means, all using the proposed MMI technique for finalizing the cluster count. The Comparative analysis reflects that the proposed MMI index is able to recognize much more true positives (biologically enriched) cancer mediating genes with respect to other cluster validity indices and it can be considered as superior to other with respect to enhanced accuracy by 85%.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"407 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-09DOI: 10.1007/s11042-024-19969-0
Amir Mahmoudi, Alireza Ahmadyfard
Labeling samples in hyperspectral images is time-consuming and labor-intensive. Domain adaptation methods seek to address this challenge by transferring the knowledge from a labeled source domain to an unlabeled target domain, enabling classification with minimal or no labeled samples in the target domain. This is achieved by mitigating the domain shift caused by differences in sensing conditions. However, most of the existing works implement domain adaptation techniques on homogeneous hyperspectral data where both source and target are acquired by the same sensor and contain an equal number of spectral bands. The present paper proposes an end-to-end network, Generative Adversarial Network for Heterogeneous Domain Adaptation (GANHDA), capable of handling domain adaptation between target and source scenes captured by different sensors with varying spectral and spatial resolutions, resulting in non-equivalent data representations across domains. GANHDA leverages adversarial training, a bi-classifier, variational autoencoders, and graph regularization to transfer high-level conceptual knowledge from the source to the target domain, aiming for improved classification performance. This approach is applied to two heterogeneous hyperspectral datasets, namely RPaviaU-DPaviaC and EHangzhou-RPaviaHR. All source labels are used for training, while only 5 pixels per class from the target are used for training. The results are promising and we achieved an overall accuracy of 90.16% for RPaviaU-DPaviaC and 99.12% for EHangzhou-RPaviaHR, outperforming previous methods. Our code Implementation can be found at https://github.com/amirmah/HSI_GANHDA.
{"title":"A GAN based method for cross-scene classification of hyperspectral scenes captured by different sensors","authors":"Amir Mahmoudi, Alireza Ahmadyfard","doi":"10.1007/s11042-024-19969-0","DOIUrl":"https://doi.org/10.1007/s11042-024-19969-0","url":null,"abstract":"<p>Labeling samples in hyperspectral images is time-consuming and labor-intensive. Domain adaptation methods seek to address this challenge by transferring the knowledge from a labeled source domain to an unlabeled target domain, enabling classification with minimal or no labeled samples in the target domain. This is achieved by mitigating the domain shift caused by differences in sensing conditions. However, most of the existing works implement domain adaptation techniques on homogeneous hyperspectral data where both source and target are acquired by the same sensor and contain an equal number of spectral bands. The present paper proposes an end-to-end network, Generative Adversarial Network for Heterogeneous Domain Adaptation (GANHDA), capable of handling domain adaptation between target and source scenes captured by different sensors with varying spectral and spatial resolutions, resulting in non-equivalent data representations across domains. GANHDA leverages adversarial training, a bi-classifier, variational autoencoders, and graph regularization to transfer high-level conceptual knowledge from the source to the target domain, aiming for improved classification performance. This approach is applied to two heterogeneous hyperspectral datasets, namely RPaviaU-DPaviaC and EHangzhou-RPaviaHR. All source labels are used for training, while only 5 pixels per class from the target are used for training. The results are promising and we achieved an overall accuracy of 90.16% for RPaviaU-DPaviaC and 99.12% for EHangzhou-RPaviaHR, outperforming previous methods. Our code Implementation can be found at https://github.com/amirmah/HSI_GANHDA.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"2 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-09DOI: 10.1007/s11042-024-20099-w
Tuan Linh Dang, Nguyen Minh Nhat Hoang, The Vu Nguyen, Hoang Vu Nguyen, Quang Minh Dang, Quang Hai Tran, Huy Hoang Pham
The COVID-19 outbreak has caused a significant shift towards virtual education, where Massive Open Online Courses (MOOCs), such as EdX and Coursera, have become prevalent distance learning mediums. Online exams are also gaining popularity, but they pose a risk of cheating without proper supervision. Online proctoring can significantly improve the quality of education, and with the addition of extended modules on MOOCs, the incorporation of artificial intelligence in the proctoring process has become more accessible. Despite the advancements in machine learning-based cheating detection in third-party proctoring tools, there is still a need for optimization and adaptability of such systems for massive simultaneous user requirements of MOOCs. Therefore, we have developed an examination monitoring system based on advanced artificial intelligence technology. This system is highly scalable and can be easily integrated with our existing MOOCs platform, daotao.ai. Experimental results demonstrated that our proposed system achieved a 95.66% accuracy rate in detecting cheating behaviors, processed video inputs with an average response time of 0.517 seconds, and successfully handled concurrent user demands, thereby validating its effectiveness and reliability for large-scale online examination monitoring.
{"title":"Auto-proctoring using computer vision in MOOCs system","authors":"Tuan Linh Dang, Nguyen Minh Nhat Hoang, The Vu Nguyen, Hoang Vu Nguyen, Quang Minh Dang, Quang Hai Tran, Huy Hoang Pham","doi":"10.1007/s11042-024-20099-w","DOIUrl":"https://doi.org/10.1007/s11042-024-20099-w","url":null,"abstract":"<p>The COVID-19 outbreak has caused a significant shift towards virtual education, where Massive Open Online Courses (MOOCs), such as EdX and Coursera, have become prevalent distance learning mediums. Online exams are also gaining popularity, but they pose a risk of cheating without proper supervision. Online proctoring can significantly improve the quality of education, and with the addition of extended modules on MOOCs, the incorporation of artificial intelligence in the proctoring process has become more accessible. Despite the advancements in machine learning-based cheating detection in third-party proctoring tools, there is still a need for optimization and adaptability of such systems for massive simultaneous user requirements of MOOCs. Therefore, we have developed an examination monitoring system based on advanced artificial intelligence technology. This system is highly scalable and can be easily integrated with our existing MOOCs platform, daotao.ai. Experimental results demonstrated that our proposed system achieved a 95.66% accuracy rate in detecting cheating behaviors, processed video inputs with an average response time of 0.517 seconds, and successfully handled concurrent user demands, thereby validating its effectiveness and reliability for large-scale online examination monitoring.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"6 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-09DOI: 10.1007/s11042-024-19883-5
Chandini A G, P. I Basarkod
Numerous healthcare organizations maintain track of the patients’ medical information with an Electronic Health Record (EHR). Nowadays, patients demand instant access to their medical records. Hence, Deep Learning (DL) methods are employed in electronic healthcare sectors for medical image processing and smart supply chain management. Various approaches are presented for the protection of healthcare data of patients using blockchain however, there are concerns regarding the security and privacy of patient medical records in the health industry, where data can be accessed instantly. The blockchain-based security with DL approaches helps to solve this problem and there is a need for improvements on the DL-based blockchain methods for privacy and security of patient data and access control strategies with developments in the supply chain. The survey provides a clear idea of DL-based strategies used in electronic healthcare data storage and security along with the integrity verification approaches. Also, it provides a comparative analysis to demonstrate the effectiveness of various blockchain-based EHR handling techniques. Moreover, future directions are provided to overcome the existing impact of various techniques in blockchain security for EHRs.
{"title":"A survey on blockchain security for electronic health record","authors":"Chandini A G, P. I Basarkod","doi":"10.1007/s11042-024-19883-5","DOIUrl":"https://doi.org/10.1007/s11042-024-19883-5","url":null,"abstract":"<p>Numerous healthcare organizations maintain track of the patients’ medical information with an Electronic Health Record (EHR). Nowadays, patients demand instant access to their medical records. Hence, Deep Learning (DL) methods are employed in electronic healthcare sectors for medical image processing and smart supply chain management. Various approaches are presented for the protection of healthcare data of patients using blockchain however, there are concerns regarding the security and privacy of patient medical records in the health industry, where data can be accessed instantly. The blockchain-based security with DL approaches helps to solve this problem and there is a need for improvements on the DL-based blockchain methods for privacy and security of patient data and access control strategies with developments in the supply chain. The survey provides a clear idea of DL-based strategies used in electronic healthcare data storage and security along with the integrity verification approaches. Also, it provides a comparative analysis to demonstrate the effectiveness of various blockchain-based EHR handling techniques. Moreover, future directions are provided to overcome the existing impact of various techniques in blockchain security for EHRs.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"13 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}