Pub Date : 2024-09-14DOI: 10.1007/s11042-024-20108-y
Musatafa Abbas Abbood Albadr, Masri Ayob, Sabrina Tiun, Raad Z. Homod, Fahad Taha AL-Dhief, Mohammed Hasan Mutar
Various speech processing approaches (e.g., acoustic feature extraction techniques) and Machine Learning (ML) algorithms have been applied to diagnosing Parkinson's disease (PD). However, the majority of these researches have used conventional techniques which obtain a low accuracy rate in diagnosing PD and still need further improvement. Particle Swarm Optimization-Extreme Learning Machine (PSO-ELM), one of the most recent and effective ML techniques, could be considered an accurate strategy in the classification process but has not been applied to solve the problem of PD diagnosis. Thus, in order to enhance the precision of the PD diagnosing, this study employs the PSO-ELM classifier and examines how well it performs on seven feature extraction techniques (basic features, WT (Wavelet Transform), MFCC (Mel Frequency Cepstral Coefficients), bandwidth + formant, intensity parameters, TQWT (Tunable Q-factor Wavelet Transform), and vocal fold features). The PSO-ELM approach has the capability to a) prevents overfitting, b) solve the binary and multi class classification issues, and c) perform like a kernel-based support vector machine with a structure of neural network. Therefore, if the combination of PSO-ELM classifier and appropriate feature extraction technique can improve learning performance, this combination can produce an effective method for identifying PD. In this study, the PD's voice samples have been taken from the Parkinson’s Disease Classification Benchmark Dataset. To discover a useful feature extraction technique to couple with the PSO-ELM classifier, we applied PSO-ELM to each extracted feature with the utilisation of unbalanced and balanced dataset. According to the experimental results, the MFCC features assist the PSO-ELM classifier to attaining its greatest accuracy, up to 97.35% using unbalanced dataset and 100.00% using balanced dataset. This shows that combining PSO-ELM with MFCC can improve learning performance, ultimately creating an effective method for identifying PD.
{"title":"Parkinson's disease diagnosis by voice data using particle swarm optimization-extreme learning machine approach","authors":"Musatafa Abbas Abbood Albadr, Masri Ayob, Sabrina Tiun, Raad Z. Homod, Fahad Taha AL-Dhief, Mohammed Hasan Mutar","doi":"10.1007/s11042-024-20108-y","DOIUrl":"https://doi.org/10.1007/s11042-024-20108-y","url":null,"abstract":"<p>Various speech processing approaches (e.g., acoustic feature extraction techniques) and Machine Learning (ML) algorithms have been applied to diagnosing Parkinson's disease (PD). However, the majority of these researches have used conventional techniques which obtain a low accuracy rate in diagnosing PD and still need further improvement. Particle Swarm Optimization-Extreme Learning Machine (PSO-ELM), one of the most recent and effective ML techniques, could be considered an accurate strategy in the classification process but has not been applied to solve the problem of PD diagnosis. Thus, in order to enhance the precision of the PD diagnosing, this study employs the PSO-ELM classifier and examines how well it performs on seven feature extraction techniques (basic features, WT (Wavelet Transform), MFCC (Mel Frequency Cepstral Coefficients), bandwidth + formant, intensity parameters, TQWT (Tunable Q-factor Wavelet Transform), and vocal fold features). The PSO-ELM approach has the capability to <b>a)</b> prevents overfitting, <b>b)</b> solve the binary and multi class classification issues, and <b>c)</b> perform like a kernel-based support vector machine with a structure of neural network. Therefore, if the combination of PSO-ELM classifier and appropriate feature extraction technique can improve learning performance, this combination can produce an effective method for identifying PD. In this study, the PD's voice samples have been taken from the Parkinson’s Disease Classification Benchmark Dataset. To discover a useful feature extraction technique to couple with the PSO-ELM classifier, we applied PSO-ELM to each extracted feature with the utilisation of unbalanced and balanced dataset. According to the experimental results, the MFCC features assist the PSO-ELM classifier to attaining its greatest accuracy, up to 97.35% using unbalanced dataset and 100.00% using balanced dataset. This shows that combining PSO-ELM with MFCC can improve learning performance, ultimately creating an effective method for identifying PD.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"64 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-14DOI: 10.1007/s11042-024-20110-4
Neha Singh, Ashish Kumar Bhandari
In the field of ophthalmology, digital images play an important role for automatic detection of various kind of eye diseases. Digital images in the field image enhancement are the first stage to assisting ophthalmologist for diagnosis. As a result, various algorithms, and methods for the enhancement of retinal images have been developed, which may face obstacles that are common in augmentation processes, such as false edges and weak illuminated that obscure image particulars. To eliminate such issues, this paper projected a novel framework for unexposed retinal image. The proposed paper uses multiscale Gaussian function for estimation of illumination layer from unexposed color retinal image and then it is corrected by gamma method. Further to this, the principal component analysis (PCA) is utilized here to generate fused enhance result for unexposed retinal images. Then, contrast limited technique is employed here for further edge and contextual details improvement. When compared to several enhancement-based state-of-the-art procedures, experimental results show that the suggested method produces results with good contrast and brightness. The significance of the proposed method that this method may help ophthalmologists screen for unexposed retinal illnesses more efficiently and build better automated image analysis for healthcare diagnosis.
{"title":"Principal component fusion based unexposed biological feature enhancement of fundus images","authors":"Neha Singh, Ashish Kumar Bhandari","doi":"10.1007/s11042-024-20110-4","DOIUrl":"https://doi.org/10.1007/s11042-024-20110-4","url":null,"abstract":"<p>In the field of ophthalmology, digital images play an important role for automatic detection of various kind of eye diseases. Digital images in the field image enhancement are the first stage to assisting ophthalmologist for diagnosis. As a result, various algorithms, and methods for the enhancement of retinal images have been developed, which may face obstacles that are common in augmentation processes, such as false edges and weak illuminated that obscure image particulars. To eliminate such issues, this paper projected a novel framework for unexposed retinal image. The proposed paper uses multiscale Gaussian function for estimation of illumination layer from unexposed color retinal image and then it is corrected by gamma method. Further to this, the principal component analysis (PCA) is utilized here to generate fused enhance result for unexposed retinal images. Then, contrast limited technique is employed here for further edge and contextual details improvement. When compared to several enhancement-based state-of-the-art procedures, experimental results show that the suggested method produces results with good contrast and brightness. The significance of the proposed method that this method may help ophthalmologists screen for unexposed retinal illnesses more efficiently and build better automated image analysis for healthcare diagnosis.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"11 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-14DOI: 10.1007/s11042-024-20153-7
Palwinder Kaur, Amandeep Kaur
COVID-19 emerged as a pandemic in December 2019. This virus targets the pulmonary systems of humans. Therefore, chest radiographic imaging is required to monitor effect of the virus, prevent the spread and decrease the mortality rate. Imaging-based testing leads to a high burden on the radiologist manually screening the images. To make the imaging-based method an efficient diagnosis tool, screening automation with minimum human interference is a necessity. It opens numerous challenges for scientists and researchers to develop automatic diagnostic tools for COVID-19 detection. In this paper, we present two speed-enhanced convolutional neural networks (SECNNs) to automatically detect COVID-19 among the X-rays of COVID-19, pneumonia and healthy subjects. For 2-class classification (2CC) and 3-class classification (3CC), we named the models SECNN-2CC and SECNN-3CC respectively. The scope of this work is to highlight the significance and potential of CNN models built from scratch in COVID-19 identification. We conduct six experiments using six different balanced and imbalanced kinds of datasets. In the datasets, All X-rays are from different patients therefore it was more challenging for us to design the models which extract abstract features from a highly variable dataset. Experimental results show that the proposed models exhibit exemplary performance. The highest accuracy for 2CC (COVID-19 vs Pneumonia) is 99.92%. For 3CC (COVID-19 vs Normal vs Pneumonia), the highest accuracy achieved is 99.51%. We believe that this study will be of great importance in diagnosing COVID-19 and also provide a deeper analysis to discriminate among pneumonia, COVID-19 patients and healthy subjects using X-rays.
COVID-19 于 2019 年 12 月作为大流行病出现。这种病毒的目标是人类的肺部系统。因此,需要胸部放射成像来监测病毒的影响、防止传播并降低死亡率。基于成像的检测给放射科医生手动筛选图像带来了很大负担。要使基于成像的方法成为一种高效的诊断工具,就必须实现筛查自动化,尽量减少人为干扰。这为科学家和研究人员开发 COVID-19 检测的自动诊断工具带来了诸多挑战。在本文中,我们提出了两种速度增强型卷积神经网络(SECNN),用于在 COVID-19、肺炎和健康受试者的 X 光片中自动检测 COVID-19。对于二类分类(2CC)和三类分类(3CC),我们将模型分别命名为 SECNN-2CC 和 SECNN-3CC。这项工作的目的是强调从零开始建立的 CNN 模型在 COVID-19 识别中的意义和潜力。我们使用六种不同的平衡和不平衡数据集进行了六次实验。在这些数据集中,所有的 X 光片都来自不同的患者,因此设计从高度多变的数据集中提取抽象特征的模型对我们来说更具挑战性。实验结果表明,所提出的模型表现出卓越的性能。2CC(COVID-19 与肺炎)的最高准确率为 99.92%。3CC(COVID-19 vs 正常 vs 肺炎)的最高准确率为 99.51%。我们相信,这项研究将对诊断 COVID-19 具有重要意义,同时也为使用 X 射线区分肺炎、COVID-19 患者和健康人提供了更深入的分析。
{"title":"Speed-enhanced convolutional neural networks for COVID-19 classification using X-rays","authors":"Palwinder Kaur, Amandeep Kaur","doi":"10.1007/s11042-024-20153-7","DOIUrl":"https://doi.org/10.1007/s11042-024-20153-7","url":null,"abstract":"<p>COVID-19 emerged as a pandemic in December 2019. This virus targets the pulmonary systems of humans. Therefore, chest radiographic imaging is required to monitor effect of the virus, prevent the spread and decrease the mortality rate. Imaging-based testing leads to a high burden on the radiologist manually screening the images. To make the imaging-based method an efficient diagnosis tool, screening automation with minimum human interference is a necessity. It opens numerous challenges for scientists and researchers to develop automatic diagnostic tools for COVID-19 detection. In this paper, we present two speed-enhanced convolutional neural networks (SECNNs) to automatically detect COVID-19 among the X-rays of COVID-19, pneumonia and healthy subjects. For 2-class classification (2CC) and 3-class classification (3CC), we named the models SECNN-2CC and SECNN-3CC respectively. The scope of this work is to highlight the significance and potential of CNN models built from scratch in COVID-19 identification. We conduct six experiments using six different balanced and imbalanced kinds of datasets. In the datasets, All X-rays are from different patients therefore it was more challenging for us to design the models which extract abstract features from a highly variable dataset. Experimental results show that the proposed models exhibit exemplary performance. The highest accuracy for 2CC (COVID-19 vs Pneumonia) is 99.92%. For 3CC (COVID-19 vs Normal vs Pneumonia), the highest accuracy achieved is 99.51%. We believe that this study will be of great importance in diagnosing COVID-19 and also provide a deeper analysis to discriminate among pneumonia, COVID-19 patients and healthy subjects using X-rays.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"2 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-14DOI: 10.1007/s11042-024-20218-7
Yasin Sancar, Ugur Yavuz, Isil Karabey Aksakalli
To evaluate multiple choice question tests, optical forms are commonly used for large-scale exams and these forms are read by the OMR (Optical Mark Recognition) scanners. However, OMR scanners often misinterpret marks that have not been fully erased, which can lead to incorrect readings. To overcome that shortcoming and reduce the time and labor lost in the assessment process, we developed a novel system based on the density of each individual’s markings, providing a more personalized and accurate approach. Instead of reading according to a specific optical form template, a dynamic and flexible structure was generated where users can create own templates and obtain the model that reads according to that template. We also optimized certain aspects of the system for efficiency, such as image memory transfer and QR code reading. These optimizations significantly increase the performance of the OMR scanners. One of the key issues addressed is inaccurate reading of OMR scanners when a student doesn’t fully erase their markings or when markings are faint. After the scanning process, the proposed approach uses a K-means clustering algorithm to classify different density markings. This technique identifies each student’s personal marking density, enabling a more accurate interpretation of their responses. According to the experimental results, we performed 97.7% improvement compared to the misread optics scanned by the conventional OMR devices. In tests performed on 265.816 optical forms, we obtained an accuracy rate of 99.98% and a reading time of 0.12 seconds per optical form.
{"title":"Personal mark density-based high-performance Optical Mark Recognition (OMR) system using K-means clustering algorithm","authors":"Yasin Sancar, Ugur Yavuz, Isil Karabey Aksakalli","doi":"10.1007/s11042-024-20218-7","DOIUrl":"https://doi.org/10.1007/s11042-024-20218-7","url":null,"abstract":"<p>To evaluate multiple choice question tests, optical forms are commonly used for large-scale exams and these forms are read by the OMR (Optical Mark Recognition) scanners. However, OMR scanners often misinterpret marks that have not been fully erased, which can lead to incorrect readings. To overcome that shortcoming and reduce the time and labor lost in the assessment process, we developed a novel system based on the density of each individual’s markings, providing a more personalized and accurate approach. Instead of reading according to a specific optical form template, a dynamic and flexible structure was generated where users can create own templates and obtain the model that reads according to that template. We also optimized certain aspects of the system for efficiency, such as image memory transfer and QR code reading. These optimizations significantly increase the performance of the OMR scanners. One of the key issues addressed is inaccurate reading of OMR scanners when a student doesn’t fully erase their markings or when markings are faint. After the scanning process, the proposed approach uses a K-means clustering algorithm to classify different density markings. This technique identifies each student’s personal marking density, enabling a more accurate interpretation of their responses. According to the experimental results, we performed 97.7% improvement compared to the misread optics scanned by the conventional OMR devices. In tests performed on 265.816 optical forms, we obtained an accuracy rate of 99.98% and a reading time of 0.12 seconds per optical form.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"4 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-14DOI: 10.1007/s11042-024-20174-2
Thaís Luiza Donega e Souza, Caetano Mazzoni Ranieri, Anand Panangadan, Jó Ueyama, Marislei Nishijima
A film is considered an experience good, as its quality is only revealed after consumption. This situation creates information asymmetry before consumption, prompting producers, who are aware of their film’s quality, to search for methods to signal this. Economic literature specifies that a signal to disclose a product’s quality must be strong, meaning only producers of good-quality films can effectively utilize such a signal. However, a poster represents the most economical signal, and all producers, regardless of film quality, have access to this option. To study whether a poster can signal film quality, we first apply a low-dimensional representation of poster images and cluster them to identify quality-related patterns. We then perform a supervised classification of films into economically successful and unsuccessful categories using a deep neural network. This is based on the hypothesis that higher quality films tend to sell more tickets and that all producers invest in the highest quality poster services. The results demonstrate that a film’s quality can indeed be predicted from its poster, reinforcing its effectiveness as a strong signal. Despite the proliferation of advanced visual media technologies, a simple yet innovative poster remains an effective and appealing tool for signaling film information. Notably, posters can classify a film’s economic success comparably to trailers but with significantly lower processing costs.
{"title":"Is a poster a strong signal of film quality? evaluating the predictive power of visual elements using deep learning","authors":"Thaís Luiza Donega e Souza, Caetano Mazzoni Ranieri, Anand Panangadan, Jó Ueyama, Marislei Nishijima","doi":"10.1007/s11042-024-20174-2","DOIUrl":"https://doi.org/10.1007/s11042-024-20174-2","url":null,"abstract":"<p>A film is considered an experience good, as its quality is only revealed after consumption. This situation creates information asymmetry before consumption, prompting producers, who are aware of their film’s quality, to search for methods to signal this. Economic literature specifies that a signal to disclose a product’s quality must be strong, meaning only producers of good-quality films can effectively utilize such a signal. However, a poster represents the most economical signal, and all producers, regardless of film quality, have access to this option. To study whether a poster can signal film quality, we first apply a low-dimensional representation of poster images and cluster them to identify quality-related patterns. We then perform a supervised classification of films into economically successful and unsuccessful categories using a deep neural network. This is based on the hypothesis that higher quality films tend to sell more tickets and that all producers invest in the highest quality poster services. The results demonstrate that a film’s quality can indeed be predicted from its poster, reinforcing its effectiveness as a strong signal. Despite the proliferation of advanced visual media technologies, a simple yet innovative poster remains an effective and appealing tool for signaling film information. Notably, posters can classify a film’s economic success comparably to trailers but with significantly lower processing costs.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"119 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-14DOI: 10.1007/s11042-024-20113-1
Avinash L. Golande, Pavankumar T.
Automated detection of cardiovascular diseases based on heartbeats is a difficult and demanding task in signal processing because the routine analysis of the patient’s cardiac arrhythmia is crucial to reducing the mortality rate. Detecting and preventing these deaths requires long-term monitoring and manual examination of electrocardiogram (ECG) signals, which takes a lot of time. This article uses an optimized Vision Transformer technique to effectively detect heart disease. The four key processes are pre-processing input data, feature extraction from pre-processed data, and optimal feature selection and classification to detect heart disease. In the pre-processing phase, single-channel adaptive blind source separation is used for artifact removal and empirical mode decomposition for noise reduction of the ECG signal. After pre-processing, the ECG signal is fed into the Enhanced Pan-Tompkins algorithm (EPTA) and the Hybrid Gabor-Walsh-Hadamard transform (HGWHT) for feature extraction. The extracted feature is selected using a Manifold Grasshopper Optimization algorithm (MGOA). Finally, an Optimized Vision Transformer (OVT) detects heart disease. The experiment is carried out on PTB diagnostic ECG and PTB-XL database, a publicly accessible research datasets. The experiment obtained the following values: accuracy 99.9%, sensitivity 98%, F1 score 99.9%, specificity 90%, processing time 13.254 s, AUC 99.9% and MCC 91% using PTB diagnostic ECG. On the other hand, the proposed method has obtained an accuracy of 99.57%, f1-score of 99.17% and AUC of 99% using PTB-XL dataset. Thus, the overall findings prove that the proposed method outperforms the existing methodology.
{"title":"Manifold grasshopper optimization based extremely disruptive vision transformer model for automatic heart disease detection in raw ECG signals","authors":"Avinash L. Golande, Pavankumar T.","doi":"10.1007/s11042-024-20113-1","DOIUrl":"https://doi.org/10.1007/s11042-024-20113-1","url":null,"abstract":"<p>Automated detection of cardiovascular diseases based on heartbeats is a difficult and demanding task in signal processing because the routine analysis of the patient’s cardiac arrhythmia is crucial to reducing the mortality rate. Detecting and preventing these deaths requires long-term monitoring and manual examination of electrocardiogram (ECG) signals, which takes a lot of time. This article uses an optimized Vision Transformer technique to effectively detect heart disease. The four key processes are pre-processing input data, feature extraction from pre-processed data, and optimal feature selection and classification to detect heart disease. In the pre-processing phase, single-channel adaptive blind source separation is used for artifact removal and empirical mode decomposition for noise reduction of the ECG signal. After pre-processing, the ECG signal is fed into the Enhanced Pan-Tompkins algorithm (EPTA) and the Hybrid Gabor-Walsh-Hadamard transform (HGWHT) for feature extraction. The extracted feature is selected using a Manifold Grasshopper Optimization algorithm (MGOA). Finally, an Optimized Vision Transformer (OVT) detects heart disease. The experiment is carried out on PTB diagnostic ECG and PTB-XL database, a publicly accessible research datasets. The experiment obtained the following values: accuracy 99.9%, sensitivity 98%, F1 score 99.9%, specificity 90%, processing time 13.254 s, AUC 99.9% and MCC 91% using PTB diagnostic ECG. On the other hand, the proposed method has obtained an accuracy of 99.57%, f1-score of 99.17% and AUC of 99% using PTB-XL dataset. Thus, the overall findings prove that the proposed method outperforms the existing methodology.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"21 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13DOI: 10.1007/s11042-024-20142-w
Baohua Zhang, Chen Hao, Xiaoqi Lv, Yu Gu, Yueming Wang, Xin Liu, Yan Ren, Jianjun Li
To decrease domain shift in cross-domain person re-identification, existing methods generate pseudo labels for training models, however, the inherent distribution between source domain data and the hard quantization loss is ignored. Therefore, a cross-domain person re-identification method based on distribution consistency and multi-label collaborative learning is proposed. Firstly, a soft binary cross-entropy loss function is constructed to constrain the inter-sample relationship of cross-domain transformation, which can ensure the consistency of appearance features and sample distribution, and achieving feature cross-domain alignment. On this basis, in order to suppress the noise of hard pseudo labels, a multi-label collaborative learning network is constructed. The soft pseudo labels are generated by using the collaborative foreground features and global features to guide the network training, making the model adapt to the target domain. The experimental results show that the proposed method has better performance than that of recent representative methods.
{"title":"A cross-domain person re-identification algorithm based on distribution-consistency and multi-label collaborative learning","authors":"Baohua Zhang, Chen Hao, Xiaoqi Lv, Yu Gu, Yueming Wang, Xin Liu, Yan Ren, Jianjun Li","doi":"10.1007/s11042-024-20142-w","DOIUrl":"https://doi.org/10.1007/s11042-024-20142-w","url":null,"abstract":"<p>To decrease domain shift in cross-domain person re-identification, existing methods generate pseudo labels for training models, however, the inherent distribution between source domain data and the hard quantization loss is ignored. Therefore, a cross-domain person re-identification method based on distribution consistency and multi-label collaborative learning is proposed. Firstly, a soft binary cross-entropy loss function is constructed to constrain the inter-sample relationship of cross-domain transformation, which can ensure the consistency of appearance features and sample distribution, and achieving feature cross-domain alignment. On this basis, in order to suppress the noise of hard pseudo labels, a multi-label collaborative learning network is constructed. The soft pseudo labels are generated by using the collaborative foreground features and global features to guide the network training, making the model adapt to the target domain. The experimental results show that the proposed method has better performance than that of recent representative methods.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"29 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13DOI: 10.1007/s11042-024-20103-3
Ensieh AbbasiRad, Mohammad Reza Keyvanpour, Nasim Tohidi
Recommender systems play an essential role in decision-making in the information age by reducing information overload via retrieving the most relevant information in various applications. They also present great opportunities and challenges for business, government, education, and other fields. The cold start problem is a significant issue in these systems. If recommender systems fail to provide satisfactory personalized recommendations for new users, the user’s trust can easily be lost. Hence, in this paper, using co-clustering and utilizing user demographic information and the behavioral history of users, a solution to this critical issue for recommending movies is introduced. In the proposed method, in addition to dealing with the problem of relative cold start, the problem of absolute cold start is also addressed. The proposed method was evaluated via two RMSE and MAE criteria, which accordingly has achieved 0.85 and 0.49 on the Movielens dataset and 1.05 and 0.6 on the EachMovie dataset, respectively, according to the number of comments that Cold Start users have registered. Moreover, it achieved 0.9 and 0.55 on the Movielens dataset and 1.42 and 0.89 on the EachMovie dataset, respectively, according to the number of registered comments for the cold start items.
{"title":"Co-clustering method for cold start issue in collaborative filtering movie recommender system","authors":"Ensieh AbbasiRad, Mohammad Reza Keyvanpour, Nasim Tohidi","doi":"10.1007/s11042-024-20103-3","DOIUrl":"https://doi.org/10.1007/s11042-024-20103-3","url":null,"abstract":"<p>Recommender systems play an essential role in decision-making in the information age by reducing information overload via retrieving the most relevant information in various applications. They also present great opportunities and challenges for business, government, education, and other fields. The cold start problem is a significant issue in these systems. If recommender systems fail to provide satisfactory personalized recommendations for new users, the user’s trust can easily be lost. Hence, in this paper, using co-clustering and utilizing user demographic information and the behavioral history of users, a solution to this critical issue for recommending movies is introduced. In the proposed method, in addition to dealing with the problem of relative cold start, the problem of absolute cold start is also addressed. The proposed method was evaluated via two RMSE and MAE criteria, which accordingly has achieved 0.85 and 0.49 on the Movielens dataset and 1.05 and 0.6 on the EachMovie dataset, respectively, according to the number of comments that Cold Start users have registered. Moreover, it achieved 0.9 and 0.55 on the Movielens dataset and 1.42 and 0.89 on the EachMovie dataset, respectively, according to the number of registered comments for the cold start items.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"13 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13DOI: 10.1007/s11042-024-20229-4
Sushovan Das, Uttam Kr. Mondal
In an acoustic pervasive wireless sensor network (PWSN), the BASE station plays a vital role in gathering and integrating acoustic sensor data from various nodes, including end and router devices tracking time-driven events. The semantic BASE station is crucial in the IoT landscape as it consolidates data from these networks, enabling thorough sentiment analysis of acoustic signals and yielding insights across domains. A semantic processor at the BASE station is essential for an energy-efficient and intelligent PWSN, managing data collection, integration, signal feature extraction, and publication for model training and sentiment analysis. This paper introduces a novel approach to designing a semantic BASE station, focusing on ontology generation, evaluation, and updates to bolster pervasive wireless sensors in capturing and depicting events and time through an ontological framework. The study addresses challenges in efficiently gathering, integrating, and processing acoustic data from pervasive nodes, proposing a semantic processor at the BASE station to enhance feature extraction and metadata publication. The semantic organization of feature-extracted labeled metadata enables the analysis of comprehensive machine learning (ML) applications such as sentiment analysis, type detection, and environment detection by generating confusion matrix. Evaluation includes performance metrics (NEEN, LSNS, BDAS) as well as accuracy, precision, sensitivity, and specificity for sentimental data analysis to validate the proposed technique’s efficacy.
在声学普适无线传感器网络(PWSN)中,基站在收集和整合来自不同节点(包括跟踪时间驱动事件的终端和路由器设备)的声学传感器数据方面发挥着至关重要的作用。语义 BASE 站在物联网领域至关重要,因为它能整合来自这些网络的数据,对声学信号进行全面的情感分析,并提供跨领域的见解。BASE 站的语义处理器对于节能、智能的 PWSN 至关重要,它可以管理数据收集、整合、信号特征提取以及用于模型训练和情感分析的发布。本文介绍了一种设计语义 BASE 站的新方法,重点关注本体的生成、评估和更新,通过本体框架支持无处不在的无线传感器捕捉和描述事件和时间。该研究解决了高效收集、整合和处理来自普适性节点的声学数据所面临的挑战,提出了在 BASE 站使用语义处理器来加强特征提取和元数据发布的方法。通过生成混淆矩阵,对特征提取的标注元数据进行语义组织,可对情感分析、类型检测和环境检测等综合机器学习(ML)应用进行分析。评估包括情感数据分析的性能指标(NEEN、LSNS、BDAS)以及准确度、精确度、灵敏度和特异性,以验证所提技术的功效。
{"title":"Acoustic data acquisition and integration for semantic organization of sentimental data and analysis in a PWSN","authors":"Sushovan Das, Uttam Kr. Mondal","doi":"10.1007/s11042-024-20229-4","DOIUrl":"https://doi.org/10.1007/s11042-024-20229-4","url":null,"abstract":"<p>In an acoustic pervasive wireless sensor network (PWSN), the BASE station plays a vital role in gathering and integrating acoustic sensor data from various nodes, including end and router devices tracking time-driven events. The semantic BASE station is crucial in the IoT landscape as it consolidates data from these networks, enabling thorough sentiment analysis of acoustic signals and yielding insights across domains. A semantic processor at the BASE station is essential for an energy-efficient and intelligent PWSN, managing data collection, integration, signal feature extraction, and publication for model training and sentiment analysis. This paper introduces a novel approach to designing a semantic BASE station, focusing on ontology generation, evaluation, and updates to bolster pervasive wireless sensors in capturing and depicting events and time through an ontological framework. The study addresses challenges in efficiently gathering, integrating, and processing acoustic data from pervasive nodes, proposing a semantic processor at the BASE station to enhance feature extraction and metadata publication. The semantic organization of feature-extracted labeled metadata enables the analysis of comprehensive machine learning (ML) applications such as sentiment analysis, type detection, and environment detection by generating confusion matrix. Evaluation includes performance metrics (NEEN, LSNS, BDAS) as well as accuracy, precision, sensitivity, and specificity for sentimental data analysis to validate the proposed technique’s efficacy.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"15 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13DOI: 10.1007/s11042-024-20125-x
Khalid M. Hosny, Sara T. Kamal
People transmit millions of digital images daily over various networks, where securing these images is a big challenge. Image encryption is a successful approach widely used in securing digital images while transmitting. Researchers developed different encryption techniques that focus on securing individual images. Recently, encryption of multiple images has gained more interest as an emerging encryption approach. In this paper, we proposed a four-tier technique for multiple image encryption (MIE) to increase the transmission speed and improve digital image security. First, we attached the plain images to create an augmented image. Second, the randomized augmented image is obtained by randomly changing the position of each plain image. Third, we scrambled the randomized augmented image using the zigzag pattern, rotation, and random permutation between blocks. Finally, we diffuse the scrambled augmented image using an Altered Sine-logistic-based Tent map (ASLT). We draw a flowchart, write a pseudo-code, and present an illustrative example to simplify the proposed method and make it easy to understand. Many experiments were performed to evaluate this Four-Tier technique, and the results show that this technique is extremely effective and secure to withstand various attacks.
{"title":"A new four-tier technique for efficient multiple images encryption","authors":"Khalid M. Hosny, Sara T. Kamal","doi":"10.1007/s11042-024-20125-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20125-x","url":null,"abstract":"<p>People transmit millions of digital images daily over various networks, where securing these images is a big challenge. Image encryption is a successful approach widely used in securing digital images while transmitting. Researchers developed different encryption techniques that focus on securing individual images. Recently, encryption of multiple images has gained more interest as an emerging encryption approach. In this paper, we proposed a four-tier technique for multiple image encryption (MIE) to increase the transmission speed and improve digital image security. First, we attached the plain images to create an augmented image. Second, the randomized augmented image is obtained by randomly changing the position of each plain image. Third, we scrambled the randomized augmented image using the zigzag pattern, rotation, and random permutation between blocks. Finally, we diffuse the scrambled augmented image using an Altered Sine-logistic-based Tent map (ASLT). We draw a flowchart, write a pseudo-code, and present an illustrative example to simplify the proposed method and make it easy to understand. Many experiments were performed to evaluate this Four-Tier technique, and the results show that this technique is extremely effective and secure to withstand various attacks.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"35 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}