Pub Date : 2024-09-14DOI: 10.1007/s11042-024-20218-7
Yasin Sancar, Ugur Yavuz, Isil Karabey Aksakalli
To evaluate multiple choice question tests, optical forms are commonly used for large-scale exams and these forms are read by the OMR (Optical Mark Recognition) scanners. However, OMR scanners often misinterpret marks that have not been fully erased, which can lead to incorrect readings. To overcome that shortcoming and reduce the time and labor lost in the assessment process, we developed a novel system based on the density of each individual’s markings, providing a more personalized and accurate approach. Instead of reading according to a specific optical form template, a dynamic and flexible structure was generated where users can create own templates and obtain the model that reads according to that template. We also optimized certain aspects of the system for efficiency, such as image memory transfer and QR code reading. These optimizations significantly increase the performance of the OMR scanners. One of the key issues addressed is inaccurate reading of OMR scanners when a student doesn’t fully erase their markings or when markings are faint. After the scanning process, the proposed approach uses a K-means clustering algorithm to classify different density markings. This technique identifies each student’s personal marking density, enabling a more accurate interpretation of their responses. According to the experimental results, we performed 97.7% improvement compared to the misread optics scanned by the conventional OMR devices. In tests performed on 265.816 optical forms, we obtained an accuracy rate of 99.98% and a reading time of 0.12 seconds per optical form.
{"title":"Personal mark density-based high-performance Optical Mark Recognition (OMR) system using K-means clustering algorithm","authors":"Yasin Sancar, Ugur Yavuz, Isil Karabey Aksakalli","doi":"10.1007/s11042-024-20218-7","DOIUrl":"https://doi.org/10.1007/s11042-024-20218-7","url":null,"abstract":"<p>To evaluate multiple choice question tests, optical forms are commonly used for large-scale exams and these forms are read by the OMR (Optical Mark Recognition) scanners. However, OMR scanners often misinterpret marks that have not been fully erased, which can lead to incorrect readings. To overcome that shortcoming and reduce the time and labor lost in the assessment process, we developed a novel system based on the density of each individual’s markings, providing a more personalized and accurate approach. Instead of reading according to a specific optical form template, a dynamic and flexible structure was generated where users can create own templates and obtain the model that reads according to that template. We also optimized certain aspects of the system for efficiency, such as image memory transfer and QR code reading. These optimizations significantly increase the performance of the OMR scanners. One of the key issues addressed is inaccurate reading of OMR scanners when a student doesn’t fully erase their markings or when markings are faint. After the scanning process, the proposed approach uses a K-means clustering algorithm to classify different density markings. This technique identifies each student’s personal marking density, enabling a more accurate interpretation of their responses. According to the experimental results, we performed 97.7% improvement compared to the misread optics scanned by the conventional OMR devices. In tests performed on 265.816 optical forms, we obtained an accuracy rate of 99.98% and a reading time of 0.12 seconds per optical form.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"4 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-14DOI: 10.1007/s11042-024-20153-7
Palwinder Kaur, Amandeep Kaur
COVID-19 emerged as a pandemic in December 2019. This virus targets the pulmonary systems of humans. Therefore, chest radiographic imaging is required to monitor effect of the virus, prevent the spread and decrease the mortality rate. Imaging-based testing leads to a high burden on the radiologist manually screening the images. To make the imaging-based method an efficient diagnosis tool, screening automation with minimum human interference is a necessity. It opens numerous challenges for scientists and researchers to develop automatic diagnostic tools for COVID-19 detection. In this paper, we present two speed-enhanced convolutional neural networks (SECNNs) to automatically detect COVID-19 among the X-rays of COVID-19, pneumonia and healthy subjects. For 2-class classification (2CC) and 3-class classification (3CC), we named the models SECNN-2CC and SECNN-3CC respectively. The scope of this work is to highlight the significance and potential of CNN models built from scratch in COVID-19 identification. We conduct six experiments using six different balanced and imbalanced kinds of datasets. In the datasets, All X-rays are from different patients therefore it was more challenging for us to design the models which extract abstract features from a highly variable dataset. Experimental results show that the proposed models exhibit exemplary performance. The highest accuracy for 2CC (COVID-19 vs Pneumonia) is 99.92%. For 3CC (COVID-19 vs Normal vs Pneumonia), the highest accuracy achieved is 99.51%. We believe that this study will be of great importance in diagnosing COVID-19 and also provide a deeper analysis to discriminate among pneumonia, COVID-19 patients and healthy subjects using X-rays.
COVID-19 于 2019 年 12 月作为大流行病出现。这种病毒的目标是人类的肺部系统。因此,需要胸部放射成像来监测病毒的影响、防止传播并降低死亡率。基于成像的检测给放射科医生手动筛选图像带来了很大负担。要使基于成像的方法成为一种高效的诊断工具,就必须实现筛查自动化,尽量减少人为干扰。这为科学家和研究人员开发 COVID-19 检测的自动诊断工具带来了诸多挑战。在本文中,我们提出了两种速度增强型卷积神经网络(SECNN),用于在 COVID-19、肺炎和健康受试者的 X 光片中自动检测 COVID-19。对于二类分类(2CC)和三类分类(3CC),我们将模型分别命名为 SECNN-2CC 和 SECNN-3CC。这项工作的目的是强调从零开始建立的 CNN 模型在 COVID-19 识别中的意义和潜力。我们使用六种不同的平衡和不平衡数据集进行了六次实验。在这些数据集中,所有的 X 光片都来自不同的患者,因此设计从高度多变的数据集中提取抽象特征的模型对我们来说更具挑战性。实验结果表明,所提出的模型表现出卓越的性能。2CC(COVID-19 与肺炎)的最高准确率为 99.92%。3CC(COVID-19 vs 正常 vs 肺炎)的最高准确率为 99.51%。我们相信,这项研究将对诊断 COVID-19 具有重要意义,同时也为使用 X 射线区分肺炎、COVID-19 患者和健康人提供了更深入的分析。
{"title":"Speed-enhanced convolutional neural networks for COVID-19 classification using X-rays","authors":"Palwinder Kaur, Amandeep Kaur","doi":"10.1007/s11042-024-20153-7","DOIUrl":"https://doi.org/10.1007/s11042-024-20153-7","url":null,"abstract":"<p>COVID-19 emerged as a pandemic in December 2019. This virus targets the pulmonary systems of humans. Therefore, chest radiographic imaging is required to monitor effect of the virus, prevent the spread and decrease the mortality rate. Imaging-based testing leads to a high burden on the radiologist manually screening the images. To make the imaging-based method an efficient diagnosis tool, screening automation with minimum human interference is a necessity. It opens numerous challenges for scientists and researchers to develop automatic diagnostic tools for COVID-19 detection. In this paper, we present two speed-enhanced convolutional neural networks (SECNNs) to automatically detect COVID-19 among the X-rays of COVID-19, pneumonia and healthy subjects. For 2-class classification (2CC) and 3-class classification (3CC), we named the models SECNN-2CC and SECNN-3CC respectively. The scope of this work is to highlight the significance and potential of CNN models built from scratch in COVID-19 identification. We conduct six experiments using six different balanced and imbalanced kinds of datasets. In the datasets, All X-rays are from different patients therefore it was more challenging for us to design the models which extract abstract features from a highly variable dataset. Experimental results show that the proposed models exhibit exemplary performance. The highest accuracy for 2CC (COVID-19 vs Pneumonia) is 99.92%. For 3CC (COVID-19 vs Normal vs Pneumonia), the highest accuracy achieved is 99.51%. We believe that this study will be of great importance in diagnosing COVID-19 and also provide a deeper analysis to discriminate among pneumonia, COVID-19 patients and healthy subjects using X-rays.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"2 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-14DOI: 10.1007/s11042-024-20113-1
Avinash L. Golande, Pavankumar T.
Automated detection of cardiovascular diseases based on heartbeats is a difficult and demanding task in signal processing because the routine analysis of the patient’s cardiac arrhythmia is crucial to reducing the mortality rate. Detecting and preventing these deaths requires long-term monitoring and manual examination of electrocardiogram (ECG) signals, which takes a lot of time. This article uses an optimized Vision Transformer technique to effectively detect heart disease. The four key processes are pre-processing input data, feature extraction from pre-processed data, and optimal feature selection and classification to detect heart disease. In the pre-processing phase, single-channel adaptive blind source separation is used for artifact removal and empirical mode decomposition for noise reduction of the ECG signal. After pre-processing, the ECG signal is fed into the Enhanced Pan-Tompkins algorithm (EPTA) and the Hybrid Gabor-Walsh-Hadamard transform (HGWHT) for feature extraction. The extracted feature is selected using a Manifold Grasshopper Optimization algorithm (MGOA). Finally, an Optimized Vision Transformer (OVT) detects heart disease. The experiment is carried out on PTB diagnostic ECG and PTB-XL database, a publicly accessible research datasets. The experiment obtained the following values: accuracy 99.9%, sensitivity 98%, F1 score 99.9%, specificity 90%, processing time 13.254 s, AUC 99.9% and MCC 91% using PTB diagnostic ECG. On the other hand, the proposed method has obtained an accuracy of 99.57%, f1-score of 99.17% and AUC of 99% using PTB-XL dataset. Thus, the overall findings prove that the proposed method outperforms the existing methodology.
{"title":"Manifold grasshopper optimization based extremely disruptive vision transformer model for automatic heart disease detection in raw ECG signals","authors":"Avinash L. Golande, Pavankumar T.","doi":"10.1007/s11042-024-20113-1","DOIUrl":"https://doi.org/10.1007/s11042-024-20113-1","url":null,"abstract":"<p>Automated detection of cardiovascular diseases based on heartbeats is a difficult and demanding task in signal processing because the routine analysis of the patient’s cardiac arrhythmia is crucial to reducing the mortality rate. Detecting and preventing these deaths requires long-term monitoring and manual examination of electrocardiogram (ECG) signals, which takes a lot of time. This article uses an optimized Vision Transformer technique to effectively detect heart disease. The four key processes are pre-processing input data, feature extraction from pre-processed data, and optimal feature selection and classification to detect heart disease. In the pre-processing phase, single-channel adaptive blind source separation is used for artifact removal and empirical mode decomposition for noise reduction of the ECG signal. After pre-processing, the ECG signal is fed into the Enhanced Pan-Tompkins algorithm (EPTA) and the Hybrid Gabor-Walsh-Hadamard transform (HGWHT) for feature extraction. The extracted feature is selected using a Manifold Grasshopper Optimization algorithm (MGOA). Finally, an Optimized Vision Transformer (OVT) detects heart disease. The experiment is carried out on PTB diagnostic ECG and PTB-XL database, a publicly accessible research datasets. The experiment obtained the following values: accuracy 99.9%, sensitivity 98%, F1 score 99.9%, specificity 90%, processing time 13.254 s, AUC 99.9% and MCC 91% using PTB diagnostic ECG. On the other hand, the proposed method has obtained an accuracy of 99.57%, f1-score of 99.17% and AUC of 99% using PTB-XL dataset. Thus, the overall findings prove that the proposed method outperforms the existing methodology.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"21 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-14DOI: 10.1007/s11042-024-20174-2
Thaís Luiza Donega e Souza, Caetano Mazzoni Ranieri, Anand Panangadan, Jó Ueyama, Marislei Nishijima
A film is considered an experience good, as its quality is only revealed after consumption. This situation creates information asymmetry before consumption, prompting producers, who are aware of their film’s quality, to search for methods to signal this. Economic literature specifies that a signal to disclose a product’s quality must be strong, meaning only producers of good-quality films can effectively utilize such a signal. However, a poster represents the most economical signal, and all producers, regardless of film quality, have access to this option. To study whether a poster can signal film quality, we first apply a low-dimensional representation of poster images and cluster them to identify quality-related patterns. We then perform a supervised classification of films into economically successful and unsuccessful categories using a deep neural network. This is based on the hypothesis that higher quality films tend to sell more tickets and that all producers invest in the highest quality poster services. The results demonstrate that a film’s quality can indeed be predicted from its poster, reinforcing its effectiveness as a strong signal. Despite the proliferation of advanced visual media technologies, a simple yet innovative poster remains an effective and appealing tool for signaling film information. Notably, posters can classify a film’s economic success comparably to trailers but with significantly lower processing costs.
{"title":"Is a poster a strong signal of film quality? evaluating the predictive power of visual elements using deep learning","authors":"Thaís Luiza Donega e Souza, Caetano Mazzoni Ranieri, Anand Panangadan, Jó Ueyama, Marislei Nishijima","doi":"10.1007/s11042-024-20174-2","DOIUrl":"https://doi.org/10.1007/s11042-024-20174-2","url":null,"abstract":"<p>A film is considered an experience good, as its quality is only revealed after consumption. This situation creates information asymmetry before consumption, prompting producers, who are aware of their film’s quality, to search for methods to signal this. Economic literature specifies that a signal to disclose a product’s quality must be strong, meaning only producers of good-quality films can effectively utilize such a signal. However, a poster represents the most economical signal, and all producers, regardless of film quality, have access to this option. To study whether a poster can signal film quality, we first apply a low-dimensional representation of poster images and cluster them to identify quality-related patterns. We then perform a supervised classification of films into economically successful and unsuccessful categories using a deep neural network. This is based on the hypothesis that higher quality films tend to sell more tickets and that all producers invest in the highest quality poster services. The results demonstrate that a film’s quality can indeed be predicted from its poster, reinforcing its effectiveness as a strong signal. Despite the proliferation of advanced visual media technologies, a simple yet innovative poster remains an effective and appealing tool for signaling film information. Notably, posters can classify a film’s economic success comparably to trailers but with significantly lower processing costs.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"119 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13DOI: 10.1007/s11042-024-20142-w
Baohua Zhang, Chen Hao, Xiaoqi Lv, Yu Gu, Yueming Wang, Xin Liu, Yan Ren, Jianjun Li
To decrease domain shift in cross-domain person re-identification, existing methods generate pseudo labels for training models, however, the inherent distribution between source domain data and the hard quantization loss is ignored. Therefore, a cross-domain person re-identification method based on distribution consistency and multi-label collaborative learning is proposed. Firstly, a soft binary cross-entropy loss function is constructed to constrain the inter-sample relationship of cross-domain transformation, which can ensure the consistency of appearance features and sample distribution, and achieving feature cross-domain alignment. On this basis, in order to suppress the noise of hard pseudo labels, a multi-label collaborative learning network is constructed. The soft pseudo labels are generated by using the collaborative foreground features and global features to guide the network training, making the model adapt to the target domain. The experimental results show that the proposed method has better performance than that of recent representative methods.
{"title":"A cross-domain person re-identification algorithm based on distribution-consistency and multi-label collaborative learning","authors":"Baohua Zhang, Chen Hao, Xiaoqi Lv, Yu Gu, Yueming Wang, Xin Liu, Yan Ren, Jianjun Li","doi":"10.1007/s11042-024-20142-w","DOIUrl":"https://doi.org/10.1007/s11042-024-20142-w","url":null,"abstract":"<p>To decrease domain shift in cross-domain person re-identification, existing methods generate pseudo labels for training models, however, the inherent distribution between source domain data and the hard quantization loss is ignored. Therefore, a cross-domain person re-identification method based on distribution consistency and multi-label collaborative learning is proposed. Firstly, a soft binary cross-entropy loss function is constructed to constrain the inter-sample relationship of cross-domain transformation, which can ensure the consistency of appearance features and sample distribution, and achieving feature cross-domain alignment. On this basis, in order to suppress the noise of hard pseudo labels, a multi-label collaborative learning network is constructed. The soft pseudo labels are generated by using the collaborative foreground features and global features to guide the network training, making the model adapt to the target domain. The experimental results show that the proposed method has better performance than that of recent representative methods.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"29 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13DOI: 10.1007/s11042-024-20103-3
Ensieh AbbasiRad, Mohammad Reza Keyvanpour, Nasim Tohidi
Recommender systems play an essential role in decision-making in the information age by reducing information overload via retrieving the most relevant information in various applications. They also present great opportunities and challenges for business, government, education, and other fields. The cold start problem is a significant issue in these systems. If recommender systems fail to provide satisfactory personalized recommendations for new users, the user’s trust can easily be lost. Hence, in this paper, using co-clustering and utilizing user demographic information and the behavioral history of users, a solution to this critical issue for recommending movies is introduced. In the proposed method, in addition to dealing with the problem of relative cold start, the problem of absolute cold start is also addressed. The proposed method was evaluated via two RMSE and MAE criteria, which accordingly has achieved 0.85 and 0.49 on the Movielens dataset and 1.05 and 0.6 on the EachMovie dataset, respectively, according to the number of comments that Cold Start users have registered. Moreover, it achieved 0.9 and 0.55 on the Movielens dataset and 1.42 and 0.89 on the EachMovie dataset, respectively, according to the number of registered comments for the cold start items.
{"title":"Co-clustering method for cold start issue in collaborative filtering movie recommender system","authors":"Ensieh AbbasiRad, Mohammad Reza Keyvanpour, Nasim Tohidi","doi":"10.1007/s11042-024-20103-3","DOIUrl":"https://doi.org/10.1007/s11042-024-20103-3","url":null,"abstract":"<p>Recommender systems play an essential role in decision-making in the information age by reducing information overload via retrieving the most relevant information in various applications. They also present great opportunities and challenges for business, government, education, and other fields. The cold start problem is a significant issue in these systems. If recommender systems fail to provide satisfactory personalized recommendations for new users, the user’s trust can easily be lost. Hence, in this paper, using co-clustering and utilizing user demographic information and the behavioral history of users, a solution to this critical issue for recommending movies is introduced. In the proposed method, in addition to dealing with the problem of relative cold start, the problem of absolute cold start is also addressed. The proposed method was evaluated via two RMSE and MAE criteria, which accordingly has achieved 0.85 and 0.49 on the Movielens dataset and 1.05 and 0.6 on the EachMovie dataset, respectively, according to the number of comments that Cold Start users have registered. Moreover, it achieved 0.9 and 0.55 on the Movielens dataset and 1.42 and 0.89 on the EachMovie dataset, respectively, according to the number of registered comments for the cold start items.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"13 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13DOI: 10.1007/s11042-024-20229-4
Sushovan Das, Uttam Kr. Mondal
In an acoustic pervasive wireless sensor network (PWSN), the BASE station plays a vital role in gathering and integrating acoustic sensor data from various nodes, including end and router devices tracking time-driven events. The semantic BASE station is crucial in the IoT landscape as it consolidates data from these networks, enabling thorough sentiment analysis of acoustic signals and yielding insights across domains. A semantic processor at the BASE station is essential for an energy-efficient and intelligent PWSN, managing data collection, integration, signal feature extraction, and publication for model training and sentiment analysis. This paper introduces a novel approach to designing a semantic BASE station, focusing on ontology generation, evaluation, and updates to bolster pervasive wireless sensors in capturing and depicting events and time through an ontological framework. The study addresses challenges in efficiently gathering, integrating, and processing acoustic data from pervasive nodes, proposing a semantic processor at the BASE station to enhance feature extraction and metadata publication. The semantic organization of feature-extracted labeled metadata enables the analysis of comprehensive machine learning (ML) applications such as sentiment analysis, type detection, and environment detection by generating confusion matrix. Evaluation includes performance metrics (NEEN, LSNS, BDAS) as well as accuracy, precision, sensitivity, and specificity for sentimental data analysis to validate the proposed technique’s efficacy.
在声学普适无线传感器网络(PWSN)中,基站在收集和整合来自不同节点(包括跟踪时间驱动事件的终端和路由器设备)的声学传感器数据方面发挥着至关重要的作用。语义 BASE 站在物联网领域至关重要,因为它能整合来自这些网络的数据,对声学信号进行全面的情感分析,并提供跨领域的见解。BASE 站的语义处理器对于节能、智能的 PWSN 至关重要,它可以管理数据收集、整合、信号特征提取以及用于模型训练和情感分析的发布。本文介绍了一种设计语义 BASE 站的新方法,重点关注本体的生成、评估和更新,通过本体框架支持无处不在的无线传感器捕捉和描述事件和时间。该研究解决了高效收集、整合和处理来自普适性节点的声学数据所面临的挑战,提出了在 BASE 站使用语义处理器来加强特征提取和元数据发布的方法。通过生成混淆矩阵,对特征提取的标注元数据进行语义组织,可对情感分析、类型检测和环境检测等综合机器学习(ML)应用进行分析。评估包括情感数据分析的性能指标(NEEN、LSNS、BDAS)以及准确度、精确度、灵敏度和特异性,以验证所提技术的功效。
{"title":"Acoustic data acquisition and integration for semantic organization of sentimental data and analysis in a PWSN","authors":"Sushovan Das, Uttam Kr. Mondal","doi":"10.1007/s11042-024-20229-4","DOIUrl":"https://doi.org/10.1007/s11042-024-20229-4","url":null,"abstract":"<p>In an acoustic pervasive wireless sensor network (PWSN), the BASE station plays a vital role in gathering and integrating acoustic sensor data from various nodes, including end and router devices tracking time-driven events. The semantic BASE station is crucial in the IoT landscape as it consolidates data from these networks, enabling thorough sentiment analysis of acoustic signals and yielding insights across domains. A semantic processor at the BASE station is essential for an energy-efficient and intelligent PWSN, managing data collection, integration, signal feature extraction, and publication for model training and sentiment analysis. This paper introduces a novel approach to designing a semantic BASE station, focusing on ontology generation, evaluation, and updates to bolster pervasive wireless sensors in capturing and depicting events and time through an ontological framework. The study addresses challenges in efficiently gathering, integrating, and processing acoustic data from pervasive nodes, proposing a semantic processor at the BASE station to enhance feature extraction and metadata publication. The semantic organization of feature-extracted labeled metadata enables the analysis of comprehensive machine learning (ML) applications such as sentiment analysis, type detection, and environment detection by generating confusion matrix. Evaluation includes performance metrics (NEEN, LSNS, BDAS) as well as accuracy, precision, sensitivity, and specificity for sentimental data analysis to validate the proposed technique’s efficacy.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"15 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13DOI: 10.1007/s11042-024-20125-x
Khalid M. Hosny, Sara T. Kamal
People transmit millions of digital images daily over various networks, where securing these images is a big challenge. Image encryption is a successful approach widely used in securing digital images while transmitting. Researchers developed different encryption techniques that focus on securing individual images. Recently, encryption of multiple images has gained more interest as an emerging encryption approach. In this paper, we proposed a four-tier technique for multiple image encryption (MIE) to increase the transmission speed and improve digital image security. First, we attached the plain images to create an augmented image. Second, the randomized augmented image is obtained by randomly changing the position of each plain image. Third, we scrambled the randomized augmented image using the zigzag pattern, rotation, and random permutation between blocks. Finally, we diffuse the scrambled augmented image using an Altered Sine-logistic-based Tent map (ASLT). We draw a flowchart, write a pseudo-code, and present an illustrative example to simplify the proposed method and make it easy to understand. Many experiments were performed to evaluate this Four-Tier technique, and the results show that this technique is extremely effective and secure to withstand various attacks.
{"title":"A new four-tier technique for efficient multiple images encryption","authors":"Khalid M. Hosny, Sara T. Kamal","doi":"10.1007/s11042-024-20125-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20125-x","url":null,"abstract":"<p>People transmit millions of digital images daily over various networks, where securing these images is a big challenge. Image encryption is a successful approach widely used in securing digital images while transmitting. Researchers developed different encryption techniques that focus on securing individual images. Recently, encryption of multiple images has gained more interest as an emerging encryption approach. In this paper, we proposed a four-tier technique for multiple image encryption (MIE) to increase the transmission speed and improve digital image security. First, we attached the plain images to create an augmented image. Second, the randomized augmented image is obtained by randomly changing the position of each plain image. Third, we scrambled the randomized augmented image using the zigzag pattern, rotation, and random permutation between blocks. Finally, we diffuse the scrambled augmented image using an Altered Sine-logistic-based Tent map (ASLT). We draw a flowchart, write a pseudo-code, and present an illustrative example to simplify the proposed method and make it easy to understand. Many experiments were performed to evaluate this Four-Tier technique, and the results show that this technique is extremely effective and secure to withstand various attacks.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"35 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13DOI: 10.1007/s11042-024-20176-0
Jun Lang, Fan Zhang
In this paper, we propose an optical image encryption scheme based on modified 3D double-phase encoding algorithm (3D-DPEA) in the gyrator transform (GT) domain, in which a plaintext is encrypted into two sparse volumetric ciphertexts under the constraints of chaos-generated binary amplitude masks (BAMs). Then, the two volumetric ciphertexts are multiplexed into the corresponding 2D ciphertexts for convenient storage and transmission. First, due to the synergistic adjustment of the two sparse volumetric ciphertexts during the iterative process, the 3D-DPEA would achieve higher recovery quality of the decrypted image with fewer iterations. In addition, because the BAMs are generated by the logistic-tent (LT) chaotic map which is closely related to the rotation angles of GT, and the LT chaotic map has several advantages such as nonlinear, pseudorandom behavior, and high sensitivity of initial conditions, the sensitivity of the secret key could be significantly improved by several orders of magnitude, reaching up to 10−14. As a result, the 3D-DPEA scheme not only eliminates the explicit/linear relationship between the plaintext and the ciphertext but also substantially enhances security. For decryption, the corresponding decrypted image can be achieved by recording an intensity pattern when a coherent beam crosses two sparse volumetric ciphertexts sequentially. Furthermore, BAMs wouldn’t impose an additional burden on the storage and transmission of secret keys. A series of numerical simulations are performed to verify the effectiveness and security of the proposed encryption scheme.
{"title":"Optical image encryption based on 3D double-phase encoding algorithm in the gyrator transform domain","authors":"Jun Lang, Fan Zhang","doi":"10.1007/s11042-024-20176-0","DOIUrl":"https://doi.org/10.1007/s11042-024-20176-0","url":null,"abstract":"<p>In this paper, we propose an optical image encryption scheme based on modified 3D double-phase encoding algorithm (3D-DPEA) in the gyrator transform (GT) domain, in which a plaintext is encrypted into two sparse volumetric ciphertexts under the constraints of chaos-generated binary amplitude masks (BAMs). Then, the two volumetric ciphertexts are multiplexed into the corresponding 2D ciphertexts for convenient storage and transmission. First, due to the synergistic adjustment of the two sparse volumetric ciphertexts during the iterative process, the 3D-DPEA would achieve higher recovery quality of the decrypted image with fewer iterations. In addition, because the BAMs are generated by the logistic-tent (LT) chaotic map which is closely related to the rotation angles of GT, and the LT chaotic map has several advantages such as nonlinear, pseudorandom behavior, and high sensitivity of initial conditions, the sensitivity of the secret key could be significantly improved by several orders of magnitude, reaching up to 10<sup>−14</sup>. As a result, the 3D-DPEA scheme not only eliminates the explicit/linear relationship between the plaintext and the ciphertext but also substantially enhances security. For decryption, the corresponding decrypted image can be achieved by recording an intensity pattern when a coherent beam crosses two sparse volumetric ciphertexts sequentially. Furthermore, BAMs wouldn’t impose an additional burden on the storage and transmission of secret keys. A series of numerical simulations are performed to verify the effectiveness and security of the proposed encryption scheme.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"121 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The main purpose of indoor localization is to precisely locate users and help them navigate within an indoor area, like a building or campus, where GPS and other satellite technologies lack precision. Our methodology for achieving indoor localization has been to implement classifiers that use Received Signal Strength Indicator (RSSI) values of WiFi signals collected from smart hand-held devices. However, these RSSI values keep varying, often appreciably, from time to time and device to device. So, to instill more generalizability into the location prediction process, ensemble models have been built that can learn from the pros and cons of all of their member classifiers. In this paper, we have presented several neural network based ensemble models to compensate for the lack of detailed studies with ensemble models (especially neural network based ones) on indoor localization. Our second contribution lies in designing a knowledge distillation framework for the ensemble models that preserves the classification performance while make the system real-time responsive as the lightweight distilled model could be executed locally on the edge devices. Our proposed knowledge distillation framework distils the knowledge of a large neural network based ensemble classifier into a much smaller compressed classification model while maintaining the performance. We have implemented and shown the workings of the proposed knowledge distillation framework on three publicly available benchmark datasets. The proposed model have been found to achieve 83.95%, 93.10% and 96.48% accuracy for DataSet1, DataSet2 and DataSet3, respectively.
{"title":"Design of a knowledge distillation network for wifi-based indoor localization","authors":"Ritabroto Ganguly, Manjarini Mallik, Chandreyee Chowdhury","doi":"10.1007/s11042-024-20212-z","DOIUrl":"https://doi.org/10.1007/s11042-024-20212-z","url":null,"abstract":"<p>The main purpose of indoor localization is to precisely locate users and help them navigate within an indoor area, like a building or campus, where GPS and other satellite technologies lack precision. Our methodology for achieving indoor localization has been to implement classifiers that use Received Signal Strength Indicator (RSSI) values of WiFi signals collected from smart hand-held devices. However, these RSSI values keep varying, often appreciably, from time to time and device to device. So, to instill more generalizability into the location prediction process, ensemble models have been built that can learn from the pros and cons of all of their member classifiers. In this paper, we have presented several neural network based ensemble models to compensate for the lack of detailed studies with ensemble models (especially neural network based ones) on indoor localization. Our second contribution lies in designing a knowledge distillation framework for the ensemble models that preserves the classification performance while make the system real-time responsive as the lightweight distilled model could be executed locally on the edge devices. Our proposed knowledge distillation framework distils the knowledge of a large neural network based ensemble classifier into a much smaller compressed classification model while maintaining the performance. We have implemented and shown the workings of the proposed knowledge distillation framework on three publicly available benchmark datasets. The proposed model have been found to achieve 83.95%, 93.10% and 96.48% accuracy for DataSet1, DataSet2 and DataSet3, respectively.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"29 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}