Multimedia Tools and Applications最新文献_第4页

Speed-enhanced convolutional neural networks for COVID-19 classification using X-rays 利用 X 射线对 COVID-19 进行分类的速度增强型卷积神经网络

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-09-14 DOI: 10.1007/s11042-024-20153-7

Palwinder Kaur, Amandeep Kaur

COVID-19 emerged as a pandemic in December 2019. This virus targets the pulmonary systems of humans. Therefore, chest radiographic imaging is required to monitor effect of the virus, prevent the spread and decrease the mortality rate. Imaging-based testing leads to a high burden on the radiologist manually screening the images. To make the imaging-based method an efficient diagnosis tool, screening automation with minimum human interference is a necessity. It opens numerous challenges for scientists and researchers to develop automatic diagnostic tools for COVID-19 detection. In this paper, we present two speed-enhanced convolutional neural networks (SECNNs) to automatically detect COVID-19 among the X-rays of COVID-19, pneumonia and healthy subjects. For 2-class classification (2CC) and 3-class classification (3CC), we named the models SECNN-2CC and SECNN-3CC respectively. The scope of this work is to highlight the significance and potential of CNN models built from scratch in COVID-19 identification. We conduct six experiments using six different balanced and imbalanced kinds of datasets. In the datasets, All X-rays are from different patients therefore it was more challenging for us to design the models which extract abstract features from a highly variable dataset. Experimental results show that the proposed models exhibit exemplary performance. The highest accuracy for 2CC (COVID-19 vs Pneumonia) is 99.92%. For 3CC (COVID-19 vs Normal vs Pneumonia), the highest accuracy achieved is 99.51%. We believe that this study will be of great importance in diagnosing COVID-19 and also provide a deeper analysis to discriminate among pneumonia, COVID-19 patients and healthy subjects using X-rays.

COVID-19 于 2019 年 12 月作为大流行病出现。这种病毒的目标是人类的肺部系统。因此，需要胸部放射成像来监测病毒的影响、防止传播并降低死亡率。基于成像的检测给放射科医生手动筛选图像带来了很大负担。要使基于成像的方法成为一种高效的诊断工具，就必须实现筛查自动化，尽量减少人为干扰。这为科学家和研究人员开发 COVID-19 检测的自动诊断工具带来了诸多挑战。在本文中，我们提出了两种速度增强型卷积神经网络（SECNN），用于在 COVID-19、肺炎和健康受试者的 X 光片中自动检测 COVID-19。对于二类分类（2CC）和三类分类（3CC），我们将模型分别命名为 SECNN-2CC 和 SECNN-3CC。这项工作的目的是强调从零开始建立的 CNN 模型在 COVID-19 识别中的意义和潜力。我们使用六种不同的平衡和不平衡数据集进行了六次实验。在这些数据集中，所有的 X 光片都来自不同的患者，因此设计从高度多变的数据集中提取抽象特征的模型对我们来说更具挑战性。实验结果表明，所提出的模型表现出卓越的性能。2CC（COVID-19 与肺炎）的最高准确率为 99.92%。3CC（COVID-19 vs 正常 vs 肺炎）的最高准确率为 99.51%。我们相信，这项研究将对诊断 COVID-19 具有重要意义，同时也为使用 X 射线区分肺炎、COVID-19 患者和健康人提供了更深入的分析。

{"title":"Speed-enhanced convolutional neural networks for COVID-19 classification using X-rays","authors":"Palwinder Kaur, Amandeep Kaur","doi":"10.1007/s11042-024-20153-7","DOIUrl":"https://doi.org/10.1007/s11042-024-20153-7","url":null,"abstract":"COVID-19 emerged as a pandemic in December 2019. This virus targets the pulmonary systems of humans. Therefore, chest radiographic imaging is required to monitor effect of the virus, prevent the spread and decrease the mortality rate. Imaging-based testing leads to a high burden on the radiologist manually screening the images. To make the imaging-based method an efficient diagnosis tool, screening automation with minimum human interference is a necessity. It opens numerous challenges for scientists and researchers to develop automatic diagnostic tools for COVID-19 detection. In this paper, we present two speed-enhanced convolutional neural networks (SECNNs) to automatically detect COVID-19 among the X-rays of COVID-19, pneumonia and healthy subjects. For 2-class classification (2CC) and 3-class classification (3CC), we named the models SECNN-2CC and SECNN-3CC respectively. The scope of this work is to highlight the significance and potential of CNN models built from scratch in COVID-19 identification. We conduct six experiments using six different balanced and imbalanced kinds of datasets. In the datasets, All X-rays are from different patients therefore it was more challenging for us to design the models which extract abstract features from a highly variable dataset. Experimental results show that the proposed models exhibit exemplary performance. The highest accuracy for 2CC (COVID-19 vs Pneumonia) is 99.92%. For 3CC (COVID-19 vs Normal vs Pneumonia), the highest accuracy achieved is 99.51%. We believe that this study will be of great importance in diagnosing COVID-19 and also provide a deeper analysis to discriminate among pneumonia, COVID-19 patients and healthy subjects using X-rays.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"2 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Personal mark density-based high-performance Optical Mark Recognition (OMR) system using K-means clustering algorithm 使用 K-means 聚类算法的基于个人标记密度的高性能光学标记识别（OMR）系统

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-09-14 DOI: 10.1007/s11042-024-20218-7

Yasin Sancar, Ugur Yavuz, Isil Karabey Aksakalli

To evaluate multiple choice question tests, optical forms are commonly used for large-scale exams and these forms are read by the OMR (Optical Mark Recognition) scanners. However, OMR scanners often misinterpret marks that have not been fully erased, which can lead to incorrect readings. To overcome that shortcoming and reduce the time and labor lost in the assessment process, we developed a novel system based on the density of each individual’s markings, providing a more personalized and accurate approach. Instead of reading according to a specific optical form template, a dynamic and flexible structure was generated where users can create own templates and obtain the model that reads according to that template. We also optimized certain aspects of the system for efficiency, such as image memory transfer and QR code reading. These optimizations significantly increase the performance of the OMR scanners. One of the key issues addressed is inaccurate reading of OMR scanners when a student doesn’t fully erase their markings or when markings are faint. After the scanning process, the proposed approach uses a K-means clustering algorithm to classify different density markings. This technique identifies each student’s personal marking density, enabling a more accurate interpretation of their responses. According to the experimental results, we performed 97.7% improvement compared to the misread optics scanned by the conventional OMR devices. In tests performed on 265.816 optical forms, we obtained an accuracy rate of 99.98% and a reading time of 0.12 seconds per optical form.

为了评估多选题考试，大型考试通常使用光学表格，这些表格由 OMR（光学标记识别）扫描仪读取。然而，OMR 扫描仪经常会误读未完全擦除的标记，从而导致读数错误。为了克服这一缺陷，减少评估过程中的时间和人力损耗，我们开发了一种基于每个人标记密度的新型系统，提供了一种更加个性化和准确的方法。我们不再根据特定的光学表格模板进行读取，而是生成了一个动态灵活的结构，用户可以创建自己的模板，并获得根据该模板读取的模型。我们还优化了系统的某些方面以提高效率，如图像内存传输和二维码读取。这些优化大大提高了 OMR 扫描仪的性能。解决的关键问题之一是，当学生没有完全擦除标记或标记模糊时，OMR 扫描仪的读取不准确。扫描过程结束后，建议的方法使用 K-means 聚类算法对不同密度的标记进行分类。这种技术能识别每个学生的个人标记密度，从而更准确地解读他们的回答。实验结果表明，与传统 OMR 设备扫描的误读光学图像相比，我们的性能提高了 97.7%。在对 265.816 张光学表格进行的测试中，我们获得了 99.98% 的准确率，每张光学表格的读取时间仅为 0.12 秒。

{"title":"Personal mark density-based high-performance Optical Mark Recognition (OMR) system using K-means clustering algorithm","authors":"Yasin Sancar, Ugur Yavuz, Isil Karabey Aksakalli","doi":"10.1007/s11042-024-20218-7","DOIUrl":"https://doi.org/10.1007/s11042-024-20218-7","url":null,"abstract":"To evaluate multiple choice question tests, optical forms are commonly used for large-scale exams and these forms are read by the OMR (Optical Mark Recognition) scanners. However, OMR scanners often misinterpret marks that have not been fully erased, which can lead to incorrect readings. To overcome that shortcoming and reduce the time and labor lost in the assessment process, we developed a novel system based on the density of each individual’s markings, providing a more personalized and accurate approach. Instead of reading according to a specific optical form template, a dynamic and flexible structure was generated where users can create own templates and obtain the model that reads according to that template. We also optimized certain aspects of the system for efficiency, such as image memory transfer and QR code reading. These optimizations significantly increase the performance of the OMR scanners. One of the key issues addressed is inaccurate reading of OMR scanners when a student doesn’t fully erase their markings or when markings are faint. After the scanning process, the proposed approach uses a K-means clustering algorithm to classify different density markings. This technique identifies each student’s personal marking density, enabling a more accurate interpretation of their responses. According to the experimental results, we performed 97.7% improvement compared to the misread optics scanned by the conventional OMR devices. In tests performed on 265.816 optical forms, we obtained an accuracy rate of 99.98% and a reading time of 0.12 seconds per optical form.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"4 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Manifold grasshopper optimization based extremely disruptive vision transformer model for automatic heart disease detection in raw ECG signals 基于极具破坏性的视觉变换器模型的歧面蚂蚱优化技术，用于在原始心电信号中自动检测心脏病

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-09-14 DOI: 10.1007/s11042-024-20113-1

Avinash L. Golande, Pavankumar T.

Automated detection of cardiovascular diseases based on heartbeats is a difficult and demanding task in signal processing because the routine analysis of the patient’s cardiac arrhythmia is crucial to reducing the mortality rate. Detecting and preventing these deaths requires long-term monitoring and manual examination of electrocardiogram (ECG) signals, which takes a lot of time. This article uses an optimized Vision Transformer technique to effectively detect heart disease. The four key processes are pre-processing input data, feature extraction from pre-processed data, and optimal feature selection and classification to detect heart disease. In the pre-processing phase, single-channel adaptive blind source separation is used for artifact removal and empirical mode decomposition for noise reduction of the ECG signal. After pre-processing, the ECG signal is fed into the Enhanced Pan-Tompkins algorithm (EPTA) and the Hybrid Gabor-Walsh-Hadamard transform (HGWHT) for feature extraction. The extracted feature is selected using a Manifold Grasshopper Optimization algorithm (MGOA). Finally, an Optimized Vision Transformer (OVT) detects heart disease. The experiment is carried out on PTB diagnostic ECG and PTB-XL database, a publicly accessible research datasets. The experiment obtained the following values: accuracy 99.9%, sensitivity 98%, F1 score 99.9%, specificity 90%, processing time 13.254 s, AUC 99.9% and MCC 91% using PTB diagnostic ECG. On the other hand, the proposed method has obtained an accuracy of 99.57%, f1-score of 99.17% and AUC of 99% using PTB-XL dataset. Thus, the overall findings prove that the proposed method outperforms the existing methodology.

根据心跳自动检测心血管疾病是信号处理中一项困难而艰巨的任务，因为对患者心律失常的常规分析对于降低死亡率至关重要。检测和预防这些死亡需要对心电图（ECG）信号进行长期监测和人工检查，这需要花费大量时间。本文采用优化的 Vision Transformer 技术来有效检测心脏病。四个关键过程分别是预处理输入数据、从预处理数据中提取特征、优化特征选择和分类，以检测心脏病。在预处理阶段，使用单通道自适应盲源分离去除伪影，并使用经验模式分解对心电图信号进行降噪。预处理后，心电信号被送入增强泛汤金斯算法（EPTA）和混合 Gabor-Walsh-Hadamard 变换（HGWHT）进行特征提取。提取出的特征使用 "蚱蜢优化算法"（MGOA）进行选择。最后，使用优化视觉变换器（OVT）检测心脏病。实验在 PTB 诊断心电图和 PTB-XL 数据库（可公开访问的研究数据集）上进行。实验结果如下：使用 PTB 诊断心电图的准确率为 99.9%，灵敏度为 98%，F1 分数为 99.9%，特异性为 90%，处理时间为 13.254 秒，AUC 为 99.9%，MCC 为 91%。另一方面，建议的方法在使用 PTB-XL 数据集时获得了 99.57% 的准确率、99.17% 的 F1 分数和 99% 的 AUC。因此，总体结果证明，建议的方法优于现有方法。

{"title":"Manifold grasshopper optimization based extremely disruptive vision transformer model for automatic heart disease detection in raw ECG signals","authors":"Avinash L. Golande, Pavankumar T.","doi":"10.1007/s11042-024-20113-1","DOIUrl":"https://doi.org/10.1007/s11042-024-20113-1","url":null,"abstract":"Automated detection of cardiovascular diseases based on heartbeats is a difficult and demanding task in signal processing because the routine analysis of the patient’s cardiac arrhythmia is crucial to reducing the mortality rate. Detecting and preventing these deaths requires long-term monitoring and manual examination of electrocardiogram (ECG) signals, which takes a lot of time. This article uses an optimized Vision Transformer technique to effectively detect heart disease. The four key processes are pre-processing input data, feature extraction from pre-processed data, and optimal feature selection and classification to detect heart disease. In the pre-processing phase, single-channel adaptive blind source separation is used for artifact removal and empirical mode decomposition for noise reduction of the ECG signal. After pre-processing, the ECG signal is fed into the Enhanced Pan-Tompkins algorithm (EPTA) and the Hybrid Gabor-Walsh-Hadamard transform (HGWHT) for feature extraction. The extracted feature is selected using a Manifold Grasshopper Optimization algorithm (MGOA). Finally, an Optimized Vision Transformer (OVT) detects heart disease. The experiment is carried out on PTB diagnostic ECG and PTB-XL database, a publicly accessible research datasets. The experiment obtained the following values: accuracy 99.9%, sensitivity 98%, F1 score 99.9%, specificity 90%, processing time 13.254 s, AUC 99.9% and MCC 91% using PTB diagnostic ECG. On the other hand, the proposed method has obtained an accuracy of 99.57%, f1-score of 99.17% and AUC of 99% using PTB-XL dataset. Thus, the overall findings prove that the proposed method outperforms the existing methodology.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"21 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Is a poster a strong signal of film quality? evaluating the predictive power of visual elements using deep learning 利用深度学习评估视觉元素的预测能力？

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-09-14 DOI: 10.1007/s11042-024-20174-2

Thaís Luiza Donega e Souza, Caetano Mazzoni Ranieri, Anand Panangadan, Jó Ueyama, Marislei Nishijima

A film is considered an experience good, as its quality is only revealed after consumption. This situation creates information asymmetry before consumption, prompting producers, who are aware of their film’s quality, to search for methods to signal this. Economic literature specifies that a signal to disclose a product’s quality must be strong, meaning only producers of good-quality films can effectively utilize such a signal. However, a poster represents the most economical signal, and all producers, regardless of film quality, have access to this option. To study whether a poster can signal film quality, we first apply a low-dimensional representation of poster images and cluster them to identify quality-related patterns. We then perform a supervised classification of films into economically successful and unsuccessful categories using a deep neural network. This is based on the hypothesis that higher quality films tend to sell more tickets and that all producers invest in the highest quality poster services. The results demonstrate that a film’s quality can indeed be predicted from its poster, reinforcing its effectiveness as a strong signal. Despite the proliferation of advanced visual media technologies, a simple yet innovative poster remains an effective and appealing tool for signaling film information. Notably, posters can classify a film’s economic success comparably to trailers but with significantly lower processing costs.

电影被认为是一种体验商品，因为它的质量只有在消费之后才能显现出来。这种情况造成了消费前的信息不对称，促使意识到自己电影质量的制片人寻找发出信号的方法。经济学文献规定，披露产品质量的信号必须强烈，这意味着只有质量好的影片生产商才能有效利用这种信号。然而，海报是最经济的信号，所有生产商，无论影片质量如何，都可以选择海报。为了研究海报是否可以作为电影质量的信号，我们首先对海报图像进行了低维表示，并对其进行聚类，以识别与质量相关的模式。然后，我们使用深度神经网络对电影进行监督分类，将其分为经济上成功的类别和不成功的类别。这是基于这样一个假设：质量较高的电影往往能卖出更多的票，而且所有制片人都会投资于最高质量的海报服务。结果表明，通过海报确实可以预测一部电影的质量，从而加强了海报作为一种强烈信号的有效性。尽管先进的视觉媒体技术层出不穷，但简单而新颖的海报仍然是传递电影信息的有效而有吸引力的工具。值得注意的是，海报可以对电影的经济成就进行分类，其效果可与预告片媲美，但处理成本却大大降低。

{"title":"Is a poster a strong signal of film quality? evaluating the predictive power of visual elements using deep learning","authors":"Thaís Luiza Donega e Souza, Caetano Mazzoni Ranieri, Anand Panangadan, Jó Ueyama, Marislei Nishijima","doi":"10.1007/s11042-024-20174-2","DOIUrl":"https://doi.org/10.1007/s11042-024-20174-2","url":null,"abstract":"A film is considered an experience good, as its quality is only revealed after consumption. This situation creates information asymmetry before consumption, prompting producers, who are aware of their film’s quality, to search for methods to signal this. Economic literature specifies that a signal to disclose a product’s quality must be strong, meaning only producers of good-quality films can effectively utilize such a signal. However, a poster represents the most economical signal, and all producers, regardless of film quality, have access to this option. To study whether a poster can signal film quality, we first apply a low-dimensional representation of poster images and cluster them to identify quality-related patterns. We then perform a supervised classification of films into economically successful and unsuccessful categories using a deep neural network. This is based on the hypothesis that higher quality films tend to sell more tickets and that all producers invest in the highest quality poster services. The results demonstrate that a film’s quality can indeed be predicted from its poster, reinforcing its effectiveness as a strong signal. Despite the proliferation of advanced visual media technologies, a simple yet innovative poster remains an effective and appealing tool for signaling film information. Notably, posters can classify a film’s economic success comparably to trailers but with significantly lower processing costs.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"119 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A cross-domain person re-identification algorithm based on distribution-consistency and multi-label collaborative learning 基于分布一致性和多标签协作学习的跨域人员再识别算法

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-09-13 DOI: 10.1007/s11042-024-20142-w

Baohua Zhang, Chen Hao, Xiaoqi Lv, Yu Gu, Yueming Wang, Xin Liu, Yan Ren, Jianjun Li

To decrease domain shift in cross-domain person re-identification, existing methods generate pseudo labels for training models, however, the inherent distribution between source domain data and the hard quantization loss is ignored. Therefore, a cross-domain person re-identification method based on distribution consistency and multi-label collaborative learning is proposed. Firstly, a soft binary cross-entropy loss function is constructed to constrain the inter-sample relationship of cross-domain transformation, which can ensure the consistency of appearance features and sample distribution, and achieving feature cross-domain alignment. On this basis, in order to suppress the noise of hard pseudo labels, a multi-label collaborative learning network is constructed. The soft pseudo labels are generated by using the collaborative foreground features and global features to guide the network training, making the model adapt to the target domain. The experimental results show that the proposed method has better performance than that of recent representative methods.

为了减少跨域人员再识别中的域偏移，现有方法为训练模型生成伪标签，但忽略了源域数据之间的固有分布和硬量化损失。因此，本文提出了一种基于分布一致性和多标签协同学习的跨域人物再识别方法。首先，构建软二值交叉熵损失函数来约束跨域变换的样本间关系，从而保证外观特征与样本分布的一致性，实现特征的跨域对齐。在此基础上，为了抑制硬伪标签的噪声，构建了多标签协同学习网络。利用协作前景特征和全局特征生成软伪标签，指导网络训练，使模型适应目标域。实验结果表明，所提出的方法比近期具有代表性的方法具有更好的性能。

{"title":"A cross-domain person re-identification algorithm based on distribution-consistency and multi-label collaborative learning","authors":"Baohua Zhang, Chen Hao, Xiaoqi Lv, Yu Gu, Yueming Wang, Xin Liu, Yan Ren, Jianjun Li","doi":"10.1007/s11042-024-20142-w","DOIUrl":"https://doi.org/10.1007/s11042-024-20142-w","url":null,"abstract":"To decrease domain shift in cross-domain person re-identification, existing methods generate pseudo labels for training models, however, the inherent distribution between source domain data and the hard quantization loss is ignored. Therefore, a cross-domain person re-identification method based on distribution consistency and multi-label collaborative learning is proposed. Firstly, a soft binary cross-entropy loss function is constructed to constrain the inter-sample relationship of cross-domain transformation, which can ensure the consistency of appearance features and sample distribution, and achieving feature cross-domain alignment. On this basis, in order to suppress the noise of hard pseudo labels, a multi-label collaborative learning network is constructed. The soft pseudo labels are generated by using the collaborative foreground features and global features to guide the network training, making the model adapt to the target domain. The experimental results show that the proposed method has better performance than that of recent representative methods.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"29 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Co-clustering method for cold start issue in collaborative filtering movie recommender system 协同过滤电影推荐系统中冷启动问题的聚类方法

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-09-13 DOI: 10.1007/s11042-024-20103-3

Ensieh AbbasiRad, Mohammad Reza Keyvanpour, Nasim Tohidi

Recommender systems play an essential role in decision-making in the information age by reducing information overload via retrieving the most relevant information in various applications. They also present great opportunities and challenges for business, government, education, and other fields. The cold start problem is a significant issue in these systems. If recommender systems fail to provide satisfactory personalized recommendations for new users, the user’s trust can easily be lost. Hence, in this paper, using co-clustering and utilizing user demographic information and the behavioral history of users, a solution to this critical issue for recommending movies is introduced. In the proposed method, in addition to dealing with the problem of relative cold start, the problem of absolute cold start is also addressed. The proposed method was evaluated via two RMSE and MAE criteria, which accordingly has achieved 0.85 and 0.49 on the Movielens dataset and 1.05 and 0.6 on the EachMovie dataset, respectively, according to the number of comments that Cold Start users have registered. Moreover, it achieved 0.9 and 0.55 on the Movielens dataset and 1.42 and 0.89 on the EachMovie dataset, respectively, according to the number of registered comments for the cold start items.

在信息时代，推荐系统通过在各种应用中检索最相关的信息来减少信息超载，从而在决策过程中发挥着至关重要的作用。它们也为商业、政府、教育和其他领域带来了巨大的机遇和挑战。冷启动问题是这些系统中的一个重要问题。如果推荐系统不能为新用户提供令人满意的个性化推荐，用户的信任就很容易丧失。因此，本文利用共聚类法，并利用用户人口信息和用户行为历史记录，提出了一种解决推荐电影这一关键问题的方法。在所提出的方法中，除了处理相对冷启动问题外，还解决了绝对冷启动问题。通过 RMSE 和 MAE 两项标准对所提出的方法进行了评估，根据冷启动用户注册的评论数量，该方法在 Movielens 数据集上的 RMSE 和 MAE 分别为 0.85 和 0.49，在 EachMovie 数据集上的 RMSE 和 MAE 分别为 1.05 和 0.6。此外，根据冷启动项目的注册评论数，在 Movielens 数据集上分别达到了 0.9 和 0.55，在 EachMovie 数据集上分别达到了 1.42 和 0.89。

{"title":"Co-clustering method for cold start issue in collaborative filtering movie recommender system","authors":"Ensieh AbbasiRad, Mohammad Reza Keyvanpour, Nasim Tohidi","doi":"10.1007/s11042-024-20103-3","DOIUrl":"https://doi.org/10.1007/s11042-024-20103-3","url":null,"abstract":"Recommender systems play an essential role in decision-making in the information age by reducing information overload via retrieving the most relevant information in various applications. They also present great opportunities and challenges for business, government, education, and other fields. The cold start problem is a significant issue in these systems. If recommender systems fail to provide satisfactory personalized recommendations for new users, the user’s trust can easily be lost. Hence, in this paper, using co-clustering and utilizing user demographic information and the behavioral history of users, a solution to this critical issue for recommending movies is introduced. In the proposed method, in addition to dealing with the problem of relative cold start, the problem of absolute cold start is also addressed. The proposed method was evaluated via two RMSE and MAE criteria, which accordingly has achieved 0.85 and 0.49 on the Movielens dataset and 1.05 and 0.6 on the EachMovie dataset, respectively, according to the number of comments that Cold Start users have registered. Moreover, it achieved 0.9 and 0.55 on the Movielens dataset and 1.42 and 0.89 on the EachMovie dataset, respectively, according to the number of registered comments for the cold start items.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"13 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Acoustic data acquisition and integration for semantic organization of sentimental data and analysis in a PWSN 采集和整合声学数据，以便在 PWSN 中对情感数据进行语义组织和分析

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-09-13 DOI: 10.1007/s11042-024-20229-4

Sushovan Das, Uttam Kr. Mondal

In an acoustic pervasive wireless sensor network (PWSN), the BASE station plays a vital role in gathering and integrating acoustic sensor data from various nodes, including end and router devices tracking time-driven events. The semantic BASE station is crucial in the IoT landscape as it consolidates data from these networks, enabling thorough sentiment analysis of acoustic signals and yielding insights across domains. A semantic processor at the BASE station is essential for an energy-efficient and intelligent PWSN, managing data collection, integration, signal feature extraction, and publication for model training and sentiment analysis. This paper introduces a novel approach to designing a semantic BASE station, focusing on ontology generation, evaluation, and updates to bolster pervasive wireless sensors in capturing and depicting events and time through an ontological framework. The study addresses challenges in efficiently gathering, integrating, and processing acoustic data from pervasive nodes, proposing a semantic processor at the BASE station to enhance feature extraction and metadata publication. The semantic organization of feature-extracted labeled metadata enables the analysis of comprehensive machine learning (ML) applications such as sentiment analysis, type detection, and environment detection by generating confusion matrix. Evaluation includes performance metrics (NEEN, LSNS, BDAS) as well as accuracy, precision, sensitivity, and specificity for sentimental data analysis to validate the proposed technique’s efficacy.

在声学普适无线传感器网络（PWSN）中，基站在收集和整合来自不同节点（包括跟踪时间驱动事件的终端和路由器设备）的声学传感器数据方面发挥着至关重要的作用。语义 BASE 站在物联网领域至关重要，因为它能整合来自这些网络的数据，对声学信号进行全面的情感分析，并提供跨领域的见解。BASE 站的语义处理器对于节能、智能的 PWSN 至关重要，它可以管理数据收集、整合、信号特征提取以及用于模型训练和情感分析的发布。本文介绍了一种设计语义 BASE 站的新方法，重点关注本体的生成、评估和更新，通过本体框架支持无处不在的无线传感器捕捉和描述事件和时间。该研究解决了高效收集、整合和处理来自普适性节点的声学数据所面临的挑战，提出了在 BASE 站使用语义处理器来加强特征提取和元数据发布的方法。通过生成混淆矩阵，对特征提取的标注元数据进行语义组织，可对情感分析、类型检测和环境检测等综合机器学习（ML）应用进行分析。评估包括情感数据分析的性能指标（NEEN、LSNS、BDAS）以及准确度、精确度、灵敏度和特异性，以验证所提技术的功效。

{"title":"Acoustic data acquisition and integration for semantic organization of sentimental data and analysis in a PWSN","authors":"Sushovan Das, Uttam Kr. Mondal","doi":"10.1007/s11042-024-20229-4","DOIUrl":"https://doi.org/10.1007/s11042-024-20229-4","url":null,"abstract":"In an acoustic pervasive wireless sensor network (PWSN), the BASE station plays a vital role in gathering and integrating acoustic sensor data from various nodes, including end and router devices tracking time-driven events. The semantic BASE station is crucial in the IoT landscape as it consolidates data from these networks, enabling thorough sentiment analysis of acoustic signals and yielding insights across domains. A semantic processor at the BASE station is essential for an energy-efficient and intelligent PWSN, managing data collection, integration, signal feature extraction, and publication for model training and sentiment analysis. This paper introduces a novel approach to designing a semantic BASE station, focusing on ontology generation, evaluation, and updates to bolster pervasive wireless sensors in capturing and depicting events and time through an ontological framework. The study addresses challenges in efficiently gathering, integrating, and processing acoustic data from pervasive nodes, proposing a semantic processor at the BASE station to enhance feature extraction and metadata publication. The semantic organization of feature-extracted labeled metadata enables the analysis of comprehensive machine learning (ML) applications such as sentiment analysis, type detection, and environment detection by generating confusion matrix. Evaluation includes performance metrics (NEEN, LSNS, BDAS) as well as accuracy, precision, sensitivity, and specificity for sentimental data analysis to validate the proposed technique’s efficacy.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"15 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A new four-tier technique for efficient multiple images encryption 高效多图像加密的新四层技术

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-09-13 DOI: 10.1007/s11042-024-20125-x

Khalid M. Hosny, Sara T. Kamal

People transmit millions of digital images daily over various networks, where securing these images is a big challenge. Image encryption is a successful approach widely used in securing digital images while transmitting. Researchers developed different encryption techniques that focus on securing individual images. Recently, encryption of multiple images has gained more interest as an emerging encryption approach. In this paper, we proposed a four-tier technique for multiple image encryption (MIE) to increase the transmission speed and improve digital image security. First, we attached the plain images to create an augmented image. Second, the randomized augmented image is obtained by randomly changing the position of each plain image. Third, we scrambled the randomized augmented image using the zigzag pattern, rotation, and random permutation between blocks. Finally, we diffuse the scrambled augmented image using an Altered Sine-logistic-based Tent map (ASLT). We draw a flowchart, write a pseudo-code, and present an illustrative example to simplify the proposed method and make it easy to understand. Many experiments were performed to evaluate this Four-Tier technique, and the results show that this technique is extremely effective and secure to withstand various attacks.

人们每天通过各种网络传输数以百万计的数字图像，如何确保这些图像的安全是一个巨大的挑战。图像加密是一种成功的方法，被广泛应用于数字图像传输过程中的安全保护。研究人员开发了不同的加密技术，重点确保单张图像的安全。最近，作为一种新兴的加密方法，多图像加密技术越来越受到关注。在本文中，我们提出了一种四层多重图像加密（MIE）技术，以提高传输速度和数字图像的安全性。首先，我们将普通图像附加到增强图像上。其次，通过随机改变每个普通图像的位置来获得随机增强图像。第三，我们使用之字形图案、旋转和块间随机排列对随机增强图像进行加扰。最后，我们使用基于正弦逻辑的 Altered Sine-logistic Tent map (ASLT) 扩散加扰的增强图像。我们绘制了流程图，编写了伪代码，并举例说明，以简化所提出的方法，使其易于理解。我们进行了许多实验来评估这种四层技术，结果表明这种技术非常有效、安全，可以抵御各种攻击。

{"title":"A new four-tier technique for efficient multiple images encryption","authors":"Khalid M. Hosny, Sara T. Kamal","doi":"10.1007/s11042-024-20125-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20125-x","url":null,"abstract":"People transmit millions of digital images daily over various networks, where securing these images is a big challenge. Image encryption is a successful approach widely used in securing digital images while transmitting. Researchers developed different encryption techniques that focus on securing individual images. Recently, encryption of multiple images has gained more interest as an emerging encryption approach. In this paper, we proposed a four-tier technique for multiple image encryption (MIE) to increase the transmission speed and improve digital image security. First, we attached the plain images to create an augmented image. Second, the randomized augmented image is obtained by randomly changing the position of each plain image. Third, we scrambled the randomized augmented image using the zigzag pattern, rotation, and random permutation between blocks. Finally, we diffuse the scrambled augmented image using an Altered Sine-logistic-based Tent map (ASLT). We draw a flowchart, write a pseudo-code, and present an illustrative example to simplify the proposed method and make it easy to understand. Many experiments were performed to evaluate this Four-Tier technique, and the results show that this technique is extremely effective and secure to withstand various attacks.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"35 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optical image encryption based on 3D double-phase encoding algorithm in the gyrator transform domain 基于回旋变换域三维双相编码算法的光学图像加密技术

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-09-13 DOI: 10.1007/s11042-024-20176-0

Jun Lang, Fan Zhang

In this paper, we propose an optical image encryption scheme based on modified 3D double-phase encoding algorithm (3D-DPEA) in the gyrator transform (GT) domain, in which a plaintext is encrypted into two sparse volumetric ciphertexts under the constraints of chaos-generated binary amplitude masks (BAMs). Then, the two volumetric ciphertexts are multiplexed into the corresponding 2D ciphertexts for convenient storage and transmission. First, due to the synergistic adjustment of the two sparse volumetric ciphertexts during the iterative process, the 3D-DPEA would achieve higher recovery quality of the decrypted image with fewer iterations. In addition, because the BAMs are generated by the logistic-tent (LT) chaotic map which is closely related to the rotation angles of GT, and the LT chaotic map has several advantages such as nonlinear, pseudorandom behavior, and high sensitivity of initial conditions, the sensitivity of the secret key could be significantly improved by several orders of magnitude, reaching up to 10⁻¹⁴. As a result, the 3D-DPEA scheme not only eliminates the explicit/linear relationship between the plaintext and the ciphertext but also substantially enhances security. For decryption, the corresponding decrypted image can be achieved by recording an intensity pattern when a coherent beam crosses two sparse volumetric ciphertexts sequentially. Furthermore, BAMs wouldn’t impose an additional burden on the storage and transmission of secret keys. A series of numerical simulations are performed to verify the effectiveness and security of the proposed encryption scheme.

本文提出了一种基于回旋器变换（GT）域中改进的三维双相编码算法（3D-DPEA）的光学图像加密方案，在混沌生成的二进制振幅掩码（BAM）的约束下，将明文加密为两个稀疏的体积密文。然后，将两个体积密码文复用为相应的二维密码文，以方便存储和传输。首先，由于在迭代过程中对两个稀疏的体积密码文本进行了协同调整，三维-DPEA 能够以更少的迭代次数获得更高的解密图像恢复质量。此外，由于BAMs是由与GT旋转角度密切相关的Logistic-tent（LT）混沌图生成的，而LT混沌图具有非线性、伪随机行为和对初始条件的高灵敏度等优点，因此密钥的灵敏度可以显著提高几个数量级，最高可达10-14。因此，3D-DPEA 方案不仅消除了明文和密文之间的显式/线性关系，还大大提高了安全性。在解密方面，当相干光束依次穿过两个稀疏的体积密码文本时，通过记录强度模式就能获得相应的解密图像。此外，BAM 不会给密钥的存储和传输带来额外负担。为了验证所提加密方案的有效性和安全性，我们进行了一系列数值模拟。

{"title":"Optical image encryption based on 3D double-phase encoding algorithm in the gyrator transform domain","authors":"Jun Lang, Fan Zhang","doi":"10.1007/s11042-024-20176-0","DOIUrl":"https://doi.org/10.1007/s11042-024-20176-0","url":null,"abstract":"In this paper, we propose an optical image encryption scheme based on modified 3D double-phase encoding algorithm (3D-DPEA) in the gyrator transform (GT) domain, in which a plaintext is encrypted into two sparse volumetric ciphertexts under the constraints of chaos-generated binary amplitude masks (BAMs). Then, the two volumetric ciphertexts are multiplexed into the corresponding 2D ciphertexts for convenient storage and transmission. First, due to the synergistic adjustment of the two sparse volumetric ciphertexts during the iterative process, the 3D-DPEA would achieve higher recovery quality of the decrypted image with fewer iterations. In addition, because the BAMs are generated by the logistic-tent (LT) chaotic map which is closely related to the rotation angles of GT, and the LT chaotic map has several advantages such as nonlinear, pseudorandom behavior, and high sensitivity of initial conditions, the sensitivity of the secret key could be significantly improved by several orders of magnitude, reaching up to 10−14. As a result, the 3D-DPEA scheme not only eliminates the explicit/linear relationship between the plaintext and the ciphertext but also substantially enhances security. For decryption, the corresponding decrypted image can be achieved by recording an intensity pattern when a coherent beam crosses two sparse volumetric ciphertexts sequentially. Furthermore, BAMs wouldn’t impose an additional burden on the storage and transmission of secret keys. A series of numerical simulations are performed to verify the effectiveness and security of the proposed encryption scheme.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"121 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Design of a knowledge distillation network for wifi-based indoor localization 为基于 WIFI 的室内定位设计知识提炼网络

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-09-13 DOI: 10.1007/s11042-024-20212-z

Ritabroto Ganguly, Manjarini Mallik, Chandreyee Chowdhury

The main purpose of indoor localization is to precisely locate users and help them navigate within an indoor area, like a building or campus, where GPS and other satellite technologies lack precision. Our methodology for achieving indoor localization has been to implement classifiers that use Received Signal Strength Indicator (RSSI) values of WiFi signals collected from smart hand-held devices. However, these RSSI values keep varying, often appreciably, from time to time and device to device. So, to instill more generalizability into the location prediction process, ensemble models have been built that can learn from the pros and cons of all of their member classifiers. In this paper, we have presented several neural network based ensemble models to compensate for the lack of detailed studies with ensemble models (especially neural network based ones) on indoor localization. Our second contribution lies in designing a knowledge distillation framework for the ensemble models that preserves the classification performance while make the system real-time responsive as the lightweight distilled model could be executed locally on the edge devices. Our proposed knowledge distillation framework distils the knowledge of a large neural network based ensemble classifier into a much smaller compressed classification model while maintaining the performance. We have implemented and shown the workings of the proposed knowledge distillation framework on three publicly available benchmark datasets. The proposed model have been found to achieve 83.95%, 93.10% and 96.48% accuracy for DataSet1, DataSet2 and DataSet3, respectively.

室内定位的主要目的是对用户进行精确定位，并帮助他们在室内区域（如建筑物或校园）进行导航，而 GPS 和其他卫星技术在室内区域缺乏精确性。我们实现室内定位的方法是使用从智能手持设备收集到的 WiFi 信号的接收信号强度指示器（RSSI）值来实施分类器。然而，这些 RSSI 值会随着时间和设备的不同而变化，而且往往变化很大。因此，为了给位置预测过程注入更多的通用性，我们建立了集合模型，可以从所有成员分类器的优缺点中学习。在本文中，我们介绍了几种基于神经网络的集合模型，以弥补室内定位集合模型（尤其是基于神经网络的集合模型）研究的不足。我们的第二个贡献在于为集合模型设计了一个知识提炼框架，它既能保持分类性能，又能使系统实时响应，因为轻量级的提炼模型可以在边缘设备上本地执行。我们提出的知识蒸馏框架能将基于神经网络的大型集合分类器的知识蒸馏为更小的压缩分类模型，同时保持性能。我们在三个公开的基准数据集上实现并展示了所提出的知识蒸馏框架的工作原理。结果发现，在数据集 1、数据集 2 和数据集 3 中，拟议模型的准确率分别达到了 83.95%、93.10% 和 96.48%。

{"title":"Design of a knowledge distillation network for wifi-based indoor localization","authors":"Ritabroto Ganguly, Manjarini Mallik, Chandreyee Chowdhury","doi":"10.1007/s11042-024-20212-z","DOIUrl":"https://doi.org/10.1007/s11042-024-20212-z","url":null,"abstract":"The main purpose of indoor localization is to precisely locate users and help them navigate within an indoor area, like a building or campus, where GPS and other satellite technologies lack precision. Our methodology for achieving indoor localization has been to implement classifiers that use Received Signal Strength Indicator (RSSI) values of WiFi signals collected from smart hand-held devices. However, these RSSI values keep varying, often appreciably, from time to time and device to device. So, to instill more generalizability into the location prediction process, ensemble models have been built that can learn from the pros and cons of all of their member classifiers. In this paper, we have presented several neural network based ensemble models to compensate for the lack of detailed studies with ensemble models (especially neural network based ones) on indoor localization. Our second contribution lies in designing a knowledge distillation framework for the ensemble models that preserves the classification performance while make the system real-time responsive as the lightweight distilled model could be executed locally on the edge devices. Our proposed knowledge distillation framework distils the knowledge of a large neural network based ensemble classifier into a much smaller compressed classification model while maintaining the performance. We have implemented and shown the workings of the proposed knowledge distillation framework on three publicly available benchmark datasets. The proposed model have been found to achieve 83.95%, 93.10% and 96.48% accuracy for DataSet1, DataSet2 and DataSet3, respectively.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"29 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0