首页 > 最新文献

Multimedia Tools and Applications最新文献

英文 中文
Multimodal emotion recognition based on a fusion of audiovisual information with temporal dynamics 基于视听信息与时间动态融合的多模态情感识别
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-18 DOI: 10.1007/s11042-024-20227-6
José Salas-Cáceres, Javier Lorenzo-Navarro, David Freire-Obregón, Modesto Castrillón-Santana

In the Human-Machine Interactions (HMI) landscape, understanding user emotions is pivotal for elevating user experiences. This paper explores Facial Expression Recognition (FER) within HMI, employing a distinctive multimodal approach that integrates visual and auditory information. Recognizing the dynamic nature of HMI, where situations evolve, this study emphasizes continuous emotion analysis. This work assesses various fusion strategies that involve the addition to the main network of different architectures, such as autoencoders (AE) or an Embracement module, to combine the information of multiple biometric cues. In addition to the multimodal approach, this paper introduces a new architecture that prioritizes temporal dynamics by incorporating Long Short-Term Memory (LSTM) networks. The final proposal, which integrates different multimodal approaches with the temporal focus capabilities of the LSTM architecture, was tested across three public datasets: RAVDESS, SAVEE, and CREMA-D. It showcased state-of-the-art accuracy of 88.11%, 86.75%, and 80.27%, respectively, and outperformed other existing approaches.

在人机交互(HMI)领域,了解用户情绪对于提升用户体验至关重要。本文探讨了人机界面中的面部表情识别(FER),采用了一种独特的多模态方法,将视觉和听觉信息整合在一起。认识到人机界面的动态性质,即情况不断变化,本研究强调持续的情感分析。这项工作评估了各种融合策略,包括在主网络中添加不同的架构,如自动编码器(AE)或嵌入模块,以结合多种生物识别线索的信息。除了多模态方法外,本文还引入了一种新的架构,通过结合长短期记忆(LSTM)网络,优先考虑时间动态。最终建议将不同的多模态方法与 LSTM 架构的时间聚焦功能相结合,并在三个公共数据集上进行了测试:RAVDESS、SAVEE 和 CREMA-D。其准确率分别为 88.11%、86.75% 和 80.27%,达到了最先进的水平,优于其他现有方法。
{"title":"Multimodal emotion recognition based on a fusion of audiovisual information with temporal dynamics","authors":"José Salas-Cáceres, Javier Lorenzo-Navarro, David Freire-Obregón, Modesto Castrillón-Santana","doi":"10.1007/s11042-024-20227-6","DOIUrl":"https://doi.org/10.1007/s11042-024-20227-6","url":null,"abstract":"<p>In the Human-Machine Interactions (HMI) landscape, understanding user emotions is pivotal for elevating user experiences. This paper explores Facial Expression Recognition (FER) within HMI, employing a distinctive multimodal approach that integrates visual and auditory information. Recognizing the dynamic nature of HMI, where situations evolve, this study emphasizes continuous emotion analysis. This work assesses various fusion strategies that involve the addition to the main network of different architectures, such as autoencoders (AE) or an Embracement module, to combine the information of multiple biometric cues. In addition to the multimodal approach, this paper introduces a new architecture that prioritizes temporal dynamics by incorporating Long Short-Term Memory (LSTM) networks. The final proposal, which integrates different multimodal approaches with the temporal focus capabilities of the LSTM architecture, was tested across three public datasets: RAVDESS, SAVEE, and CREMA-D. It showcased state-of-the-art accuracy of 88.11%, 86.75%, and 80.27%, respectively, and outperformed other existing approaches.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"32 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improvised method for analysis and synthesis of NUFB for Speech and ECG signal applications 用于语音和心电信号的 NUFB 分析与合成改进方法
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-18 DOI: 10.1007/s11042-024-20211-0
B. Keerthana, N. Raju

This article presents a rapidly converging optimization technique using a single parameter for designing non-uniform cosine modulated filter banks (CMFBS). The non-uniform cosine modulated filter banks are derived from closed-form uniform cosine modulated filter banks by merging the relevant bandpass filters based on given decimation factors. In this proposed method, the cut-off frequency of the prototype filter is varied through analytically calculated step size using control parameters so that the filter coefficients at quadrature frequency are approximately equal to 0.707 and the formulated objective function is satisfied with the prescribed tolerance. Simulation results demonstrate that the proposed algorithm achieves superior performance, with amplitude distortion levels significantly outperforming existing methods in the literature, reaching as low as 2.4483 × 10⁻4. For the prototype filter design, a constrained equiripple finite impulse response (FIR) digital filter is employed, with the roll-off factor and error ratio chosen based on a stopband attenuation, a passband attenuation and a filter order. The results highlight the proposed algorithm’s effectiveness for high-quality reconstruction of speech signals, particularly in speech coding and enhancement, as well as ECG signals. This makes the method highly versatile and suitable for various practical applications, including sub-band coding of real-time and near real-time signals.

本文提出了一种快速收敛的优化技术,使用单一参数设计非均匀余弦调制滤波器组(CMFBS)。非均匀余弦调制滤波器组是从闭式均匀余弦调制滤波器组衍生而来的,方法是根据给定的抽取系数合并相关的带通滤波器。在所提出的方法中,原型滤波器的截止频率通过使用控制参数分析计算的步长来改变,从而使正交频率下的滤波器系数近似等于 0.707,并在规定的容差范围内满足所制定的目标函数。仿真结果表明,所提出的算法性能优越,振幅失真水平明显优于文献中的现有方法,最低可达 2.4483 × 10-4。在滤波器原型设计中,采用了受约束等褶有限脉冲响应(FIR)数字滤波器,根据阻带衰减、通带衰减和滤波器阶数选择滚降系数和误差比。结果表明,所提出的算法能有效地对语音信号(尤其是语音编码和增强)以及心电信号进行高质量的重建。这使得该方法具有很强的通用性,适用于各种实际应用,包括实时和近实时信号的子带编码。
{"title":"Improvised method for analysis and synthesis of NUFB for Speech and ECG signal applications","authors":"B. Keerthana, N. Raju","doi":"10.1007/s11042-024-20211-0","DOIUrl":"https://doi.org/10.1007/s11042-024-20211-0","url":null,"abstract":"<p>This article presents a rapidly converging optimization technique using a single parameter for designing non-uniform cosine modulated filter banks (CMFB<sub>S</sub>). The non-uniform cosine modulated filter banks are derived from closed-form uniform cosine modulated filter banks by merging the relevant bandpass filters based on given decimation factors. In this proposed method, the cut-off frequency of the prototype filter is varied through analytically calculated step size using control parameters so that the filter coefficients at quadrature frequency are approximately equal to 0.707 and the formulated objective function is satisfied with the prescribed tolerance. Simulation results demonstrate that the proposed algorithm achieves superior performance, with amplitude distortion levels significantly outperforming existing methods in the literature, reaching as low as 2.4483 × 10⁻<sup>4</sup>. For the prototype filter design, a constrained equiripple finite impulse response (FIR) digital filter is employed, with the roll-off factor and error ratio chosen based on a stopband attenuation, a passband attenuation and a filter order. The results highlight the proposed algorithm’s effectiveness for high-quality reconstruction of speech signals, particularly in speech coding and enhancement, as well as ECG signals. This makes the method highly versatile and suitable for various practical applications, including sub-band coding of real-time and near real-time signals.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"49 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Template-based text field segmentation for ID documents using dynamic squeezeboxes packing 使用动态挤压框包装基于模板的身份证件文本字段分割
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-18 DOI: 10.1007/s11042-024-20162-6
Michael Zingerenko, Elena Limonova, Vladimir V. Arlazarov

In this paper, we focus on the problem of text field segmentation in identity documents. These documents, characterized by their fixed layouts, present an opportunity to apply computationally efficient template-based algorithms. We consider the Dynamic Squeezeboxes Packing method and demonstrate its integration into document recognition systems, utilizing a single sample per document type. We benchmark text field segmentation on the MIDV-2019 public dataset using standard intersection-over-union and our custom intersection-over-template metrics, while also measuring processing time. We demonstrate that Dynamic Squeezeboxes Packing maintains competitive quality compared to text in the wild methods (EAST, CRAFT) and named-entity recognition method (LayoutLMv2). A significant advantage of this method is its processing speed, averaging 9 ms per image on the x86_64 platform, which is substantially faster than EAST (980 ms), CRAFT (2030 ms), and LayoutLMv2 (2210 ms). The obtained results suggest that the considered method has strong potential as a method in document image analysis, particularly for processing identity documents.

在本文中,我们重点讨论身份证件中的文本字段分割问题。这些文件的特点是布局固定,为应用基于模板的高效计算算法提供了机会。我们考虑了动态挤压盒打包方法,并演示了该方法与文档识别系统的整合,每种文档类型只需使用一个样本。我们在 MIDV-2019 公开数据集上使用标准的 "过联合交集 "和我们自定义的 "过模板交集 "指标对文本字段分割进行了基准测试,同时还测量了处理时间。我们证明,与野生文本方法(EAST、CRAFT)和命名实体识别方法(LayoutLMv2)相比,动态 Squeezeboxes Packing 保持了具有竞争力的质量。这种方法的一个显著优势是处理速度快,在 x86_64 平台上,平均每张图像的处理速度为 9 毫秒,大大快于 EAST(980 毫秒)、CRAFT(2030 毫秒)和 LayoutLMv2(2210 毫秒)。所获得的结果表明,所考虑的方法在文档图像分析中,特别是在处理身份证件方面具有很大的潜力。
{"title":"Template-based text field segmentation for ID documents using dynamic squeezeboxes packing","authors":"Michael Zingerenko, Elena Limonova, Vladimir V. Arlazarov","doi":"10.1007/s11042-024-20162-6","DOIUrl":"https://doi.org/10.1007/s11042-024-20162-6","url":null,"abstract":"<p>In this paper, we focus on the problem of text field segmentation in identity documents. These documents, characterized by their fixed layouts, present an opportunity to apply computationally efficient template-based algorithms. We consider the Dynamic Squeezeboxes Packing method and demonstrate its integration into document recognition systems, utilizing a single sample per document type. We benchmark text field segmentation on the MIDV-2019 public dataset using standard intersection-over-union and our custom intersection-over-template metrics, while also measuring processing time. We demonstrate that Dynamic Squeezeboxes Packing maintains competitive quality compared to text in the wild methods (EAST, CRAFT) and named-entity recognition method (LayoutLMv2). A significant advantage of this method is its processing speed, averaging 9 ms per image on the x86_64 platform, which is substantially faster than EAST (980 ms), CRAFT (2030 ms), and LayoutLMv2 (2210 ms). The obtained results suggest that the considered method has strong potential as a method in document image analysis, particularly for processing identity documents.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"99 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancement of single foggy image using feature based fusion technique 使用基于特征的融合技术增强单幅雾图像
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-18 DOI: 10.1007/s11042-024-20181-3
Pooja Pandey, Rashmi Gupta, Nidhi Goel

Foggy and hazy weather conditions are very common natural phenomenon which reduces the visibility of acquired outdoor pictures. Poor visibility creates innumerable problems in various facets of life viz. in tracking, surveillance and in many more fields. In this paper, an efficient feature based fusion technique has been used to enhance the single foggy image at transmission level. Fusion at this level retains most significant features of foggy image and using this fused single input at transmission level, output defog image is calculated. Proposed methodology overcomes the shortcoming of existing Dark Channel Prior and Bright Channel Prior methods.Output of proposed method shows promising result for all types of datasets varying in fog density as well as in size. The foremost major advantage of this method is that it does not require any pre-processing or post processing and thus, very simple to implement.

雾和朦胧的天气条件是非常常见的自然现象,会降低获取的户外图片的可见度。能见度低给生活的各个方面带来了无数问题,如跟踪、监控和其他许多领域。本文采用了一种高效的基于特征的融合技术,在传输层面上增强单幅雾天图像。这一级别的融合保留了雾图像最重要的特征,并利用传输级融合后的单一输入,计算出输出除雾图像。所提出的方法克服了现有暗通道先验法和亮通道先验法的缺点。所提出方法的输出结果表明,对于雾密度和大小各不相同的各类数据集,效果都很好。这种方法的最大优点是不需要任何预处理或后处理,因此实施起来非常简单。
{"title":"Enhancement of single foggy image using feature based fusion technique","authors":"Pooja Pandey, Rashmi Gupta, Nidhi Goel","doi":"10.1007/s11042-024-20181-3","DOIUrl":"https://doi.org/10.1007/s11042-024-20181-3","url":null,"abstract":"<p>Foggy and hazy weather conditions are very common natural phenomenon which reduces the visibility of acquired outdoor pictures. Poor visibility creates innumerable problems in various facets of life <i>viz</i>. in tracking, surveillance and in many more fields. In this paper, an efficient feature based fusion technique has been used to enhance the single foggy image at transmission level. Fusion at this level retains most significant features of foggy image and using this fused single input at transmission level, output defog image is calculated. Proposed methodology overcomes the shortcoming of existing Dark Channel Prior and Bright Channel Prior methods.Output of proposed method shows promising result for all types of datasets varying in fog density as well as in size. The foremost major advantage of this method is that it does not require any pre-processing or post processing and thus, very simple to implement.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"50 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integration of Blockchain and IPFS: healthcare data management & sharing for IoT Environment 区块链与 IPFS 的整合:物联网环境下的医疗数据管理与共享
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-17 DOI: 10.1007/s11042-024-20092-3
Rajiv Kumar Mishra, Rajesh Kumar Yadav, Prem Nath

The immense volume of data generated and collected by smart devices has significantly enhanced various aspects of our daily lives. However, safeguarding the sensitive information shared among these devices is crucial. Ensuring the security of the Internet of Things (IoT) ecosystem from unauthorized access is imperative. Blockchain technology emerges as a promising solution to address these security concerns. Nevertheless, the effectiveness of Blockchain in handling the extensive data generated by smart devices is challenged by the rapid pace of IoT data generation and the slower transaction validation speed within Blockchain networks. This research aims to resolve these issues by integrating Blockchain with the Inter-Planetary File System (IPFS), creating a robust framework for secure data recording on a distributed storage network while enabling authorized access to the stored data. The proposed mechanism involves defining and recording access policies and cryptographic hash content on the Blockchain network, while storing the actual IoT-generated data on IPFS to enhance the confidentiality, integrity, and availability (CIA) triad. Performance assessments of the proposed scheme demonstrate its security and practicality, validating its potential for real-world application.

智能设备生成和收集的大量数据极大地改善了我们日常生活的各个方面。然而,保护这些设备之间共享的敏感信息至关重要。当务之急是确保物联网生态系统的安全,防止未经授权的访问。区块链技术是解决这些安全问题的大有可为的解决方案。然而,由于物联网数据生成速度快,而区块链网络内的交易验证速度较慢,区块链在处理智能设备产生的大量数据方面的有效性受到了挑战。本研究旨在通过将区块链与星际文件系统(IPFS)集成来解决这些问题,从而创建一个强大的框架,用于在分布式存储网络上安全记录数据,同时实现对存储数据的授权访问。拟议的机制包括在区块链网络上定义和记录访问策略和加密哈希内容,同时在 IPFS 上存储物联网生成的实际数据,以增强保密性、完整性和可用性(CIA)三要素。拟议方案的性能评估证明了其安全性和实用性,验证了其在现实世界中的应用潜力。
{"title":"Integration of Blockchain and IPFS: healthcare data management & sharing for IoT Environment","authors":"Rajiv Kumar Mishra, Rajesh Kumar Yadav, Prem Nath","doi":"10.1007/s11042-024-20092-3","DOIUrl":"https://doi.org/10.1007/s11042-024-20092-3","url":null,"abstract":"<p>The immense volume of data generated and collected by smart devices has significantly enhanced various aspects of our daily lives. However, safeguarding the sensitive information shared among these devices is crucial. Ensuring the security of the Internet of Things (IoT) ecosystem from unauthorized access is imperative. Blockchain technology emerges as a promising solution to address these security concerns. Nevertheless, the effectiveness of Blockchain in handling the extensive data generated by smart devices is challenged by the rapid pace of IoT data generation and the slower transaction validation speed within Blockchain networks. This research aims to resolve these issues by integrating Blockchain with the Inter-Planetary File System (IPFS), creating a robust framework for secure data recording on a distributed storage network while enabling authorized access to the stored data. The proposed mechanism involves defining and recording access policies and cryptographic hash content on the Blockchain network, while storing the actual IoT-generated data on IPFS to enhance the confidentiality, integrity, and availability (CIA) triad. Performance assessments of the proposed scheme demonstrate its security and practicality, validating its potential for real-world application.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"1 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving agility in projects using machine learning algorithm 利用机器学习算法提高项目的敏捷性
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-17 DOI: 10.1007/s11042-024-19909-y
Janani Varun, R A Karthika

All the software products developed will need testing to ensure the quality and accuracy of the product. It makes the life of testers much easier when they can optimize on the effort spent and predict defects for the upcoming modules in the Agile era. The functionality being discussed in this paper is to predict the defects using Random Forest Algorithm. Predictive analytics draws on information from the past to create forecasts about the outcomes of future events. Product team always have the difficulty in delivering the product as per schedule. As we are in the agile era, the requirement keeps changing and team is unsure on upcoming releases. Prediction helps the team to focus on the complex and error prone modules in upcoming releases. The Predictive analytics model designed, can predict defects with an accuracy rate of 88% with the help of historical data. By predicting, testers can focus on the module where there are a greater number of defects predicted by the model and left shift the delivery.

所有开发出来的软件产品都需要测试,以确保产品的质量和准确性。在敏捷时代,如果测试人员能够优化所花费的精力并预测即将到来的模块的缺陷,那么他们的生活就会变得更加轻松。本文讨论的功能是使用随机森林算法预测缺陷。预测分析利用过去的信息来创建对未来事件结果的预测。产品团队总是难以按计划交付产品。由于我们正处于敏捷时代,需求不断变化,团队无法确定即将发布的产品。预测有助于团队在即将发布的版本中专注于复杂和易出错的模块。在历史数据的帮助下,所设计的预测分析模型能以 88% 的准确率预测缺陷。通过预测,测试人员可以将重点放在模型预测的缺陷数量较多的模块上,并对交付进行左移。
{"title":"Improving agility in projects using machine learning algorithm","authors":"Janani Varun, R A Karthika","doi":"10.1007/s11042-024-19909-y","DOIUrl":"https://doi.org/10.1007/s11042-024-19909-y","url":null,"abstract":"<p>All the software products developed will need testing to ensure the quality and accuracy of the product. It makes the life of testers much easier when they can optimize on the effort spent and predict defects for the upcoming modules in the Agile era. The functionality being discussed in this paper is to predict the defects using Random Forest Algorithm. Predictive analytics draws on information from the past to create forecasts about the outcomes of future events. Product team always have the difficulty in delivering the product as per schedule. As we are in the agile era, the requirement keeps changing and team is unsure on upcoming releases. Prediction helps the team to focus on the complex and error prone modules in upcoming releases. The Predictive analytics model designed, can predict defects with an accuracy rate of 88% with the help of historical data. By predicting, testers can focus on the module where there are a greater number of defects predicted by the model and left shift the delivery.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"19 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning-driven IoT device for women’s safety: a real-time sexual harassment prevention system 促进妇女安全的机器学习驱动型物联网设备:实时性骚扰预防系统
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-17 DOI: 10.1007/s11042-024-20228-5
Md Reazul Islam, Khondokar Oliullah, Mohsin Kabir, Ashifur Rahman, M. F. Mridha, Muhammed Fayyaz Khan, Nilanjan Dey

Sexual harassment is an all-encompassing problem that affects individuals in diverse environments including educational institutions, workplaces, and public areas. Despite increased awareness and advocacy efforts, many women continue to face harassment daily, especially on the Indian sub-continent, with underreporting and impunity exacerbating the problem. As technology advances, there is a growing opportunity to use innovative solutions to address this problem. In recent years, the Internet of Things (IoT) and machine learning have emerged as promising technologies for developing systems that can detect and prevent sexual harassment in real-time. This study presents a novel approach for real-time sexual harassment monitoring using a machine learning-based IoT system. The system incorporates nine force-sensitive resistors strategically embedded in women’s dresses to capture relevant data. It is portable and can be affixed to any type of dressing. If the user wishes to change their attire, the system can be easily removed from the current dress and attached to another dress of choice. This flexibility allows users to adapt the system to suit various clothing preferences and styles. The sensor data are transmitted to the cloud via the NodeMCU, enabling continuous monitoring. In the cloud, a pre-trained machine learning model, specifically the AdaBoost classifier, was employed to classify incoming data in real time. We applied four ML methods: RF with GridSearchCV, Bagging Classifier, XGBoost, and Adaboost Classifier. The AdaBoost classifier performed best with an accuracy of 99.3% using a dataset prepared by our lab, which consists of 1048 instances and was collected from 50 students. If a sexual harassment event is detected, an alert is generated through a mobile application and promptly sent to appropriate authorities for immediate action to save the victim. By integrating wearable sensors, IoT technology, and machine learning, this system offers a proactive and efficient approach, especially in uncertain situations, to detect and address sexual harassment incidents and enhance safety and security in various settings.

性骚扰是一个全方位的问题,影响着教育机构、工作场所和公共场所等各种环境中的个人。尽管人们的意识和宣传力度有所提高,但许多妇女仍然每天面临骚扰,尤其是在印度次大陆,报告不足和有罪不罚现象使问题更加严重。随着技术的进步,利用创新解决方案解决这一问题的机会越来越多。近年来,物联网(IoT)和机器学习已成为开发实时检测和预防性骚扰系统的有前途的技术。本研究提出了一种利用基于机器学习的物联网系统对性骚扰进行实时监控的新方法。该系统将九个力敏电阻器战略性地嵌入女性的裙子中,以捕捉相关数据。它便于携带,可贴在任何类型的衣服上。如果用户想更换服装,可以轻松地将系统从当前的衣服上取下,然后贴到另一件衣服上。这种灵活性使用户可以调整系统,以适应各种服装偏好和风格。传感器数据通过 NodeMCU 传输到云端,实现持续监测。在云端,我们采用了一个预先训练好的机器学习模型,特别是 AdaBoost 分类器,对接收到的数据进行实时分类。我们采用了四种 ML 方法:RF with GridSearchCV、Bagging Classifier、XGBoost 和 Adaboost Classifier。AdaBoost 分类器表现最佳,在使用我们实验室准备的数据集时,准确率达到 99.3%,该数据集由 1048 个实例组成,收集自 50 名学生。如果检测到性骚扰事件,就会通过移动应用程序发出警报,并迅速发送给相关部门,以便立即采取行动拯救受害者。通过整合可穿戴传感器、物联网技术和机器学习,该系统提供了一种积极有效的方法,尤其是在不确定的情况下,以检测和处理性骚扰事件,并加强各种环境中的安全和安保。
{"title":"Machine learning-driven IoT device for women’s safety: a real-time sexual harassment prevention system","authors":"Md Reazul Islam, Khondokar Oliullah, Mohsin Kabir, Ashifur Rahman, M. F. Mridha, Muhammed Fayyaz Khan, Nilanjan Dey","doi":"10.1007/s11042-024-20228-5","DOIUrl":"https://doi.org/10.1007/s11042-024-20228-5","url":null,"abstract":"<p>Sexual harassment is an all-encompassing problem that affects individuals in diverse environments including educational institutions, workplaces, and public areas. Despite increased awareness and advocacy efforts, many women continue to face harassment daily, especially on the Indian sub-continent, with underreporting and impunity exacerbating the problem. As technology advances, there is a growing opportunity to use innovative solutions to address this problem. In recent years, the Internet of Things (IoT) and machine learning have emerged as promising technologies for developing systems that can detect and prevent sexual harassment in real-time. This study presents a novel approach for real-time sexual harassment monitoring using a machine learning-based IoT system. The system incorporates nine force-sensitive resistors strategically embedded in women’s dresses to capture relevant data. It is portable and can be affixed to any type of dressing. If the user wishes to change their attire, the system can be easily removed from the current dress and attached to another dress of choice. This flexibility allows users to adapt the system to suit various clothing preferences and styles. The sensor data are transmitted to the cloud via the NodeMCU, enabling continuous monitoring. In the cloud, a pre-trained machine learning model, specifically the AdaBoost classifier, was employed to classify incoming data in real time. We applied four ML methods: RF with GridSearchCV, Bagging Classifier, XGBoost, and Adaboost Classifier. The AdaBoost classifier performed best with an accuracy of 99.3% using a dataset prepared by our lab, which consists of 1048 instances and was collected from 50 students. If a sexual harassment event is detected, an alert is generated through a mobile application and promptly sent to appropriate authorities for immediate action to save the victim. By integrating wearable sensors, IoT technology, and machine learning, this system offers a proactive and efficient approach, especially in uncertain situations, to detect and address sexual harassment incidents and enhance safety and security in various settings.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"7 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing multi-target tracking stability using knowledge graph integration within the Gaussian Mixture Probability Hypothesis Density Filter 利用高斯混杂概率假设密度滤波器中的知识图谱集成增强多目标跟踪稳定性
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-17 DOI: 10.1007/s11042-024-20180-4
Ali Mehrizi, Hadi Sadoghi Yazdi

This paper proposes a novel approach to enhancing multi-target tracking of vehicles in videos with frequent camera occlusions. Our method integrates prior knowledge about vehicle behavior into a Gaussian Mixture Probability Hypothesis Density (GMPHD) filter framework. This knowledge, extracted as a knowledge graph from historical vehicle trajectories, allows the tracker to maintain persistence even during significant interruptions. The knowledge graph models expected movement patterns and generates pseudo-observations during occlusions, similar to how time series analysis leverages historical data for forecasting. We evaluate the proposed method on both simulated and real-world video datasets using the Optimal Sub Pattern Assignment (OSPA) metric, which assesses tracking accuracy. The results show a 19.5% improvement for simulated data and a 16.5% improvement for real-world video data under fully occluded conditions, demonstrating a significant enhancement in performance.

本文提出了一种新颖的方法,用于在摄像机频繁遮挡的视频中加强对车辆的多目标跟踪。我们的方法将有关车辆行为的先验知识整合到高斯混合概率假设密度(GMPHD)滤波器框架中。这些知识是从历史车辆轨迹中提取的知识图谱,即使在出现重大中断时,跟踪器也能保持持续跟踪。知识图谱对预期运动模式进行建模,并在闭塞期间生成伪观测数据,这与时间序列分析利用历史数据进行预测的方法类似。我们在模拟和真实世界的视频数据集上使用最佳子模式分配(OSPA)指标对所提出的方法进行了评估,该指标用于评估跟踪精度。结果表明,在完全遮挡的条件下,模拟数据的性能提高了 19.5%,真实世界视频数据的性能提高了 16.5%,这表明该方法的性能显著提高。
{"title":"Enhancing multi-target tracking stability using knowledge graph integration within the Gaussian Mixture Probability Hypothesis Density Filter","authors":"Ali Mehrizi, Hadi Sadoghi Yazdi","doi":"10.1007/s11042-024-20180-4","DOIUrl":"https://doi.org/10.1007/s11042-024-20180-4","url":null,"abstract":"<p> This paper proposes a novel approach to enhancing multi-target tracking of vehicles in videos with frequent camera occlusions. Our method integrates prior knowledge about vehicle behavior into a Gaussian Mixture Probability Hypothesis Density (GMPHD) filter framework. This knowledge, extracted as a knowledge graph from historical vehicle trajectories, allows the tracker to maintain persistence even during significant interruptions. The knowledge graph models expected movement patterns and generates pseudo-observations during occlusions, similar to how time series analysis leverages historical data for forecasting. We evaluate the proposed method on both simulated and real-world video datasets using the Optimal Sub Pattern Assignment (OSPA) metric, which assesses tracking accuracy. The results show a 19.5% improvement for simulated data and a 16.5% improvement for real-world video data under fully occluded conditions, demonstrating a significant enhancement in performance.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"16 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effective video deblurring based on feature-enhanced deep learning network for daytime and nighttime images 基于特征增强型深度学习网络的昼夜图像有效去模糊技术
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-16 DOI: 10.1007/s11042-024-20222-x
Deng-Yuan Huang, Chao-Ho Chen, Tsong-Yi Chen, Jia-En Li, Hsueh-Liang Hsiao, Da-Jinn Wang, Cheng-Kang Wen

Motion-blurred images are usually generated when captured with a handheld or wearable video camera, owing to rapid movement of the camera or foreground (i.e., moving object captured). Most traditional algorithm-based approaches cannot effectively restore the nonlinear motion-blurred images. Deep learning network-based approaches with intensive computations have recently been developed for deblurring blind motion-blurred images. However, they still achieve limited effect in restoring the details of the images, especially for blurred nighttime images. To effectively deblur the blurred daytime and nighttime images, the proposed video deblurring method consists of three major parts: an image storage module (storing the previous deblurred frame), adjacent frames alignment module (performing optimal feature point selection and perspective transformation matrix), and video-deblurring neural network module (containing two sub-networks of single image deblurring and adjacent frames fusion deblurring). The proposed approach’s main strategy is to design a blurred attention block to extract more effective features (especially for nighttime images) to restore the edges or details of objects. Additionally, the skip connection is introduced into such two sub-networks to improve the model’s ability to fuse contextual features across different layers to enhance the deblurring effect further. Quantitative evaluations demonstrate that our method achieves an average PSNR of 32.401 dB and SSIM of 0.9107, surpassing the next-best method by 1.635 dB in PSNR and 0.0381 in SSIM. Such improvements reveal the effectiveness of the proposed approach in addressing deblurring challenges across both daytime and nighttime scenarios, especially for making the alphanumeric characters in the really blurred nighttime images legible.

在使用手持或可穿戴摄像机拍摄时,由于摄像机或前景(即拍摄到的移动物体)的快速移动,通常会产生运动模糊图像。大多数基于传统算法的方法无法有效还原非线性运动模糊图像。最近,人们开发出了基于深度学习网络的方法,这种方法计算量大,可用于消除盲运动模糊图像。然而,这些方法在恢复图像细节方面的效果仍然有限,尤其是对于模糊的夜间图像。为了有效地对白天和夜间的模糊图像进行去模糊,所提出的视频去模糊方法由三大部分组成:图像存储模块(存储上一帧去模糊图像)、相邻帧配准模块(执行最佳特征点选择和透视变换矩阵)和视频去模糊神经网络模块(包含单幅图像去模糊和相邻帧融合去模糊两个子网络)。所提方法的主要策略是设计一个模糊注意力区块,以提取更有效的特征(尤其是夜间图像),从而还原物体的边缘或细节。此外,还在这两个子网络中引入了跳转连接,以提高模型融合不同层上下文特征的能力,从而进一步增强去模糊效果。定量评估结果表明,我们的方法实现了 32.401 dB 的平均 PSNR 和 0.9107 的 SSIM,在 PSNR 和 SSIM 方面分别超过次优方法 1.635 dB 和 0.0381 dB。这些改进揭示了所提出的方法在解决白天和夜间场景中的去模糊难题方面的有效性,特别是在使真正模糊的夜间图像中的字母数字字符清晰可辨方面。
{"title":"Effective video deblurring based on feature-enhanced deep learning network for daytime and nighttime images","authors":"Deng-Yuan Huang, Chao-Ho Chen, Tsong-Yi Chen, Jia-En Li, Hsueh-Liang Hsiao, Da-Jinn Wang, Cheng-Kang Wen","doi":"10.1007/s11042-024-20222-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20222-x","url":null,"abstract":"<p>Motion-blurred images are usually generated when captured with a handheld or wearable video camera, owing to rapid movement of the camera or foreground (i.e., moving object captured). Most traditional algorithm-based approaches cannot effectively restore the nonlinear motion-blurred images. Deep learning network-based approaches with intensive computations have recently been developed for deblurring blind motion-blurred images. However, they still achieve limited effect in restoring the details of the images, especially for blurred nighttime images. To effectively deblur the blurred daytime and nighttime images, the proposed video deblurring method consists of three major parts: an image storage module (storing the previous deblurred frame), adjacent frames alignment module (performing optimal feature point selection and perspective transformation matrix), and video-deblurring neural network module (containing two sub-networks of single image deblurring and adjacent frames fusion deblurring). The proposed approach’s main strategy is to design a blurred attention block to extract more effective features (especially for nighttime images) to restore the edges or details of objects. Additionally, the skip connection is introduced into such two sub-networks to improve the model’s ability to fuse contextual features across different layers to enhance the deblurring effect further. Quantitative evaluations demonstrate that our method achieves an average PSNR of 32.401 dB and SSIM of 0.9107, surpassing the next-best method by 1.635 dB in PSNR and 0.0381 in SSIM. Such improvements reveal the effectiveness of the proposed approach in addressing deblurring challenges across both daytime and nighttime scenarios, especially for making the alphanumeric characters in the really blurred nighttime images legible.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"50 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DMR $$^2$$ G: diffusion model for radiology report generation DMR $^$2$ G:放射学报告生成的扩散模型
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-16 DOI: 10.1007/s11042-024-20206-x
Huan Ouyang, Zheng Chang, Binghao Tang, Si Li

Radiology report generation aims to generate pathological assessments from given radiographic images accurately. Prior methods largely rely on autoregressive models, where the sequential token-by-token generation process always results in longer inference time and suffers from the sequential error accumulation. In order to enhance the efficiency of report generation without compromising diagnostic accuracy, we present a novel radiology report generation approach based on diffusion models. By integrating a graph-guided image feature extractor informed by a radiology knowledge graph, our model adeptly identifies critical abnormalities within images. We also introduce an auxiliary lesion classification loss mechanism using pseudo labels as supervision to align image features and textual disease keyword representations accurately. By adopting the accelerated sampling strategy inherent to diffusion models, our approach significantly reduces the inference time. Through comprehensive evaluation on the IU-Xray and MIMIC-CXR benchmarks, our approach outperforms autoregressive models in inference speed while maintaining high quality, offering a significant advancement in automating radiology report generation task.

放射报告生成的目的是根据给定的放射图像准确生成病理评估。先前的方法主要依赖于自回归模型,而逐个令牌的顺序生成过程总是会导致推理时间延长,并受到顺序误差累积的影响。为了在不影响诊断准确性的前提下提高报告生成效率,我们提出了一种基于扩散模型的新型放射学报告生成方法。通过集成由放射学知识图谱提供信息的图谱引导图像特征提取器,我们的模型能很好地识别图像中的关键异常。我们还引入了一种辅助病变分类损失机制,使用伪标签作为监督,使图像特征和文本疾病关键词表征准确一致。通过采用扩散模型固有的加速采样策略,我们的方法大大缩短了推理时间。通过对 IU-Xray 和 MIMIC-CXR 基准的全面评估,我们的方法在推理速度上优于自回归模型,同时保持了较高的质量,在放射学报告自动生成任务方面取得了重大进展。
{"title":"DMR $$^2$$ G: diffusion model for radiology report generation","authors":"Huan Ouyang, Zheng Chang, Binghao Tang, Si Li","doi":"10.1007/s11042-024-20206-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20206-x","url":null,"abstract":"<p>Radiology report generation aims to generate pathological assessments from given radiographic images accurately. Prior methods largely rely on autoregressive models, where the sequential token-by-token generation process always results in longer inference time and suffers from the sequential error accumulation. In order to enhance the efficiency of report generation without compromising diagnostic accuracy, we present a novel radiology report generation approach based on diffusion models. By integrating a graph-guided image feature extractor informed by a radiology knowledge graph, our model adeptly identifies critical abnormalities within images. We also introduce an auxiliary lesion classification loss mechanism using pseudo labels as supervision to align image features and textual disease keyword representations accurately. By adopting the accelerated sampling strategy inherent to diffusion models, our approach significantly reduces the inference time. Through comprehensive evaluation on the IU-Xray and MIMIC-CXR benchmarks, our approach outperforms autoregressive models in inference speed while maintaining high quality, offering a significant advancement in automating radiology report generation task.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"1 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Multimedia Tools and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1