Pub Date : 2024-09-18DOI: 10.1007/s11042-024-20227-6
José Salas-Cáceres, Javier Lorenzo-Navarro, David Freire-Obregón, Modesto Castrillón-Santana
In the Human-Machine Interactions (HMI) landscape, understanding user emotions is pivotal for elevating user experiences. This paper explores Facial Expression Recognition (FER) within HMI, employing a distinctive multimodal approach that integrates visual and auditory information. Recognizing the dynamic nature of HMI, where situations evolve, this study emphasizes continuous emotion analysis. This work assesses various fusion strategies that involve the addition to the main network of different architectures, such as autoencoders (AE) or an Embracement module, to combine the information of multiple biometric cues. In addition to the multimodal approach, this paper introduces a new architecture that prioritizes temporal dynamics by incorporating Long Short-Term Memory (LSTM) networks. The final proposal, which integrates different multimodal approaches with the temporal focus capabilities of the LSTM architecture, was tested across three public datasets: RAVDESS, SAVEE, and CREMA-D. It showcased state-of-the-art accuracy of 88.11%, 86.75%, and 80.27%, respectively, and outperformed other existing approaches.
{"title":"Multimodal emotion recognition based on a fusion of audiovisual information with temporal dynamics","authors":"José Salas-Cáceres, Javier Lorenzo-Navarro, David Freire-Obregón, Modesto Castrillón-Santana","doi":"10.1007/s11042-024-20227-6","DOIUrl":"https://doi.org/10.1007/s11042-024-20227-6","url":null,"abstract":"<p>In the Human-Machine Interactions (HMI) landscape, understanding user emotions is pivotal for elevating user experiences. This paper explores Facial Expression Recognition (FER) within HMI, employing a distinctive multimodal approach that integrates visual and auditory information. Recognizing the dynamic nature of HMI, where situations evolve, this study emphasizes continuous emotion analysis. This work assesses various fusion strategies that involve the addition to the main network of different architectures, such as autoencoders (AE) or an Embracement module, to combine the information of multiple biometric cues. In addition to the multimodal approach, this paper introduces a new architecture that prioritizes temporal dynamics by incorporating Long Short-Term Memory (LSTM) networks. The final proposal, which integrates different multimodal approaches with the temporal focus capabilities of the LSTM architecture, was tested across three public datasets: RAVDESS, SAVEE, and CREMA-D. It showcased state-of-the-art accuracy of 88.11%, 86.75%, and 80.27%, respectively, and outperformed other existing approaches.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"32 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-18DOI: 10.1007/s11042-024-20211-0
B. Keerthana, N. Raju
This article presents a rapidly converging optimization technique using a single parameter for designing non-uniform cosine modulated filter banks (CMFBS). The non-uniform cosine modulated filter banks are derived from closed-form uniform cosine modulated filter banks by merging the relevant bandpass filters based on given decimation factors. In this proposed method, the cut-off frequency of the prototype filter is varied through analytically calculated step size using control parameters so that the filter coefficients at quadrature frequency are approximately equal to 0.707 and the formulated objective function is satisfied with the prescribed tolerance. Simulation results demonstrate that the proposed algorithm achieves superior performance, with amplitude distortion levels significantly outperforming existing methods in the literature, reaching as low as 2.4483 × 10⁻4. For the prototype filter design, a constrained equiripple finite impulse response (FIR) digital filter is employed, with the roll-off factor and error ratio chosen based on a stopband attenuation, a passband attenuation and a filter order. The results highlight the proposed algorithm’s effectiveness for high-quality reconstruction of speech signals, particularly in speech coding and enhancement, as well as ECG signals. This makes the method highly versatile and suitable for various practical applications, including sub-band coding of real-time and near real-time signals.
{"title":"Improvised method for analysis and synthesis of NUFB for Speech and ECG signal applications","authors":"B. Keerthana, N. Raju","doi":"10.1007/s11042-024-20211-0","DOIUrl":"https://doi.org/10.1007/s11042-024-20211-0","url":null,"abstract":"<p>This article presents a rapidly converging optimization technique using a single parameter for designing non-uniform cosine modulated filter banks (CMFB<sub>S</sub>). The non-uniform cosine modulated filter banks are derived from closed-form uniform cosine modulated filter banks by merging the relevant bandpass filters based on given decimation factors. In this proposed method, the cut-off frequency of the prototype filter is varied through analytically calculated step size using control parameters so that the filter coefficients at quadrature frequency are approximately equal to 0.707 and the formulated objective function is satisfied with the prescribed tolerance. Simulation results demonstrate that the proposed algorithm achieves superior performance, with amplitude distortion levels significantly outperforming existing methods in the literature, reaching as low as 2.4483 × 10⁻<sup>4</sup>. For the prototype filter design, a constrained equiripple finite impulse response (FIR) digital filter is employed, with the roll-off factor and error ratio chosen based on a stopband attenuation, a passband attenuation and a filter order. The results highlight the proposed algorithm’s effectiveness for high-quality reconstruction of speech signals, particularly in speech coding and enhancement, as well as ECG signals. This makes the method highly versatile and suitable for various practical applications, including sub-band coding of real-time and near real-time signals.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"49 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-18DOI: 10.1007/s11042-024-20162-6
Michael Zingerenko, Elena Limonova, Vladimir V. Arlazarov
In this paper, we focus on the problem of text field segmentation in identity documents. These documents, characterized by their fixed layouts, present an opportunity to apply computationally efficient template-based algorithms. We consider the Dynamic Squeezeboxes Packing method and demonstrate its integration into document recognition systems, utilizing a single sample per document type. We benchmark text field segmentation on the MIDV-2019 public dataset using standard intersection-over-union and our custom intersection-over-template metrics, while also measuring processing time. We demonstrate that Dynamic Squeezeboxes Packing maintains competitive quality compared to text in the wild methods (EAST, CRAFT) and named-entity recognition method (LayoutLMv2). A significant advantage of this method is its processing speed, averaging 9 ms per image on the x86_64 platform, which is substantially faster than EAST (980 ms), CRAFT (2030 ms), and LayoutLMv2 (2210 ms). The obtained results suggest that the considered method has strong potential as a method in document image analysis, particularly for processing identity documents.
{"title":"Template-based text field segmentation for ID documents using dynamic squeezeboxes packing","authors":"Michael Zingerenko, Elena Limonova, Vladimir V. Arlazarov","doi":"10.1007/s11042-024-20162-6","DOIUrl":"https://doi.org/10.1007/s11042-024-20162-6","url":null,"abstract":"<p>In this paper, we focus on the problem of text field segmentation in identity documents. These documents, characterized by their fixed layouts, present an opportunity to apply computationally efficient template-based algorithms. We consider the Dynamic Squeezeboxes Packing method and demonstrate its integration into document recognition systems, utilizing a single sample per document type. We benchmark text field segmentation on the MIDV-2019 public dataset using standard intersection-over-union and our custom intersection-over-template metrics, while also measuring processing time. We demonstrate that Dynamic Squeezeboxes Packing maintains competitive quality compared to text in the wild methods (EAST, CRAFT) and named-entity recognition method (LayoutLMv2). A significant advantage of this method is its processing speed, averaging 9 ms per image on the x86_64 platform, which is substantially faster than EAST (980 ms), CRAFT (2030 ms), and LayoutLMv2 (2210 ms). The obtained results suggest that the considered method has strong potential as a method in document image analysis, particularly for processing identity documents.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"99 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-18DOI: 10.1007/s11042-024-20181-3
Pooja Pandey, Rashmi Gupta, Nidhi Goel
Foggy and hazy weather conditions are very common natural phenomenon which reduces the visibility of acquired outdoor pictures. Poor visibility creates innumerable problems in various facets of life viz. in tracking, surveillance and in many more fields. In this paper, an efficient feature based fusion technique has been used to enhance the single foggy image at transmission level. Fusion at this level retains most significant features of foggy image and using this fused single input at transmission level, output defog image is calculated. Proposed methodology overcomes the shortcoming of existing Dark Channel Prior and Bright Channel Prior methods.Output of proposed method shows promising result for all types of datasets varying in fog density as well as in size. The foremost major advantage of this method is that it does not require any pre-processing or post processing and thus, very simple to implement.
{"title":"Enhancement of single foggy image using feature based fusion technique","authors":"Pooja Pandey, Rashmi Gupta, Nidhi Goel","doi":"10.1007/s11042-024-20181-3","DOIUrl":"https://doi.org/10.1007/s11042-024-20181-3","url":null,"abstract":"<p>Foggy and hazy weather conditions are very common natural phenomenon which reduces the visibility of acquired outdoor pictures. Poor visibility creates innumerable problems in various facets of life <i>viz</i>. in tracking, surveillance and in many more fields. In this paper, an efficient feature based fusion technique has been used to enhance the single foggy image at transmission level. Fusion at this level retains most significant features of foggy image and using this fused single input at transmission level, output defog image is calculated. Proposed methodology overcomes the shortcoming of existing Dark Channel Prior and Bright Channel Prior methods.Output of proposed method shows promising result for all types of datasets varying in fog density as well as in size. The foremost major advantage of this method is that it does not require any pre-processing or post processing and thus, very simple to implement.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"50 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-17DOI: 10.1007/s11042-024-20092-3
Rajiv Kumar Mishra, Rajesh Kumar Yadav, Prem Nath
The immense volume of data generated and collected by smart devices has significantly enhanced various aspects of our daily lives. However, safeguarding the sensitive information shared among these devices is crucial. Ensuring the security of the Internet of Things (IoT) ecosystem from unauthorized access is imperative. Blockchain technology emerges as a promising solution to address these security concerns. Nevertheless, the effectiveness of Blockchain in handling the extensive data generated by smart devices is challenged by the rapid pace of IoT data generation and the slower transaction validation speed within Blockchain networks. This research aims to resolve these issues by integrating Blockchain with the Inter-Planetary File System (IPFS), creating a robust framework for secure data recording on a distributed storage network while enabling authorized access to the stored data. The proposed mechanism involves defining and recording access policies and cryptographic hash content on the Blockchain network, while storing the actual IoT-generated data on IPFS to enhance the confidentiality, integrity, and availability (CIA) triad. Performance assessments of the proposed scheme demonstrate its security and practicality, validating its potential for real-world application.
{"title":"Integration of Blockchain and IPFS: healthcare data management & sharing for IoT Environment","authors":"Rajiv Kumar Mishra, Rajesh Kumar Yadav, Prem Nath","doi":"10.1007/s11042-024-20092-3","DOIUrl":"https://doi.org/10.1007/s11042-024-20092-3","url":null,"abstract":"<p>The immense volume of data generated and collected by smart devices has significantly enhanced various aspects of our daily lives. However, safeguarding the sensitive information shared among these devices is crucial. Ensuring the security of the Internet of Things (IoT) ecosystem from unauthorized access is imperative. Blockchain technology emerges as a promising solution to address these security concerns. Nevertheless, the effectiveness of Blockchain in handling the extensive data generated by smart devices is challenged by the rapid pace of IoT data generation and the slower transaction validation speed within Blockchain networks. This research aims to resolve these issues by integrating Blockchain with the Inter-Planetary File System (IPFS), creating a robust framework for secure data recording on a distributed storage network while enabling authorized access to the stored data. The proposed mechanism involves defining and recording access policies and cryptographic hash content on the Blockchain network, while storing the actual IoT-generated data on IPFS to enhance the confidentiality, integrity, and availability (CIA) triad. Performance assessments of the proposed scheme demonstrate its security and practicality, validating its potential for real-world application.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"1 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-17DOI: 10.1007/s11042-024-19909-y
Janani Varun, R A Karthika
All the software products developed will need testing to ensure the quality and accuracy of the product. It makes the life of testers much easier when they can optimize on the effort spent and predict defects for the upcoming modules in the Agile era. The functionality being discussed in this paper is to predict the defects using Random Forest Algorithm. Predictive analytics draws on information from the past to create forecasts about the outcomes of future events. Product team always have the difficulty in delivering the product as per schedule. As we are in the agile era, the requirement keeps changing and team is unsure on upcoming releases. Prediction helps the team to focus on the complex and error prone modules in upcoming releases. The Predictive analytics model designed, can predict defects with an accuracy rate of 88% with the help of historical data. By predicting, testers can focus on the module where there are a greater number of defects predicted by the model and left shift the delivery.
{"title":"Improving agility in projects using machine learning algorithm","authors":"Janani Varun, R A Karthika","doi":"10.1007/s11042-024-19909-y","DOIUrl":"https://doi.org/10.1007/s11042-024-19909-y","url":null,"abstract":"<p>All the software products developed will need testing to ensure the quality and accuracy of the product. It makes the life of testers much easier when they can optimize on the effort spent and predict defects for the upcoming modules in the Agile era. The functionality being discussed in this paper is to predict the defects using Random Forest Algorithm. Predictive analytics draws on information from the past to create forecasts about the outcomes of future events. Product team always have the difficulty in delivering the product as per schedule. As we are in the agile era, the requirement keeps changing and team is unsure on upcoming releases. Prediction helps the team to focus on the complex and error prone modules in upcoming releases. The Predictive analytics model designed, can predict defects with an accuracy rate of 88% with the help of historical data. By predicting, testers can focus on the module where there are a greater number of defects predicted by the model and left shift the delivery.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"19 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-17DOI: 10.1007/s11042-024-20228-5
Md Reazul Islam, Khondokar Oliullah, Mohsin Kabir, Ashifur Rahman, M. F. Mridha, Muhammed Fayyaz Khan, Nilanjan Dey
Sexual harassment is an all-encompassing problem that affects individuals in diverse environments including educational institutions, workplaces, and public areas. Despite increased awareness and advocacy efforts, many women continue to face harassment daily, especially on the Indian sub-continent, with underreporting and impunity exacerbating the problem. As technology advances, there is a growing opportunity to use innovative solutions to address this problem. In recent years, the Internet of Things (IoT) and machine learning have emerged as promising technologies for developing systems that can detect and prevent sexual harassment in real-time. This study presents a novel approach for real-time sexual harassment monitoring using a machine learning-based IoT system. The system incorporates nine force-sensitive resistors strategically embedded in women’s dresses to capture relevant data. It is portable and can be affixed to any type of dressing. If the user wishes to change their attire, the system can be easily removed from the current dress and attached to another dress of choice. This flexibility allows users to adapt the system to suit various clothing preferences and styles. The sensor data are transmitted to the cloud via the NodeMCU, enabling continuous monitoring. In the cloud, a pre-trained machine learning model, specifically the AdaBoost classifier, was employed to classify incoming data in real time. We applied four ML methods: RF with GridSearchCV, Bagging Classifier, XGBoost, and Adaboost Classifier. The AdaBoost classifier performed best with an accuracy of 99.3% using a dataset prepared by our lab, which consists of 1048 instances and was collected from 50 students. If a sexual harassment event is detected, an alert is generated through a mobile application and promptly sent to appropriate authorities for immediate action to save the victim. By integrating wearable sensors, IoT technology, and machine learning, this system offers a proactive and efficient approach, especially in uncertain situations, to detect and address sexual harassment incidents and enhance safety and security in various settings.
性骚扰是一个全方位的问题,影响着教育机构、工作场所和公共场所等各种环境中的个人。尽管人们的意识和宣传力度有所提高,但许多妇女仍然每天面临骚扰,尤其是在印度次大陆,报告不足和有罪不罚现象使问题更加严重。随着技术的进步,利用创新解决方案解决这一问题的机会越来越多。近年来,物联网(IoT)和机器学习已成为开发实时检测和预防性骚扰系统的有前途的技术。本研究提出了一种利用基于机器学习的物联网系统对性骚扰进行实时监控的新方法。该系统将九个力敏电阻器战略性地嵌入女性的裙子中,以捕捉相关数据。它便于携带,可贴在任何类型的衣服上。如果用户想更换服装,可以轻松地将系统从当前的衣服上取下,然后贴到另一件衣服上。这种灵活性使用户可以调整系统,以适应各种服装偏好和风格。传感器数据通过 NodeMCU 传输到云端,实现持续监测。在云端,我们采用了一个预先训练好的机器学习模型,特别是 AdaBoost 分类器,对接收到的数据进行实时分类。我们采用了四种 ML 方法:RF with GridSearchCV、Bagging Classifier、XGBoost 和 Adaboost Classifier。AdaBoost 分类器表现最佳,在使用我们实验室准备的数据集时,准确率达到 99.3%,该数据集由 1048 个实例组成,收集自 50 名学生。如果检测到性骚扰事件,就会通过移动应用程序发出警报,并迅速发送给相关部门,以便立即采取行动拯救受害者。通过整合可穿戴传感器、物联网技术和机器学习,该系统提供了一种积极有效的方法,尤其是在不确定的情况下,以检测和处理性骚扰事件,并加强各种环境中的安全和安保。
{"title":"Machine learning-driven IoT device for women’s safety: a real-time sexual harassment prevention system","authors":"Md Reazul Islam, Khondokar Oliullah, Mohsin Kabir, Ashifur Rahman, M. F. Mridha, Muhammed Fayyaz Khan, Nilanjan Dey","doi":"10.1007/s11042-024-20228-5","DOIUrl":"https://doi.org/10.1007/s11042-024-20228-5","url":null,"abstract":"<p>Sexual harassment is an all-encompassing problem that affects individuals in diverse environments including educational institutions, workplaces, and public areas. Despite increased awareness and advocacy efforts, many women continue to face harassment daily, especially on the Indian sub-continent, with underreporting and impunity exacerbating the problem. As technology advances, there is a growing opportunity to use innovative solutions to address this problem. In recent years, the Internet of Things (IoT) and machine learning have emerged as promising technologies for developing systems that can detect and prevent sexual harassment in real-time. This study presents a novel approach for real-time sexual harassment monitoring using a machine learning-based IoT system. The system incorporates nine force-sensitive resistors strategically embedded in women’s dresses to capture relevant data. It is portable and can be affixed to any type of dressing. If the user wishes to change their attire, the system can be easily removed from the current dress and attached to another dress of choice. This flexibility allows users to adapt the system to suit various clothing preferences and styles. The sensor data are transmitted to the cloud via the NodeMCU, enabling continuous monitoring. In the cloud, a pre-trained machine learning model, specifically the AdaBoost classifier, was employed to classify incoming data in real time. We applied four ML methods: RF with GridSearchCV, Bagging Classifier, XGBoost, and Adaboost Classifier. The AdaBoost classifier performed best with an accuracy of 99.3% using a dataset prepared by our lab, which consists of 1048 instances and was collected from 50 students. If a sexual harassment event is detected, an alert is generated through a mobile application and promptly sent to appropriate authorities for immediate action to save the victim. By integrating wearable sensors, IoT technology, and machine learning, this system offers a proactive and efficient approach, especially in uncertain situations, to detect and address sexual harassment incidents and enhance safety and security in various settings.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"7 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-17DOI: 10.1007/s11042-024-20180-4
Ali Mehrizi, Hadi Sadoghi Yazdi
This paper proposes a novel approach to enhancing multi-target tracking of vehicles in videos with frequent camera occlusions. Our method integrates prior knowledge about vehicle behavior into a Gaussian Mixture Probability Hypothesis Density (GMPHD) filter framework. This knowledge, extracted as a knowledge graph from historical vehicle trajectories, allows the tracker to maintain persistence even during significant interruptions. The knowledge graph models expected movement patterns and generates pseudo-observations during occlusions, similar to how time series analysis leverages historical data for forecasting. We evaluate the proposed method on both simulated and real-world video datasets using the Optimal Sub Pattern Assignment (OSPA) metric, which assesses tracking accuracy. The results show a 19.5% improvement for simulated data and a 16.5% improvement for real-world video data under fully occluded conditions, demonstrating a significant enhancement in performance.
{"title":"Enhancing multi-target tracking stability using knowledge graph integration within the Gaussian Mixture Probability Hypothesis Density Filter","authors":"Ali Mehrizi, Hadi Sadoghi Yazdi","doi":"10.1007/s11042-024-20180-4","DOIUrl":"https://doi.org/10.1007/s11042-024-20180-4","url":null,"abstract":"<p> This paper proposes a novel approach to enhancing multi-target tracking of vehicles in videos with frequent camera occlusions. Our method integrates prior knowledge about vehicle behavior into a Gaussian Mixture Probability Hypothesis Density (GMPHD) filter framework. This knowledge, extracted as a knowledge graph from historical vehicle trajectories, allows the tracker to maintain persistence even during significant interruptions. The knowledge graph models expected movement patterns and generates pseudo-observations during occlusions, similar to how time series analysis leverages historical data for forecasting. We evaluate the proposed method on both simulated and real-world video datasets using the Optimal Sub Pattern Assignment (OSPA) metric, which assesses tracking accuracy. The results show a 19.5% improvement for simulated data and a 16.5% improvement for real-world video data under fully occluded conditions, demonstrating a significant enhancement in performance.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"16 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motion-blurred images are usually generated when captured with a handheld or wearable video camera, owing to rapid movement of the camera or foreground (i.e., moving object captured). Most traditional algorithm-based approaches cannot effectively restore the nonlinear motion-blurred images. Deep learning network-based approaches with intensive computations have recently been developed for deblurring blind motion-blurred images. However, they still achieve limited effect in restoring the details of the images, especially for blurred nighttime images. To effectively deblur the blurred daytime and nighttime images, the proposed video deblurring method consists of three major parts: an image storage module (storing the previous deblurred frame), adjacent frames alignment module (performing optimal feature point selection and perspective transformation matrix), and video-deblurring neural network module (containing two sub-networks of single image deblurring and adjacent frames fusion deblurring). The proposed approach’s main strategy is to design a blurred attention block to extract more effective features (especially for nighttime images) to restore the edges or details of objects. Additionally, the skip connection is introduced into such two sub-networks to improve the model’s ability to fuse contextual features across different layers to enhance the deblurring effect further. Quantitative evaluations demonstrate that our method achieves an average PSNR of 32.401 dB and SSIM of 0.9107, surpassing the next-best method by 1.635 dB in PSNR and 0.0381 in SSIM. Such improvements reveal the effectiveness of the proposed approach in addressing deblurring challenges across both daytime and nighttime scenarios, especially for making the alphanumeric characters in the really blurred nighttime images legible.
在使用手持或可穿戴摄像机拍摄时,由于摄像机或前景(即拍摄到的移动物体)的快速移动,通常会产生运动模糊图像。大多数基于传统算法的方法无法有效还原非线性运动模糊图像。最近,人们开发出了基于深度学习网络的方法,这种方法计算量大,可用于消除盲运动模糊图像。然而,这些方法在恢复图像细节方面的效果仍然有限,尤其是对于模糊的夜间图像。为了有效地对白天和夜间的模糊图像进行去模糊,所提出的视频去模糊方法由三大部分组成:图像存储模块(存储上一帧去模糊图像)、相邻帧配准模块(执行最佳特征点选择和透视变换矩阵)和视频去模糊神经网络模块(包含单幅图像去模糊和相邻帧融合去模糊两个子网络)。所提方法的主要策略是设计一个模糊注意力区块,以提取更有效的特征(尤其是夜间图像),从而还原物体的边缘或细节。此外,还在这两个子网络中引入了跳转连接,以提高模型融合不同层上下文特征的能力,从而进一步增强去模糊效果。定量评估结果表明,我们的方法实现了 32.401 dB 的平均 PSNR 和 0.9107 的 SSIM,在 PSNR 和 SSIM 方面分别超过次优方法 1.635 dB 和 0.0381 dB。这些改进揭示了所提出的方法在解决白天和夜间场景中的去模糊难题方面的有效性,特别是在使真正模糊的夜间图像中的字母数字字符清晰可辨方面。
{"title":"Effective video deblurring based on feature-enhanced deep learning network for daytime and nighttime images","authors":"Deng-Yuan Huang, Chao-Ho Chen, Tsong-Yi Chen, Jia-En Li, Hsueh-Liang Hsiao, Da-Jinn Wang, Cheng-Kang Wen","doi":"10.1007/s11042-024-20222-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20222-x","url":null,"abstract":"<p>Motion-blurred images are usually generated when captured with a handheld or wearable video camera, owing to rapid movement of the camera or foreground (i.e., moving object captured). Most traditional algorithm-based approaches cannot effectively restore the nonlinear motion-blurred images. Deep learning network-based approaches with intensive computations have recently been developed for deblurring blind motion-blurred images. However, they still achieve limited effect in restoring the details of the images, especially for blurred nighttime images. To effectively deblur the blurred daytime and nighttime images, the proposed video deblurring method consists of three major parts: an image storage module (storing the previous deblurred frame), adjacent frames alignment module (performing optimal feature point selection and perspective transformation matrix), and video-deblurring neural network module (containing two sub-networks of single image deblurring and adjacent frames fusion deblurring). The proposed approach’s main strategy is to design a blurred attention block to extract more effective features (especially for nighttime images) to restore the edges or details of objects. Additionally, the skip connection is introduced into such two sub-networks to improve the model’s ability to fuse contextual features across different layers to enhance the deblurring effect further. Quantitative evaluations demonstrate that our method achieves an average PSNR of 32.401 dB and SSIM of 0.9107, surpassing the next-best method by 1.635 dB in PSNR and 0.0381 in SSIM. Such improvements reveal the effectiveness of the proposed approach in addressing deblurring challenges across both daytime and nighttime scenarios, especially for making the alphanumeric characters in the really blurred nighttime images legible.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"50 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1007/s11042-024-20206-x
Huan Ouyang, Zheng Chang, Binghao Tang, Si Li
Radiology report generation aims to generate pathological assessments from given radiographic images accurately. Prior methods largely rely on autoregressive models, where the sequential token-by-token generation process always results in longer inference time and suffers from the sequential error accumulation. In order to enhance the efficiency of report generation without compromising diagnostic accuracy, we present a novel radiology report generation approach based on diffusion models. By integrating a graph-guided image feature extractor informed by a radiology knowledge graph, our model adeptly identifies critical abnormalities within images. We also introduce an auxiliary lesion classification loss mechanism using pseudo labels as supervision to align image features and textual disease keyword representations accurately. By adopting the accelerated sampling strategy inherent to diffusion models, our approach significantly reduces the inference time. Through comprehensive evaluation on the IU-Xray and MIMIC-CXR benchmarks, our approach outperforms autoregressive models in inference speed while maintaining high quality, offering a significant advancement in automating radiology report generation task.
{"title":"DMR $$^2$$ G: diffusion model for radiology report generation","authors":"Huan Ouyang, Zheng Chang, Binghao Tang, Si Li","doi":"10.1007/s11042-024-20206-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20206-x","url":null,"abstract":"<p>Radiology report generation aims to generate pathological assessments from given radiographic images accurately. Prior methods largely rely on autoregressive models, where the sequential token-by-token generation process always results in longer inference time and suffers from the sequential error accumulation. In order to enhance the efficiency of report generation without compromising diagnostic accuracy, we present a novel radiology report generation approach based on diffusion models. By integrating a graph-guided image feature extractor informed by a radiology knowledge graph, our model adeptly identifies critical abnormalities within images. We also introduce an auxiliary lesion classification loss mechanism using pseudo labels as supervision to align image features and textual disease keyword representations accurately. By adopting the accelerated sampling strategy inherent to diffusion models, our approach significantly reduces the inference time. Through comprehensive evaluation on the IU-Xray and MIMIC-CXR benchmarks, our approach outperforms autoregressive models in inference speed while maintaining high quality, offering a significant advancement in automating radiology report generation task.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"1 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}