首页 > 最新文献

Multimedia Tools and Applications最新文献

英文 中文
Negotiation strategies in ubiquitous human-computer interaction: a novel storyboards scale & field study 无处不在的人机交互中的谈判策略:一种新颖的故事板规模和实地研究
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-19 DOI: 10.1007/s11042-024-20240-9
Sofia Yfantidou, Georgia Yfantidou, Panagiota Balaska, Athena Vakali

In today’s connected society, self-tracking technologies (STTs), such as wearables and mobile fitness apps, empower humans to improve their health and well-being through ubiquitous physical activity monitoring, with several personal and societal benefits. Despite the advances in such technologies’ hardware, low user engagement and decreased effectiveness limitations demand more informed and theoretically-founded Human-Computer Interaction designs. To address these challenges, we build upon the previously unexplored Leisure Constraints Negotiation Model and the Transtheoretical Model to systematically define and assess the effectiveness of STTs’ features that acknowledge users’ contextual constraints and establish human-negotiated STTs narratives. Specifically, we introduce and validate a human-centric scale, StoryWear, which exploits and explores eleven dimensions of negotiation strategies that humans utilize to overcome constraints regarding exercise participation, captured through an inclusive storyboards format. Based on our preliminary studies, StoryWear shows high reliability, rendering it suitable for future work in ubiquitous computing. Our results indicate that negotiation strategies vary in perceived effectiveness and have higher appeal for existing STTs’ users, with self-motivation, commitment, and understanding of the negative impact of non-exercise placed at the top. Finally, we give actionable guidelines for real-world implementation and a commentary on the future of personalized training.

在当今的互联社会中,可穿戴设备和移动健身应用程序等自我跟踪技术(STTs)通过无处不在的体力活动监测,使人类能够改善自身的健康和福祉,从而为个人和社会带来诸多益处。尽管此类技术的硬件不断进步,但用户参与度低和有效性下降的局限性要求我们在人机交互设计中采用更有依据和理论基础的方法。为了应对这些挑战,我们在以前未曾探索过的休闲约束协商模型和跨理论模型的基础上,系统地定义和评估了 STTs 功能的有效性,这些功能承认用户的情境约束,并建立了由人类协商的 STTs 叙事。具体来说,我们引入并验证了一种以人为本的量表--StoryWear,该量表利用并探索了人类用来克服运动参与限制的 11 个谈判策略维度,这些策略通过一种包容性的故事板格式记录下来。根据我们的初步研究,StoryWear 显示出很高的可靠性,使其适用于未来的泛在计算工作。我们的研究结果表明,协商策略的感知效果各不相同,对现有 STT 用户的吸引力较高,其中自我激励、承诺和对不锻炼负面影响的理解排在首位。最后,我们给出了在现实世界中实施的可行指南,并对个性化培训的未来进行了评论。
{"title":"Negotiation strategies in ubiquitous human-computer interaction: a novel storyboards scale & field study","authors":"Sofia Yfantidou, Georgia Yfantidou, Panagiota Balaska, Athena Vakali","doi":"10.1007/s11042-024-20240-9","DOIUrl":"https://doi.org/10.1007/s11042-024-20240-9","url":null,"abstract":"<p>In today’s connected society, self-tracking technologies (STTs), such as wearables and mobile fitness apps, empower humans to improve their health and well-being through ubiquitous physical activity monitoring, with several personal and societal benefits. Despite the advances in such technologies’ hardware, low user engagement and decreased effectiveness limitations demand more informed and theoretically-founded Human-Computer Interaction designs. To address these challenges, we build upon the previously unexplored Leisure Constraints Negotiation Model and the Transtheoretical Model to systematically define and assess the effectiveness of STTs’ features that acknowledge users’ contextual constraints and establish human-negotiated STTs narratives. Specifically, we introduce and validate a human-centric scale, StoryWear, which exploits and explores eleven dimensions of negotiation strategies that humans utilize to overcome constraints regarding exercise participation, captured through an inclusive storyboards format. Based on our preliminary studies, StoryWear shows high reliability, rendering it suitable for future work in ubiquitous computing. Our results indicate that negotiation strategies vary in perceived effectiveness and have higher appeal for existing STTs’ users, with self-motivation, commitment, and understanding of the negative impact of non-exercise placed at the top. Finally, we give actionable guidelines for real-world implementation and a commentary on the future of personalized training.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unified pre-training with pseudo infrared images for visible-infrared person re-identification 利用伪红外图像进行统一预训练,实现可见光-红外人员再识别
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-19 DOI: 10.1007/s11042-024-20217-8
ZhiGang Liu, Yan Hu

In the pre-training task of visible-infrared person re-identification(VI-ReID), two main challenges arise: i) Domain disparities. A significant domain gap exists between the ImageNet utilized in public pre-trained models and the specific person data in the VI-ReID task. ii) Insufficient sample. Due to the challenge of gathering cross-modal paired samples, there is currently a scarcity of large-scale datasets suitable for pretraininge. To address the aforementioned issues, we propose a new unified pre-training framework (UPPI). Firstly, we established a large-scale visible-pseudo infrared paired sample repository (UnitCP) based on the existing visible person dataset, encompassing nearly 170,000 sample pairs. Benefiting from this repository, not only are training samples significantly expanded, but pre-training on this foundation also effectively bridges the domain disparities. Simultaneously, to fully harness the potential of the repository, we devised an innovative feature fusion mechanism(CF(^2)) during pre-training. It leverages redundant features present in the paired images to steer the model towards cross-modal feature fusion. In addition, during fine-tuning, to adapt the model to datasets lacking paired images, we introduced a center contrast loss(C(^2)). This loss guides the model to prioritize cross-modal features with consistent identities. Extensive experimental results on two standard benchmarks (SYSU-MM01 and RegDB) demonstrate that the proposed UPPI performs favorably against state-of-the-art methods.

在可见光-红外人员再识别(VI-ReID)的预训练任务中,存在两个主要挑战: i) 领域差异。公共预训练模型中使用的 ImageNet 与 VI-ReID 任务中的具体人物数据之间存在明显的领域差距。由于收集跨模态配对样本的难度很大,目前适合预训练的大规模数据集非常稀缺。针对上述问题,我们提出了一种新的统一预训练框架(UPPI)。首先,我们在现有可见光人像数据集的基础上建立了一个大规模可见光-伪红外配对样本库(UnitCP),包含近 17 万对样本。得益于这个样本库,不仅训练样本得到大幅扩充,而且在此基础上进行的预训练也有效弥合了领域差异。同时,为了充分利用样本库的潜力,我们在预训练过程中设计了一种创新的特征融合机制(CF/(^2))。它利用配对图像中的冗余特征引导模型进行跨模态特征融合。此外,在微调过程中,为了使模型适应缺乏配对图像的数据集,我们引入了中心对比度损失(C/(^2))。这种损失会引导模型优先考虑具有一致特征的跨模态特征。在两个标准基准(SYSU-MM01 和 RegDB)上的广泛实验结果表明,与最先进的方法相比,所提出的 UPPI 性能更优。
{"title":"Unified pre-training with pseudo infrared images for visible-infrared person re-identification","authors":"ZhiGang Liu, Yan Hu","doi":"10.1007/s11042-024-20217-8","DOIUrl":"https://doi.org/10.1007/s11042-024-20217-8","url":null,"abstract":"<p>In the pre-training task of visible-infrared person re-identification(VI-ReID), two main challenges arise: i) Domain disparities. A significant domain gap exists between the ImageNet utilized in public pre-trained models and the specific person data in the VI-ReID task. ii) Insufficient sample. Due to the challenge of gathering cross-modal paired samples, there is currently a scarcity of large-scale datasets suitable for pretraininge. To address the aforementioned issues, we propose a new unified pre-training framework (UPPI). Firstly, we established a large-scale visible-pseudo infrared paired sample repository (UnitCP) based on the existing visible person dataset, encompassing nearly 170,000 sample pairs. Benefiting from this repository, not only are training samples significantly expanded, but pre-training on this foundation also effectively bridges the domain disparities. Simultaneously, to fully harness the potential of the repository, we devised an innovative feature fusion mechanism(CF<span>(^2)</span>) during pre-training. It leverages redundant features present in the paired images to steer the model towards cross-modal feature fusion. In addition, during fine-tuning, to adapt the model to datasets lacking paired images, we introduced a center contrast loss(C<span>(^2)</span>). This loss guides the model to prioritize cross-modal features with consistent identities. Extensive experimental results on two standard benchmarks (SYSU-MM01 and RegDB) demonstrate that the proposed UPPI performs favorably against state-of-the-art methods.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid golden jackal fusion based recommendation system for spatio-temporal transportation's optimal traffic congestion and road condition classification 基于混合金豺融合的时空交通最佳交通拥堵和路况分类推荐系统
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-19 DOI: 10.1007/s11042-024-20133-x
Tukaram K. Gawali, Shailesh S. Deore

Traffic congestion, influenced by varying traffic density levels, remains a critical challenge in transportation management, significantly impacting efficiency and safety. This research addresses these challenges by proposing an Enhanced Hybrid Golden Jackal (EGJ) fusion-based recommendation system for optimal traffic congestion and road condition categorization. In the first phase, road vehicle images are processed using Enhanced Geodesic Filtering (EGF) to classify traffic density as heterogeneous or homogeneous across heavy, medium and light flows using Enhanced Consolidated Convolutional Neural Network (ECNN). Simultaneously, text data from road safety datasets undergo preprocessing through crisp data conversion, splitting and normalization techniques. This data is then categorized into weather conditions, speed, highway conditions, rural/urban settings and light conditions using Adaptive Drop Block Enhanced Generative Adversarial Networks (ADGAN). In the third phase, the EGJ fusion method integrates outputs from ECNN and ADGAN classifiers to enhance classification accuracy and robustness. The proposed approach addresses challenges like accurately assessing traffic density variations and optimizing traffic flow in historical pattern scenarios. The simulation outcomes establish the efficiency of the EGJ fusion-based system, achieving significant performance metrics. Specifically, the system achieves 98% accuracy, 99.1% precision and 98.2% F1-Score in traffic density and road condition classification tasks. Additionally, error performance like mean absolute error of 0.043, root mean square error of 0.05 and mean absolute percentage error of 0.148 further validate the robustness and accuracy of the introduced approach.

受不同交通密度水平的影响,交通拥堵仍然是交通管理中的一个重要挑战,严重影响了效率和安全。本研究针对这些挑战,提出了一种基于增强混合金豺(EGJ)融合的推荐系统,用于优化交通拥堵和道路状况分类。在第一阶段,使用增强型大地滤波(EGF)对道路车辆图像进行处理,并使用增强型综合卷积神经网络(ECNN)将交通密度划分为重型、中型和轻型车流的异构或同构。同时,通过清晰的数据转换、分割和归一化技术,对来自道路安全数据集的文本数据进行预处理。然后,使用自适应降块增强生成对抗网络(ADGAN)将这些数据按天气条件、车速、高速公路状况、农村/城市环境和光照条件进行分类。在第三阶段,EGJ 融合方法整合了 ECNN 和 ADGAN 分类器的输出,以提高分类的准确性和鲁棒性。所提出的方法解决了在历史模式场景中准确评估交通密度变化和优化交通流量等难题。模拟结果证明了基于 EGJ 融合的系统的效率,实现了显著的性能指标。具体而言,该系统在交通密度和路况分类任务中实现了 98% 的准确率、99.1% 的精确度和 98.2% 的 F1 分数。此外,平均绝对误差 0.043、均方根误差 0.05 和平均绝对百分比误差 0.148 等误差性能进一步验证了所引入方法的鲁棒性和准确性。
{"title":"Hybrid golden jackal fusion based recommendation system for spatio-temporal transportation's optimal traffic congestion and road condition classification","authors":"Tukaram K. Gawali, Shailesh S. Deore","doi":"10.1007/s11042-024-20133-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20133-x","url":null,"abstract":"<p>Traffic congestion, influenced by varying traffic density levels, remains a critical challenge in transportation management, significantly impacting efficiency and safety. This research addresses these challenges by proposing an Enhanced Hybrid Golden Jackal (EGJ) fusion-based recommendation system for optimal traffic congestion and road condition categorization. In the first phase, road vehicle images are processed using Enhanced Geodesic Filtering (EGF) to classify traffic density as heterogeneous or homogeneous across heavy, medium and light flows using Enhanced Consolidated Convolutional Neural Network (ECNN). Simultaneously, text data from road safety datasets undergo preprocessing through crisp data conversion, splitting and normalization techniques. This data is then categorized into weather conditions, speed, highway conditions, rural/urban settings and light conditions using Adaptive Drop Block Enhanced Generative Adversarial Networks (ADGAN). In the third phase, the EGJ fusion method integrates outputs from ECNN and ADGAN classifiers to enhance classification accuracy and robustness. The proposed approach addresses challenges like accurately assessing traffic density variations and optimizing traffic flow in historical pattern scenarios. The simulation outcomes establish the efficiency of the EGJ fusion-based system, achieving significant performance metrics. Specifically, the system achieves 98% accuracy, 99.1% precision and 98.2% F1-Score in traffic density and road condition classification tasks. Additionally, error performance like mean absolute error of 0.043, root mean square error of 0.05 and mean absolute percentage error of 0.148 further validate the robustness and accuracy of the introduced approach.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification and location monitoring through Live video Streaming by using blockchain 利用区块链通过实时视频流进行身份识别和位置监控
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-19 DOI: 10.1007/s11042-024-20197-9
Sana Zeba, Mohammad Amjad

Surveillance through video surveillance is the basis for the increasing demand for security. Users who are capable can manipulate video images, timestamps, and camera settings digitally; they can also physically manipulate camera locations, orientation, and mechanical settings. Advanced video manipulation techniques can easily alter cameras and videos, which are essential for criminal investigations. To ensure security, it is necessary to increase the level of security for the camera and video data. Blockchain technology has gained a lot of attention in the last decade due to its ability to create trust between users without the use of third-party intermediaries, which allows for many applications. Our goal is to create a CCTV camera system that utilizes blockchain technology to guarantee the reliability of video or image data. The truthfulness of stored data can be confirmed by authorities using blockchain technology, which enables data creation and storage in a distributed manner. The workflow of tracking and blockchain storage to secure data was discussed for security purposes. Develop an algorithm that synchronizes all updated criminal records of all users with IoT devices. Our final step involved calculating the accuracy of tracking the recognized face in diverse datasets with different resolutions and assessing the efficiency of the location being tracked. The accuracy of recognition has changed depending on the resolution. Low-resolution datasets have more accuracy than high-resolution datasets. According to the analysis, the system's average accuracy is 98.5%, and its tracking efficiency is 99%. In addition, smart devices in various locations can take actions on specific individuals according to the distributed blockchain server storage.

视频监控是安全需求不断增长的基础。有能力的用户可以通过数字方式操作视频图像、时间戳和摄像机设置,也可以通过物理方式操作摄像机的位置、方向和机械设置。先进的视频操纵技术可以轻易改变摄像机和视频,这对刑事调查至关重要。为确保安全,有必要提高摄像机和视频数据的安全级别。区块链技术在过去十年中备受关注,因为它能够在不使用第三方中介的情况下在用户之间建立信任,从而实现多种应用。我们的目标是创建一个利用区块链技术保证视频或图像数据可靠性的闭路电视摄像系统。当局可以利用区块链技术确认所存储数据的真实性,从而以分布式方式创建和存储数据。为了安全起见,讨论了追踪和区块链存储以确保数据安全的工作流程。开发一种算法,用物联网设备同步所有用户的所有更新犯罪记录。我们的最后一步是计算在不同分辨率的数据集中跟踪识别人脸的准确性,并评估被跟踪位置的效率。识别准确率随分辨率的不同而变化。低分辨率数据集的准确率高于高分辨率数据集。根据分析,系统的平均准确率为 98.5%,追踪效率为 99%。此外,不同地点的智能设备可以根据分布式区块链服务器存储对特定个人采取行动。
{"title":"Identification and location monitoring through Live video Streaming by using blockchain","authors":"Sana Zeba, Mohammad Amjad","doi":"10.1007/s11042-024-20197-9","DOIUrl":"https://doi.org/10.1007/s11042-024-20197-9","url":null,"abstract":"<p>Surveillance through video surveillance is the basis for the increasing demand for security. Users who are capable can manipulate video images, timestamps, and camera settings digitally; they can also physically manipulate camera locations, orientation, and mechanical settings. Advanced video manipulation techniques can easily alter cameras and videos, which are essential for criminal investigations. To ensure security, it is necessary to increase the level of security for the camera and video data. Blockchain technology has gained a lot of attention in the last decade due to its ability to create trust between users without the use of third-party intermediaries, which allows for many applications. Our goal is to create a CCTV camera system that utilizes blockchain technology to guarantee the reliability of video or image data. The truthfulness of stored data can be confirmed by authorities using blockchain technology, which enables data creation and storage in a distributed manner. The workflow of tracking and blockchain storage to secure data was discussed for security purposes. Develop an algorithm that synchronizes all updated criminal records of all users with IoT devices. Our final step involved calculating the accuracy of tracking the recognized face in diverse datasets with different resolutions and assessing the efficiency of the location being tracked. The accuracy of recognition has changed depending on the resolution. Low-resolution datasets have more accuracy than high-resolution datasets. According to the analysis, the system's average accuracy is 98.5%, and its tracking efficiency is 99%. In addition, smart devices in various locations can take actions on specific individuals according to the distributed blockchain server storage.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep-Dixon: Deep-Learning frameworks for fusion of MR T1 images for fat and water extraction Deep-Dixon:用于融合 MR T1 图像以提取脂肪和水分的深度学习框架
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-19 DOI: 10.1007/s11042-024-20255-2
Snehal V. Laddha, Rohini S. Ochawar, Krushna Gandhi, Yu-Dong Zhang

Medical image fusion plays a crucial role in understanding the necessity of medical procedures and it also assists radiologists in decision-making for surgical operations. Dixon has mathematically described a fat suppression technique that differentiates between fat and water signals by utilizing in-phase and out-of-phase MR imaging. The fusion of MR T1 images can be performed by adding or subtracting in-phase and out-phase images, respectively. The dataset used in this study was collected from the CHAOS grand challenge, comprising DICOM data sets from two different MRI sequences (T1 in-phase and out-phase). Our methodology involved training of deep learning models; VGG 19 and RESNET18 to extract features from this dataset to implement the Dixon technique, effectively separating the water and fat components. Using VGG19 and ResNet18 models, we were able to accomplish the image fusion accuracy for water-only images with EN as high as 5.70, 4.72, MI as 2.26, 2.21; SSIM as 0.97, 0.81; Qabf as 0.73, 0.72; Nabf as low as 0.18, 0.19 using VGG19 and ResNet18 models respectively. For fat-only images we have achieved EN as 4.17, 4.06; MI as 0.80, 0.77; SSIM as 0.45, 0.39; Qabf as 0.53, 0.48; Nabf as low as 0.22, 0.27. The experimental findings demonstrated the superior performance of our proposed method in terms of the enhanced accuracy and visual quality of water-only and fat-only images using several quantitative assessment parameters over other models experimented by various researchers. Our models are the stand-alone models for the implementation of the Dixon methodology using deep learning techniques. This model has experienced an improvement of 0.62 in EN, and 0.29 in Qabf compared to existing fusion models for different image modalities. Also, it can better assist radiologists in identifying tissues and blood vessels of abdominal organs that are rich in protein and understanding the fat content in lesions.

医学影像融合在理解医疗程序的必要性方面起着至关重要的作用,它还能帮助放射科医生做出外科手术决策。Dixon 用数学方法描述了一种脂肪抑制技术,该技术通过利用相内和相外磁共振成像来区分脂肪和水信号。核磁共振 T1 图像的融合可分别通过相内和相外图像的相加或相减来实现。本研究使用的数据集收集自 CHAOS 大挑战赛,包括来自两种不同磁共振成像序列(T1 相内和相外)的 DICOM 数据集。我们的方法包括训练深度学习模型(VGG 19 和 RESNET18),从该数据集中提取特征以实施 Dixon 技术,从而有效分离水和脂肪成分。使用 VGG19 和 ResNet18 模型,我们能够实现纯水图像的图像融合精度,EN 分别高达 5.70 和 4.72,MI 分别为 2.26 和 2.21,SSIM 分别为 0.97 和 0.81,Qabf 分别为 0.73 和 0.72,Nabf 分别为 0.18 和 0.19。对于纯脂肪图像,我们的 EN 值分别为 4.17、4.06;MI 值分别为 0.80、0.77;SSIM 值分别为 0.45、0.39;Qabf 值分别为 0.53、0.48;Nabf 值分别低至 0.22、0.27。实验结果表明,我们提出的方法在提高纯水图像和纯脂肪图像的准确性和视觉质量方面具有优越性能,其使用的几个定量评估参数优于其他研究人员实验过的模型。我们的模型是利用深度学习技术实现 Dixon 方法的独立模型。与不同图像模式的现有融合模型相比,该模型的EN值提高了0.62,Qabf值提高了0.29。此外,它还能更好地帮助放射科医生识别富含蛋白质的腹部器官组织和血管,了解病变部位的脂肪含量。
{"title":"Deep-Dixon: Deep-Learning frameworks for fusion of MR T1 images for fat and water extraction","authors":"Snehal V. Laddha, Rohini S. Ochawar, Krushna Gandhi, Yu-Dong Zhang","doi":"10.1007/s11042-024-20255-2","DOIUrl":"https://doi.org/10.1007/s11042-024-20255-2","url":null,"abstract":"<p>Medical image fusion plays a crucial role in understanding the necessity of medical procedures and it also assists radiologists in decision-making for surgical operations. Dixon has mathematically described a fat suppression technique that differentiates between fat and water signals by utilizing in-phase and out-of-phase MR imaging. The fusion of MR T1 images can be performed by adding or subtracting in-phase and out-phase images, respectively. The dataset used in this study was collected from the CHAOS grand challenge, comprising DICOM data sets from two different MRI sequences (T1 in-phase and out-phase). Our methodology involved training of deep learning models; VGG 19 and RESNET18 to extract features from this dataset to implement the Dixon technique, effectively separating the water and fat components. Using VGG19 and ResNet18 models, we were able to accomplish the image fusion accuracy for water-only images with EN as high as 5.70, 4.72, MI as 2.26, 2.21; SSIM as 0.97, 0.81; Qabf as 0.73, 0.72; Nabf as low as 0.18, 0.19 using VGG19 and ResNet18 models respectively. For fat-only images we have achieved EN as 4.17, 4.06; MI as 0.80, 0.77; SSIM as 0.45, 0.39; Qabf as 0.53, 0.48; Nabf as low as 0.22, 0.27. The experimental findings demonstrated the superior performance of our proposed method in terms of the enhanced accuracy and visual quality of water-only and fat-only images using several quantitative assessment parameters over other models experimented by various researchers. Our models are the stand-alone models for the implementation of the Dixon methodology using deep learning techniques. This model has experienced an improvement of 0.62 in EN, and 0.29 in Qabf compared to existing fusion models for different image modalities. Also, it can better assist radiologists in identifying tissues and blood vessels of abdominal organs that are rich in protein and understanding the fat content in lesions.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MeVs-deep CNN: optimized deep learning model for efficient lung cancer classification MeVs-deep CNN:用于高效肺癌分类的优化深度学习模型
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-19 DOI: 10.1007/s11042-024-20230-x
Ranjana M. Sewatkar, Asnath Victy Phamila Y

Lung cancer is a dangerous condition that impacts many people. The type and location of cancer are critical factors in determining the appropriate medical treatment. Early identification of cancer cells can save numerous lives, making the development of automated detection techniques essential. Although many methods have been proposed by researchers over the years, achieving high prediction accuracy remains a persistent challenge. Addressing this issue, this research employs Memory-Enabled Vulture Search Optimization based on Deep Convolutional Neural Networks (MeVs-deep CNN) to develop an autonomous, accurate lung cancer categorization system. The data is initially gathered from the PET/CT dataset and preprocessed using the Non-Local Means (NL-Means) approach. The proposed MeVs optimization approach is then used to segment the data. The feature extraction process incorporates statistical, texture, and intensity-based features and Resnet-101-based features, resulting in the creation of the final feature vector for cancer classification and the multi-level standardized convolutional fusion model. Subsequently, the MeVs-deep CNN leverages the MeVs optimization technique to automatically classify lung cancer. The key contribution of the research is the MeVs optimization, which effectively adjusts the classifier's parameters using the fitness function. The output is evaluated using metrics such as accuracy, sensitivity, specificity, AUC, and loss function. The efficiency of the MeVs-deep CNN is demonstrated through these metrics, achieving values of 97.08%, 97.93%, 96.42%, 95.88%, and 2.92% for training phase; 95.78%, 95.34%, 96.42%, 93.48%, and 4.22% for testing percentage; 96.33%, 95.20%, 97.65%, 94.83%, and 3.67% for k-fold train data; and 94.16%, 95.20%, 93.30%, 91.66%, and 5.84% for k-fold test data. These results demonstrate the effectiveness of the research.

肺癌是一种影响许多人的危险疾病。癌症的类型和部位是决定适当治疗的关键因素。早期识别癌细胞可以挽救无数生命,因此开发自动检测技术至关重要。尽管多年来研究人员提出了许多方法,但要达到较高的预测准确度仍是一项长期挑战。针对这一问题,本研究采用基于深度卷积神经网络(MeVs-deep CNN)的 "记忆启用秃鹫搜索优化"(Memory-Enabled Vulture Search Optimization)来开发一种自主、准确的肺癌分类系统。数据最初来自 PET/CT 数据集,并使用非局部均值(NL-Means)方法进行预处理。然后使用提出的 MeVs 优化方法对数据进行分割。特征提取过程结合了基于统计、纹理和强度的特征以及基于 Resnet-101 的特征,从而创建了用于癌症分类和多级标准化卷积融合模型的最终特征向量。随后,MeVs-deep CNN 利用 MeVs 优化技术对肺癌进行自动分类。该研究的主要贡献在于 MeVs 优化,它能利用拟合函数有效调整分类器的参数。输出结果使用准确度、灵敏度、特异性、AUC 和损失函数等指标进行评估。MeVs-deep CNN 的效率通过这些指标得到了证明,在训练阶段达到了 97.08%、97.93%、96.42%、95.88% 和 2.92% 的数值;在训练阶段达到了 95.78%、95.34%、96.测试百分比分别为 95.78%、95.34%、96.42%、93.48% 和 4.22%;k-fold 训练数据分别为 96.33%、95.20%、97.65%、94.83% 和 3.67%;k-fold 测试数据分别为 94.16%、95.20%、93.30%、91.66% 和 5.84%。这些结果证明了研究的有效性。
{"title":"MeVs-deep CNN: optimized deep learning model for efficient lung cancer classification","authors":"Ranjana M. Sewatkar, Asnath Victy Phamila Y","doi":"10.1007/s11042-024-20230-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20230-x","url":null,"abstract":"<p>Lung cancer is a dangerous condition that impacts many people. The type and location of cancer are critical factors in determining the appropriate medical treatment. Early identification of cancer cells can save numerous lives, making the development of automated detection techniques essential. Although many methods have been proposed by researchers over the years, achieving high prediction accuracy remains a persistent challenge. Addressing this issue, this research employs Memory-Enabled Vulture Search Optimization based on Deep Convolutional Neural Networks (MeVs-deep CNN) to develop an autonomous, accurate lung cancer categorization system. The data is initially gathered from the PET/CT dataset and preprocessed using the Non-Local Means (NL-Means) approach. The proposed MeVs optimization approach is then used to segment the data. The feature extraction process incorporates statistical, texture, and intensity-based features and Resnet-101-based features, resulting in the creation of the final feature vector for cancer classification and the multi-level standardized convolutional fusion model. Subsequently, the MeVs-deep CNN leverages the MeVs optimization technique to automatically classify lung cancer. The key contribution of the research is the MeVs optimization, which effectively adjusts the classifier's parameters using the fitness function. The output is evaluated using metrics such as accuracy, sensitivity, specificity, AUC, and loss function. The efficiency of the MeVs-deep CNN is demonstrated through these metrics, achieving values of 97.08%, 97.93%, 96.42%, 95.88%, and 2.92% for training phase; 95.78%, 95.34%, 96.42%, 93.48%, and 4.22% for testing percentage; 96.33%, 95.20%, 97.65%, 94.83%, and 3.67% for k-fold train data; and 94.16%, 95.20%, 93.30%, 91.66%, and 5.84% for k-fold test data. These results demonstrate the effectiveness of the research.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text-driven clothed human image synthesis with 3D human model estimation for assistance in shopping 通过三维人体模型估算进行文字驱动的着装人体图像合成,用于购物辅助
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-19 DOI: 10.1007/s11042-024-20187-x
S. Karkuzhali, A. Syed Aasim, A. StalinRaj
<p>Online shopping has become an integral part of modern consumer culture. Yet, it is plagued by challenges in visualizing clothing items based on textual descriptions and estimating their fit on individual body types. In this work, we present an innovative solution to address these challenges through text-driven clothed human image synthesis with 3D human model estimation, leveraging the power of Vector Quantized Variational AutoEncoder (VQ-VAE). Creating diverse and high-quality human images is a crucial yet difficult undertaking in vision and graphics. With the wide variety of clothing designs and textures, existing generative models are often not sufficient for the end user. In this proposed work, we introduce a solution that is provided by various datasets passed through several models so the optimized solution can be provided along with high-quality images with a range of postures. We use two distinct procedures to create full-body 2D human photographs starting from a predetermined human posture. 1) The provided human pose is first converted to a human parsing map with some sentences that describe the shapes of clothing. 2) The model developed is then given further information about the textures of clothing as an input to produce the final human image. The model is split into two different sections the first one being a codebook at a coarse level that deals with overall results and a fine-level codebook that deals with minute detailing. As mentioned previously at fine level concentrates on the minutiae of textures, whereas the codebook at the coarse level covers the depictions of textures in structures. The decoder trained together with hierarchical codebooks converts the anticipated indices at various levels to human images. The created image can be dependent on the fine-grained text input thanks to the utilization of a blend of experts. The quality of clothing textures is refined by the forecast for finer-level indexes. Implementing these strategies can result in more diversified and high-quality human images than state-of-the-art procedures, according to numerous quantitative and qualitative evaluations. These generated photographs will be converted into a 3D model, resulting in several postures and outcomes, or you may just make a 3D model from a dataset that produces a variety of stances. The application of the PIFu method uses the Marching cube algorithm and Stacked Hourglass method to produce 3D models and realistic images respectively. This results in the generation of high-resolution images based on textual description and reconstruction of the generated images as 3D models. The inception score and Fréchet Intercept Distance, SSIM, and PSNR that was achieved was 1.64 ± 0.20 and 24.64527782349843, 0.642919520, and 32.87157744102002 respectively. The implemented method scores well in comparison with other techniques. This technology holds immense promise for reshaping the e-commerce landscape, offering a more immersive and informativ
网上购物已成为现代消费文化不可或缺的一部分。然而,在根据文字描述可视化服装商品以及估计其是否适合个人体型方面,它却面临着诸多挑战。在这项工作中,我们利用矢量量化变异自动编码器(VQ-VAE)的强大功能,通过文本驱动的服装人体图像合成和三维人体模型估计,提出了一种创新的解决方案来应对这些挑战。创建多样化和高质量的人体图像是视觉和图形学领域一项重要而艰巨的任务。由于服装设计和纹理种类繁多,现有的生成模型往往无法满足最终用户的需求。在本作品中,我们提出了一种解决方案,即通过多个模型处理不同的数据集,从而提供优化的解决方案以及具有各种姿态的高质量图像。我们使用两种不同的程序,从预定的人体姿势开始创建全身二维人体照片。1) 首先将提供的人体姿态转换为人体解析图,其中包含一些描述服装形状的句子。2) 然后,将所建立的模型作为输入,进一步提供有关服装纹理的信息,以生成最终的人体图像。该模型分为两个不同的部分,第一部分是处理整体结果的粗级编码本,第二部分是处理微小细节的细级编码本。如前所述,精细级别的编码本集中于纹理的细微之处,而粗略级别的编码本则涵盖结构中纹理的描述。与分层编码本一起训练的解码器可将各级预期指数转换为人类图像。由于使用了混合专家,创建的图像可以依赖于细粒度文本输入。服装纹理的质量可通过对更精细级别的索引进行预测而得到改善。根据大量的定量和定性评估,与最先进的程序相比,实施这些策略可以生成更加多样化和高质量的人体图像。这些生成的照片将被转换成三维模型,从而产生多种姿态和结果,或者您也可以直接从产生各种姿态的数据集中制作三维模型。PIFu 方法的应用采用行进立方体算法和堆叠沙漏法,分别生成三维模型和逼真图像。这样就能根据文字描述生成高分辨率图像,并将生成的图像重建为三维模型。所获得的入门分数和弗雷谢特截距、SSIM 和 PSNR 分别为 1.64 ± 0.20 和 24.64527782349843、0.642919520 和 32.87157744102002。与其他技术相比,所采用的方法得分较高。这项技术有望重塑电子商务的格局,为探索服装选择提供一种更身临其境、信息更丰富的手段。
{"title":"Text-driven clothed human image synthesis with 3D human model estimation for assistance in shopping","authors":"S. Karkuzhali, A. Syed Aasim, A. StalinRaj","doi":"10.1007/s11042-024-20187-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20187-x","url":null,"abstract":"&lt;p&gt;Online shopping has become an integral part of modern consumer culture. Yet, it is plagued by challenges in visualizing clothing items based on textual descriptions and estimating their fit on individual body types. In this work, we present an innovative solution to address these challenges through text-driven clothed human image synthesis with 3D human model estimation, leveraging the power of Vector Quantized Variational AutoEncoder (VQ-VAE). Creating diverse and high-quality human images is a crucial yet difficult undertaking in vision and graphics. With the wide variety of clothing designs and textures, existing generative models are often not sufficient for the end user. In this proposed work, we introduce a solution that is provided by various datasets passed through several models so the optimized solution can be provided along with high-quality images with a range of postures. We use two distinct procedures to create full-body 2D human photographs starting from a predetermined human posture. 1) The provided human pose is first converted to a human parsing map with some sentences that describe the shapes of clothing. 2) The model developed is then given further information about the textures of clothing as an input to produce the final human image. The model is split into two different sections the first one being a codebook at a coarse level that deals with overall results and a fine-level codebook that deals with minute detailing. As mentioned previously at fine level concentrates on the minutiae of textures, whereas the codebook at the coarse level covers the depictions of textures in structures. The decoder trained together with hierarchical codebooks converts the anticipated indices at various levels to human images. The created image can be dependent on the fine-grained text input thanks to the utilization of a blend of experts. The quality of clothing textures is refined by the forecast for finer-level indexes. Implementing these strategies can result in more diversified and high-quality human images than state-of-the-art procedures, according to numerous quantitative and qualitative evaluations. These generated photographs will be converted into a 3D model, resulting in several postures and outcomes, or you may just make a 3D model from a dataset that produces a variety of stances. The application of the PIFu method uses the Marching cube algorithm and Stacked Hourglass method to produce 3D models and realistic images respectively. This results in the generation of high-resolution images based on textual description and reconstruction of the generated images as 3D models. The inception score and Fréchet Intercept Distance, SSIM, and PSNR that was achieved was 1.64 ± 0.20 and 24.64527782349843, 0.642919520, and 32.87157744102002 respectively. The implemented method scores well in comparison with other techniques. This technology holds immense promise for reshaping the e-commerce landscape, offering a more immersive and informativ","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Laplacian nonlinear logistic stepwise and gravitational deep neural classification for facial expression recognition 用于面部表情识别的拉普拉斯非线性逻辑逐步分类和引力深度神经分类
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-18 DOI: 10.1007/s11042-024-20079-0
Binthu Kumari M, Sivagami B

Facial expression recognition is the paramount segment of non-verbal communication and one frequent procedure of human communication. However, different facial expressions and attaining accuracy remain major issues to be focused on. Laplacian Non-linear Logistic Regression and Gravitational Deep Learning (LNLR-GDL) for facial expression recognition is proposed to select righteous features from face image data, via feature selection to achieve high performance at minimum time. The proposed method is split into three sections, namely, preprocessing, feature selection, and classification. In the first section, preprocessing is conducted with the face recognition dataset where noise-reduced preprocessed face images are obtained by employing the Unsharp Masking Laplacian Non-linear Filter model. Second with the preprocessed face images, computationally efficient relevant features are selected using a Logistic Stepwise Regression-based feature selection model. Finally, the Gravitational Deep Neural Classification model is applied to the selected features for robust recognition of facial expressions. The proposed method is compared with existing methods using three evaluation metrics namely, facial expression recognition accuracy, facial expression recognition time, and PSNR. The obtained results demonstrate that the proposed LNLR-GDL method outperforms the state-of-the-art methods.

面部表情识别是非语言交流中最重要的部分,也是人类交流中最常见的程序之一。然而,不同的面部表情和准确性仍然是需要重点关注的主要问题。本文提出了用于面部表情识别的拉普拉斯非线性逻辑回归和引力深度学习(LNLR-GDL)方法,通过特征选择从人脸图像数据中选取正确的特征,从而在最短的时间内实现高性能。所提出的方法分为三个部分,即预处理、特征选择和分类。第一部分是对人脸识别数据集进行预处理,通过使用非清晰遮蔽拉普拉斯非线性滤波模型,得到降噪预处理后的人脸图像。其次,利用预处理后的人脸图像,使用基于逻辑逐步回归的特征选择模型来选择计算效率高的相关特征。最后,将引力深度神经分类模型应用于所选特征,以实现面部表情的鲁棒识别。通过面部表情识别准确率、面部表情识别时间和 PSNR 这三个评价指标,将所提出的方法与现有方法进行了比较。结果表明,所提出的 LNLR-GDL 方法优于最先进的方法。
{"title":"Laplacian nonlinear logistic stepwise and gravitational deep neural classification for facial expression recognition","authors":"Binthu Kumari M, Sivagami B","doi":"10.1007/s11042-024-20079-0","DOIUrl":"https://doi.org/10.1007/s11042-024-20079-0","url":null,"abstract":"<p>Facial expression recognition is the paramount segment of non-verbal communication and one frequent procedure of human communication. However, different facial expressions and attaining accuracy remain major issues to be focused on. Laplacian Non-linear Logistic Regression and Gravitational Deep Learning (LNLR-GDL) for facial expression recognition is proposed to select righteous features from face image data, via feature selection to achieve high performance at minimum time. The proposed method is split into three sections, namely, preprocessing, feature selection, and classification. In the first section, preprocessing is conducted with the face recognition dataset where noise-reduced preprocessed face images are obtained by employing the Unsharp Masking Laplacian Non-linear Filter model. Second with the preprocessed face images, computationally efficient relevant features are selected using a Logistic Stepwise Regression-based feature selection model. Finally, the Gravitational Deep Neural Classification model is applied to the selected features for robust recognition of facial expressions. The proposed method is compared with existing methods using three evaluation metrics namely, facial expression recognition accuracy, facial expression recognition time, and PSNR. The obtained results demonstrate that the proposed LNLR-GDL method outperforms the state-of-the-art methods.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancements in automated sperm morphology analysis: a deep learning approach with comprehensive classification and model evaluation 精子形态自动分析的进展:一种具有综合分类和模型评估功能的深度学习方法
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-18 DOI: 10.1007/s11042-024-20188-w
Rania Maalej, Olfa Abdelkefi, Salima Daoud

Automated sperm morphology analysis is crucial in reproductive medicine for assessing male fertility, but existing methods often lack robustness in handling diverse morphological abnormalities across different regions of sperm. This study proposes a deep learning-based approach utilizing the ResNet50 architecture trained on a new SMD/MSS benchmarked dataset, which includes comprehensive annotations of 12 morphological defects across head, midpiece, and tail regions of sperm. Our approach achieved promising results with an accuracy of 95%, demonstrating effective classification across various sperm morphology classes. However, certain classes exhibited lower precision and recall rates, highlighting challenges in model performance for specific abnormalities. The findings underscore the potential of our proposed system in enhancing sperm morphology assessment. In fact, it is the first to comprehensively diagnose a spermatozoon by examining each part, including the head, intermediate piece, and tail, by identifying the type of anomaly in each part according to David's classification, which includes 12 different anomalies, to perform multi-label classification for a more precise diagnosis. It is unlike SOTA works which either study only the head or simply indicate whether each part of the sperm is normal or abnormal.

自动精子形态分析是生殖医学评估男性生育能力的关键,但现有方法在处理精子不同区域的各种形态异常时往往缺乏鲁棒性。本研究提出了一种基于深度学习的方法,利用 ResNet50 架构在新的 SMD/MSS 基准数据集上进行训练,该数据集包括精子头部、中段和尾部区域的 12 种形态缺陷的全面注释。我们的方法取得了可喜的成果,准确率达到 95%,显示出对不同精子形态类别的有效分类。然而,某些类别的精确率和召回率较低,这凸显了针对特定异常情况的模型性能所面临的挑战。这些发现凸显了我们提出的系统在加强精子形态评估方面的潜力。事实上,这是首个通过检查精子的各个部分(包括头部、中间部分和尾部)来全面诊断精子的系统,该系统根据大卫分类法(包括 12 种不同的异常情况)识别每个部分的异常类型,从而进行多标签分类,以获得更精确的诊断。它不同于只研究头部或只说明精子各部分正常或异常的 SOTA 方法。
{"title":"Advancements in automated sperm morphology analysis: a deep learning approach with comprehensive classification and model evaluation","authors":"Rania Maalej, Olfa Abdelkefi, Salima Daoud","doi":"10.1007/s11042-024-20188-w","DOIUrl":"https://doi.org/10.1007/s11042-024-20188-w","url":null,"abstract":"<p>Automated sperm morphology analysis is crucial in reproductive medicine for assessing male fertility, but existing methods often lack robustness in handling diverse morphological abnormalities across different regions of sperm. This study proposes a deep learning-based approach utilizing the ResNet50 architecture trained on a new SMD/MSS benchmarked dataset, which includes comprehensive annotations of 12 morphological defects across head, midpiece, and tail regions of sperm. Our approach achieved promising results with an accuracy of 95%, demonstrating effective classification across various sperm morphology classes. However, certain classes exhibited lower precision and recall rates, highlighting challenges in model performance for specific abnormalities. The findings underscore the potential of our proposed system in enhancing sperm morphology assessment. In fact, it is the first to comprehensively diagnose a spermatozoon by examining each part, including the head, intermediate piece, and tail, by identifying the type of anomaly in each part according to David's classification, which includes 12 different anomalies, to perform multi-label classification for a more precise diagnosis. It is unlike SOTA works which either study only the head or simply indicate whether each part of the sperm is normal or abnormal.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal emotion recognition based on a fusion of audiovisual information with temporal dynamics 基于视听信息与时间动态融合的多模态情感识别
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-18 DOI: 10.1007/s11042-024-20227-6
José Salas-Cáceres, Javier Lorenzo-Navarro, David Freire-Obregón, Modesto Castrillón-Santana

In the Human-Machine Interactions (HMI) landscape, understanding user emotions is pivotal for elevating user experiences. This paper explores Facial Expression Recognition (FER) within HMI, employing a distinctive multimodal approach that integrates visual and auditory information. Recognizing the dynamic nature of HMI, where situations evolve, this study emphasizes continuous emotion analysis. This work assesses various fusion strategies that involve the addition to the main network of different architectures, such as autoencoders (AE) or an Embracement module, to combine the information of multiple biometric cues. In addition to the multimodal approach, this paper introduces a new architecture that prioritizes temporal dynamics by incorporating Long Short-Term Memory (LSTM) networks. The final proposal, which integrates different multimodal approaches with the temporal focus capabilities of the LSTM architecture, was tested across three public datasets: RAVDESS, SAVEE, and CREMA-D. It showcased state-of-the-art accuracy of 88.11%, 86.75%, and 80.27%, respectively, and outperformed other existing approaches.

在人机交互(HMI)领域,了解用户情绪对于提升用户体验至关重要。本文探讨了人机界面中的面部表情识别(FER),采用了一种独特的多模态方法,将视觉和听觉信息整合在一起。认识到人机界面的动态性质,即情况不断变化,本研究强调持续的情感分析。这项工作评估了各种融合策略,包括在主网络中添加不同的架构,如自动编码器(AE)或嵌入模块,以结合多种生物识别线索的信息。除了多模态方法外,本文还引入了一种新的架构,通过结合长短期记忆(LSTM)网络,优先考虑时间动态。最终建议将不同的多模态方法与 LSTM 架构的时间聚焦功能相结合,并在三个公共数据集上进行了测试:RAVDESS、SAVEE 和 CREMA-D。其准确率分别为 88.11%、86.75% 和 80.27%,达到了最先进的水平,优于其他现有方法。
{"title":"Multimodal emotion recognition based on a fusion of audiovisual information with temporal dynamics","authors":"José Salas-Cáceres, Javier Lorenzo-Navarro, David Freire-Obregón, Modesto Castrillón-Santana","doi":"10.1007/s11042-024-20227-6","DOIUrl":"https://doi.org/10.1007/s11042-024-20227-6","url":null,"abstract":"<p>In the Human-Machine Interactions (HMI) landscape, understanding user emotions is pivotal for elevating user experiences. This paper explores Facial Expression Recognition (FER) within HMI, employing a distinctive multimodal approach that integrates visual and auditory information. Recognizing the dynamic nature of HMI, where situations evolve, this study emphasizes continuous emotion analysis. This work assesses various fusion strategies that involve the addition to the main network of different architectures, such as autoencoders (AE) or an Embracement module, to combine the information of multiple biometric cues. In addition to the multimodal approach, this paper introduces a new architecture that prioritizes temporal dynamics by incorporating Long Short-Term Memory (LSTM) networks. The final proposal, which integrates different multimodal approaches with the temporal focus capabilities of the LSTM architecture, was tested across three public datasets: RAVDESS, SAVEE, and CREMA-D. It showcased state-of-the-art accuracy of 88.11%, 86.75%, and 80.27%, respectively, and outperformed other existing approaches.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Multimedia Tools and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1