Pub Date : 2024-09-19DOI: 10.1007/s11042-024-20240-9
Sofia Yfantidou, Georgia Yfantidou, Panagiota Balaska, Athena Vakali
In today’s connected society, self-tracking technologies (STTs), such as wearables and mobile fitness apps, empower humans to improve their health and well-being through ubiquitous physical activity monitoring, with several personal and societal benefits. Despite the advances in such technologies’ hardware, low user engagement and decreased effectiveness limitations demand more informed and theoretically-founded Human-Computer Interaction designs. To address these challenges, we build upon the previously unexplored Leisure Constraints Negotiation Model and the Transtheoretical Model to systematically define and assess the effectiveness of STTs’ features that acknowledge users’ contextual constraints and establish human-negotiated STTs narratives. Specifically, we introduce and validate a human-centric scale, StoryWear, which exploits and explores eleven dimensions of negotiation strategies that humans utilize to overcome constraints regarding exercise participation, captured through an inclusive storyboards format. Based on our preliminary studies, StoryWear shows high reliability, rendering it suitable for future work in ubiquitous computing. Our results indicate that negotiation strategies vary in perceived effectiveness and have higher appeal for existing STTs’ users, with self-motivation, commitment, and understanding of the negative impact of non-exercise placed at the top. Finally, we give actionable guidelines for real-world implementation and a commentary on the future of personalized training.
{"title":"Negotiation strategies in ubiquitous human-computer interaction: a novel storyboards scale & field study","authors":"Sofia Yfantidou, Georgia Yfantidou, Panagiota Balaska, Athena Vakali","doi":"10.1007/s11042-024-20240-9","DOIUrl":"https://doi.org/10.1007/s11042-024-20240-9","url":null,"abstract":"<p>In today’s connected society, self-tracking technologies (STTs), such as wearables and mobile fitness apps, empower humans to improve their health and well-being through ubiquitous physical activity monitoring, with several personal and societal benefits. Despite the advances in such technologies’ hardware, low user engagement and decreased effectiveness limitations demand more informed and theoretically-founded Human-Computer Interaction designs. To address these challenges, we build upon the previously unexplored Leisure Constraints Negotiation Model and the Transtheoretical Model to systematically define and assess the effectiveness of STTs’ features that acknowledge users’ contextual constraints and establish human-negotiated STTs narratives. Specifically, we introduce and validate a human-centric scale, StoryWear, which exploits and explores eleven dimensions of negotiation strategies that humans utilize to overcome constraints regarding exercise participation, captured through an inclusive storyboards format. Based on our preliminary studies, StoryWear shows high reliability, rendering it suitable for future work in ubiquitous computing. Our results indicate that negotiation strategies vary in perceived effectiveness and have higher appeal for existing STTs’ users, with self-motivation, commitment, and understanding of the negative impact of non-exercise placed at the top. Finally, we give actionable guidelines for real-world implementation and a commentary on the future of personalized training.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"41 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-19DOI: 10.1007/s11042-024-20217-8
ZhiGang Liu, Yan Hu
In the pre-training task of visible-infrared person re-identification(VI-ReID), two main challenges arise: i) Domain disparities. A significant domain gap exists between the ImageNet utilized in public pre-trained models and the specific person data in the VI-ReID task. ii) Insufficient sample. Due to the challenge of gathering cross-modal paired samples, there is currently a scarcity of large-scale datasets suitable for pretraininge. To address the aforementioned issues, we propose a new unified pre-training framework (UPPI). Firstly, we established a large-scale visible-pseudo infrared paired sample repository (UnitCP) based on the existing visible person dataset, encompassing nearly 170,000 sample pairs. Benefiting from this repository, not only are training samples significantly expanded, but pre-training on this foundation also effectively bridges the domain disparities. Simultaneously, to fully harness the potential of the repository, we devised an innovative feature fusion mechanism(CF(^2)) during pre-training. It leverages redundant features present in the paired images to steer the model towards cross-modal feature fusion. In addition, during fine-tuning, to adapt the model to datasets lacking paired images, we introduced a center contrast loss(C(^2)). This loss guides the model to prioritize cross-modal features with consistent identities. Extensive experimental results on two standard benchmarks (SYSU-MM01 and RegDB) demonstrate that the proposed UPPI performs favorably against state-of-the-art methods.
{"title":"Unified pre-training with pseudo infrared images for visible-infrared person re-identification","authors":"ZhiGang Liu, Yan Hu","doi":"10.1007/s11042-024-20217-8","DOIUrl":"https://doi.org/10.1007/s11042-024-20217-8","url":null,"abstract":"<p>In the pre-training task of visible-infrared person re-identification(VI-ReID), two main challenges arise: i) Domain disparities. A significant domain gap exists between the ImageNet utilized in public pre-trained models and the specific person data in the VI-ReID task. ii) Insufficient sample. Due to the challenge of gathering cross-modal paired samples, there is currently a scarcity of large-scale datasets suitable for pretraininge. To address the aforementioned issues, we propose a new unified pre-training framework (UPPI). Firstly, we established a large-scale visible-pseudo infrared paired sample repository (UnitCP) based on the existing visible person dataset, encompassing nearly 170,000 sample pairs. Benefiting from this repository, not only are training samples significantly expanded, but pre-training on this foundation also effectively bridges the domain disparities. Simultaneously, to fully harness the potential of the repository, we devised an innovative feature fusion mechanism(CF<span>(^2)</span>) during pre-training. It leverages redundant features present in the paired images to steer the model towards cross-modal feature fusion. In addition, during fine-tuning, to adapt the model to datasets lacking paired images, we introduced a center contrast loss(C<span>(^2)</span>). This loss guides the model to prioritize cross-modal features with consistent identities. Extensive experimental results on two standard benchmarks (SYSU-MM01 and RegDB) demonstrate that the proposed UPPI performs favorably against state-of-the-art methods.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"9 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-19DOI: 10.1007/s11042-024-20133-x
Tukaram K. Gawali, Shailesh S. Deore
Traffic congestion, influenced by varying traffic density levels, remains a critical challenge in transportation management, significantly impacting efficiency and safety. This research addresses these challenges by proposing an Enhanced Hybrid Golden Jackal (EGJ) fusion-based recommendation system for optimal traffic congestion and road condition categorization. In the first phase, road vehicle images are processed using Enhanced Geodesic Filtering (EGF) to classify traffic density as heterogeneous or homogeneous across heavy, medium and light flows using Enhanced Consolidated Convolutional Neural Network (ECNN). Simultaneously, text data from road safety datasets undergo preprocessing through crisp data conversion, splitting and normalization techniques. This data is then categorized into weather conditions, speed, highway conditions, rural/urban settings and light conditions using Adaptive Drop Block Enhanced Generative Adversarial Networks (ADGAN). In the third phase, the EGJ fusion method integrates outputs from ECNN and ADGAN classifiers to enhance classification accuracy and robustness. The proposed approach addresses challenges like accurately assessing traffic density variations and optimizing traffic flow in historical pattern scenarios. The simulation outcomes establish the efficiency of the EGJ fusion-based system, achieving significant performance metrics. Specifically, the system achieves 98% accuracy, 99.1% precision and 98.2% F1-Score in traffic density and road condition classification tasks. Additionally, error performance like mean absolute error of 0.043, root mean square error of 0.05 and mean absolute percentage error of 0.148 further validate the robustness and accuracy of the introduced approach.
{"title":"Hybrid golden jackal fusion based recommendation system for spatio-temporal transportation's optimal traffic congestion and road condition classification","authors":"Tukaram K. Gawali, Shailesh S. Deore","doi":"10.1007/s11042-024-20133-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20133-x","url":null,"abstract":"<p>Traffic congestion, influenced by varying traffic density levels, remains a critical challenge in transportation management, significantly impacting efficiency and safety. This research addresses these challenges by proposing an Enhanced Hybrid Golden Jackal (EGJ) fusion-based recommendation system for optimal traffic congestion and road condition categorization. In the first phase, road vehicle images are processed using Enhanced Geodesic Filtering (EGF) to classify traffic density as heterogeneous or homogeneous across heavy, medium and light flows using Enhanced Consolidated Convolutional Neural Network (ECNN). Simultaneously, text data from road safety datasets undergo preprocessing through crisp data conversion, splitting and normalization techniques. This data is then categorized into weather conditions, speed, highway conditions, rural/urban settings and light conditions using Adaptive Drop Block Enhanced Generative Adversarial Networks (ADGAN). In the third phase, the EGJ fusion method integrates outputs from ECNN and ADGAN classifiers to enhance classification accuracy and robustness. The proposed approach addresses challenges like accurately assessing traffic density variations and optimizing traffic flow in historical pattern scenarios. The simulation outcomes establish the efficiency of the EGJ fusion-based system, achieving significant performance metrics. Specifically, the system achieves 98% accuracy, 99.1% precision and 98.2% F1-Score in traffic density and road condition classification tasks. Additionally, error performance like mean absolute error of 0.043, root mean square error of 0.05 and mean absolute percentage error of 0.148 further validate the robustness and accuracy of the introduced approach.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"1 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-19DOI: 10.1007/s11042-024-20197-9
Sana Zeba, Mohammad Amjad
Surveillance through video surveillance is the basis for the increasing demand for security. Users who are capable can manipulate video images, timestamps, and camera settings digitally; they can also physically manipulate camera locations, orientation, and mechanical settings. Advanced video manipulation techniques can easily alter cameras and videos, which are essential for criminal investigations. To ensure security, it is necessary to increase the level of security for the camera and video data. Blockchain technology has gained a lot of attention in the last decade due to its ability to create trust between users without the use of third-party intermediaries, which allows for many applications. Our goal is to create a CCTV camera system that utilizes blockchain technology to guarantee the reliability of video or image data. The truthfulness of stored data can be confirmed by authorities using blockchain technology, which enables data creation and storage in a distributed manner. The workflow of tracking and blockchain storage to secure data was discussed for security purposes. Develop an algorithm that synchronizes all updated criminal records of all users with IoT devices. Our final step involved calculating the accuracy of tracking the recognized face in diverse datasets with different resolutions and assessing the efficiency of the location being tracked. The accuracy of recognition has changed depending on the resolution. Low-resolution datasets have more accuracy than high-resolution datasets. According to the analysis, the system's average accuracy is 98.5%, and its tracking efficiency is 99%. In addition, smart devices in various locations can take actions on specific individuals according to the distributed blockchain server storage.
{"title":"Identification and location monitoring through Live video Streaming by using blockchain","authors":"Sana Zeba, Mohammad Amjad","doi":"10.1007/s11042-024-20197-9","DOIUrl":"https://doi.org/10.1007/s11042-024-20197-9","url":null,"abstract":"<p>Surveillance through video surveillance is the basis for the increasing demand for security. Users who are capable can manipulate video images, timestamps, and camera settings digitally; they can also physically manipulate camera locations, orientation, and mechanical settings. Advanced video manipulation techniques can easily alter cameras and videos, which are essential for criminal investigations. To ensure security, it is necessary to increase the level of security for the camera and video data. Blockchain technology has gained a lot of attention in the last decade due to its ability to create trust between users without the use of third-party intermediaries, which allows for many applications. Our goal is to create a CCTV camera system that utilizes blockchain technology to guarantee the reliability of video or image data. The truthfulness of stored data can be confirmed by authorities using blockchain technology, which enables data creation and storage in a distributed manner. The workflow of tracking and blockchain storage to secure data was discussed for security purposes. Develop an algorithm that synchronizes all updated criminal records of all users with IoT devices. Our final step involved calculating the accuracy of tracking the recognized face in diverse datasets with different resolutions and assessing the efficiency of the location being tracked. The accuracy of recognition has changed depending on the resolution. Low-resolution datasets have more accuracy than high-resolution datasets. According to the analysis, the system's average accuracy is 98.5%, and its tracking efficiency is 99%. In addition, smart devices in various locations can take actions on specific individuals according to the distributed blockchain server storage.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"105 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-19DOI: 10.1007/s11042-024-20255-2
Snehal V. Laddha, Rohini S. Ochawar, Krushna Gandhi, Yu-Dong Zhang
Medical image fusion plays a crucial role in understanding the necessity of medical procedures and it also assists radiologists in decision-making for surgical operations. Dixon has mathematically described a fat suppression technique that differentiates between fat and water signals by utilizing in-phase and out-of-phase MR imaging. The fusion of MR T1 images can be performed by adding or subtracting in-phase and out-phase images, respectively. The dataset used in this study was collected from the CHAOS grand challenge, comprising DICOM data sets from two different MRI sequences (T1 in-phase and out-phase). Our methodology involved training of deep learning models; VGG 19 and RESNET18 to extract features from this dataset to implement the Dixon technique, effectively separating the water and fat components. Using VGG19 and ResNet18 models, we were able to accomplish the image fusion accuracy for water-only images with EN as high as 5.70, 4.72, MI as 2.26, 2.21; SSIM as 0.97, 0.81; Qabf as 0.73, 0.72; Nabf as low as 0.18, 0.19 using VGG19 and ResNet18 models respectively. For fat-only images we have achieved EN as 4.17, 4.06; MI as 0.80, 0.77; SSIM as 0.45, 0.39; Qabf as 0.53, 0.48; Nabf as low as 0.22, 0.27. The experimental findings demonstrated the superior performance of our proposed method in terms of the enhanced accuracy and visual quality of water-only and fat-only images using several quantitative assessment parameters over other models experimented by various researchers. Our models are the stand-alone models for the implementation of the Dixon methodology using deep learning techniques. This model has experienced an improvement of 0.62 in EN, and 0.29 in Qabf compared to existing fusion models for different image modalities. Also, it can better assist radiologists in identifying tissues and blood vessels of abdominal organs that are rich in protein and understanding the fat content in lesions.
{"title":"Deep-Dixon: Deep-Learning frameworks for fusion of MR T1 images for fat and water extraction","authors":"Snehal V. Laddha, Rohini S. Ochawar, Krushna Gandhi, Yu-Dong Zhang","doi":"10.1007/s11042-024-20255-2","DOIUrl":"https://doi.org/10.1007/s11042-024-20255-2","url":null,"abstract":"<p>Medical image fusion plays a crucial role in understanding the necessity of medical procedures and it also assists radiologists in decision-making for surgical operations. Dixon has mathematically described a fat suppression technique that differentiates between fat and water signals by utilizing in-phase and out-of-phase MR imaging. The fusion of MR T1 images can be performed by adding or subtracting in-phase and out-phase images, respectively. The dataset used in this study was collected from the CHAOS grand challenge, comprising DICOM data sets from two different MRI sequences (T1 in-phase and out-phase). Our methodology involved training of deep learning models; VGG 19 and RESNET18 to extract features from this dataset to implement the Dixon technique, effectively separating the water and fat components. Using VGG19 and ResNet18 models, we were able to accomplish the image fusion accuracy for water-only images with EN as high as 5.70, 4.72, MI as 2.26, 2.21; SSIM as 0.97, 0.81; Qabf as 0.73, 0.72; Nabf as low as 0.18, 0.19 using VGG19 and ResNet18 models respectively. For fat-only images we have achieved EN as 4.17, 4.06; MI as 0.80, 0.77; SSIM as 0.45, 0.39; Qabf as 0.53, 0.48; Nabf as low as 0.22, 0.27. The experimental findings demonstrated the superior performance of our proposed method in terms of the enhanced accuracy and visual quality of water-only and fat-only images using several quantitative assessment parameters over other models experimented by various researchers. Our models are the stand-alone models for the implementation of the Dixon methodology using deep learning techniques. This model has experienced an improvement of 0.62 in EN, and 0.29 in Qabf compared to existing fusion models for different image modalities. Also, it can better assist radiologists in identifying tissues and blood vessels of abdominal organs that are rich in protein and understanding the fat content in lesions.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"50 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-19DOI: 10.1007/s11042-024-20187-x
S. Karkuzhali, A. Syed Aasim, A. StalinRaj
<p>Online shopping has become an integral part of modern consumer culture. Yet, it is plagued by challenges in visualizing clothing items based on textual descriptions and estimating their fit on individual body types. In this work, we present an innovative solution to address these challenges through text-driven clothed human image synthesis with 3D human model estimation, leveraging the power of Vector Quantized Variational AutoEncoder (VQ-VAE). Creating diverse and high-quality human images is a crucial yet difficult undertaking in vision and graphics. With the wide variety of clothing designs and textures, existing generative models are often not sufficient for the end user. In this proposed work, we introduce a solution that is provided by various datasets passed through several models so the optimized solution can be provided along with high-quality images with a range of postures. We use two distinct procedures to create full-body 2D human photographs starting from a predetermined human posture. 1) The provided human pose is first converted to a human parsing map with some sentences that describe the shapes of clothing. 2) The model developed is then given further information about the textures of clothing as an input to produce the final human image. The model is split into two different sections the first one being a codebook at a coarse level that deals with overall results and a fine-level codebook that deals with minute detailing. As mentioned previously at fine level concentrates on the minutiae of textures, whereas the codebook at the coarse level covers the depictions of textures in structures. The decoder trained together with hierarchical codebooks converts the anticipated indices at various levels to human images. The created image can be dependent on the fine-grained text input thanks to the utilization of a blend of experts. The quality of clothing textures is refined by the forecast for finer-level indexes. Implementing these strategies can result in more diversified and high-quality human images than state-of-the-art procedures, according to numerous quantitative and qualitative evaluations. These generated photographs will be converted into a 3D model, resulting in several postures and outcomes, or you may just make a 3D model from a dataset that produces a variety of stances. The application of the PIFu method uses the Marching cube algorithm and Stacked Hourglass method to produce 3D models and realistic images respectively. This results in the generation of high-resolution images based on textual description and reconstruction of the generated images as 3D models. The inception score and Fréchet Intercept Distance, SSIM, and PSNR that was achieved was 1.64 ± 0.20 and 24.64527782349843, 0.642919520, and 32.87157744102002 respectively. The implemented method scores well in comparison with other techniques. This technology holds immense promise for reshaping the e-commerce landscape, offering a more immersive and informativ
{"title":"Text-driven clothed human image synthesis with 3D human model estimation for assistance in shopping","authors":"S. Karkuzhali, A. Syed Aasim, A. StalinRaj","doi":"10.1007/s11042-024-20187-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20187-x","url":null,"abstract":"<p>Online shopping has become an integral part of modern consumer culture. Yet, it is plagued by challenges in visualizing clothing items based on textual descriptions and estimating their fit on individual body types. In this work, we present an innovative solution to address these challenges through text-driven clothed human image synthesis with 3D human model estimation, leveraging the power of Vector Quantized Variational AutoEncoder (VQ-VAE). Creating diverse and high-quality human images is a crucial yet difficult undertaking in vision and graphics. With the wide variety of clothing designs and textures, existing generative models are often not sufficient for the end user. In this proposed work, we introduce a solution that is provided by various datasets passed through several models so the optimized solution can be provided along with high-quality images with a range of postures. We use two distinct procedures to create full-body 2D human photographs starting from a predetermined human posture. 1) The provided human pose is first converted to a human parsing map with some sentences that describe the shapes of clothing. 2) The model developed is then given further information about the textures of clothing as an input to produce the final human image. The model is split into two different sections the first one being a codebook at a coarse level that deals with overall results and a fine-level codebook that deals with minute detailing. As mentioned previously at fine level concentrates on the minutiae of textures, whereas the codebook at the coarse level covers the depictions of textures in structures. The decoder trained together with hierarchical codebooks converts the anticipated indices at various levels to human images. The created image can be dependent on the fine-grained text input thanks to the utilization of a blend of experts. The quality of clothing textures is refined by the forecast for finer-level indexes. Implementing these strategies can result in more diversified and high-quality human images than state-of-the-art procedures, according to numerous quantitative and qualitative evaluations. These generated photographs will be converted into a 3D model, resulting in several postures and outcomes, or you may just make a 3D model from a dataset that produces a variety of stances. The application of the PIFu method uses the Marching cube algorithm and Stacked Hourglass method to produce 3D models and realistic images respectively. This results in the generation of high-resolution images based on textual description and reconstruction of the generated images as 3D models. The inception score and Fréchet Intercept Distance, SSIM, and PSNR that was achieved was 1.64 ± 0.20 and 24.64527782349843, 0.642919520, and 32.87157744102002 respectively. The implemented method scores well in comparison with other techniques. This technology holds immense promise for reshaping the e-commerce landscape, offering a more immersive and informativ","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"54 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-19DOI: 10.1007/s11042-024-20230-x
Ranjana M. Sewatkar, Asnath Victy Phamila Y
Lung cancer is a dangerous condition that impacts many people. The type and location of cancer are critical factors in determining the appropriate medical treatment. Early identification of cancer cells can save numerous lives, making the development of automated detection techniques essential. Although many methods have been proposed by researchers over the years, achieving high prediction accuracy remains a persistent challenge. Addressing this issue, this research employs Memory-Enabled Vulture Search Optimization based on Deep Convolutional Neural Networks (MeVs-deep CNN) to develop an autonomous, accurate lung cancer categorization system. The data is initially gathered from the PET/CT dataset and preprocessed using the Non-Local Means (NL-Means) approach. The proposed MeVs optimization approach is then used to segment the data. The feature extraction process incorporates statistical, texture, and intensity-based features and Resnet-101-based features, resulting in the creation of the final feature vector for cancer classification and the multi-level standardized convolutional fusion model. Subsequently, the MeVs-deep CNN leverages the MeVs optimization technique to automatically classify lung cancer. The key contribution of the research is the MeVs optimization, which effectively adjusts the classifier's parameters using the fitness function. The output is evaluated using metrics such as accuracy, sensitivity, specificity, AUC, and loss function. The efficiency of the MeVs-deep CNN is demonstrated through these metrics, achieving values of 97.08%, 97.93%, 96.42%, 95.88%, and 2.92% for training phase; 95.78%, 95.34%, 96.42%, 93.48%, and 4.22% for testing percentage; 96.33%, 95.20%, 97.65%, 94.83%, and 3.67% for k-fold train data; and 94.16%, 95.20%, 93.30%, 91.66%, and 5.84% for k-fold test data. These results demonstrate the effectiveness of the research.
{"title":"MeVs-deep CNN: optimized deep learning model for efficient lung cancer classification","authors":"Ranjana M. Sewatkar, Asnath Victy Phamila Y","doi":"10.1007/s11042-024-20230-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20230-x","url":null,"abstract":"<p>Lung cancer is a dangerous condition that impacts many people. The type and location of cancer are critical factors in determining the appropriate medical treatment. Early identification of cancer cells can save numerous lives, making the development of automated detection techniques essential. Although many methods have been proposed by researchers over the years, achieving high prediction accuracy remains a persistent challenge. Addressing this issue, this research employs Memory-Enabled Vulture Search Optimization based on Deep Convolutional Neural Networks (MeVs-deep CNN) to develop an autonomous, accurate lung cancer categorization system. The data is initially gathered from the PET/CT dataset and preprocessed using the Non-Local Means (NL-Means) approach. The proposed MeVs optimization approach is then used to segment the data. The feature extraction process incorporates statistical, texture, and intensity-based features and Resnet-101-based features, resulting in the creation of the final feature vector for cancer classification and the multi-level standardized convolutional fusion model. Subsequently, the MeVs-deep CNN leverages the MeVs optimization technique to automatically classify lung cancer. The key contribution of the research is the MeVs optimization, which effectively adjusts the classifier's parameters using the fitness function. The output is evaluated using metrics such as accuracy, sensitivity, specificity, AUC, and loss function. The efficiency of the MeVs-deep CNN is demonstrated through these metrics, achieving values of 97.08%, 97.93%, 96.42%, 95.88%, and 2.92% for training phase; 95.78%, 95.34%, 96.42%, 93.48%, and 4.22% for testing percentage; 96.33%, 95.20%, 97.65%, 94.83%, and 3.67% for k-fold train data; and 94.16%, 95.20%, 93.30%, 91.66%, and 5.84% for k-fold test data. These results demonstrate the effectiveness of the research.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"50 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-18DOI: 10.1007/s11042-024-20188-w
Rania Maalej, Olfa Abdelkefi, Salima Daoud
Automated sperm morphology analysis is crucial in reproductive medicine for assessing male fertility, but existing methods often lack robustness in handling diverse morphological abnormalities across different regions of sperm. This study proposes a deep learning-based approach utilizing the ResNet50 architecture trained on a new SMD/MSS benchmarked dataset, which includes comprehensive annotations of 12 morphological defects across head, midpiece, and tail regions of sperm. Our approach achieved promising results with an accuracy of 95%, demonstrating effective classification across various sperm morphology classes. However, certain classes exhibited lower precision and recall rates, highlighting challenges in model performance for specific abnormalities. The findings underscore the potential of our proposed system in enhancing sperm morphology assessment. In fact, it is the first to comprehensively diagnose a spermatozoon by examining each part, including the head, intermediate piece, and tail, by identifying the type of anomaly in each part according to David's classification, which includes 12 different anomalies, to perform multi-label classification for a more precise diagnosis. It is unlike SOTA works which either study only the head or simply indicate whether each part of the sperm is normal or abnormal.
{"title":"Advancements in automated sperm morphology analysis: a deep learning approach with comprehensive classification and model evaluation","authors":"Rania Maalej, Olfa Abdelkefi, Salima Daoud","doi":"10.1007/s11042-024-20188-w","DOIUrl":"https://doi.org/10.1007/s11042-024-20188-w","url":null,"abstract":"<p>Automated sperm morphology analysis is crucial in reproductive medicine for assessing male fertility, but existing methods often lack robustness in handling diverse morphological abnormalities across different regions of sperm. This study proposes a deep learning-based approach utilizing the ResNet50 architecture trained on a new SMD/MSS benchmarked dataset, which includes comprehensive annotations of 12 morphological defects across head, midpiece, and tail regions of sperm. Our approach achieved promising results with an accuracy of 95%, demonstrating effective classification across various sperm morphology classes. However, certain classes exhibited lower precision and recall rates, highlighting challenges in model performance for specific abnormalities. The findings underscore the potential of our proposed system in enhancing sperm morphology assessment. In fact, it is the first to comprehensively diagnose a spermatozoon by examining each part, including the head, intermediate piece, and tail, by identifying the type of anomaly in each part according to David's classification, which includes 12 different anomalies, to perform multi-label classification for a more precise diagnosis. It is unlike SOTA works which either study only the head or simply indicate whether each part of the sperm is normal or abnormal.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"4 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-18DOI: 10.1007/s11042-024-20079-0
Binthu Kumari M, Sivagami B
Facial expression recognition is the paramount segment of non-verbal communication and one frequent procedure of human communication. However, different facial expressions and attaining accuracy remain major issues to be focused on. Laplacian Non-linear Logistic Regression and Gravitational Deep Learning (LNLR-GDL) for facial expression recognition is proposed to select righteous features from face image data, via feature selection to achieve high performance at minimum time. The proposed method is split into three sections, namely, preprocessing, feature selection, and classification. In the first section, preprocessing is conducted with the face recognition dataset where noise-reduced preprocessed face images are obtained by employing the Unsharp Masking Laplacian Non-linear Filter model. Second with the preprocessed face images, computationally efficient relevant features are selected using a Logistic Stepwise Regression-based feature selection model. Finally, the Gravitational Deep Neural Classification model is applied to the selected features for robust recognition of facial expressions. The proposed method is compared with existing methods using three evaluation metrics namely, facial expression recognition accuracy, facial expression recognition time, and PSNR. The obtained results demonstrate that the proposed LNLR-GDL method outperforms the state-of-the-art methods.
{"title":"Laplacian nonlinear logistic stepwise and gravitational deep neural classification for facial expression recognition","authors":"Binthu Kumari M, Sivagami B","doi":"10.1007/s11042-024-20079-0","DOIUrl":"https://doi.org/10.1007/s11042-024-20079-0","url":null,"abstract":"<p>Facial expression recognition is the paramount segment of non-verbal communication and one frequent procedure of human communication. However, different facial expressions and attaining accuracy remain major issues to be focused on. Laplacian Non-linear Logistic Regression and Gravitational Deep Learning (LNLR-GDL) for facial expression recognition is proposed to select righteous features from face image data, via feature selection to achieve high performance at minimum time. The proposed method is split into three sections, namely, preprocessing, feature selection, and classification. In the first section, preprocessing is conducted with the face recognition dataset where noise-reduced preprocessed face images are obtained by employing the Unsharp Masking Laplacian Non-linear Filter model. Second with the preprocessed face images, computationally efficient relevant features are selected using a Logistic Stepwise Regression-based feature selection model. Finally, the Gravitational Deep Neural Classification model is applied to the selected features for robust recognition of facial expressions. The proposed method is compared with existing methods using three evaluation metrics namely, facial expression recognition accuracy, facial expression recognition time, and PSNR. The obtained results demonstrate that the proposed LNLR-GDL method outperforms the state-of-the-art methods.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"13 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Early identification of potato leaf disease is challenging due to variations in crop species, disease symptoms, and environmental conditions. Existing methods for detecting crop species and diseases are limited, as they rely on models trained and evaluated solely on plant leaf images from specific regions. This study proposes a novel approach utilizing a Weighted Majority Voting strategy combined with multiple color space models to diagnose potato leaf diseases. The initial detection stage employs deep learning models such as AlexNet, ResNet50, and MobileNet. Our approach aims to identify Early Blight, Late Blight, and healthy potato leaf images. The proposed detection model is trained and tested on two datasets: the PlantVillage dataset and the PLD dataset. The novel fusion and ensemble method achieves an accuracy of 98.38% on the PlantVillage dataset and 98.27% on the PLD dataset with the MobileNet model. An ensemble of all models and color spaces using Weighted Majority Voting significantly increases classification accuracies to 98.61% on the PlantVillage dataset and 97.78% on the PLD dataset. Our contributions include a novel fusion method of color spaces and deep learning models, improving disease detection accuracy beyond the state-of-the-art.
{"title":"Potato leaf disease classification using fusion of multiple color spaces with weighted majority voting on deep learning architectures","authors":"Samaneh Sarfarazi, Hossein Ghaderi Zefrehi, Önsen Toygar","doi":"10.1007/s11042-024-20173-3","DOIUrl":"https://doi.org/10.1007/s11042-024-20173-3","url":null,"abstract":"<p>Early identification of potato leaf disease is challenging due to variations in crop species, disease symptoms, and environmental conditions. Existing methods for detecting crop species and diseases are limited, as they rely on models trained and evaluated solely on plant leaf images from specific regions. This study proposes a novel approach utilizing a Weighted Majority Voting strategy combined with multiple color space models to diagnose potato leaf diseases. The initial detection stage employs deep learning models such as AlexNet, ResNet50, and MobileNet. Our approach aims to identify Early Blight, Late Blight, and healthy potato leaf images. The proposed detection model is trained and tested on two datasets: the PlantVillage dataset and the PLD dataset. The novel fusion and ensemble method achieves an accuracy of 98.38% on the PlantVillage dataset and 98.27% on the PLD dataset with the MobileNet model. An ensemble of all models and color spaces using Weighted Majority Voting significantly increases classification accuracies to 98.61% on the PlantVillage dataset and 97.78% on the PLD dataset. Our contributions include a novel fusion method of color spaces and deep learning models, improving disease detection accuracy beyond the state-of-the-art.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"195 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}