Pub Date : 2025-01-27DOI: 10.1109/ACCESS.2025.3535519
Huanjun Zhang;Yutaka Matsubara
The complexity of a System of Systems makes resilience one of its key attributes. Numerous studies have focused on the quantitative assessment of resilience by trailing indicators, yet discussions on resilience assurance through monitoring leading indicators remain scarce. Resilience assurance in SoS faces two major challenges: lack of structured argumentation work related to resilience and conflicts among multiple independent stakeholders. To address these challenges, this paper first introduces a resilience argumentation approach based on STAMP (Systems-Theoretic Accident Model and Processes), then employs cooperative consensus process model to seek consensus on resilience assurance. Additionally, under the requirements of the international standard IEC 62853 for open systems dependability, a consensus based resilience assurance framework is proposed. Within the framework, the resilient team can discuss the specific implementation details of failure response, accountability, and change accommodation based on stakeholder consensus. Finally, two SoS case studies, Microgrid and Mobility as a Service, are used to demonstrate the application of the proposed approach.
{"title":"Consensus-Based Resilience Assurance for System of Systems","authors":"Huanjun Zhang;Yutaka Matsubara","doi":"10.1109/ACCESS.2025.3535519","DOIUrl":"https://doi.org/10.1109/ACCESS.2025.3535519","url":null,"abstract":"The complexity of a System of Systems makes resilience one of its key attributes. Numerous studies have focused on the quantitative assessment of resilience by trailing indicators, yet discussions on resilience assurance through monitoring leading indicators remain scarce. Resilience assurance in SoS faces two major challenges: lack of structured argumentation work related to resilience and conflicts among multiple independent stakeholders. To address these challenges, this paper first introduces a resilience argumentation approach based on STAMP (Systems-Theoretic Accident Model and Processes), then employs cooperative consensus process model to seek consensus on resilience assurance. Additionally, under the requirements of the international standard IEC 62853 for open systems dependability, a consensus based resilience assurance framework is proposed. Within the framework, the resilient team can discuss the specific implementation details of failure response, accountability, and change accommodation based on stakeholder consensus. Finally, two SoS case studies, Microgrid and Mobility as a Service, are used to demonstrate the application of the proposed approach.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"20203-20217"},"PeriodicalIF":3.4,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10855432","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143105906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-27DOI: 10.1109/ACCESS.2025.3534197
S. Abinaya;K. S. Ashwin;A. Sherly Alphonse
While chatbots are increasingly popular for communication, their effectiveness is limited by their difficulty in understanding users’ emotions. To address this, this study proposes a new hybrid chatbot model called “TEBC-Net” (Text Emotion Bert CNN Network), which combines text and video analysis to interpret user emotions and generate more empathetic responses. At the core of TEBC-Net is a multi-modal emotion analysis system. One component uses Bidirectional Encoder Representations from Transformers (BERT), a well-regarded model in natural language processing (NLP), achieving an 87.21% accuracy rate in detecting emotional cues from text inputs. The second component captures users’ facial expressions through webcam footage. It begins by detecting faces using a pre-trained classifier like Haarcascade. Then, to improve emotion recognition, it preprocesses the image through brightness adjustments and contrast enhancement with Automatic CLAHE and dual gamma correction. This processed image is analyzed by a Convolutional Neural Network (CNN) model trained specifically for emotion recognition, reaching 74.14% accuracy by assigning probabilities to different emotions. By integrating insights from both text and video analysis, TEBC-Net gains a comprehensive understanding of the user’s emotional state and intent. This combined data then informs the chatbot’s response generation module, enabling it to craft responses that are both empathetic and more directly aligned with the user’s emotional needs.
{"title":"Enhanced Emotion-Aware Conversational Agent: Analyzing User Behavioral Status for Tailored Reponses in Chatbot Interactions","authors":"S. Abinaya;K. S. Ashwin;A. Sherly Alphonse","doi":"10.1109/ACCESS.2025.3534197","DOIUrl":"https://doi.org/10.1109/ACCESS.2025.3534197","url":null,"abstract":"While chatbots are increasingly popular for communication, their effectiveness is limited by their difficulty in understanding users’ emotions. To address this, this study proposes a new hybrid chatbot model called “TEBC-Net” (Text Emotion Bert CNN Network), which combines text and video analysis to interpret user emotions and generate more empathetic responses. At the core of TEBC-Net is a multi-modal emotion analysis system. One component uses Bidirectional Encoder Representations from Transformers (BERT), a well-regarded model in natural language processing (NLP), achieving an 87.21% accuracy rate in detecting emotional cues from text inputs. The second component captures users’ facial expressions through webcam footage. It begins by detecting faces using a pre-trained classifier like Haarcascade. Then, to improve emotion recognition, it preprocesses the image through brightness adjustments and contrast enhancement with Automatic CLAHE and dual gamma correction. This processed image is analyzed by a Convolutional Neural Network (CNN) model trained specifically for emotion recognition, reaching 74.14% accuracy by assigning probabilities to different emotions. By integrating insights from both text and video analysis, TEBC-Net gains a comprehensive understanding of the user’s emotional state and intent. This combined data then informs the chatbot’s response generation module, enabling it to craft responses that are both empathetic and more directly aligned with the user’s emotional needs.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"19770-19787"},"PeriodicalIF":3.4,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10854433","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143105992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-27DOI: 10.1109/ACCESS.2025.3534176
A-Hyeon Jo;Keun-Chang Kwak
In this paper, we propose a method for designing a classification model of speech emotional state based on the feature-map fusion of temporal convolutional network (TCN) and the pretrained convolutional neural networks (CNN) from Korean speech database. For this purpose, the proposed approach is comprised of four main stages. In the first stage, we extract Mel-frequency cepstral coefficient (MFCC) and gammatone cepstral coefficient features (GFCC) in the frequency domain as well as log-Mel spectrogram in the time-frequency domain. From these features, the second stage performs training process using TCN and the yet another audio Mobile Net network (YAMNet), respectively. In the third stage, we perform feature-map fusion using canonical correlation analysis (CCA), stationary wavelet transform (SWT), and fuzzy c-means-based principal component averaging (FCMPCA), respectively. From these steps, speech emotion recognition model is effectively designed through the fusion model of TCN and YAMNet as well as feature-map fusion methods. Finally, we evaluate the performance comparison from five databases: the AI-Hub speech emotion dataset built in Korea and Korean speech emotional state classification dataset built from Chosun University as well as Emo-DB, RAVDESS, and TESS datasets. The experimental results showed that the proposed model revealed good performance in comparison to other previous works in most datasets.
{"title":"Classification of Speech Emotion State Based on Feature Map Fusion of TCN and Pretrained CNN Model From Korean Speech Emotion Data","authors":"A-Hyeon Jo;Keun-Chang Kwak","doi":"10.1109/ACCESS.2025.3534176","DOIUrl":"https://doi.org/10.1109/ACCESS.2025.3534176","url":null,"abstract":"In this paper, we propose a method for designing a classification model of speech emotional state based on the feature-map fusion of temporal convolutional network (TCN) and the pretrained convolutional neural networks (CNN) from Korean speech database. For this purpose, the proposed approach is comprised of four main stages. In the first stage, we extract Mel-frequency cepstral coefficient (MFCC) and gammatone cepstral coefficient features (GFCC) in the frequency domain as well as log-Mel spectrogram in the time-frequency domain. From these features, the second stage performs training process using TCN and the yet another audio Mobile Net network (YAMNet), respectively. In the third stage, we perform feature-map fusion using canonical correlation analysis (CCA), stationary wavelet transform (SWT), and fuzzy c-means-based principal component averaging (FCMPCA), respectively. From these steps, speech emotion recognition model is effectively designed through the fusion model of TCN and YAMNet as well as feature-map fusion methods. Finally, we evaluate the performance comparison from five databases: the AI-Hub speech emotion dataset built in Korea and Korean speech emotional state classification dataset built from Chosun University as well as Emo-DB, RAVDESS, and TESS datasets. The experimental results showed that the proposed model revealed good performance in comparison to other previous works in most datasets.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"19947-19963"},"PeriodicalIF":3.4,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10854478","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-27DOI: 10.1109/ACCESS.2025.3535229
Ni Chen;Rong Wen
To address the issue of uneven sensor node distribution and unbalanced energy consumption leading to premature node death in wireless sensor networks, an energy efficient tree-based routing algorithm is proposed. The algorithm calculates the optimal number of branches that minimize network energy consumption by constructing a tree-based energy model. Based on the optimal number of branches, with the base station as the root node, a multi-layer tree routing is formed from near to far according to the distance between the node and the base station. During the formation of routing tree, the nodes whose residual energy of the nodes is less than the energy threshold can only become end nodes, thus avoiding premature death of the nodes due to excessive energy consumption of the nodes. Nodes transmit data to the base station along the routing tree. The routing tree is updated at dynamic intervals instead of every round to reduce energy consumption. Simulation results show that the algorithm has more balanced node energy consumption, lower network energy consumption, and longer network stability period and network lifespan than the other three protocols.
{"title":"Energy Efficient Tree-Based Routing Algorithm for Wireless Sensor Networks","authors":"Ni Chen;Rong Wen","doi":"10.1109/ACCESS.2025.3535229","DOIUrl":"https://doi.org/10.1109/ACCESS.2025.3535229","url":null,"abstract":"To address the issue of uneven sensor node distribution and unbalanced energy consumption leading to premature node death in wireless sensor networks, an energy efficient tree-based routing algorithm is proposed. The algorithm calculates the optimal number of branches that minimize network energy consumption by constructing a tree-based energy model. Based on the optimal number of branches, with the base station as the root node, a multi-layer tree routing is formed from near to far according to the distance between the node and the base station. During the formation of routing tree, the nodes whose residual energy of the nodes is less than the energy threshold can only become end nodes, thus avoiding premature death of the nodes due to excessive energy consumption of the nodes. Nodes transmit data to the base station along the routing tree. The routing tree is updated at dynamic intervals instead of every round to reduce energy consumption. Simulation results show that the algorithm has more balanced node energy consumption, lower network energy consumption, and longer network stability period and network lifespan than the other three protocols.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"20149-20159"},"PeriodicalIF":3.4,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10855438","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-27DOI: 10.1109/ACCESS.2025.3534321
Yunxiang Liu;Yuqing Shi
Accurate detection of vulnerable road users (VRUs) is critical for enhancing traffic safety and advancing autonomous driving systems. However, due to their small size and unpredictable movements, existing detection methods struggle to provide stable and accurate results under real-time conditions. To overcome these challenges, this paper proposes an improved VRU detection algorithm based on YOLOv8, named VRU-YOLO. First, we redesign the neck structure and construct a Detail Enhancement Feature Pyramid Network (DEFPN) to enhance the extraction and fusion capabilities of small target features. Second, the YOLOv8 network’s Spatial Pyramid Pooling Fast (SPPF) module is replaced with a novel Feature Pyramid Convolution Fast (FPCF) module based on dilated convolution, effectively mitigating feature loss in small target processing. Additionally, a lightweight Optimized Shared Detection Head (OSDH-Head) is introduced, reducing computational complexity while improving detection efficiency. Finally, to alleviate the deficiencies of traditional loss functions in shape matching and computational efficiency, we propose the Wise-Powerful Intersection over Union (WPIoU) loss function, which further optimizes the regression of target bounding boxes. Experimental results on a custom-built multi-source VRU dataset show that the proposed model enhances precision, recall, mAP50, and mAP50:95 by 1.3%, 3.4%, 3.3%, and 1.8%, respectively, in comparison to the baseline model. Moreover, in a generalization test conducted on the remote sensing small target dataset VisDrone2019, the VRU-YOLO model achieved an mAP50 of 31%. This study demonstrates that the improved model offers more efficient performance in small object detection scenarios, making it well-suited for VRU detection in complex road environments.
{"title":"VRU-YOLO: A Small Object Detection Algorithm for Vulnerable Road Users in Complex Scenes","authors":"Yunxiang Liu;Yuqing Shi","doi":"10.1109/ACCESS.2025.3534321","DOIUrl":"https://doi.org/10.1109/ACCESS.2025.3534321","url":null,"abstract":"Accurate detection of vulnerable road users (VRUs) is critical for enhancing traffic safety and advancing autonomous driving systems. However, due to their small size and unpredictable movements, existing detection methods struggle to provide stable and accurate results under real-time conditions. To overcome these challenges, this paper proposes an improved VRU detection algorithm based on YOLOv8, named VRU-YOLO. First, we redesign the neck structure and construct a Detail Enhancement Feature Pyramid Network (DEFPN) to enhance the extraction and fusion capabilities of small target features. Second, the YOLOv8 network’s Spatial Pyramid Pooling Fast (SPPF) module is replaced with a novel Feature Pyramid Convolution Fast (FPCF) module based on dilated convolution, effectively mitigating feature loss in small target processing. Additionally, a lightweight Optimized Shared Detection Head (OSDH-Head) is introduced, reducing computational complexity while improving detection efficiency. Finally, to alleviate the deficiencies of traditional loss functions in shape matching and computational efficiency, we propose the Wise-Powerful Intersection over Union (WPIoU) loss function, which further optimizes the regression of target bounding boxes. Experimental results on a custom-built multi-source VRU dataset show that the proposed model enhances precision, recall, mAP50, and mAP50:95 by 1.3%, 3.4%, 3.3%, and 1.8%, respectively, in comparison to the baseline model. Moreover, in a generalization test conducted on the remote sensing small target dataset VisDrone2019, the VRU-YOLO model achieved an mAP50 of 31%. This study demonstrates that the improved model offers more efficient performance in small object detection scenarios, making it well-suited for VRU detection in complex road environments.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"19996-20015"},"PeriodicalIF":3.4,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10854459","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143105737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Based on the analysis of transverse flux machine designs, it was established that they exhibit a relative simplicity of design and demonstrate high specific power indices. This paper seeks to explore the influence of design features on the heating of the stator coil, identified as the most temperature-sensitive element in the system. Additionally, the study aims to characterize the temperature distribution pattern within the stator. To achieve this goal, experiments were conducted using a 3D model of a low-speed transverse flux motor. Thermal analysis was carried out using modern software, enabling the determination of temperature patterns in the coil, cores, and stator body. Graphs illustrating the temperature rise over time for each motor component were generated. The obtained results include corresponding graphs and dependencies, revealing that the average coil temperature reached 92°C, deviating by 3.3% from the experimental value. A significant finding is that the stator coil in a transverse flux motor experiences non-uniform heating, with temperature variations in areas lacking circulated air. Introducing thermal paste in the region enclosed by the U-shaped cores, coil, and body was found to equalize and reduce the stator coil temperature by 10%. These modeling results were subsequently validated through experimentation on the operational prototype of the TFM-200/32 transverse flux motor.
{"title":"Study on the Thermal State of a Transverse-Flux Motor","authors":"Andrii Yehorov;Oleksii Duniev;Andrii Masliennikov;Rupert Gouws;Oleksandr Dobzhanskyi;Mario Stamann","doi":"10.1109/ACCESS.2025.3534284","DOIUrl":"https://doi.org/10.1109/ACCESS.2025.3534284","url":null,"abstract":"Based on the analysis of transverse flux machine designs, it was established that they exhibit a relative simplicity of design and demonstrate high specific power indices. This paper seeks to explore the influence of design features on the heating of the stator coil, identified as the most temperature-sensitive element in the system. Additionally, the study aims to characterize the temperature distribution pattern within the stator. To achieve this goal, experiments were conducted using a 3D model of a low-speed transverse flux motor. Thermal analysis was carried out using modern software, enabling the determination of temperature patterns in the coil, cores, and stator body. Graphs illustrating the temperature rise over time for each motor component were generated. The obtained results include corresponding graphs and dependencies, revealing that the average coil temperature reached 92°C, deviating by 3.3% from the experimental value. A significant finding is that the stator coil in a transverse flux motor experiences non-uniform heating, with temperature variations in areas lacking circulated air. Introducing thermal paste in the region enclosed by the U-shaped cores, coil, and body was found to equalize and reduce the stator coil temperature by 10%. These modeling results were subsequently validated through experimentation on the operational prototype of the TFM-200/32 transverse flux motor.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"20893-20902"},"PeriodicalIF":3.4,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10854438","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143105925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-27DOI: 10.1109/ACCESS.2025.3534845
Sujin Jo;Seongsoo Hong
Showing facial expressions and using emotion-appropriate gestures are essential for social robots. As a robot’s behavior becomes more anthropomorphic, the intimacy and naturalness of human-robot interactions improve. This study aims to derive optimized facial expression and gesture designs for social robots interacting with elderly individuals, thereby enhancing emotional interactions. First, we utilized user-robot integrated scenarios to identify the emotional states required for robot interactions. Subsequently, we conducted surveys and user preference evaluations on commercially available robot faces. The results indicated that suitable components for robot faces include the eyes, eyebrows, mouth, and cheeks; geometric shapes were deemed the most appropriate. Accordingly, we collected and analyzed human facial expression images using the Facial Action Coding System to identify action unit combinations and facial landmarks. This analysis informed the design of robot faces capable of expressing humanlike emotions. Furthermore, we collected and evaluated human gesture videos representing various emotions to select the most suitable gestures, which were analyzed using motion capture technology. We utilized these data to design robot gestures. The designed robot facial expressions and gestures were validated and refined through emotion-based user preference evaluations. As a result of the study, we developed facial expression and gesture designs for six emotions (Loving, Joyful, Upbeat, Hopeful, Concerned, Grateful) in social robots interacting with elderly individuals. The results provide guidelines for designing human-friendly robot facial expressions and gestures, thus enabling social robots to form deep emotional bonds with users. By analyzing human facial expressions and gestures in relation to emotions and applying these findings to robots, we successfully developed natural and emotionally expressive robot behaviors. These findings contribute to the advancement of robots as reliable and comforting companions for humans.
{"title":"The Development of Human-Robot Interaction Design for Optimal Emotional Expression in Social Robots Used by Older People: Design of Robot Facial Expressions and Gestures","authors":"Sujin Jo;Seongsoo Hong","doi":"10.1109/ACCESS.2025.3534845","DOIUrl":"https://doi.org/10.1109/ACCESS.2025.3534845","url":null,"abstract":"Showing facial expressions and using emotion-appropriate gestures are essential for social robots. As a robot’s behavior becomes more anthropomorphic, the intimacy and naturalness of human-robot interactions improve. This study aims to derive optimized facial expression and gesture designs for social robots interacting with elderly individuals, thereby enhancing emotional interactions. First, we utilized user-robot integrated scenarios to identify the emotional states required for robot interactions. Subsequently, we conducted surveys and user preference evaluations on commercially available robot faces. The results indicated that suitable components for robot faces include the eyes, eyebrows, mouth, and cheeks; geometric shapes were deemed the most appropriate. Accordingly, we collected and analyzed human facial expression images using the Facial Action Coding System to identify action unit combinations and facial landmarks. This analysis informed the design of robot faces capable of expressing humanlike emotions. Furthermore, we collected and evaluated human gesture videos representing various emotions to select the most suitable gestures, which were analyzed using motion capture technology. We utilized these data to design robot gestures. The designed robot facial expressions and gestures were validated and refined through emotion-based user preference evaluations. As a result of the study, we developed facial expression and gesture designs for six emotions (Loving, Joyful, Upbeat, Hopeful, Concerned, Grateful) in social robots interacting with elderly individuals. The results provide guidelines for designing human-friendly robot facial expressions and gestures, thus enabling social robots to form deep emotional bonds with users. By analyzing human facial expressions and gestures in relation to emotions and applying these findings to robots, we successfully developed natural and emotionally expressive robot behaviors. These findings contribute to the advancement of robots as reliable and comforting companions for humans.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"21367-21381"},"PeriodicalIF":3.4,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10855395","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143105545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-27DOI: 10.1109/ACCESS.2025.3534411
Hong Yan;Jingjing Fan;Yajun Liu
As the variety and quantity of goods in modern warehouse management continue to increase, optimizing space utilization and ensuring the safe and orderly storage of goods have become critical challenges. High-rise shelving systems are increasingly favored by enterprises, but long-term use, collisions with stacker cranes, and overloading can lead to structural deformation of the shelves. If these deformations are not detected and addressed in a timely manner, they may result in serious safety incidents and significant property damage. To address this issue, this study proposes a zero-shot shelf deformation detection method based on multimodal data fusion. The proposed approach integrates Micro-Electro-Mechanical Systems (MEMS) sensors and image data to establish a real-time monitoring and alert mechanism. Specifically, MEMS sensors are employed for real-time acquisition of shelf status, with threshold values set to trigger an initial alert mechanism. Simultaneously, cameras capture shelf images, and multiple You Only Look Once (YOLO) models are used to detect and classify critical components of the shelf, such as beams and columns. YOLOv11n is ultimately selected as the optimal model for detecting these structural elements. Based on the detected beams and columns, further feature extraction is performed, and the sensor data is fused with these features. A K-Means clustering algorithm is then applied to conduct the clustering analysis. To address the issue of a lack of negative samples in the dataset, the study employs oversampling techniques, including SMOTE, ADASYN, and Borderline-SMOTE, combined with machine learning models such as Random Forest and Gradient Boosting Decision Trees (GBDT). The experimental results demonstrate that both Random Forest and GBDT achieved precision, recall, and F1 scores exceeding 95%, confirming the effectiveness and accuracy of the proposed method in shelf deformation detection. The multimodal detection method proposed in this study not only improves the accuracy and real-time performance of shelf deformation detection but also provides strong technical support for the safety management of warehouse operations.
{"title":"Multimodal Zero-Shot Shelf Deformation Detection Based on MEMS Sensors and Images","authors":"Hong Yan;Jingjing Fan;Yajun Liu","doi":"10.1109/ACCESS.2025.3534411","DOIUrl":"https://doi.org/10.1109/ACCESS.2025.3534411","url":null,"abstract":"As the variety and quantity of goods in modern warehouse management continue to increase, optimizing space utilization and ensuring the safe and orderly storage of goods have become critical challenges. High-rise shelving systems are increasingly favored by enterprises, but long-term use, collisions with stacker cranes, and overloading can lead to structural deformation of the shelves. If these deformations are not detected and addressed in a timely manner, they may result in serious safety incidents and significant property damage. To address this issue, this study proposes a zero-shot shelf deformation detection method based on multimodal data fusion. The proposed approach integrates Micro-Electro-Mechanical Systems (MEMS) sensors and image data to establish a real-time monitoring and alert mechanism. Specifically, MEMS sensors are employed for real-time acquisition of shelf status, with threshold values set to trigger an initial alert mechanism. Simultaneously, cameras capture shelf images, and multiple You Only Look Once (YOLO) models are used to detect and classify critical components of the shelf, such as beams and columns. YOLOv11n is ultimately selected as the optimal model for detecting these structural elements. Based on the detected beams and columns, further feature extraction is performed, and the sensor data is fused with these features. A K-Means clustering algorithm is then applied to conduct the clustering analysis. To address the issue of a lack of negative samples in the dataset, the study employs oversampling techniques, including SMOTE, ADASYN, and Borderline-SMOTE, combined with machine learning models such as Random Forest and Gradient Boosting Decision Trees (GBDT). The experimental results demonstrate that both Random Forest and GBDT achieved precision, recall, and F1 scores exceeding 95%, confirming the effectiveness and accuracy of the proposed method in shelf deformation detection. The multimodal detection method proposed in this study not only improves the accuracy and real-time performance of shelf deformation detection but also provides strong technical support for the safety management of warehouse operations.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"21486-21502"},"PeriodicalIF":3.4,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10854213","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143105617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Capsule Networks (CapsNets) are a class of neural network architectures that can be used to more accurately model hierarchical relationships due to their hierarchical structure and dynamic routing algorithms. However, their high accuracy comes at the cost of significant memory and computational resources, making them less feasible for deployment on resource-constrained devices. In this paper, progressive bitwidth assignment approaches are introduced to efficiently quantize the CapsNets. Initially, a comprehensive and detailed analysis of parameter quantization in CapsNets is performed exploring various granularities, such as block-wise quantization and dynamic routing quantization. Then, three quantization approaches are applied to progressively quantize the CapsNet, considering various insights into the susceptibility of layers to quantization. The proposed approaches include Post-Training Quantization (PTQ) strategies that minimize the dependence on floating-point operations and incorporates layer-specific integer bit-widths based on quantization error analysis. PTQ strategies employ Power-of-Two (PoT) scaling factors to simplify computations, effectively utilizing hardware shifts and significantly reducing the computational complexity. This technique not only reduces the memory footprint but also maintains accuracy by introducing a range clipping method tailored to the hardware’s capabilities, obviating the need for data preprocessing. Our experimental results on ShallowCaps and DeepCaps networks across multiple datasets (MNIST, Fashion-MNIST, CIFAR-10, and SVHN) demonstrate the efficiency of our approach. Specifically, on the CIFAR-10 dataset using the DeepCaps architecture, we achieved a substantial memory reduction ($7.02times $ for weights and $3.74times $ for activations) with a minimal accuracy loss of only 0.09%. By using progressive bitwidth assignment and post-training quantization, this work optimizes CapsNets for efficient, real-time visual processing on resource-constrained edge devices, enabling applications in IoT, mobile platforms, and embedded systems.
{"title":"Progressive Bitwidth Assignment Approaches for Efficient Capsule Networks Quantization","authors":"Mohsen Raji;Amir Ghazizadeh Ahsaei;Kimia Soroush;Behnam Ghavami","doi":"10.1109/ACCESS.2025.3534434","DOIUrl":"https://doi.org/10.1109/ACCESS.2025.3534434","url":null,"abstract":"Capsule Networks (CapsNets) are a class of neural network architectures that can be used to more accurately model hierarchical relationships due to their hierarchical structure and dynamic routing algorithms. However, their high accuracy comes at the cost of significant memory and computational resources, making them less feasible for deployment on resource-constrained devices. In this paper, progressive bitwidth assignment approaches are introduced to efficiently quantize the CapsNets. Initially, a comprehensive and detailed analysis of parameter quantization in CapsNets is performed exploring various granularities, such as block-wise quantization and dynamic routing quantization. Then, three quantization approaches are applied to progressively quantize the CapsNet, considering various insights into the susceptibility of layers to quantization. The proposed approaches include Post-Training Quantization (PTQ) strategies that minimize the dependence on floating-point operations and incorporates layer-specific integer bit-widths based on quantization error analysis. PTQ strategies employ Power-of-Two (PoT) scaling factors to simplify computations, effectively utilizing hardware shifts and significantly reducing the computational complexity. This technique not only reduces the memory footprint but also maintains accuracy by introducing a range clipping method tailored to the hardware’s capabilities, obviating the need for data preprocessing. Our experimental results on ShallowCaps and DeepCaps networks across multiple datasets (MNIST, Fashion-MNIST, CIFAR-10, and SVHN) demonstrate the efficiency of our approach. Specifically, on the CIFAR-10 dataset using the DeepCaps architecture, we achieved a substantial memory reduction (<inline-formula> <tex-math>$7.02times $ </tex-math></inline-formula> for weights and <inline-formula> <tex-math>$3.74times $ </tex-math></inline-formula> for activations) with a minimal accuracy loss of only 0.09%. By using progressive bitwidth assignment and post-training quantization, this work optimizes CapsNets for efficient, real-time visual processing on resource-constrained edge devices, enabling applications in IoT, mobile platforms, and embedded systems.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"21533-21546"},"PeriodicalIF":3.4,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10854429","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143105690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-27DOI: 10.1109/ACCESS.2025.3534637
Einari Vaaras;Manu Airaksinen;Okko Räsänen
The performance of discriminative machine-learning classifiers, such as neural networks, is limited by training label inconsistencies. Even expert-based annotations can suffer from label inconsistencies, especially in the case of ambiguous phenomena-to-annotate. To address this, we propose a novel algorithm, iterative annotation refinement (IAR) 2.0, for refining inconsistent annotations for time-series data. IAR 2.0 uses a procedure that utilizes discriminative classifiers to iteratively combine original annotations with increasingly accurate posterior estimates of classes present in the data. Unlike most existing label refinement approaches, IAR 2.0 offers a simpler yet effective solution for resolving ambiguities in training labels, working with real label noise on time-series data instead of synthetic label noise on image data. We demonstrate the effectiveness of our algorithm through five distinct classification tasks on two highly distinct data modalities. As a result, we show that the labels produced by IAR 2.0 systematically improve classifier performance compared to using the original labels or a previous state-of-the-art method for label refinement. We also conduct a set of controlled simulations to systematically investigate when IAR 2.0 fails to improve on the original training labels. The simulation results demonstrate that IAR 2.0 improves performance in nearly all tested conditions. We also find that the decrease in performance when IAR 2.0 fails is small compared to the average performance gain when IAR 2.0 succeeds, encouraging the use of IAR 2.0 even when the nature of data is unknown. The code is freely available at https://github.com/SPEECHCOG/IAR_2.
{"title":"IAR 2.0: An Algorithm for Refining Inconsistent Annotations for Time-Series Data Using Discriminative Classifiers","authors":"Einari Vaaras;Manu Airaksinen;Okko Räsänen","doi":"10.1109/ACCESS.2025.3534637","DOIUrl":"https://doi.org/10.1109/ACCESS.2025.3534637","url":null,"abstract":"The performance of discriminative machine-learning classifiers, such as neural networks, is limited by training label inconsistencies. Even expert-based annotations can suffer from label inconsistencies, especially in the case of ambiguous phenomena-to-annotate. To address this, we propose a novel algorithm, iterative annotation refinement (IAR) 2.0, for refining inconsistent annotations for time-series data. IAR 2.0 uses a procedure that utilizes discriminative classifiers to iteratively combine original annotations with increasingly accurate posterior estimates of classes present in the data. Unlike most existing label refinement approaches, IAR 2.0 offers a simpler yet effective solution for resolving ambiguities in training labels, working with real label noise on time-series data instead of synthetic label noise on image data. We demonstrate the effectiveness of our algorithm through five distinct classification tasks on two highly distinct data modalities. As a result, we show that the labels produced by IAR 2.0 systematically improve classifier performance compared to using the original labels or a previous state-of-the-art method for label refinement. We also conduct a set of controlled simulations to systematically investigate when IAR 2.0 fails to improve on the original training labels. The simulation results demonstrate that IAR 2.0 improves performance in nearly all tested conditions. We also find that the decrease in performance when IAR 2.0 fails is small compared to the average performance gain when IAR 2.0 succeeds, encouraging the use of IAR 2.0 even when the nature of data is unknown. The code is freely available at <uri>https://github.com/SPEECHCOG/IAR_2</uri>.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"19979-19995"},"PeriodicalIF":3.4,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10854471","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143105787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}