Recognition of various human emotions holds significant value in numerous real-world scenarios. This paper focuses on the multimodal fusion of speech and text for emotion recognition. A 39-dimensional Mel-frequency cepstral coefficient (MFCC) was used as a feature for speech emotion. A 300-dimensional word vector obtained through the Glove algorithm was used as the feature for text emotion. The bidirectional gate recurrent unit (BiGRU) method in deep learning was added for extracting deep features. Subsequently, it was combined with the multi-head self-attention (MHA) mechanism and the improved sparrow search algorithm (ISSA) to obtain the ISSA-BiGRU-MHA method for emotion recognition. It was validated on the IEMOCAP and MELD datasets. It was found that MFCC and Glove word vectors exhibited superior recognition effects as features. Comparisons with the support vector machine and convolutional neural network methods revealed that the ISSA-BiGRU-MHA method demonstrated the highest weighted accuracy and unweighted accuracy. Multimodal fusion achieved weighted accuracies of 76.52 %, 71.84 %, 66.72 %, and 62.12 % on the IEMOCAP, MELD, MOSI, and MOSEI datasets, suggesting better performance than unimodal fusion. These results affirm the reliability of the multimodal fusion recognition method, showing its practical applicability.
Combining recommendation systems and AI in e-commerce can improve the user experience and decision-making. This study uses a method called bibliometrics to look at how these systems and artificial intelligence are changing. Of the 120 documents, 91 were analyzed. This shows a growth of 97.16% in the topic. The most influential authors were Paraschakis and Nilsson, with three publications and 43 citations. The magazine Electronic Commerce Research has four publications and 60 citations. China is the top country for citations, with 120, followed by India with 25 publications. The results show that research increased in 2021 and 2022. This shows a shift towards sentiment analysis and convolutional neural networks. The identification of new keywords, such as content-based image retrieval and knowledge graph, shows promising areas for future research. This study provides a solid foundation for future research in e-commerce recommender systems.
Deep learning applications have far-reaching implications in people’s daily lives. Disaster management professionals are becoming increasingly interested in applying deep learning to prepare for and respond to natural disasters. In this paper, we aim to assist natural disaster management professionals in preparing for disasters by developing a framework that can accurately classify natural disasters and interpret the results using a combination of a deep learning model and an XAI method to ensure reliability and ease of interpretation without a technical background. Two main aspects categorize the novelty of our work. The first is utilizing pre-trained Models such as VGGNet19, ResNet50, and ViT for accurate classification of natural disaster images. The second is implementing three explainable AI techniques-Gradient-weighted Class Activation Mapping (Grad-CAM), Grad CAM++, and Local Interpretable Model-agnostic Explanations (LIME) to ensure the interpretability of the model’s predictions, making the decision-making process transparent and reliable. Experiments on the Natural disaster datasets (Niloy et al. 2021) and MEDIC with a ViT-B-32 model achieved a high accuracy of 95.23%. Additionally, explainable artificial intelligence techniques such as LIME, Grad-CAM, and Grad-CAM++ are used to evaluate model performance and visualize decision-making. Our code is available at.1
This paper presents a novel approach to address the essential challenge of accurately determining the total weight of shrimp within aquaculture ponds. Precise weight estimation is crucial in mitigating issues of overfeeding and underfeeding, thus enhancing efficiency and productivity in shrimp farming. The proposed system leverages image processing techniques to detect individual live shrimp and extract pertinent features for weight estimation within a clay pond environment. Specifically, an automated feed tray captures images of live shrimp, which are then processed using a combination of Detectron2, PyTorch, and CUDA (Compute Unified Device Architecture) for individual shrimp detection. Essential features such as area, perimeter, width, length, and posture are extracted through image analysis, enabling accurate estimation of shrimp weight. An Artificial Neural Network (ANN) model, utilizing these features, accurately predicts shrimp weight with a coefficient of determination (R2) of 94.50% when incorporating all extracted features. Furthermore, our system integrates a user-friendly web application that empowers farmers to monitor shrimp weight trends, facilitating precision feeding strategies and effective farm management. This study contributes a low-cost solution using a deep learning model to estimate the weight of live Pacific white shrimp in clay ponds, enabling daily weight calculations, helping farmers optimize feed quantities, providing shrimp size distribution insights, and reducing the Feed Conversion Ratio (FCR) for greater profitability. The procedure for shrimp feature extraction is also provided, including the calculation of shrimp length and width, as well as shrimp posture classification.
Obesity is a critical health issue associated with severe medical conditions. To enhance public health and well-being, early prediction of obesity risk is crucial. This study introduces an innovative approach to predicting obesity levels using explainable artificial intelligence, focusing on lifestyle factors rather than traditional BMI measures. Our best-performing machine learning model, free from BMI parameters, achieved 86.5% accuracy using the Random Forest algorithm. Explainability techniques, including SHAP, PDP and feature importance are employed to gain insights into lifestyle factors’ impact on obesity. Key findings indicate the importance of meal frequency and technology usage. This work demonstrates the significance of lifestyle factors in obesity risk and the power of model-agnostic methods to uncover these relationships.
Large Language Models (LLMs) have become a hot topic in AI due to their ability to mimic human conversation. This study compares the open artificial intelligence generative pretrained transformer-4 (GPT-4) model, based on the (GPT), and Google's artificial intelligence (AI), which is based on the Bidirectional Encoder Representations from Transformers (BERT) framework in terms of the defined capabilities and the built-in architecture. Both LLMs are prominent in AI applications. First, eight different capabilities were identified to evaluate these models, i.e. translation accuracy, text generation, factuality, creativity, intellect, deception avoidance, sentiment classification, and sarcasm detection. Next, each capability was assessed using instructions. Additionally, a categorized LLM evaluation system has been developed by means of using ten research questions per category based on this paper's main contributions from a prompt engineering perspective. It should be highlighted that GPT-4 and Google AI successfully answered 85 % and 68,7 % of the study prompts, respectively. It has been noted that GPT-4 better understands prompts than Google AI, even with verbal flaws, and tolerates grammatical errors. Moreover, the GPT-4 based approach was more precise, accurate, and succinct than Google AI, which was sometimes verbose and less realistic. While GPT-4 beats Google AI in terms of translation accuracy, text generation, factuality, intellectuality, creativity, and deception avoidance, Google AI outperforms the former when considering sarcasm detection. Both sentiment classification models did work properly. More importantly, a human panel of judges was used to assess and evaluate the model comparisons. Statistical analysis of the judges' ratings revealed more robust results based on examining the specific uses, limitations, and expectations of both GPT-4 and Google AI-based approaches. Finally, the two approaches' transformers, parameter sizes, and attention mechanisms have been examined.
Reference image-based Super-Resolution (RefSR) is introduced to improve the quality of a Low-resolution (LR) input image by leveraging the additional information provided by a High-Resolution (HR) reference image (Ref). While existing RefSR methods focus on thermal or visible flows separately, they often struggle to enhance the resolution of small objects such as Mini/Micro UAVs (Unmanned Aerial Vehicle) due to the resolution disparities between the input and reference images. To cope with these challenges when dealing with UAV early detection in context of video surveillance, we propose ThermoVisSR, a multiscale texture transformer for enhancing the Super-Resolution (SR) of visible and thermal images of Mini/Micro UAVs. Our approach tries to reconstruct the fine details of these objects while preserving their approximation (the body form and color of the different scene objects) already contained in the LR image. Hence, our model is divided up into two streams dealing separately with approximation and detail reconstruction. In the first one, we introduce a Convolution Neural Network (CNN) fusion backbone to extract the Low-Frequency (LF) approximation from the original LR image pairs. In the second one and to extract the details from the Ref image, our approach involves blending features from both visible and thermal sources to make the most of what each offer. Subsequently, we introduce the High-Frequency Texture Transformer (HFTT) across various resolutions of the merged features to ensure an accurate correspondence matching and significant transfer of High-Frequency (HF) patches from Ref to LR images. Moreover, to adapt the injection to the different bands well, we incorporate the separable software decoder (SSD) into the HFTT allowing to capture channel-specific details during the reconstruction phase. We validated our approach using a newly created dataset of Air images of Mini/Micro UAVs. Experimental results demonstrate that the proposed model consistently outperforms the state-of-the-art approaches on both qualitative and quantitative assessments.
Aiming at the problem of low production efficiency and inability to fully unleash production capacity in current semiconductor enterprises, a production efficiency optimization model for semiconductor enterprises has been studied and constructed. This model transforms the problem of low production efficiency into a problem of locating and solving the decoupling point of enterprise customer orders, and comprehensively considers the situation of sudden changes in enterprise production orders when locating and solving the decoupling point of customer orders. Propose to use a three-layer coding (TLC) mechanism to improve and optimize the biogeographical optimization algorithm, and use the improved biogeographical optimization (IOBO) algorithm to solve the production efficiency optimization problem of semiconductor enterprises. The results show that the proposed IBBO-TCL algorithm has a fast convergence speed and the minimum root mean square error after convergence. And this method can accurately solve the decoupling point of customer orders for semiconductor enterprises. The method proposed in the study has effectively improved the production efficiency of semiconductor enterprises and has guiding significance for optimizing enterprise structure.