Deep neural networks trained on labeled medical data face major challenges owing to the economic costs of data acquisition through expensive medical imaging devices, expert labor for data annotation, and large datasets to achieve optimal model performance. The heterogeneity of diseases, such as Alzheimer's disease, further complicates deep learning because the test cases may substantially differ from the training data, possibly increasing the rate of false positives. We propose a reconstruction-based self-supervised anomaly detection model to overcome these challenges. It has a dual-subnetwork encoder that enhances feature encoding augmented by skip connections to the decoder for improving the gradient flow. The novel encoder captures local and global features to improve image reconstruction. In addition, we introduce an entropy-based image conversion method. Extensive evaluations show that the proposed model outperforms benchmark models in anomaly detection and classification using an encoder. The supervised and unsupervised models show improved performances when trained with data preprocessed using the proposed image conversion method.
In e-commerce platforms, sentiment analysis on an enormous number of user reviews efficiently enhances user satisfaction. In this article, an automated product recommendation system is developed based on machine and deep-learning models. In the initial step, the text data are acquired from the Amazon Product Reviews dataset, which includes 60 000 customer reviews with 14 806 neutral reviews, 19 567 negative reviews, and 25 627 positive reviews. Further, the text data denoising is carried out using techniques such as stop word removal, stemming, segregation, lemmatization, and tokenization. Removing stop-words (duplicate and inconsistent text) and other denoising techniques improves the classification performance and decreases the training time of the model. Next, vectorization is accomplished utilizing the term frequency–inverse document frequency technique, which converts denoised text to numerical vectors for faster code execution. The obtained feature vectors are given to the modified convolutional neural network model for sentiment analysis on e-commerce platforms. The empirical result shows that the proposed model obtained a mean accuracy of 97.40% on the APR dataset.
The performance of face recognition (FR) has reached a plateau for public benchmark datasets, such as labeled faces in the wild (LFW), celebrities in frontal-profile in the wild (CFP-FP), and the first manually collected, in-the-wild age database (AgeDB), owing to the rapid advances in convolutional neural networks (CNNs). However, the effects of faces under various fine-grained conditions on FR models have not been investigated, owing to the absence of relevant datasets. This paper analyzes their effects under different conditions and loss functions using K-FACE, a recently introduced FR dataset with fine-grained conditions. We propose a novel loss function called MixFace, which combines classification and metric losses. The superiority of MixFace in terms of effectiveness and robustness was experimentally demonstrated using various benchmark datasets.
Violence can be committed anywhere, even in crowded places. It is hence necessary to monitor human activities for public safety. Surveillance cameras can monitor surrounding activities but require human assistance to continuously monitor every incident. Automatic violence detection is needed for early warning and fast response. However, such automation is still challenging because of low video resolution and blind spots. This paper uses ResNet50v2 and the gated recurrent unit (GRU) algorithm to detect violence in the Movies, Hockey, and Crowd video datasets. Spatial features were extracted from each frame sequence of the video using a pretrained model from ResNet50V2, which was then classified using the optimal trained model on the GRU architecture. The experimental results were then compared with wavelet feature extraction methods and classification models, such as the convolutional neural network and long short-term memory. The results show that the proposed combination of ResNet50V2 and GRU is robust and delivers the best performance in terms of accuracy, recall, precision, and F1-score. The use of ResNet50V2 for feature extraction can improve model performance.
This study introduces CR-M-SpanBERT, a coreference resolution (CR) model that utilizes multiple embedding-based span bidirectional encoder representations from transformers, for antecedent recognition in natural language (NL) text. Information extraction studies aimed to extract knowledge from NL text autonomously and cost-effectively. However, the extracted information may not represent knowledge accurately owing to the presence of ambiguous entities. Therefore, we propose a CR model that identifies mentions referring to the same entity in NL text. In the case of CR, it is necessary to understand both the syntax and semantics of the NL text simultaneously. Therefore, multiple embeddings are generated for CR, which can include syntactic and semantic information for each word. We evaluate the effectiveness of CR-M-SpanBERT by comparing it to a model that uses SpanBERT as the language model in CR studies. The results demonstrate that our proposed deep neural network model achieves high-recognition accuracy for extracting antecedents from NL text. Additionally, it requires fewer epochs to achieve an average F1 accuracy greater than 75% compared with the conventional SpanBERT approach.