Medical image description can be applied to clinical medical diagnosis, but the field still faces serious challenges. There is a serious problem of visual and textual data bias in medical datasets, which are the imbalanced distribution of health and disease data. This can greatly affect the learning performance of data-driven neural networks and finally lead to errors in the generated medical image descriptions. To address this problem, we propose a new medical image description network architecture named multimodal data-assisted knowledge fusion network (MDAKF), which introduces multimodal auxiliary signals to guide the Transformer network to generate more accurate medical reports. In detail, audio auxiliary signals provide clear abnormal visual regions to alleviate the visual data bias problem. However, the audio modality signals with similar pronunciation lack recognizability, which may lead to incorrect mapping of audio labels to medical image regions. Therefore, we further fuse the audio with text features as the auxiliary signal to improve the overall performance of the model. Through the experiments on two medical image description datasets, IU-X-ray and COV-CTR, it is found that the proposed model is superior to the previous models in terms of language generation evaluation indicators.
Visual characteristics have the potential to assess the navigational proficiency of ship pilots. A precise assessment of ship piloting competence is imperative to mitigate human errors in piloting. An exhaustive examination of cognitive capabilities plays a pivotal role in developing an enhanced and refined system for classifying, selecting, and training ship piloting proficiency. Insufficiency in situation awareness (SA), denoting the cognitive underpinning of hazardous behaviors among pilots, may lead to subpar performance in ship pilotage when faced with adverse conditions. To address this issue, we propose an SA recognition model based on the random forest-support vector machine (RF-SVM) algorithm, which utilizes wearable eye-tracking technology to detect pilots’ at-risk cognitive state, specifically low-SA levels. We rectify the relative error (RE) and root mean square error (RMSE) and employ principal component analysis (PCA) to enhance the RF algorithm, optimizing the combination of salient features in greater depth. Through the utilization of these feature combinations, we construct a SVM algorithm using the most suitable variables for SA recognition. Our proposed RF-SVM algorithm is compared to RF or SVM alone, and it achieves the highest accuracy in recognizing at-risk cognitive states under poor visibility conditions (an improvement of 86.79% to 93.43% in accuracy). Taken collectively, the present findings offer vital technical support for developing a technique-based intelligent system for adaptively evaluating the cognitive accomplishment of pilots. Furthermore, they establish the groundwork and framework for the surveillance of cognitive processes and capabilities in marine pilotage operations within China.
Knowledge graphs can effectively organize and represent information related to emergency resources for unforeseen sudden events. In this study, we construct a model layer for the knowledge graph of emergency resources, focused on sudden events, through the classification and analysis of unforeseen disaster measures. This study defines eight interconnected entity types, each characterised by a set of attributes and engaging in one or more relationships with other entity types. Utilizing 121 incident investigation reports from the emergency management departments of various provinces and cities over the past five years, we select five entities with the highest frequency of occurrence along with their corresponding four relationships. We then design an extraction plan for these entities and relationships. Based on the completed knowledge graph data, we formulate 14 questions related to emergency resources for sudden events and construct 19 corresponding question-and-answer templates using a template-based question-answering (QA) approach. We retrieve the corresponding Cypher statement templates through template mapping and obtain the question answers through querying. Finally, we design a knowledge graph question-and-answer system using the Django web framework, which includes entity queries and knowledge QA functions, specifically for emergency resources related to sudden events.
Available subspace clustering methods often contain two stages, finding low-dimensional subspaces of data and then conducting clustering in the subspaces. Therefore, how to find the subspaces that better represent the original data becomes a research challenge. However, most of the reported methods are based on the premise that the contributions of different features are equal, which may not be ideal for real scenarios, i.e., the contributions of the important features may be overwhelmed by a large amount of redundant features. In this study, a weighted subspace fuzzy clustering (WSFC) model with a locality preservation mechanism is presented, which can adaptively capture the importance of different features, achieve an optimal lower-dimensional subspace, and perform fuzzy clustering simultaneously. Since each feature can be well quantified in terms of its importance, the proposed model exhibits the sparsity and robustness of fuzzy clustering. The intrinsic geometrical structures of data can also be preserved while enhancing the interpretability of clustering tasks. Extensive experimental results show that WSFC can allocate appropriate weights to different features according to data distributions and clustering tasks and achieve superior performance compared to other clustering models on real-world datasets.
Effective traffic congestion prediction is need of the hour in a modern smart city to save time and improve the quality of life for citizens. In this study, AB_AO (ARIMA Bi-LSTM using Aquila optimizer), a hybrid predictive model, is proposed using the most effective time-series data prediction statistical model ARIMA (Autoregressive Integrated Moving Average) and sequential predictive Deep Learning (DL) technique LSTM (Long Short-Term Memory) which helps in traffic congestion prediction with a minimum error rate. Also, the Aquila optimizer (AO) is used to elevate the adequacy of the AB_AO model. Three road traffic datasets of different cities from the “CityPulse EU FP7 project” are used to implement the proposed hybrid model. In a time-series dataset, two components need to be handled with care, i.e., linear and nonlinear. In this study, the ARIMA model has been used to manage linear components and Bi-LSTM is used to handle nonlinear components of the time-series dataset. The Aquila Optimizer (AO) is used for hyperparametric tuning to enhance the performance of Bi-LSTM. Error measurement parameters like the Mean Absolute Error (MAE), Mean Squared Error (MSE), and Mean Absolute Percentage Error (MAPE) are used to validate the results. A detailed mathematical and empirical analysis is given to justify the performance of the AB_AO model using an ablation study and comparative analysis. The AB_AO model acquires more stable and precise results with MSE as 18.78, MAE as 3.18, and MAPE as 0.21 than other models. It may further help to predict the vehicle count on the road, which may be of great help in reducing wastage of time in traffic congestion.
Bitcoin-NG is an extensible blockchain protocol based on the same trust model as Bitcoin. It divides each epoch into one keyblock and multiple microblocks, effectively improving the transaction processing capacity. Bitcoin-NG adopts a special incentive mechanism (i.e., the transaction fees in each epoch are split to the current and next leader) to maintain its security. However, there are some limitations to the existing incentive analysis of Bitcoin-NG in recent works. First, the incentive division method of Bitcoin-NG only includes some specific mining attack strategies of the adversary, while ignoring more stubborn attack strategies. Second, once adversaries find a whale transaction, they will deviate from the honest mining strategies to obtain an extra reward. In this paper, we are committed to solving these two limitations. First, we propose a novel mining strategy named Greedy-Mine attack. Then, we formulate a Markov reward process (MRP) model to analyze the competition of honest miners and adversaries. Furthermore, we analyze the extra reward of adversaries and summarize the mining power proportion required for malicious adversaries to launch Greedy-Mine to obtain extra returns. Meanwhile, we make a backward-compatibility progressive modification to Bitcoin-NG protocol that would raise the threshold of propagation factor from 0 to 1. Finally, we get the winning condition of adversaries when adopting Greedy-Mine, compared with honest mining. Simulation and experimental results indicate that Bitcoin-NG is not incentive compatible, which is vulnerable to Greedy-Mine attack.
The present study proposes a novel method for predicting the discharge capabilities of lithium-ion (Li-ion) batteries using a digital twin model in practice. By combining cutting-edge machine learning techniques, such as AdaBoost and long short-term memory (LSTM) network, with a semiempirical mathematical structure, the digital twin (DT)—a virtual representation that mimics the behavior of actual batteries in real time is constructed. Various metaheuristic optimization methods, such as antlion, grey wolf optimization (GWO), and improved grey wolf optimization (IGWO), are used to adjust hyperparameters in order to optimize the models. As indicators of performance, mean absolute error (MAE) and root-mean-square error (RMSE) are applied to the models after they have undergone extensive training and ten-fold cross-validation. The models are rigorously trained and cross-validated using the NASA battery aging dataset, a widely accepted benchmark dataset for battery research. The IGWO-AdaBoost digital twin model emerges as the standout performer, achieving exceptional accuracy in predicting the discharge capacity. This model demonstrates the lowest mean absolute error (MAE) of 0.01, showcasing its superior precision in estimating discharge capabilities. Additionally, the root mean square error (RMSE) for the IGWO-AdaBoost DT model is also the lowest at 0.01. The findings of this study offer insightful information about the potential utilization of the digital twin model to accurately predict the discharge capacity of batteries.
Background and Objective. Currently, depression is a widespread global issue that imposes a significant burden and disability on individuals, families, and society. Deep learning (DL) has emerged as a valuable approach for automatically detecting depression by extracting cues from audiovisual data and making a diagnosis. PHQ-8 is considered a validated diagnostic tool for depressive disorders in clinical studies, and the objective of this experiment is to improve the accuracy of PHQ-8 prediction. Furthermore, this paper aims to demonstrate the effectiveness of expert knowledge in depression diagnosis and discuss a novel multimodal network architecture. Methods. This research paper focuses on multimodal depression analysis, proposing a flexible parallel transformer (FPT) model capable of extracting data from three distinct modalities (i.e., one video and two audio descriptors). The FPT-Former model incorporates three paths, each using expert-knowledge-based descriptors from one modality as inputs. These descriptors are represented into 32 features by the encoder part of a transformer module, and these features are fused to realize the final regression of PHQ-8 score. The extended distress analysis interview corpus (E-DAIC) is an expansion of WOZ-DAIC which comprises semiclinical interviews intended to assist in the diagnosis of psychological distress conditions. It encompasses a sample size of 275 participants, and in this study, it was utilized to test the model in a way of 10-fold cross-validation. Results. The FPT presented herein achieved comparable performance to the state-of-the-art works, with a root mean square error (RMSE) of 4.80 and a mean absolute error (MAE) of 4.58. The ablation experiments demonstrate that the three-modality-fused model outperforms other two-modality-fused and single-modality models. While using a PHQ-8 score threshold of 10, the accuracy of the depression classification is 0.79. Conclusions. Leveraging the strength of expert-knowledge-based multimodal measures and parallel transformer structure, the FPT model exhibits promising performance in depression detection. This model improved the accuracy of depression diagnosis through audio and video, and it also proved the effectiveness of using expert-knowledge in the diagnosis of depression. The traits of flexible structure, high predictive efficiency, and secure privacy protection make our model a promotable intelligent system in mental healthcare.