Aiming to help improve quality of life of the visually impaired people, this paper presents a novel wearable aid in the shape of a helmet for helping them find objects in indoor scenes. An object-goal navigation system based on a wearable device is developed, which consists of four modules: object relation prior knowledge (ORPK), perception, decision and feedback. To make the aid also work well in unfamiliar environment, ORPK is used for sub-goal inference to help the user find the target goal. And a method that learns the ORPK from unlabelled images by utilising a scene graph and knowledge graph is proposed. The effectiveness of the aid is demonstrated in real world experiments.
Face detection is the basic step of many face analysis tasks. In practice, face detectors usually run on mobile devices with limited memory and computing resources. Therefore, it is important to keep face detectors lightweight. To this end, current methods usually focus on directly designing lightweight detectors. Nevertheless, it is not fully explored whether the resource consumption of these lightweight detectors can be further suppressed without too much sacrifice on accuracy. In this study, we propose to apply the network pruning method to the lightweight face detection network, to further reduce its parameters and floating point operations. To identify the channels of less importance, we perform network training with sparsity regularisation on channel scaling factors of each layer. Then, we remove the connections and corresponding weights with near-zero scaling factors after sparsity training. We apply the proposed pruning pipeline to a state-of-the-art face detection method, EagleEye, and get a shrunken EagleEye model, which has a reduced number of computing operations and parameters. The shrunken model achieves comparable accuracy as the unpruned model. By using the proposed method, the shrunken EagleEye achieves a 56.3% reduction of parameter size with almost no accuracy loss on the WiderFace dataset.
Engineers solving engineering design problems can be regarded as a gradual optimisation process that involves strategising. The process can be modelled as a reinforcement learning (RL) framework. This article presents an RL model with episodic controllers to solve engineering problems. Episodic controllers provide a mechanism for using the short-term and long-term memories to improve the efficiency of searching for engineering problem solutions. This work demonstrates that the two kinds of models of memories can be incorporated into the existing RL framework. Finally, an optimised design problem of a crane girder is illustrated by RL with episodic controllers. The work presented in this study leverages the RL model that has been shown to mimic human problem solving in engineering optimised design problems.
In order to solve the problem that the existing methods cannot effectively capture the semantic emotion of the sentence when faced with the lack of cross-language corpus, it is difficult to effectively perform cross-language sentiment analysis, we propose a neural network architecture called the Attention-Based Hybrid Robust Neural Network. The proposed architecture includes pre-trained word embedding with fine-tuning training to obtain prior semantic information, two sub-networks and attention mechanism to capture the global semantic emotional information in the text, and a fully connected layer and softmax function to jointly perform final emotional classification. The Convolutional Neural Networks sub-network captures the local semantic emotional information of the text, the BiLSTM sub-network captures the contextual semantic emotional information of the text, and the attention mechanism dynamically integrates the semantic emotional information to obtain key emotional information. We conduct experiments on Chinese (International Conference on Natural Language Processing and Chinese Computing) and English (SST) datasets. The experiment is divided into three subtasks to evaluate the superiority of our method. It improves the recognition accuracy of single sentence positive/negative classification from 79% to 86% in the single-language emotion recognition task. The recognition performance of fine-grained emotional tags is also improved by 9.6%. The recognition accuracy of cross-language emotion recognition tasks has also been improved by 1.5%. Even in the face of faulty data, the performance of our model is not significantly reduced when the error rate is less than 20%. These experimental results prove the superiority of our method.
The article presents the design and development of a virtual fretless Chinese stringed instrument App with the Duxianqin as an example, whose performance is expected to be no different from a real instrument. The digital simulation of fretless musical instruments is mainly divided into two parts: the simulation of the continuous pitch processing of the strings, and the simulation of the sound produced by plucking strings. The article returns to the theory of mechanics and wave theory and obtains the quantitative relationship between string frequency and its deformation and elongation. The Duxianqin selected in this article is a fretless instrument, which cannot be completely simulated by relying solely on sound source data. Playing and vocalization require real-time synthesis through pitch processing, which has certain reference significance for the realization of other fretless instruments.
String instruments, wind instruments and percussion instruments are three traditional categories of musical instruments, among which wind instruments play an important role. Usually, pitches of wind instruments are determined by the vibrating air column, and the musical pitches will be affected by multiple factors of the air flow. In this article, the mechanism of sound production by a pipe is analysed in terms of the coupling of the edge tone and the air column's vibration in the tube. Experiments and computational fluid dynamics numerical calculations are combined to study the influence of the jet velocity on the oscillation frequency of the edge tone and the musical sound produced by the tube, which help to gain deeper insight into the relation between physics and music.
This paper is part of a special issue on Music Technology. We study the type recognition of traditional Chinese musical instrument audio in the common way. Using MEL spectrum characteristics as input, we train an 8-layer convolutional neural network, and finally achieve 99.3% accuracy. After that, this paper mainly studies the performance skill recognition of Chinese traditional musical instruments. Firstly, for a single instrument, the features were extracted by using the pre-trained ResNet model, and then the SVM algorithm was used to classify all the instruments with an accuracy of 99%. Then, in order to improve the generalization of the model, the paper proposes the performance skill recognition of the same kind of instruments. In this way, the regularity of the same playing technique of different instruments can be utilized. Finally, the recognition accuracy of the four kinds of instruments is as follows: 95.7% for blowing instruments, 82.2% for plucked-string instruments, 88.3% for strings instruments, and 97.5% for percussion instruments. We open source the audio database of traditional Chinese musical instruments and the Python source code of the whole experiment for further research.
Patients with cerebral haemorrhages need to drain haematomas. Fresh blood may appear during the haematoma drainage process, so this needs to be observed and detected in real time. To solve this problem, this paper studies images produced during the haematoma drainage process. A blood image feature selection recognition and classification framework is designed. First, aiming at the characteristics of the small colour differences in blood images, the general RGB colour space feature is not obvious. This study proposes an optimal colour channel selection method. By extracting the colour information from the images, it is recombined into a 3 × 3 matrix. The normalised 4-neighbourhood contrast and variance are calculated for quantitative comparison. The optimised colour channel is selected to overcome the problem of weak features caused by a single colour space. After that, the effective region in the image is intercepted, and the best colour channel of the image in the region is transformed. The first, second and third moments of the three best colour channels are extracted to form a nine-dimensional eigenvector. K-means clustering is used to obtain the image eigenvector, outliers are removed, and the results are then transferred to the hidden Markov model (HMM) and support vector machine (SVM) for classification. After selecting the best color channel, the classification accuracy of HMM-SVM is greatly improved. Compared with other classification algorithms, the proposed method offers great advantages. Experiments show that the recognition accuracy of this method reaches 98.9%.
Music can be regarded as an art of expressing inner feelings. However, most of the existing networks for music generation ignore the analysis of its emotional expression. In this paper, we propose to synthesise music according to the specified emotion, and also integrate the internal structural characteristics of music into the generation process. Specifically, we embed the emotional labels along with music structure features as the conditional input and then investigate the GRU network for generating emotional music. In addition to the generator, we also design a novel perceptually optimised emotion classification model which aims for promoting the generated music close to the emotion expression of real music. In order to validate the effectiveness of the proposed framework, both the subjective and objective experiments are conducted to verify that our method can produce emotional music correlated to the specified emotion and music structures.