Heart sound signals are widely used in medical applications for disease prevention, initial diagnosis, and long-term monitoring of patient conditions. Accurate processing and analysis of heart sound signals allow doctors to better understand the patient’s condition and formulate more appropriate prevention and treatment plans. However, the physician’s recognition of heart sound signals from time series cannot exclude interference from subjective factors when processing such high-dimensional data, resulting in inaccurate recognition results. Additionally, with traditional machine learning methods, further improvement is difficult to achieve, and existing neural network algorithms do not effectively utilize the long-term contextual relationship of time series signals. To address these problems, this study constructed an end-to-end neural network sequence labeling algorithm based on the physical information of heart sound signals and embedded a saliency attentive model network (SAM-Net) module to reduce interference from redundant information. The results of the labeling algorithm were used to design a multichannel feature fusion network for heart sound signals, incorporating a squeeze excitation network (SE-Net) module to accelerate the extraction of target features in different channels, which is different from the traditional classify, recognize, detect, and analyze approach. The proposed method improved robustness and adaptability of classification and recognition of heart sound signals, performing well on the selected dataset, thereby obtaining the highest recognition accuracy of 97.23 % and F1 score of 97.08 %. These results are significantly better than previous classification methods by other researchers. This work provides a clinical informatics tool to assist clinician with early detection of abnormal heart conditions.