Acoustic howling due to feedback loops in audio systems is a major challenge in such fields as hearing aids or public address systems. Traditional approaches such as notch filters and adaptive feedback cancellation often have limitations such as lack of adaptability in dynamic environments, and a need for a large amount of labelled data. To overcome these shortcomings, a new deep learning approach, Dynamic Adaptive Thresholding and Self-Supervised Contrastive Learning for Graph-based Temporal Anomaly Recognition (GTAD-CL), is proposed in this paper. By representing audio signals as graphs, GTAD-CL uses graph neural networks to represent complex spatial–temporal patterns to detect howling with high precision as an anomaly. Self-supervised contrastive learning removes the requirement of having labeled datasets which improves the scalability and generalization of the AI models. A dynamic adaptive thresholding mechanism guarantees robust performance under different acoustic conditions, e.g. low signal to noise ratio environments. Integrated with neural filtering in real time, GTAD-CL makes howling suppression easy. Experimental results on a 100-hour custom dataset and six public benchmarks indicate that GTAD-CL has a precision of 0.92 (compared to 0.88, the best baseline, HybridAHS, showing a gain of 4.5%), recall of 0.90 (compared to 0.85, a gain of 5%) and F1-score of 0.91 (compared to 0.865, a gain of 4.5%). In suppression quality GTAD-CL achieves a PESQ score of 3.02 (compared to 2.68 for HybridAHS, i.e. ∼12.7% better), and a STOI of 0.90 (compared to 0.86, i.e. ∼4.7% better). Moreover, GTAD-Cl runs with a real-time factor of 0.36× which is better than HybridAHS’s 0.42× (approx. 14% faster). These results give validation to GTAD-CL as a powerful, scalable, and low-latency solution and high-fidelity solution that is superior to state-of-the-art results for varying acoustic scenarios.
扫码关注我们
求助内容:
应助结果提醒方式:
