Alcohol poisoning is a severe health concern resulting from excessive drinking and can be life-threatening. By utilizing home monitoring, individuals can quickly determine their blood alcohol content, thus preventing it from reaching hazardous levels. However, most existing systems for drunkenness detection require extra hardware or much effort from the user, making these systems impractical for detecting drunkenness in real life. Motivated by this, we present a device-free, noise-resistant drunkenness detection system named HearDrinking based on smartphone, which utilizes microphone of smartphone to record human’s voice activity, then mine drunkenness related features to yield accurate drunkenness detection. However, using acoustic signal to detect drunkenness is non-trivial since voice activities are prone to be interfered by ambient noise, and extracting fine-grained representations related to drunkenness from voice activities remains unresolved. On one hand, HearDrinking employs a multi-modal fusion method to realize noise-resistant voice activity detection. On the other hand, HearDrinking initially calculates the log-Mel spectrograms from the speech signal. The log-Mel spectrograms contain temporal and spectral information absent in image data. Therefore, conventional convolutions designed for images often have limited effectiveness in extracting features from log-Mel spectrograms. To overcome this limitation, we integrate Omni-dimensional Dynamic Convolution (ODConv) with ShuffleNetV2, creating OD-ShuffleNetV2. ODConv replaces certain conventional convolutions in the ShuffleNetV2 network. Multiple convolution cores are fused based on the log-Mel spectrogram, taking into account multi-dimensional attention, thereby optimizing the network structure. Comprehensive experiments with 15 participants reveal drunkenness detection accuracy of 96.08% and Blood Alcohol Content (BAC) predictions with an average error of 5 mg/dl.
扫码关注我们
求助内容:
应助结果提醒方式:
