Tejes Srivastava, Ju-Chieh Chou, Priyank Shroff, Karen Livescu, Christopher Graziul
{"title":"用于分析警用无线电通信的语音识别技术","authors":"Tejes Srivastava, Ju-Chieh Chou, Priyank Shroff, Karen Livescu, Christopher Graziul","doi":"arxiv-2409.10858","DOIUrl":null,"url":null,"abstract":"Police departments around the world use two-way radio for coordination. These\nbroadcast police communications (BPC) are a unique source of information about\neveryday police activity and emergency response. Yet BPC are not transcribed,\nand their naturalistic audio properties make automatic transcription\nchallenging. We collect a corpus of roughly 62,000 manually transcribed radio\ntransmissions (~46 hours of audio) to evaluate the feasibility of automatic\nspeech recognition (ASR) using modern recognition models. We evaluate the\nperformance of off-the-shelf speech recognizers, models fine-tuned on BPC data,\nand customized end-to-end models. We find that both human and machine\ntranscription is challenging in this domain. Large off-the-shelf ASR models\nperform poorly, but fine-tuned models can reach the approximate range of human\nperformance. Our work suggests directions for future work, including analysis\nof short utterances and potential miscommunication in police radio\ninteractions. We make our corpus and data annotation pipeline available to\nother researchers, to enable further research on recognition and analysis of\npolice communication.","PeriodicalId":501284,"journal":{"name":"arXiv - EE - Audio and Speech Processing","volume":"96 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Speech Recognition for Analysis of Police Radio Communication\",\"authors\":\"Tejes Srivastava, Ju-Chieh Chou, Priyank Shroff, Karen Livescu, Christopher Graziul\",\"doi\":\"arxiv-2409.10858\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Police departments around the world use two-way radio for coordination. These\\nbroadcast police communications (BPC) are a unique source of information about\\neveryday police activity and emergency response. Yet BPC are not transcribed,\\nand their naturalistic audio properties make automatic transcription\\nchallenging. We collect a corpus of roughly 62,000 manually transcribed radio\\ntransmissions (~46 hours of audio) to evaluate the feasibility of automatic\\nspeech recognition (ASR) using modern recognition models. We evaluate the\\nperformance of off-the-shelf speech recognizers, models fine-tuned on BPC data,\\nand customized end-to-end models. We find that both human and machine\\ntranscription is challenging in this domain. Large off-the-shelf ASR models\\nperform poorly, but fine-tuned models can reach the approximate range of human\\nperformance. Our work suggests directions for future work, including analysis\\nof short utterances and potential miscommunication in police radio\\ninteractions. We make our corpus and data annotation pipeline available to\\nother researchers, to enable further research on recognition and analysis of\\npolice communication.\",\"PeriodicalId\":501284,\"journal\":{\"name\":\"arXiv - EE - Audio and Speech Processing\",\"volume\":\"96 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - EE - Audio and Speech Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.10858\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - EE - Audio and Speech Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10858","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Speech Recognition for Analysis of Police Radio Communication
Police departments around the world use two-way radio for coordination. These
broadcast police communications (BPC) are a unique source of information about
everyday police activity and emergency response. Yet BPC are not transcribed,
and their naturalistic audio properties make automatic transcription
challenging. We collect a corpus of roughly 62,000 manually transcribed radio
transmissions (~46 hours of audio) to evaluate the feasibility of automatic
speech recognition (ASR) using modern recognition models. We evaluate the
performance of off-the-shelf speech recognizers, models fine-tuned on BPC data,
and customized end-to-end models. We find that both human and machine
transcription is challenging in this domain. Large off-the-shelf ASR models
perform poorly, but fine-tuned models can reach the approximate range of human
performance. Our work suggests directions for future work, including analysis
of short utterances and potential miscommunication in police radio
interactions. We make our corpus and data annotation pipeline available to
other researchers, to enable further research on recognition and analysis of
police communication.