{"title":"‘All possible sounds’: speech, music, and the emergence of machine listening","authors":"James E K Parker, Sean Dockray","doi":"10.1080/20551940.2023.2195057","DOIUrl":null,"url":null,"abstract":"ABSTRACT “Machine listening” is one common term for a fast-growing interdisciplinary field of science and engineering that “uses signal processing and machine learning to extract useful information from sound”. This article contributes to the critical literature on machine listening by presenting some of its history as a field. From the 1940s to the 1990s, work on artificial intelligence and audio developed along two streams. There was work on speech recognition/understanding, and work in computer music. In the early 1990s, another stream began to emerge. At institutions such as MIT Media Lab and Stanford’s CCRMA, researchers started turning towards “more fundamental problems of audition”. Propelled by work being done by and alongside musicians, speech and music would increasingly be understood by computer scientists as particular sounds within a broader “auditory scene”. Researchers began to develop machine listening systems for a more diverse range of sounds and classification tasks: often in the service of speech recognition, but also increasingly for their own sake. The soundscape itself was becoming an object of computational concern. Today, the ambition is “to cover all possible sounds”. That is the aspiration with which we must now contend politically, and which this article sets out to historicise and understand.","PeriodicalId":53207,"journal":{"name":"Sound Studies","volume":"17 1","pages":"253 - 281"},"PeriodicalIF":0.4000,"publicationDate":"2023-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sound Studies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/20551940.2023.2195057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"HUMANITIES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
ABSTRACT “Machine listening” is one common term for a fast-growing interdisciplinary field of science and engineering that “uses signal processing and machine learning to extract useful information from sound”. This article contributes to the critical literature on machine listening by presenting some of its history as a field. From the 1940s to the 1990s, work on artificial intelligence and audio developed along two streams. There was work on speech recognition/understanding, and work in computer music. In the early 1990s, another stream began to emerge. At institutions such as MIT Media Lab and Stanford’s CCRMA, researchers started turning towards “more fundamental problems of audition”. Propelled by work being done by and alongside musicians, speech and music would increasingly be understood by computer scientists as particular sounds within a broader “auditory scene”. Researchers began to develop machine listening systems for a more diverse range of sounds and classification tasks: often in the service of speech recognition, but also increasingly for their own sake. The soundscape itself was becoming an object of computational concern. Today, the ambition is “to cover all possible sounds”. That is the aspiration with which we must now contend politically, and which this article sets out to historicise and understand.
“机器听力”是一个快速发展的跨学科科学和工程领域的常见术语,“使用信号处理和机器学习从声音中提取有用的信息”。本文通过介绍机器听力作为一个领域的一些历史,为批评性文献做出了贡献。从20世纪40年代到90年代,人工智能和音频的研究沿着两条方向发展。有关于语音识别/理解的工作,也有关于计算机音乐的工作。在20世纪90年代初,另一股潮流开始出现。在麻省理工学院媒体实验室(MIT Media Lab)和斯坦福大学(Stanford)的CCRMA等机构,研究人员开始转向“听力中更基本的问题”。在音乐家的推动下,语言和音乐将越来越多地被计算机科学家理解为更广泛的“听觉场景”中的特定声音。研究人员开始为更多样化的声音和分类任务开发机器听音系统:通常用于语音识别,但也越来越多地用于它们自己。音景本身正在成为计算关注的对象。如今,他们的目标是“覆盖所有可能的声音”。这就是我们现在必须在政治上与之斗争的愿望,这篇文章旨在将其历史化并加以理解。