文本到语音的艺术

IF 2 2区社会学 Q1 CULTURAL STUDIES Critical Inquiry Pub Date : 2024-01-01 DOI:10.1086/727651

Benjamin Lindquist

{"title":"文本到语音的艺术","authors":"Benjamin Lindquist","doi":"10.1086/727651","DOIUrl":null,"url":null,"abstract":"Long before Siri and ChatGPT uttered their first automated words, there was only one way to program synthetic speech: with paint and brush. During the transformative years between 1930 and 1960, artists, linguists, and engineers mixed sound and image in a way that combined artistic production with new technologies. What was known as “synthesis-by-art” grew into the rules that power computer speech today. This article concentrates on the emergence of rule-based speech synthesis at Haskins Laboratories in mid-twentieth-century America. An unexpected outgrowth of their work with disabled Second World War veterans, members of the Haskins group had developed a new machine that converted visual patterns into sound: the Pattern Playback. Like holes in a player-piano roll, painted shapes were mechanically translated into distinct sounds. Early experiments at the laboratory promised “a new art form.” Researchers painted pictures of music and listened to geometric shapes. This work eventually grew into a psycholinguistic program committed to painting the shapes of speech. But these early aesthetic experiments had helped researchers cultivate a familiarity with paint, brush, and subjective bodily knowledge. This allowed them to intuitively develop a recipe for painting synthetic speech. In other words, their painting hands enacted knowledge long before they could articulate the complex rules that govern how phonemes interact. By the late 1950s, lab member Frances Ingemann successfully converted this “embodied knowing” into a machine-legible code that rigorously detailed how to paint synthetic speech. She had hoped that her rules might result in a reading machine for blind users that would automatically convert text into speech. Instead, her work was coopted by J. C. R. Licklider, who described Ingemann’s rulebook as “a digital code, suitable for use by computing machines.” While Licklider would use the work of Haskins Laboratories to spearhead his novel concept of man-computer symbiosis, he obscured the extent to which this digital code grew from the anomalous bodies of wounded war veterans and the subjective knowing of painting hands. Indeed, the forgotten history of early text-to-speech shows the indivisibility of interactive computing and digital codes from the material practices and embodied cognition from which they grew.","PeriodicalId":48130,"journal":{"name":"Critical Inquiry","volume":"68 27","pages":"225 - 251"},"PeriodicalIF":2.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Art of Text-to-Speech\",\"authors\":\"Benjamin Lindquist\",\"doi\":\"10.1086/727651\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Long before Siri and ChatGPT uttered their first automated words, there was only one way to program synthetic speech: with paint and brush. During the transformative years between 1930 and 1960, artists, linguists, and engineers mixed sound and image in a way that combined artistic production with new technologies. What was known as “synthesis-by-art” grew into the rules that power computer speech today. This article concentrates on the emergence of rule-based speech synthesis at Haskins Laboratories in mid-twentieth-century America. An unexpected outgrowth of their work with disabled Second World War veterans, members of the Haskins group had developed a new machine that converted visual patterns into sound: the Pattern Playback. Like holes in a player-piano roll, painted shapes were mechanically translated into distinct sounds. Early experiments at the laboratory promised “a new art form.” Researchers painted pictures of music and listened to geometric shapes. This work eventually grew into a psycholinguistic program committed to painting the shapes of speech. But these early aesthetic experiments had helped researchers cultivate a familiarity with paint, brush, and subjective bodily knowledge. This allowed them to intuitively develop a recipe for painting synthetic speech. In other words, their painting hands enacted knowledge long before they could articulate the complex rules that govern how phonemes interact. By the late 1950s, lab member Frances Ingemann successfully converted this “embodied knowing” into a machine-legible code that rigorously detailed how to paint synthetic speech. She had hoped that her rules might result in a reading machine for blind users that would automatically convert text into speech. Instead, her work was coopted by J. C. R. Licklider, who described Ingemann’s rulebook as “a digital code, suitable for use by computing machines.” While Licklider would use the work of Haskins Laboratories to spearhead his novel concept of man-computer symbiosis, he obscured the extent to which this digital code grew from the anomalous bodies of wounded war veterans and the subjective knowing of painting hands. Indeed, the forgotten history of early text-to-speech shows the indivisibility of interactive computing and digital codes from the material practices and embodied cognition from which they grew.\",\"PeriodicalId\":48130,\"journal\":{\"name\":\"Critical Inquiry\",\"volume\":\"68 27\",\"pages\":\"225 - 251\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Critical Inquiry\",\"FirstCategoryId\":\"90\",\"ListUrlMain\":\"https://doi.org/10.1086/727651\",\"RegionNum\":2,\"RegionCategory\":\"社会学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CULTURAL STUDIES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Critical Inquiry","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1086/727651","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CULTURAL STUDIES","Score":null,"Total":0}

引用次数: 0

摘要

早在 Siri 和 ChatGPT 发出第一句自动语音之前，合成语音的编程方式只有一种：用颜料和画笔。在 1930 年至 1960 年的变革时期，艺术家、语言学家和工程师们将声音和图像混合在一起，将艺术创作与新技术相结合。这种被称为 "艺术合成 "的方式逐渐发展成为今天计算机语音的基本规则。本文主要介绍二十世纪中期美国哈斯金斯实验室出现的基于规则的语音合成技术。哈斯金斯小组的成员在为第二次世界大战伤残退伍军人服务的过程中，意外地开发出一种能将视觉图案转换成声音的新机器：图案回放器。就像播放器钢琴卷轴上的小孔一样，绘制的图形通过机械方式转化为独特的声音。实验室的早期实验承诺 "一种新的艺术形式"。研究人员绘制音乐图片，聆听几何图形。这项工作最终发展成为一项心理语言学计划，致力于描绘语音的形状。但这些早期的美学实验帮助研究人员培养了对颜料、画笔和主观身体知识的熟悉程度。这让他们能够凭直觉开发出合成语音的绘画配方。换句话说，他们的绘画之手在阐明音素如何相互作用的复杂规则之前，就已经掌握了相关知识。到 20 世纪 50 年代末，实验室成员弗朗西斯-英格曼（Frances Ingemann）成功地将这种 "具身知识 "转化为机器可读代码，严格详细地说明了如何绘制合成语音。她曾希望她的规则能为盲人用户带来一台阅读机，自动将文字转换成语音。然而，她的成果却被 J. C. R. Licklider 抄袭，后者将英格曼的规则手册描述为 "适合计算机使用的数字代码"。利克莱德利用哈斯金斯实验室的研究成果，提出了 "人机共生 "的新概念，但他掩盖了这一数字代码是如何从受伤退伍军人的异常躯体和绘画者的主观认知中发展而来的。事实上，早期文字转语音技术被遗忘的历史表明，交互式计算和数字代码与物质实践和具身认知是密不可分的，而交互式计算和数字代码正是从物质实践和具身认知中成长起来的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

The Art of Text-to-Speech

Long before Siri and ChatGPT uttered their first automated words, there was only one way to program synthetic speech: with paint and brush. During the transformative years between 1930 and 1960, artists, linguists, and engineers mixed sound and image in a way that combined artistic production with new technologies. What was known as “synthesis-by-art” grew into the rules that power computer speech today. This article concentrates on the emergence of rule-based speech synthesis at Haskins Laboratories in mid-twentieth-century America. An unexpected outgrowth of their work with disabled Second World War veterans, members of the Haskins group had developed a new machine that converted visual patterns into sound: the Pattern Playback. Like holes in a player-piano roll, painted shapes were mechanically translated into distinct sounds. Early experiments at the laboratory promised “a new art form.” Researchers painted pictures of music and listened to geometric shapes. This work eventually grew into a psycholinguistic program committed to painting the shapes of speech. But these early aesthetic experiments had helped researchers cultivate a familiarity with paint, brush, and subjective bodily knowledge. This allowed them to intuitively develop a recipe for painting synthetic speech. In other words, their painting hands enacted knowledge long before they could articulate the complex rules that govern how phonemes interact. By the late 1950s, lab member Frances Ingemann successfully converted this “embodied knowing” into a machine-legible code that rigorously detailed how to paint synthetic speech. She had hoped that her rules might result in a reading machine for blind users that would automatically convert text into speech. Instead, her work was coopted by J. C. R. Licklider, who described Ingemann’s rulebook as “a digital code, suitable for use by computing machines.” While Licklider would use the work of Haskins Laboratories to spearhead his novel concept of man-computer symbiosis, he obscured the extent to which this digital code grew from the anomalous bodies of wounded war veterans and the subjective knowing of painting hands. Indeed, the forgotten history of early text-to-speech shows the indivisibility of interactive computing and digital codes from the material practices and embodied cognition from which they grew.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Critical Inquiry Multiple-

CiteScore

2.80

自引率

0.00%

发文量

期刊介绍： Critical Inquiry has published the best critical thought in the arts and humanities since 1974. Combining a commitment to rigorous scholarship with a vital concern for dialogue and debate, the journal presents articles by eminent critics, scholars, and artists on a wide variety of issues central to contemporary criticism and culture. In CI new ideas and reconsideration of those traditional in criticism and culture are granted a voice. The wide interdisciplinary focus creates surprising juxtapositions and linkages of concepts, offering new grounds for theoretical debate. In CI, authors entertain and challenge while illuminating such issues as improvisations, the life of things, Flaubert, and early modern women"s writing.

期刊最新文献

Victorian Equations :Bread and Freedom: Egypt’s Revolutionary Situation The Draw of the Mark A Peripheral Vision: Framing the Cultural Bias in the Center of Photography :Apropos of Something: A History of Irrelevance and Relevance