{"title":"The Art of Text-to-Speech","authors":"Benjamin Lindquist","doi":"10.1086/727651","DOIUrl":null,"url":null,"abstract":"Long before Siri and ChatGPT uttered their first automated words, there was only one way to program synthetic speech: with paint and brush. During the transformative years between 1930 and 1960, artists, linguists, and engineers mixed sound and image in a way that combined artistic production with new technologies. What was known as “synthesis-by-art” grew into the rules that power computer speech today. This article concentrates on the emergence of rule-based speech synthesis at Haskins Laboratories in mid-twentieth-century America. An unexpected outgrowth of their work with disabled Second World War veterans, members of the Haskins group had developed a new machine that converted visual patterns into sound: the Pattern Playback. Like holes in a player-piano roll, painted shapes were mechanically translated into distinct sounds. Early experiments at the laboratory promised “a new art form.” Researchers painted pictures of music and listened to geometric shapes. This work eventually grew into a psycholinguistic program committed to painting the shapes of speech. But these early aesthetic experiments had helped researchers cultivate a familiarity with paint, brush, and subjective bodily knowledge. This allowed them to intuitively develop a recipe for painting synthetic speech. In other words, their painting hands enacted knowledge long before they could articulate the complex rules that govern how phonemes interact. By the late 1950s, lab member Frances Ingemann successfully converted this “embodied knowing” into a machine-legible code that rigorously detailed how to paint synthetic speech. She had hoped that her rules might result in a reading machine for blind users that would automatically convert text into speech. Instead, her work was coopted by J. C. R. Licklider, who described Ingemann’s rulebook as “a digital code, suitable for use by computing machines.” While Licklider would use the work of Haskins Laboratories to spearhead his novel concept of man-computer symbiosis, he obscured the extent to which this digital code grew from the anomalous bodies of wounded war veterans and the subjective knowing of painting hands. Indeed, the forgotten history of early text-to-speech shows the indivisibility of interactive computing and digital codes from the material practices and embodied cognition from which they grew.","PeriodicalId":48130,"journal":{"name":"Critical Inquiry","volume":"68 27","pages":"225 - 251"},"PeriodicalIF":2.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Critical Inquiry","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1086/727651","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CULTURAL STUDIES","Score":null,"Total":0}
引用次数: 0
Abstract
Long before Siri and ChatGPT uttered their first automated words, there was only one way to program synthetic speech: with paint and brush. During the transformative years between 1930 and 1960, artists, linguists, and engineers mixed sound and image in a way that combined artistic production with new technologies. What was known as “synthesis-by-art” grew into the rules that power computer speech today. This article concentrates on the emergence of rule-based speech synthesis at Haskins Laboratories in mid-twentieth-century America. An unexpected outgrowth of their work with disabled Second World War veterans, members of the Haskins group had developed a new machine that converted visual patterns into sound: the Pattern Playback. Like holes in a player-piano roll, painted shapes were mechanically translated into distinct sounds. Early experiments at the laboratory promised “a new art form.” Researchers painted pictures of music and listened to geometric shapes. This work eventually grew into a psycholinguistic program committed to painting the shapes of speech. But these early aesthetic experiments had helped researchers cultivate a familiarity with paint, brush, and subjective bodily knowledge. This allowed them to intuitively develop a recipe for painting synthetic speech. In other words, their painting hands enacted knowledge long before they could articulate the complex rules that govern how phonemes interact. By the late 1950s, lab member Frances Ingemann successfully converted this “embodied knowing” into a machine-legible code that rigorously detailed how to paint synthetic speech. She had hoped that her rules might result in a reading machine for blind users that would automatically convert text into speech. Instead, her work was coopted by J. C. R. Licklider, who described Ingemann’s rulebook as “a digital code, suitable for use by computing machines.” While Licklider would use the work of Haskins Laboratories to spearhead his novel concept of man-computer symbiosis, he obscured the extent to which this digital code grew from the anomalous bodies of wounded war veterans and the subjective knowing of painting hands. Indeed, the forgotten history of early text-to-speech shows the indivisibility of interactive computing and digital codes from the material practices and embodied cognition from which they grew.
期刊介绍:
Critical Inquiry has published the best critical thought in the arts and humanities since 1974. Combining a commitment to rigorous scholarship with a vital concern for dialogue and debate, the journal presents articles by eminent critics, scholars, and artists on a wide variety of issues central to contemporary criticism and culture. In CI new ideas and reconsideration of those traditional in criticism and culture are granted a voice. The wide interdisciplinary focus creates surprising juxtapositions and linkages of concepts, offering new grounds for theoretical debate. In CI, authors entertain and challenge while illuminating such issues as improvisations, the life of things, Flaubert, and early modern women"s writing.