{"title":"音频描述中的省略和推理意义生成,以及对视频内容描述自动化的启示","authors":"Kim Starr, Sabine Braun","doi":"10.1007/s10209-023-01045-3","DOIUrl":null,"url":null,"abstract":"Abstract There is broad consensus that audio description (AD) is a modality of intersemiotic translation, but there are different views in relation to how AD can be more precisely conceptualised. While Benecke (Audiodeskription als partielle Translation. Modell und Methode, LIT, Berlin, 2014) characterises AD as ‘partial translation’, Braun (T 28: 302–313, 2016) hypothesises that what audio describers appear to ‘omit’ from their descriptions can normally be inferred by the audience, drawing on narrative cues from dialogue, mise-en-scène, kinesis, music or sound effects. The study reported in this paper tested this hypothesis using a corpus of material created during the H2020 MeMAD project. The MeMAD project aimed to improve access to audiovisual (AV) content through a combination of human and computer-based methods of description. One of the MeMAD workstreams addressed human approaches to describing visually salient cues. This included an analysis of the potential impact of omissions in AD, which is the focus of this paper. Using a corpus of approximately 500 audio described film extracts we identified the visual elements that can be considered essential for the construction of the filmic narrative and then performed a qualitative analysis of the corresponding audio descriptions to determine how these elements are verbally represented and whether any omitted elements could be inferred from other cues that are accessible to visually impaired audiences. We then identified the most likely source of these inferences and the conditions upon which retrieval could be predicated, preparing the ground for future reception studies to test our hypotheses with target audiences. In this paper, we discuss the methodology used to determine where omissions occur in the analysed audio descriptions, consider worked examples from the MeMAD500 film corpus, and outline the findings of our study namely that various strategies are relevant to inferring omitted information, including the use of proximal and distal contextual cues, and reliance on the application of common knowledge and iconic scenarios. To conclude, consideration is given to overcoming significant omissions in human-generated AD, such as using extended AD formats, and mitigating similar gaps in machine-generated descriptions, where incorporating dialogue analysis and other supplementary data into the computer model could resolve many omissions.","PeriodicalId":49115,"journal":{"name":"Universal Access in the Information Society","volume":"23 1","pages":"0"},"PeriodicalIF":2.1000,"publicationDate":"2023-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Omissions and inferential meaning-making in audio description, and implications for automating video content description\",\"authors\":\"Kim Starr, Sabine Braun\",\"doi\":\"10.1007/s10209-023-01045-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract There is broad consensus that audio description (AD) is a modality of intersemiotic translation, but there are different views in relation to how AD can be more precisely conceptualised. While Benecke (Audiodeskription als partielle Translation. Modell und Methode, LIT, Berlin, 2014) characterises AD as ‘partial translation’, Braun (T 28: 302–313, 2016) hypothesises that what audio describers appear to ‘omit’ from their descriptions can normally be inferred by the audience, drawing on narrative cues from dialogue, mise-en-scène, kinesis, music or sound effects. The study reported in this paper tested this hypothesis using a corpus of material created during the H2020 MeMAD project. The MeMAD project aimed to improve access to audiovisual (AV) content through a combination of human and computer-based methods of description. One of the MeMAD workstreams addressed human approaches to describing visually salient cues. This included an analysis of the potential impact of omissions in AD, which is the focus of this paper. Using a corpus of approximately 500 audio described film extracts we identified the visual elements that can be considered essential for the construction of the filmic narrative and then performed a qualitative analysis of the corresponding audio descriptions to determine how these elements are verbally represented and whether any omitted elements could be inferred from other cues that are accessible to visually impaired audiences. We then identified the most likely source of these inferences and the conditions upon which retrieval could be predicated, preparing the ground for future reception studies to test our hypotheses with target audiences. In this paper, we discuss the methodology used to determine where omissions occur in the analysed audio descriptions, consider worked examples from the MeMAD500 film corpus, and outline the findings of our study namely that various strategies are relevant to inferring omitted information, including the use of proximal and distal contextual cues, and reliance on the application of common knowledge and iconic scenarios. To conclude, consideration is given to overcoming significant omissions in human-generated AD, such as using extended AD formats, and mitigating similar gaps in machine-generated descriptions, where incorporating dialogue analysis and other supplementary data into the computer model could resolve many omissions.\",\"PeriodicalId\":49115,\"journal\":{\"name\":\"Universal Access in the Information Society\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2023-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Universal Access in the Information Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s10209-023-01045-3\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, CYBERNETICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Universal Access in the Information Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10209-023-01045-3","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, CYBERNETICS","Score":null,"Total":0}
Omissions and inferential meaning-making in audio description, and implications for automating video content description
Abstract There is broad consensus that audio description (AD) is a modality of intersemiotic translation, but there are different views in relation to how AD can be more precisely conceptualised. While Benecke (Audiodeskription als partielle Translation. Modell und Methode, LIT, Berlin, 2014) characterises AD as ‘partial translation’, Braun (T 28: 302–313, 2016) hypothesises that what audio describers appear to ‘omit’ from their descriptions can normally be inferred by the audience, drawing on narrative cues from dialogue, mise-en-scène, kinesis, music or sound effects. The study reported in this paper tested this hypothesis using a corpus of material created during the H2020 MeMAD project. The MeMAD project aimed to improve access to audiovisual (AV) content through a combination of human and computer-based methods of description. One of the MeMAD workstreams addressed human approaches to describing visually salient cues. This included an analysis of the potential impact of omissions in AD, which is the focus of this paper. Using a corpus of approximately 500 audio described film extracts we identified the visual elements that can be considered essential for the construction of the filmic narrative and then performed a qualitative analysis of the corresponding audio descriptions to determine how these elements are verbally represented and whether any omitted elements could be inferred from other cues that are accessible to visually impaired audiences. We then identified the most likely source of these inferences and the conditions upon which retrieval could be predicated, preparing the ground for future reception studies to test our hypotheses with target audiences. In this paper, we discuss the methodology used to determine where omissions occur in the analysed audio descriptions, consider worked examples from the MeMAD500 film corpus, and outline the findings of our study namely that various strategies are relevant to inferring omitted information, including the use of proximal and distal contextual cues, and reliance on the application of common knowledge and iconic scenarios. To conclude, consideration is given to overcoming significant omissions in human-generated AD, such as using extended AD formats, and mitigating similar gaps in machine-generated descriptions, where incorporating dialogue analysis and other supplementary data into the computer model could resolve many omissions.
期刊介绍:
Universal Access in the Information Society (UAIS) is an international, interdisciplinary refereed journal that solicits original research contributions addressing the accessibility, usability, and, ultimately, acceptability of Information Society Technologies by anyone, anywhere, at anytime, and through any media and device. Universal access refers to the conscious and systematic effort to proactively apply principles, methods and tools of universal design order to develop Information Society Technologies that are accessible and usable by all citizens, including the very young and the elderly and people with different types of disabilities, thus avoiding the need for a posteriori adaptations or specialized design. The journal''s unique focus is on theoretical, methodological, and empirical research, of both technological and non-technological nature, that addresses equitable access and active participation of potentially all citizens in the information society.