Victor Fragoso, Steffen Gauglitz, S. Zamora, Jim Kleban, M. Turk
{"title":"TranslatAR:移动增强现实翻译器","authors":"Victor Fragoso, Steffen Gauglitz, S. Zamora, Jim Kleban, M. Turk","doi":"10.1109/WACV.2011.5711545","DOIUrl":null,"url":null,"abstract":"We present a mobile augmented reality (AR) translation system, using a smartphone's camera and touchscreen, that requires the user to simply tap on the word of interest once in order to produce a translation, presented as an AR overlay. The translation seamlessly replaces the original text in the live camera stream, matching background and foreground colors estimated from the source images. For this purpose, we developed an efficient algorithm for accurately detecting the location and orientation of the text in a live camera stream that is robust to perspective distortion, and we combine it with OCR and a text-to-text translation engine. Our experimental results, using the ICDAR 2003 dataset and our own set of video sequences, quantify the accuracy of our detection and analyze the sources of failure among the system's components. With the OCR and translation running in a background thread, the system runs at 26 fps on a current generation smartphone (Nokia N900) and offers a particularly easy-to-use and simple method for translation, especially in situations in which typing or correct pronunciation (for systems with speech input) is cumbersome or impossible.","PeriodicalId":424724,"journal":{"name":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"98","resultStr":"{\"title\":\"TranslatAR: A mobile augmented reality translator\",\"authors\":\"Victor Fragoso, Steffen Gauglitz, S. Zamora, Jim Kleban, M. Turk\",\"doi\":\"10.1109/WACV.2011.5711545\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a mobile augmented reality (AR) translation system, using a smartphone's camera and touchscreen, that requires the user to simply tap on the word of interest once in order to produce a translation, presented as an AR overlay. The translation seamlessly replaces the original text in the live camera stream, matching background and foreground colors estimated from the source images. For this purpose, we developed an efficient algorithm for accurately detecting the location and orientation of the text in a live camera stream that is robust to perspective distortion, and we combine it with OCR and a text-to-text translation engine. Our experimental results, using the ICDAR 2003 dataset and our own set of video sequences, quantify the accuracy of our detection and analyze the sources of failure among the system's components. With the OCR and translation running in a background thread, the system runs at 26 fps on a current generation smartphone (Nokia N900) and offers a particularly easy-to-use and simple method for translation, especially in situations in which typing or correct pronunciation (for systems with speech input) is cumbersome or impossible.\",\"PeriodicalId\":424724,\"journal\":{\"name\":\"2011 IEEE Workshop on Applications of Computer Vision (WACV)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-01-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"98\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE Workshop on Applications of Computer Vision (WACV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WACV.2011.5711545\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Workshop on Applications of Computer Vision (WACV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACV.2011.5711545","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
We present a mobile augmented reality (AR) translation system, using a smartphone's camera and touchscreen, that requires the user to simply tap on the word of interest once in order to produce a translation, presented as an AR overlay. The translation seamlessly replaces the original text in the live camera stream, matching background and foreground colors estimated from the source images. For this purpose, we developed an efficient algorithm for accurately detecting the location and orientation of the text in a live camera stream that is robust to perspective distortion, and we combine it with OCR and a text-to-text translation engine. Our experimental results, using the ICDAR 2003 dataset and our own set of video sequences, quantify the accuracy of our detection and analyze the sources of failure among the system's components. With the OCR and translation running in a background thread, the system runs at 26 fps on a current generation smartphone (Nokia N900) and offers a particularly easy-to-use and simple method for translation, especially in situations in which typing or correct pronunciation (for systems with speech input) is cumbersome or impossible.