Gesture2Text: A Generalizable Decoder for Word-Gesture Keyboards in XR Through Trajectory Coarse Discretization and Pre-Training

Junxiao Shen;Khadija Khaldi;Enmin Zhou;Hemant Bhaskar Surale;Amy Karlson
{"title":"Gesture2Text: A Generalizable Decoder for Word-Gesture Keyboards in XR Through Trajectory Coarse Discretization and Pre-Training","authors":"Junxiao Shen;Khadija Khaldi;Enmin Zhou;Hemant Bhaskar Surale;Amy Karlson","doi":"10.1109/TVCG.2024.3456198","DOIUrl":null,"url":null,"abstract":"Text entry with word-gesture keyboards (WGK) is emerging as a popular method and becoming a key interaction for Extended Reality (XR). However, the diversity of interaction modes, keyboard sizes, and visual feedback in these environments introduces divergent word-gesture trajectory data patterns, thus leading to complexity in decoding trajectories into text. Template-matching decoding methods, such as SHARK2 [32], are commonly used for these WGK systems because they are easy to implement and configure. However, these methods are susceptible to decoding inaccuracies for noisy trajectories. While conventional neural-network-based decoders (neural decoders) trained on word-gesture trajectory data have been proposed to improve accuracy, they have their own limitations: they require extensive data for training and deep-learning expertise for implementation. To address these challenges, we propose a novel solution that combines ease of implementation with high decoding accuracy: a generalizable neural decoder enabled by pre-training on large-scale coarsely discretized word-gesture trajectories. This approach produces a ready-to-use WGK decoder that is generalizable across mid-air and on-surface WGK systems in augmented reality (AR) and virtual reality (VR), which is evident by a robust average Top-4 accuracy of 90.4% on four diverse datasets. It significantly outperforms SHARK2 with a 37.2% enhancement and surpasses the conventional neural decoder by 7.4%. Moreover, the Pre-trained Neural Decoder's size is only 4 MB after quantization, without sacrificing accuracy, and it can operate in real-time, executing in just 97 milliseconds on Quest 3.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on visualization and computer graphics","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10681014/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Text entry with word-gesture keyboards (WGK) is emerging as a popular method and becoming a key interaction for Extended Reality (XR). However, the diversity of interaction modes, keyboard sizes, and visual feedback in these environments introduces divergent word-gesture trajectory data patterns, thus leading to complexity in decoding trajectories into text. Template-matching decoding methods, such as SHARK2 [32], are commonly used for these WGK systems because they are easy to implement and configure. However, these methods are susceptible to decoding inaccuracies for noisy trajectories. While conventional neural-network-based decoders (neural decoders) trained on word-gesture trajectory data have been proposed to improve accuracy, they have their own limitations: they require extensive data for training and deep-learning expertise for implementation. To address these challenges, we propose a novel solution that combines ease of implementation with high decoding accuracy: a generalizable neural decoder enabled by pre-training on large-scale coarsely discretized word-gesture trajectories. This approach produces a ready-to-use WGK decoder that is generalizable across mid-air and on-surface WGK systems in augmented reality (AR) and virtual reality (VR), which is evident by a robust average Top-4 accuracy of 90.4% on four diverse datasets. It significantly outperforms SHARK2 with a 37.2% enhancement and surpasses the conventional neural decoder by 7.4%. Moreover, the Pre-trained Neural Decoder's size is only 4 MB after quantization, without sacrificing accuracy, and it can operate in real-time, executing in just 97 milliseconds on Quest 3.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Gesture2Text:通过轨迹粗离散化和预训练为 XR 中的文字手势键盘设计通用解码器。
使用文字手势键盘(WGK)输入文本正在成为一种流行的方法,并成为扩展现实(XR)的一种关键交互方式。然而,在这些环境中,交互模式、键盘尺寸和视觉反馈的多样性带来了不同的文字手势轨迹数据模式,从而导致将轨迹解码为文本的复杂性。模板匹配解码方法(如 SHARK2 [32])通常用于这些 WGK 系统,因为它们易于实现和配置。然而,这些方法在解码噪声轨迹时容易出现误差。虽然有人提出了基于神经网络的传统解码器(神经解码器)来提高准确性,但它们也有自身的局限性:它们需要大量数据进行训练,并需要深度学习的专业知识来实现。为了应对这些挑战,我们提出了一种新颖的解决方案,该方案兼具易实施性和高解码准确性:通过在大规模粗离散词句轨迹上进行预训练,实现可通用的神经解码器。这种方法产生了一种即用型 WGK 解码器,可用于增强现实(AR)和虚拟现实(VR)中的空中和地面 WGK 系统,在四个不同的数据集上,Top-4 平均准确率高达 90.4%。它明显优于 SHARK2,提高了 37.2%,比传统神经解码器高出 7.4%。此外,预训练神经解码器在量化后的大小仅为 4 MB,且不影响准确性,而且可以实时运行,在 Quest 3 上的执行时间仅为 97 毫秒。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
"where Did My Apps Go?" Supporting Scalable and Transition-Aware Access to Everyday Applications in Head-Worn Augmented Reality. PGSR: Planar-based Gaussian Splatting for Efficient and High-Fidelity Surface Reconstruction. From Dashboard Zoo to Census: A Case Study With Tableau Public. Authoring Data-Driven Chart Animations. Super-NeRF: View-consistent Detail Generation for NeRF Super-resolution.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1