用于人机通信的词嵌入式语义-拓扑保存量化技术

IF 8.3 2区 计算机科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Communications Pub Date : 2024-10-10 DOI:10.1109/TCOMM.2024.3471992
Zhenyi Lin;Lin Yang;Yi Gong;Kaibin Huang
{"title":"用于人机通信的词嵌入式语义-拓扑保存量化技术","authors":"Zhenyi Lin;Lin Yang;Yi Gong;Kaibin Huang","doi":"10.1109/TCOMM.2024.3471992","DOIUrl":null,"url":null,"abstract":"The vision of 6G mobile networks aims to connect intelligent machines to humans to provide the latter with cooperation, care, and assistance. The mainstream approach for human-to-machine (H2M) semantic communication is to map words into (word) embedding vectors which are clustered according to their semantic similarity to facilitate machines’ interpretation of human languages. The computation-intensive tasks of text-to-embedding mapping are usually delegated to an edge server that senses human commands, maps them into embedding vectors, and then transmits the vectors to a machine over a wireless link. In this work, we propose a quantization framework customized for embedding vectors, called semantic-topology preserving VQ (SemTop-VQ), to overcome the communication bottleneck due to the vectors’ high dimensionality. While traditional VQ focuses on minimizing the distortion of individual vectors, SemTop-VQ aims to minimize the distortion of the topology of embedding matrix, referring to the vectors’ relative positions that represent semantics. To this end, we adopt a topology-distortion metric, termed pointwise-inner-product (PIP) loss, a hierarchical VQ architecture targeting high-dimensional VQ. In this architecture, an embedding vector is decomposed into blocks; the norm and shape (normalized vector) are quantized separately using a scalar and a Grassmannian quantizers, respectively. The main feature of SemTop-VQ lies in deriving from the PIP loss a set of so-called semantic-importance indicators, which reflect the level of influences of individual blocks’ quantization errors on the topology distortion. Then the indicators are applied to optimize quantization-bit allocation for decomposed vector blocks under the criterion of PIP-loss minimization. In practice, the usage probabilities of embedding vectors for a specific machine task are highly skewed and the task is time-varying. We exploit this fact to further develop SemTop-VQ to feature task adaptation that can attain a higher communication efficiency. The task-adaptive VQ is realized via the use of a frequently used (quantization) codebook that is much smaller in size than the original codebook and continuously updated via estimation of embedding-usage distribution. Our experiments using real embedding datasets, namely Word2Vec and Glove, demonstrate the effectiveness of SemTop-VQ as a goal-oriented technique for efficient H2M communications.","PeriodicalId":13041,"journal":{"name":"IEEE Transactions on Communications","volume":"73 4","pages":"2401-2415"},"PeriodicalIF":8.3000,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic-Topology Preserving Quantization of Word Embeddings for Human-to-Machine Communications\",\"authors\":\"Zhenyi Lin;Lin Yang;Yi Gong;Kaibin Huang\",\"doi\":\"10.1109/TCOMM.2024.3471992\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The vision of 6G mobile networks aims to connect intelligent machines to humans to provide the latter with cooperation, care, and assistance. The mainstream approach for human-to-machine (H2M) semantic communication is to map words into (word) embedding vectors which are clustered according to their semantic similarity to facilitate machines’ interpretation of human languages. The computation-intensive tasks of text-to-embedding mapping are usually delegated to an edge server that senses human commands, maps them into embedding vectors, and then transmits the vectors to a machine over a wireless link. In this work, we propose a quantization framework customized for embedding vectors, called semantic-topology preserving VQ (SemTop-VQ), to overcome the communication bottleneck due to the vectors’ high dimensionality. While traditional VQ focuses on minimizing the distortion of individual vectors, SemTop-VQ aims to minimize the distortion of the topology of embedding matrix, referring to the vectors’ relative positions that represent semantics. To this end, we adopt a topology-distortion metric, termed pointwise-inner-product (PIP) loss, a hierarchical VQ architecture targeting high-dimensional VQ. In this architecture, an embedding vector is decomposed into blocks; the norm and shape (normalized vector) are quantized separately using a scalar and a Grassmannian quantizers, respectively. The main feature of SemTop-VQ lies in deriving from the PIP loss a set of so-called semantic-importance indicators, which reflect the level of influences of individual blocks’ quantization errors on the topology distortion. Then the indicators are applied to optimize quantization-bit allocation for decomposed vector blocks under the criterion of PIP-loss minimization. In practice, the usage probabilities of embedding vectors for a specific machine task are highly skewed and the task is time-varying. We exploit this fact to further develop SemTop-VQ to feature task adaptation that can attain a higher communication efficiency. The task-adaptive VQ is realized via the use of a frequently used (quantization) codebook that is much smaller in size than the original codebook and continuously updated via estimation of embedding-usage distribution. Our experiments using real embedding datasets, namely Word2Vec and Glove, demonstrate the effectiveness of SemTop-VQ as a goal-oriented technique for efficient H2M communications.\",\"PeriodicalId\":13041,\"journal\":{\"name\":\"IEEE Transactions on Communications\",\"volume\":\"73 4\",\"pages\":\"2401-2415\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2024-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Communications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10713295/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Communications","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10713295/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

6G移动网络的愿景是将智能机器与人类连接起来,为人类提供合作、关怀和帮助。人机(H2M)语义通信的主流方法是将词映射到词的嵌入向量中,这些嵌入向量根据语义相似度聚类,以方便机器对人类语言的解释。文本到嵌入映射的计算密集型任务通常委托给边缘服务器,该服务器感知人类命令,将它们映射到嵌入向量,然后通过无线链路将向量传输到机器。在这项工作中,我们提出了一个为嵌入向量定制的量化框架,称为语义拓扑保持VQ (SemTop-VQ),以克服由于向量的高维性而导致的通信瓶颈。传统的VQ侧重于最小化单个向量的畸变,而SemTop-VQ旨在最小化嵌入矩阵拓扑的畸变,即向量表示语义的相对位置。为此,我们采用了一种拓扑失真度量,称为点向内积(PIP)损失,这是一种针对高维VQ的分层VQ体系结构。在该体系结构中,将嵌入向量分解为块;范数和形状(归一化向量)分别使用标量和格拉斯曼量化器进行量化。SemTop-VQ的主要特点在于从PIP损失中得到一组所谓的语义重要性指标,这些指标反映了单个块的量化误差对拓扑失真的影响程度。然后应用这些指标在pip损失最小化准则下优化分解矢量块的量化位分配。在实际应用中,对于特定的机器任务,嵌入向量的使用概率是高度倾斜的,并且任务是时变的。我们利用这一事实进一步开发了SemTop-VQ,以实现更高的通信效率。任务自适应VQ通过使用比原始码本小得多的常用码本(量化)来实现,并通过估计嵌入使用分布来不断更新。我们使用真实嵌入数据集(即Word2Vec和Glove)进行的实验证明了SemTop-VQ作为高效H2M通信的目标导向技术的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Semantic-Topology Preserving Quantization of Word Embeddings for Human-to-Machine Communications
The vision of 6G mobile networks aims to connect intelligent machines to humans to provide the latter with cooperation, care, and assistance. The mainstream approach for human-to-machine (H2M) semantic communication is to map words into (word) embedding vectors which are clustered according to their semantic similarity to facilitate machines’ interpretation of human languages. The computation-intensive tasks of text-to-embedding mapping are usually delegated to an edge server that senses human commands, maps them into embedding vectors, and then transmits the vectors to a machine over a wireless link. In this work, we propose a quantization framework customized for embedding vectors, called semantic-topology preserving VQ (SemTop-VQ), to overcome the communication bottleneck due to the vectors’ high dimensionality. While traditional VQ focuses on minimizing the distortion of individual vectors, SemTop-VQ aims to minimize the distortion of the topology of embedding matrix, referring to the vectors’ relative positions that represent semantics. To this end, we adopt a topology-distortion metric, termed pointwise-inner-product (PIP) loss, a hierarchical VQ architecture targeting high-dimensional VQ. In this architecture, an embedding vector is decomposed into blocks; the norm and shape (normalized vector) are quantized separately using a scalar and a Grassmannian quantizers, respectively. The main feature of SemTop-VQ lies in deriving from the PIP loss a set of so-called semantic-importance indicators, which reflect the level of influences of individual blocks’ quantization errors on the topology distortion. Then the indicators are applied to optimize quantization-bit allocation for decomposed vector blocks under the criterion of PIP-loss minimization. In practice, the usage probabilities of embedding vectors for a specific machine task are highly skewed and the task is time-varying. We exploit this fact to further develop SemTop-VQ to feature task adaptation that can attain a higher communication efficiency. The task-adaptive VQ is realized via the use of a frequently used (quantization) codebook that is much smaller in size than the original codebook and continuously updated via estimation of embedding-usage distribution. Our experiments using real embedding datasets, namely Word2Vec and Glove, demonstrate the effectiveness of SemTop-VQ as a goal-oriented technique for efficient H2M communications.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Communications
IEEE Transactions on Communications 工程技术-电信学
CiteScore
16.10
自引率
8.40%
发文量
528
审稿时长
4.1 months
期刊介绍: The IEEE Transactions on Communications is dedicated to publishing high-quality manuscripts that showcase advancements in the state-of-the-art of telecommunications. Our scope encompasses all aspects of telecommunications, including telephone, telegraphy, facsimile, and television, facilitated by electromagnetic propagation methods such as radio, wire, aerial, underground, coaxial, and submarine cables, as well as waveguides, communication satellites, and lasers. We cover telecommunications in various settings, including marine, aeronautical, space, and fixed station services, addressing topics such as repeaters, radio relaying, signal storage, regeneration, error detection and correction, multiplexing, carrier techniques, communication switching systems, data communications, and communication theory. Join us in advancing the field of telecommunications through groundbreaking research and innovation.
期刊最新文献
Adaptive UAV Positioning to Enhance SNR in Air-to-Water Optical Wireless Channels CRB-Constrained Rate Optimization for Movable Antenna-Enabled IRS-Aided ISAC Systems Enhancing Near-field BAN-based Vital-Sign Monitoring via Integrated Sensing, Communication, and Powering Network-Level Performance Analysis for Hybrid sub-6 GHz and mmWave Integrated Sensing and Communications OIRS-assisted VLC Channel Optimization Against UAV Blockage Based on Two-Stage Machine Learning Framework
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1