Complex-valued Neural Network-based Quantum Language Models

ACM Transactions on Information Systems (TOIS) Pub Date : 2022-03-10 DOI:10.1145/3505138

Peng Zhang, Wenjie Hui, Benyou Wang, Donghao Zhao, Dawei Song, C. Lioma, J. Simonsen

{"title":"Complex-valued Neural Network-based Quantum Language Models","authors":"Peng Zhang, Wenjie Hui, Benyou Wang, Donghao Zhao, Dawei Song, C. Lioma, J. Simonsen","doi":"10.1145/3505138","DOIUrl":null,"url":null,"abstract":"Language modeling is essential in Natural Language Processing and Information Retrieval related tasks. After the statistical language models, Quantum Language Model (QLM) has been proposed to unify both single words and compound terms in the same probability space without extending term space exponentially. Although QLM achieved good performance in ad hoc retrieval, it still has two major limitations: (1) QLM cannot make use of supervised information, mainly due to the iterative and non-differentiable estimation of the density matrix, which represents both queries and documents in QLM. (2) QLM assumes the exchangeability of words or word dependencies, neglecting the order or position information of words. This article aims to generalize QLM and make it applicable to more complicated matching tasks (e.g., Question Answering) beyond ad hoc retrieval. We propose a complex-valued neural network-based QLM solution called C-NNQLM to employ an end-to-end approach to build and train density matrices in a light-weight and differentiable manner, and it can therefore make use of external well-trained word vectors and supervised labels. Furthermore, C-NNQLM adopts complex-valued word vectors whose phase vectors can directly encode the order (or position) information of words. Note that complex numbers are also essential in the quantum theory. We show that the real-valued NNQLM (R-NNQLM) is a special case of C-NNQLM. The experimental results on the QA task show that both R-NNQLM and C-NNQLM achieve much better performance than the vanilla QLM, and C-NNQLM’s performance is on par with state-of-the-art neural network models. We also evaluate the proposed C-NNQLM on text classification and document retrieval tasks. The results on most datasets show that the C-NNQLM can outperform R-NNQLM, which demonstrates the usefulness of the complex representation for words and sentences in C-NNQLM.","PeriodicalId":6934,"journal":{"name":"ACM Transactions on Information Systems (TOIS)","volume":"29 1","pages":"1 - 31"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Information Systems (TOIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3505138","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Language modeling is essential in Natural Language Processing and Information Retrieval related tasks. After the statistical language models, Quantum Language Model (QLM) has been proposed to unify both single words and compound terms in the same probability space without extending term space exponentially. Although QLM achieved good performance in ad hoc retrieval, it still has two major limitations: (1) QLM cannot make use of supervised information, mainly due to the iterative and non-differentiable estimation of the density matrix, which represents both queries and documents in QLM. (2) QLM assumes the exchangeability of words or word dependencies, neglecting the order or position information of words. This article aims to generalize QLM and make it applicable to more complicated matching tasks (e.g., Question Answering) beyond ad hoc retrieval. We propose a complex-valued neural network-based QLM solution called C-NNQLM to employ an end-to-end approach to build and train density matrices in a light-weight and differentiable manner, and it can therefore make use of external well-trained word vectors and supervised labels. Furthermore, C-NNQLM adopts complex-valued word vectors whose phase vectors can directly encode the order (or position) information of words. Note that complex numbers are also essential in the quantum theory. We show that the real-valued NNQLM (R-NNQLM) is a special case of C-NNQLM. The experimental results on the QA task show that both R-NNQLM and C-NNQLM achieve much better performance than the vanilla QLM, and C-NNQLM’s performance is on par with state-of-the-art neural network models. We also evaluate the proposed C-NNQLM on text classification and document retrieval tasks. The results on most datasets show that the C-NNQLM can outperform R-NNQLM, which demonstrates the usefulness of the complex representation for words and sentences in C-NNQLM.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于复值神经网络的量子语言模型

语言建模在自然语言处理和信息检索相关的任务中是必不可少的。继统计语言模型之后，又提出了量子语言模型(Quantum language Model, QLM)，该模型在不以指数方式扩展词空间的情况下，将单个词和复合词统一在同一概率空间中。尽管QLM在临时检索方面取得了良好的性能，但它仍然存在两个主要的局限性:(1)QLM不能利用监督信息，主要是由于密度矩阵的迭代和不可微估计，而密度矩阵在QLM中既代表查询，也代表文档。(2) QLM假设词的互换性或词的依赖性，忽略词的顺序或位置信息。本文旨在概括QLM，并使其适用于更复杂的匹配任务(例如，问答)，而不是特别检索。我们提出了一种基于复值神经网络的QLM解决方案，称为C-NNQLM，采用端到端方法以轻量级和可微的方式构建和训练密度矩阵，因此它可以利用外部训练良好的词向量和监督标签。此外，C-NNQLM采用复值词向量，其相位向量可以直接编码词的顺序(或位置)信息。请注意，复数在量子理论中也是必不可少的。我们证明了实值NNQLM (R-NNQLM)是C-NNQLM的一个特例。在QA任务上的实验结果表明，R-NNQLM和C-NNQLM的性能都比普通的QLM好得多，C-NNQLM的性能与最先进的神经网络模型相当。我们还评估了C-NNQLM在文本分类和文档检索任务上的性能。在大多数数据集上的结果表明，C-NNQLM可以优于R-NNQLM，这证明了C-NNQLM对单词和句子的复杂表示的有用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Information Systems (TOIS)

自引率

0.00%

发文量