A Framework for Task-specific Short Document Expansion

Ramakrishna Bairi, Raghavendra Udupa, Ganesh Ramakrishnan
{"title":"A Framework for Task-specific Short Document Expansion","authors":"Ramakrishna Bairi, Raghavendra Udupa, Ganesh Ramakrishnan","doi":"10.1145/2983323.2983811","DOIUrl":null,"url":null,"abstract":"Collections that contain a large number of short texts are becoming increasingly common (eg., tweets, reviews, etc). Analytical tasks (such as classification, clustering, etc.) involving short texts could be challenging due to the lack of context and owing to their sparseness. An often encountered problem is low accuracy on the task. A standard technique used in the handling of short texts is expanding them before subjecting them to the task. However, existing works on short text expansion suffer from certain limitations: (i) they depend on domain knowledge to expand the text; (ii) they employ task-specific heuristics; and (iii) the expansion procedure is tightly coupled to the task. This makes it hard to adapt a procedure, designed for one task, into another. We present an expansion technique -- TIDE (Task-specIfic short Document Expansion) -- that can be applied on several Machine Learning, NLP and Information Retrieval tasks on short texts (such as short text classification, clustering, entity disambiguation, and the like) without using task specific heuristics and domain-specific knowledge for expansion. At the same time, our technique is capable of learning to expand short texts in a task-specific way. That is, the same technique that is applied to expand a short text in two different tasks is able to learn to produce different expansions depending upon what expansion benefits the task's performance. To speed up the learning process, we also introduce a technique called block learning. Our experiments with classification and clustering tasks show that our framework improves upon several baselines according to the standard evaluation metrics which includes the accuracy and normalized mutual information (NMI).","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2983323.2983811","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Collections that contain a large number of short texts are becoming increasingly common (eg., tweets, reviews, etc). Analytical tasks (such as classification, clustering, etc.) involving short texts could be challenging due to the lack of context and owing to their sparseness. An often encountered problem is low accuracy on the task. A standard technique used in the handling of short texts is expanding them before subjecting them to the task. However, existing works on short text expansion suffer from certain limitations: (i) they depend on domain knowledge to expand the text; (ii) they employ task-specific heuristics; and (iii) the expansion procedure is tightly coupled to the task. This makes it hard to adapt a procedure, designed for one task, into another. We present an expansion technique -- TIDE (Task-specIfic short Document Expansion) -- that can be applied on several Machine Learning, NLP and Information Retrieval tasks on short texts (such as short text classification, clustering, entity disambiguation, and the like) without using task specific heuristics and domain-specific knowledge for expansion. At the same time, our technique is capable of learning to expand short texts in a task-specific way. That is, the same technique that is applied to expand a short text in two different tasks is able to learn to produce different expansions depending upon what expansion benefits the task's performance. To speed up the learning process, we also introduce a technique called block learning. Our experiments with classification and clustering tasks show that our framework improves upon several baselines according to the standard evaluation metrics which includes the accuracy and normalized mutual information (NMI).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
特定任务的短文档扩展框架
包含大量短文本的集合正变得越来越普遍。比如推特、评论等)。由于缺乏上下文和它们的稀疏性,涉及短文本的分析任务(如分类、聚类等)可能具有挑战性。一个经常遇到的问题是任务的准确性低。在处理短文本时使用的标准技巧是在完成任务之前展开它们。然而,现有的短文本扩展工作存在一定的局限性:(1)依赖领域知识进行文本扩展;(ii)他们采用特定任务的启发式方法;(3)展开过程与任务紧密耦合。这使得将为一项任务设计的程序调整到另一项任务变得困难。我们提出了一种扩展技术——TIDE (task -specific short Document expansion)——它可以应用于几个关于短文本的机器学习、自然语言处理和信息检索任务(如短文本分类、聚类、实体消歧等),而不需要使用任务特定的启发式和领域特定的知识进行扩展。同时,我们的技术能够学习以特定任务的方式扩展短文本。也就是说,应用于在两个不同任务中展开短文本的相同技术能够学习产生不同的展开,这取决于哪种展开有利于任务的性能。为了加快学习过程,我们还引入了一种称为块学习的技术。我们对分类和聚类任务的实验表明,我们的框架根据包括准确率和归一化互信息(NMI)在内的标准评估指标在几个基线上进行了改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Querying Minimal Steiner Maximum-Connected Subgraphs in Large Graphs aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model Approximate Discovery of Functional Dependencies for Large Datasets Mining Shopping Patterns for Divergent Urban Regions by Incorporating Mobility Data A Personal Perspective and Retrospective on Web Search Technology
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1