基于信息不对称度量和多词单元识别的文本蕴涵识别

G. Dias, Sebastião Pais, K. Wegrzyn-Wolska, R. Mahl
{"title":"基于信息不对称度量和多词单元识别的文本蕴涵识别","authors":"G. Dias, Sebastião Pais, K. Wegrzyn-Wolska, R. Mahl","doi":"10.1109/WI-IAT.2011.122","DOIUrl":null,"url":null,"abstract":"In the context of Ephemeral Clustering of web Pages, it can be interesting to label each cluster with a small summary instead of just a label. Within this scope, we introduce the paradigm of Textual Entailment by Generality, which can be defined as the entailment from a specific web snippet towards a more general web snippet. The subjacent idea is to find the best web snippet, which summarizes and subsumes all the other web snippets within an ephemeral cluster. To reach this objective, we first propose a new informative asymmetric similarity measure called the Simplified Asymmetric InfoSimba(AISs), which can be combined with different asymmetric association measures. In particular, the AISs proposes an unsupervised language-independent solution to infer Textual Entailment by Generality and as such can help to encounter the web snippet with maximum semantic coverage. This new methodology is tested against the first Recognizing Textual Entailment data set (RTE-1)1 for an exhaustive number of asymmetric association measures with and without the identification of Multiword Units. The comparative experiments with existing state-of-the-art methodologies show promising results.","PeriodicalId":128421,"journal":{"name":"2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Recognizing Textual Entailment by Generality Using Informative Asymmetric Measures and Multiword Unit Identification to Summarize Ephemeral Clusters\",\"authors\":\"G. Dias, Sebastião Pais, K. Wegrzyn-Wolska, R. Mahl\",\"doi\":\"10.1109/WI-IAT.2011.122\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the context of Ephemeral Clustering of web Pages, it can be interesting to label each cluster with a small summary instead of just a label. Within this scope, we introduce the paradigm of Textual Entailment by Generality, which can be defined as the entailment from a specific web snippet towards a more general web snippet. The subjacent idea is to find the best web snippet, which summarizes and subsumes all the other web snippets within an ephemeral cluster. To reach this objective, we first propose a new informative asymmetric similarity measure called the Simplified Asymmetric InfoSimba(AISs), which can be combined with different asymmetric association measures. In particular, the AISs proposes an unsupervised language-independent solution to infer Textual Entailment by Generality and as such can help to encounter the web snippet with maximum semantic coverage. This new methodology is tested against the first Recognizing Textual Entailment data set (RTE-1)1 for an exhaustive number of asymmetric association measures with and without the identification of Multiword Units. The comparative experiments with existing state-of-the-art methodologies show promising results.\",\"PeriodicalId\":128421,\"journal\":{\"name\":\"2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WI-IAT.2011.122\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI-IAT.2011.122","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

在网页的短暂聚类的背景下,用一个小的摘要来标记每个聚类可能会很有趣,而不仅仅是一个标签。在这个范围内,我们引入了一般性文本蕴涵范式,它可以被定义为从一个特定的web片段到一个更一般的web片段的蕴涵。次要的想法是找到最好的网页片段,它总结并包含所有其他网页片段在一个短暂的集群。为了实现这一目标,我们首先提出了一种新的信息不对称相似性度量,称为简化不对称InfoSimba(AISs),它可以与不同的不对称关联度量相结合。特别地,ais提出了一种无监督的语言独立解决方案,通过通则推断文本蕴涵,这样可以帮助遇到具有最大语义覆盖的web片段。这种新方法针对第一个识别文本蕴涵数据集(RTE-1)1进行了测试,以获得具有和不具有多词单位标识的非对称关联度量的详尽数量。与现有最先进方法的对比实验显示出良好的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Recognizing Textual Entailment by Generality Using Informative Asymmetric Measures and Multiword Unit Identification to Summarize Ephemeral Clusters
In the context of Ephemeral Clustering of web Pages, it can be interesting to label each cluster with a small summary instead of just a label. Within this scope, we introduce the paradigm of Textual Entailment by Generality, which can be defined as the entailment from a specific web snippet towards a more general web snippet. The subjacent idea is to find the best web snippet, which summarizes and subsumes all the other web snippets within an ephemeral cluster. To reach this objective, we first propose a new informative asymmetric similarity measure called the Simplified Asymmetric InfoSimba(AISs), which can be combined with different asymmetric association measures. In particular, the AISs proposes an unsupervised language-independent solution to infer Textual Entailment by Generality and as such can help to encounter the web snippet with maximum semantic coverage. This new methodology is tested against the first Recognizing Textual Entailment data set (RTE-1)1 for an exhaustive number of asymmetric association measures with and without the identification of Multiword Units. The comparative experiments with existing state-of-the-art methodologies show promising results.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Slovak Blog Clustering Enhanced by Mining the Web Comments Automatic Face Annotation in News Images by Mining the Web Exploiting Additional Dimensions as Virtual Items on Top-N Recommender Systems Supporting Agent Systems in the Programming Language A Software Agent Framework for Exploiting Demand-Side Consumer Social Networks in Power Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1