基于Shapley值的标记义消歧特征词选择

2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS) Pub Date : 1900-01-01 DOI:10.1109/SITIS.2016.45

Meshesha Legesse, G. Gianini, Dereje Teferi

{"title":"基于Shapley值的标记义消歧特征词选择","authors":"Meshesha Legesse, G. Gianini, Dereje Teferi","doi":"10.1109/SITIS.2016.45","DOIUrl":null,"url":null,"abstract":"In tag-word disambiguation, a word is assigned to a specific context chosen among the different ones to which it is related. Relatedness to a context is often defined based on the co-occurrence of the target word with other words (context words) in sentences of a specific corpus. The overall disambiguation process can be thought as a classification process, where the context words play the role of features for the target. A problem with this approach is that the large number of possible context words can reduce the classification performance, both in terms of computational effort and in terms of quality of the outcome. Feature selection can improve the process in both regards, by reducing the overall feature space to a manageable size with high information content. In this work we propose to use, in disambiguation, a feature selection approach based on the Shapley Value (SV) - a Coalitional Game Theory related metrics, measuring the importance of a component within a coalition. By including in the feature set only the words with the highest Shapley Value, we obtain remarkable quality and performance improvements. The problem of the exponential complexity in the exact SV computation is avoided by an approximate computation based on sampling. We demonstrate the effectiveness of this method and of the sampling approach results, by using both a synthetic language corpus and a real world linguistic corpus.","PeriodicalId":403704,"journal":{"name":"2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Selecting Feature-Words in Tag Sense Disambiguation Based on Their Shapley Value\",\"authors\":\"Meshesha Legesse, G. Gianini, Dereje Teferi\",\"doi\":\"10.1109/SITIS.2016.45\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In tag-word disambiguation, a word is assigned to a specific context chosen among the different ones to which it is related. Relatedness to a context is often defined based on the co-occurrence of the target word with other words (context words) in sentences of a specific corpus. The overall disambiguation process can be thought as a classification process, where the context words play the role of features for the target. A problem with this approach is that the large number of possible context words can reduce the classification performance, both in terms of computational effort and in terms of quality of the outcome. Feature selection can improve the process in both regards, by reducing the overall feature space to a manageable size with high information content. In this work we propose to use, in disambiguation, a feature selection approach based on the Shapley Value (SV) - a Coalitional Game Theory related metrics, measuring the importance of a component within a coalition. By including in the feature set only the words with the highest Shapley Value, we obtain remarkable quality and performance improvements. The problem of the exponential complexity in the exact SV computation is avoided by an approximate computation based on sampling. We demonstrate the effectiveness of this method and of the sampling approach results, by using both a synthetic language corpus and a real world linguistic corpus.\",\"PeriodicalId\":403704,\"journal\":{\"name\":\"2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SITIS.2016.45\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SITIS.2016.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

在标签词消歧中，一个词被分配到从与之相关的不同上下文中选择的特定上下文中。与上下文的相关性通常是根据目标词与特定语料库句子中的其他词(上下文词)的共现来定义的。整个消歧过程可以看作是一个分类过程，其中语境词对目标词起着特征作用。这种方法的一个问题是，大量可能的上下文词会降低分类性能，无论是在计算工作量方面还是在结果质量方面。特征选择可以通过将整体特征空间减小到具有高信息量的可管理大小来改善这两个方面的过程。在这项工作中，我们建议在消除歧义时使用基于Shapley值(SV)的特征选择方法-一种与联盟博弈论相关的度量，测量联盟中组件的重要性。通过在特征集中只包含Shapley值最高的单词，我们获得了显著的质量和性能改进。通过基于采样的近似计算，避免了精确SV计算中的指数复杂度问题。我们通过使用一个合成语料库和一个真实世界的语料库来证明这种方法和抽样方法结果的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Selecting Feature-Words in Tag Sense Disambiguation Based on Their Shapley Value

In tag-word disambiguation, a word is assigned to a specific context chosen among the different ones to which it is related. Relatedness to a context is often defined based on the co-occurrence of the target word with other words (context words) in sentences of a specific corpus. The overall disambiguation process can be thought as a classification process, where the context words play the role of features for the target. A problem with this approach is that the large number of possible context words can reduce the classification performance, both in terms of computational effort and in terms of quality of the outcome. Feature selection can improve the process in both regards, by reducing the overall feature space to a manageable size with high information content. In this work we propose to use, in disambiguation, a feature selection approach based on the Shapley Value (SV) - a Coalitional Game Theory related metrics, measuring the importance of a component within a coalition. By including in the feature set only the words with the highest Shapley Value, we obtain remarkable quality and performance improvements. The problem of the exponential complexity in the exact SV computation is avoided by an approximate computation based on sampling. We demonstrate the effectiveness of this method and of the sampling approach results, by using both a synthetic language corpus and a real world linguistic corpus.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 12th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS)

自引率

0.00%

发文量

期刊最新文献

Consensus as a Nash Equilibrium of a Dynamic Game An Ontology-Based Augmented Reality Application Exploring Contextual Data of Cultural Heritage Sites All-in-One Mobile Outdoor Augmented Reality Framework for Cultural Heritage Sites 3D Visual-Based Human Motion Descriptors: A Review Tags and Information Recollection