基于多特征组合的三阶段聚类框架中文人名消歧

2013 International Conference on Information Science and Cloud Computing Companion Pub Date : 2013-12-07 DOI:10.1109/ISCC-C.2013.33

Fei Wang, Yi Yang, Zhaocai Ma, Lian Li

{"title":"基于多特征组合的三阶段聚类框架中文人名消歧","authors":"Fei Wang, Yi Yang, Zhaocai Ma, Lian Li","doi":"10.1109/ISCC-C.2013.33","DOIUrl":null,"url":null,"abstract":"To solve name ambiguity problems and improve the performance of person name disambiguation, we propose a three-stage clustering algorithm in the paper. In the first stage, organizations and locations (OLs) are used to cluster documents about the same person, therefore some texts with more resemblance will be assigned to one category. This stage is simply document clustering based on the similarity of OLs. In the second stage, the clustered documents are used as a new data source from which some novel features (like co-author names) are extracted. We used these new extracted features to make additional clustering between documents. Meanwhile, a method was proposed to solve name ambiguity problems by using social networks construction based on the relationships among co-authors. In the third stage, Web pages are further clustered using content-based hierarchical agglomerative clustering (HAC) algorithm, then analyzing the useful content including title and abstract and keywords (TAKs) to disambiguate the ambiguous names. Experimental results show that our three-stage clustering algorithm can availably enhance the performance of person name disambiguation.","PeriodicalId":313511,"journal":{"name":"2013 International Conference on Information Science and Cloud Computing Companion","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Three-Stage Clustering Framework Based on Multiple Feature Combination for Chinese Person Name Disambiguation\",\"authors\":\"Fei Wang, Yi Yang, Zhaocai Ma, Lian Li\",\"doi\":\"10.1109/ISCC-C.2013.33\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To solve name ambiguity problems and improve the performance of person name disambiguation, we propose a three-stage clustering algorithm in the paper. In the first stage, organizations and locations (OLs) are used to cluster documents about the same person, therefore some texts with more resemblance will be assigned to one category. This stage is simply document clustering based on the similarity of OLs. In the second stage, the clustered documents are used as a new data source from which some novel features (like co-author names) are extracted. We used these new extracted features to make additional clustering between documents. Meanwhile, a method was proposed to solve name ambiguity problems by using social networks construction based on the relationships among co-authors. In the third stage, Web pages are further clustered using content-based hierarchical agglomerative clustering (HAC) algorithm, then analyzing the useful content including title and abstract and keywords (TAKs) to disambiguate the ambiguous names. Experimental results show that our three-stage clustering algorithm can availably enhance the performance of person name disambiguation.\",\"PeriodicalId\":313511,\"journal\":{\"name\":\"2013 International Conference on Information Science and Cloud Computing Companion\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Information Science and Cloud Computing Companion\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCC-C.2013.33\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Information Science and Cloud Computing Companion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCC-C.2013.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

为了解决人名歧义问题，提高人名消歧义的性能，本文提出了一种三阶段聚类算法。在第一阶段，使用组织和位置(OLs)来聚类关于同一个人的文档，因此一些更相似的文本将被分配到一个类别。这个阶段是简单的基于ol相似性的文档聚类。在第二阶段，将聚类文档用作新数据源，从中提取一些新特征(如合著者姓名)。我们使用这些新提取的特征在文档之间进行额外的聚类。同时，提出了一种基于共同作者关系的社会网络构建方法来解决姓名歧义问题。在第三阶段，采用基于内容的层次聚类算法对网页进行聚类，然后对有用内容(包括标题、摘要和关键词)进行分析，消除歧义性名称。实验结果表明，本文提出的三阶段聚类算法可以有效地提高人名消歧的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Three-Stage Clustering Framework Based on Multiple Feature Combination for Chinese Person Name Disambiguation

To solve name ambiguity problems and improve the performance of person name disambiguation, we propose a three-stage clustering algorithm in the paper. In the first stage, organizations and locations (OLs) are used to cluster documents about the same person, therefore some texts with more resemblance will be assigned to one category. This stage is simply document clustering based on the similarity of OLs. In the second stage, the clustered documents are used as a new data source from which some novel features (like co-author names) are extracted. We used these new extracted features to make additional clustering between documents. Meanwhile, a method was proposed to solve name ambiguity problems by using social networks construction based on the relationships among co-authors. In the third stage, Web pages are further clustered using content-based hierarchical agglomerative clustering (HAC) algorithm, then analyzing the useful content including title and abstract and keywords (TAKs) to disambiguate the ambiguous names. Experimental results show that our three-stage clustering algorithm can availably enhance the performance of person name disambiguation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 International Conference on Information Science and Cloud Computing Companion

自引率

0.00%

发文量

期刊最新文献

Commercial Bank Stress Tests Based on Credit Risk An Instant-Based Qur'an Memorizer Application Interface Optimization of PID Parameters Based on Improved Particle-Swarm-Optimization The Universal Approximation Capabilities of 2pi-Periodic Approximate Identity Neural Networks Survey of Cloud Messaging Push Notification Service