{"title":"加权混合特征来解决混合实体","authors":"Ingyu Lee, Byung-Won On","doi":"10.1109/ICDIM.2011.6093351","DOIUrl":null,"url":null,"abstract":"With the popularity of Internet, tremendous amount of unstructured document information is available to access. Extracting related information from huge unstructured documents is a very difficult task. Especially, confusion can occur by synonym and polysemy, miss spelling, abbreviation, etc. To resolve those confusion is known as an Entity Resolution problem. Clustering algorithms have been popularly used to resolve mixed entities. However, most researches focus on one feature of an entity such as co-author lists or paper titles. In this paper, we are proposing a weighted hybrid feature scheme to distinguish mixed entities among unstructured documents. Experimental results show that weighted hybrid approach improves the accuracy and efficiency.","PeriodicalId":355775,"journal":{"name":"2011 Sixth International Conference on Digital Information Management","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Weighted hybrid features to resolve mixed entities\",\"authors\":\"Ingyu Lee, Byung-Won On\",\"doi\":\"10.1109/ICDIM.2011.6093351\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the popularity of Internet, tremendous amount of unstructured document information is available to access. Extracting related information from huge unstructured documents is a very difficult task. Especially, confusion can occur by synonym and polysemy, miss spelling, abbreviation, etc. To resolve those confusion is known as an Entity Resolution problem. Clustering algorithms have been popularly used to resolve mixed entities. However, most researches focus on one feature of an entity such as co-author lists or paper titles. In this paper, we are proposing a weighted hybrid feature scheme to distinguish mixed entities among unstructured documents. Experimental results show that weighted hybrid approach improves the accuracy and efficiency.\",\"PeriodicalId\":355775,\"journal\":{\"name\":\"2011 Sixth International Conference on Digital Information Management\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 Sixth International Conference on Digital Information Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDIM.2011.6093351\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Sixth International Conference on Digital Information Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDIM.2011.6093351","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Weighted hybrid features to resolve mixed entities
With the popularity of Internet, tremendous amount of unstructured document information is available to access. Extracting related information from huge unstructured documents is a very difficult task. Especially, confusion can occur by synonym and polysemy, miss spelling, abbreviation, etc. To resolve those confusion is known as an Entity Resolution problem. Clustering algorithms have been popularly used to resolve mixed entities. However, most researches focus on one feature of an entity such as co-author lists or paper titles. In this paper, we are proposing a weighted hybrid feature scheme to distinguish mixed entities among unstructured documents. Experimental results show that weighted hybrid approach improves the accuracy and efficiency.