{"title":"从在线隐私策略中提取关键字和关键短语","authors":"Dhiren A. Audich, R. Dara, B. Nonnecke","doi":"10.1109/ICDIM.2016.7829792","DOIUrl":null,"url":null,"abstract":"One of the key components of constructing an ontology is a taxonomy. Creating a comprehensive taxonomy involves extracting keywords and keyphrases from the domain corpus. It is a time consuming endeavour that involves domain expertise and syntactic and structural knowledge of the corpus in question. In this paper we explore different keyword and keyphrase extraction algorithms for the domain of online privacy policies. To do this we used a variety of well-known techniques such as TF-IDF, RAKE, TextRank, and AlchemyAPI, benchmarked against manual annotation. We then further evaluated the performances of various algorithms over a large corpus of 631 privacy policies. Due to the inconsistent language of privacy policies algorithms evaluating single documents (RAKE, TextRank, AlchemyAPI) outperformed the one evaluating the entire corpus (TF-IDF).","PeriodicalId":146662,"journal":{"name":"2016 Eleventh International Conference on Digital Information Management (ICDIM)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Extracting keyword and keyphrase from online privacy policies\",\"authors\":\"Dhiren A. Audich, R. Dara, B. Nonnecke\",\"doi\":\"10.1109/ICDIM.2016.7829792\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the key components of constructing an ontology is a taxonomy. Creating a comprehensive taxonomy involves extracting keywords and keyphrases from the domain corpus. It is a time consuming endeavour that involves domain expertise and syntactic and structural knowledge of the corpus in question. In this paper we explore different keyword and keyphrase extraction algorithms for the domain of online privacy policies. To do this we used a variety of well-known techniques such as TF-IDF, RAKE, TextRank, and AlchemyAPI, benchmarked against manual annotation. We then further evaluated the performances of various algorithms over a large corpus of 631 privacy policies. Due to the inconsistent language of privacy policies algorithms evaluating single documents (RAKE, TextRank, AlchemyAPI) outperformed the one evaluating the entire corpus (TF-IDF).\",\"PeriodicalId\":146662,\"journal\":{\"name\":\"2016 Eleventh International Conference on Digital Information Management (ICDIM)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 Eleventh International Conference on Digital Information Management (ICDIM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDIM.2016.7829792\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Eleventh International Conference on Digital Information Management (ICDIM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDIM.2016.7829792","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Extracting keyword and keyphrase from online privacy policies
One of the key components of constructing an ontology is a taxonomy. Creating a comprehensive taxonomy involves extracting keywords and keyphrases from the domain corpus. It is a time consuming endeavour that involves domain expertise and syntactic and structural knowledge of the corpus in question. In this paper we explore different keyword and keyphrase extraction algorithms for the domain of online privacy policies. To do this we used a variety of well-known techniques such as TF-IDF, RAKE, TextRank, and AlchemyAPI, benchmarked against manual annotation. We then further evaluated the performances of various algorithms over a large corpus of 631 privacy policies. Due to the inconsistent language of privacy policies algorithms evaluating single documents (RAKE, TextRank, AlchemyAPI) outperformed the one evaluating the entire corpus (TF-IDF).