{"title":"WCOND-mine:从Web文档中检测Web内容异常值的算法","authors":"Malik Agyemang, K. Barker, R. Alhajj","doi":"10.1109/ISCC.2005.155","DOIUrl":null,"url":null,"abstract":"Outlier mining is dedicated to finding data objects, which differ significantly from the rest of the data. Outlier mining has been extensively studied in statistics and recently data mining. However, exploring the Web for outliers has received very little attention in the mining community. Web content outliers are documents with 'varying contents ' compared to similar Web documents taken from the same domain. Mining Web content outliers may lead to the identification of competitors and emerging business patterns in electronic commerce. This paper proposes WCOND-mine algorithm for mining Web content outliers using n-grams without a domain dictionary. Experimental results with embedded motifs show that WCOND-mine is capable of finding Web content outliers from Web datasets.","PeriodicalId":315855,"journal":{"name":"10th IEEE Symposium on Computers and Communications (ISCC'05)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"WCOND-mine: algorithm for detecting Web content outliers from Web documents\",\"authors\":\"Malik Agyemang, K. Barker, R. Alhajj\",\"doi\":\"10.1109/ISCC.2005.155\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Outlier mining is dedicated to finding data objects, which differ significantly from the rest of the data. Outlier mining has been extensively studied in statistics and recently data mining. However, exploring the Web for outliers has received very little attention in the mining community. Web content outliers are documents with 'varying contents ' compared to similar Web documents taken from the same domain. Mining Web content outliers may lead to the identification of competitors and emerging business patterns in electronic commerce. This paper proposes WCOND-mine algorithm for mining Web content outliers using n-grams without a domain dictionary. Experimental results with embedded motifs show that WCOND-mine is capable of finding Web content outliers from Web datasets.\",\"PeriodicalId\":315855,\"journal\":{\"name\":\"10th IEEE Symposium on Computers and Communications (ISCC'05)\",\"volume\":\"92 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"10th IEEE Symposium on Computers and Communications (ISCC'05)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCC.2005.155\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"10th IEEE Symposium on Computers and Communications (ISCC'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCC.2005.155","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
WCOND-mine: algorithm for detecting Web content outliers from Web documents
Outlier mining is dedicated to finding data objects, which differ significantly from the rest of the data. Outlier mining has been extensively studied in statistics and recently data mining. However, exploring the Web for outliers has received very little attention in the mining community. Web content outliers are documents with 'varying contents ' compared to similar Web documents taken from the same domain. Mining Web content outliers may lead to the identification of competitors and emerging business patterns in electronic commerce. This paper proposes WCOND-mine algorithm for mining Web content outliers using n-grams without a domain dictionary. Experimental results with embedded motifs show that WCOND-mine is capable of finding Web content outliers from Web datasets.