{"title":"基于设施定位和球面k均值的文本流聚类","authors":"Aaditya Jain, I. Sharma","doi":"10.1109/ICECA.2018.8474757","DOIUrl":null,"url":null,"abstract":"Spherical k-means is a fast and effective method for clustering text documents in their directional representation over a unit hypersphere. Current needs of text clustering are related to clustering of streams. Due to memory restrictions, fast and effective methods are required that incur less space complexity. Few research works exist that have adapted spherical k-means to streaming text data, but recorded performance is not satisfactory for novelty detection imbalanced cluster structure. This paper presents streaming spherical k-means with associated facility location costs. Arriving documents are detected as new topic or join an existing depending on these costs.","PeriodicalId":272623,"journal":{"name":"2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Clustering of Text Streams via Facility Location and Spherical K-means\",\"authors\":\"Aaditya Jain, I. Sharma\",\"doi\":\"10.1109/ICECA.2018.8474757\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Spherical k-means is a fast and effective method for clustering text documents in their directional representation over a unit hypersphere. Current needs of text clustering are related to clustering of streams. Due to memory restrictions, fast and effective methods are required that incur less space complexity. Few research works exist that have adapted spherical k-means to streaming text data, but recorded performance is not satisfactory for novelty detection imbalanced cluster structure. This paper presents streaming spherical k-means with associated facility location costs. Arriving documents are detected as new topic or join an existing depending on these costs.\",\"PeriodicalId\":272623,\"journal\":{\"name\":\"2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICECA.2018.8474757\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECA.2018.8474757","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Clustering of Text Streams via Facility Location and Spherical K-means
Spherical k-means is a fast and effective method for clustering text documents in their directional representation over a unit hypersphere. Current needs of text clustering are related to clustering of streams. Due to memory restrictions, fast and effective methods are required that incur less space complexity. Few research works exist that have adapted spherical k-means to streaming text data, but recorded performance is not satisfactory for novelty detection imbalanced cluster structure. This paper presents streaming spherical k-means with associated facility location costs. Arriving documents are detected as new topic or join an existing depending on these costs.