{"title":"Clustering Based on Context Similarity","authors":"L. Kovács, T. Repasi, E. Baksa-Varga, P. Barabas","doi":"10.1109/CANS.2008.26","DOIUrl":null,"url":null,"abstract":"The discovery of word categories is an important step in statistical grammar induction systems. Word categories can be considered as clusters containing words with similar grammatical or semantic behavior. Having a metric space of words, the clustering algorithm will place similar words into the same cluster, whereas dissimilar ones are clustered into different groups. In this paper we propose an approximate word clustering method based on context similarity. The context of a word is defined here as the set of sentences containing the word. The similarity of two words is measured with the similarity of the corresponding context sets. For the calculation of the context-based distance of two words, a hierarchical agglomerative clustering algorithm has been developed, and is presented here.","PeriodicalId":50026,"journal":{"name":"Journal of Systems Science & Complexity","volume":"42 1","pages":"157-165"},"PeriodicalIF":2.6000,"publicationDate":"2008-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Science & Complexity","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1109/CANS.2008.26","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 3
Abstract
The discovery of word categories is an important step in statistical grammar induction systems. Word categories can be considered as clusters containing words with similar grammatical or semantic behavior. Having a metric space of words, the clustering algorithm will place similar words into the same cluster, whereas dissimilar ones are clustered into different groups. In this paper we propose an approximate word clustering method based on context similarity. The context of a word is defined here as the set of sentences containing the word. The similarity of two words is measured with the similarity of the corresponding context sets. For the calculation of the context-based distance of two words, a hierarchical agglomerative clustering algorithm has been developed, and is presented here.
期刊介绍:
The Journal of Systems Science and Complexity is dedicated to publishing high quality papers on mathematical theories, methodologies, and applications of systems science and complexity science. It encourages fundamental research into complex systems and complexity and fosters cross-disciplinary approaches to elucidate the common mathematical methods that arise in natural, artificial, and social systems. Topics covered are:
complex systems,
systems control,
operations research for complex systems,
economic and financial systems analysis,
statistics and data science,
computer mathematics,
systems security, coding theory and crypto-systems,
other topics related to systems science.