{"title":"Constructing concept clouds from company websites","authors":"Rosa Tsegaye Aga, Christian Wartena","doi":"10.1145/2809563.2809615","DOIUrl":null,"url":null,"abstract":"Word clouds are used for the visual representation of texts. The font size and color of a word show its importance, and the position of a word in the cloud can be arbitrary or reflect its relation to other words. In this paper, we present a tool that generates concept clouds from German company websites. The main idea of the visualization is to show the overall work and main interests of companies in a detailed information cloud based solely on their own web page. The concepts are taken from the STW Thesaurus of Economics. The colors of the concepts show the categories of the concepts in the thesaurus while the cloud layout is organized by semantic proximity of the concepts. To compute the similarity between concepts we use the semantic representation that is generated from DeWaC corpus. The distributional similarity is fundamentally different from the co-occurrence statistics which often used to generate word clouds.","PeriodicalId":20526,"journal":{"name":"Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business","volume":"56 19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2015-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2809563.2809615","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Word clouds are used for the visual representation of texts. The font size and color of a word show its importance, and the position of a word in the cloud can be arbitrary or reflect its relation to other words. In this paper, we present a tool that generates concept clouds from German company websites. The main idea of the visualization is to show the overall work and main interests of companies in a detailed information cloud based solely on their own web page. The concepts are taken from the STW Thesaurus of Economics. The colors of the concepts show the categories of the concepts in the thesaurus while the cloud layout is organized by semantic proximity of the concepts. To compute the similarity between concepts we use the semantic representation that is generated from DeWaC corpus. The distributional similarity is fundamentally different from the co-occurrence statistics which often used to generate word clouds.