Clustering of Business Organisations based on Textual Data - An LDA Topic Modeling Approach

Ferenc Tolner, M. Takács, G. Eigner, Balázs Barta
{"title":"Clustering of Business Organisations based on Textual Data - An LDA Topic Modeling Approach","authors":"Ferenc Tolner, M. Takács, G. Eigner, Balázs Barta","doi":"10.1109/CINTI53070.2021.9668337","DOIUrl":null,"url":null,"abstract":"Textual data provides a new perspective and a huge potential with additional information in analysing and segmenting business organisations. Statistical “hard data” is often too general or even misleading and might be affected by several exogenous and endogenous factors while questionnaire or survey related “soft data” is hardly available or can be biased by the interviewees position in the organisation or by its own personal orientation. On the other hand, besides the aforementioned information sources business organisations, education- and research institutions etc. provide many times textual data on themselves as well, that can further contribute to the understanding of the investigated population. In this paper a topic modeling of 51 Central European business-, educational- and research organisation has been performed by Latent Dirichlet Allocation (LDA). The investigated organisations were partakers of an online survey where their textual organisational descriptions were collected together with basic geographical and industry related data. Based on the result a grouping of the stakeholders has been implemented and an LDA based methodology has been tested in order to further support cluster-forming efforts of business- and other type of organisations within the Central European region.","PeriodicalId":340545,"journal":{"name":"2021 IEEE 21st International Symposium on Computational Intelligence and Informatics (CINTI)","volume":"55 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 21st International Symposium on Computational Intelligence and Informatics (CINTI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CINTI53070.2021.9668337","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Textual data provides a new perspective and a huge potential with additional information in analysing and segmenting business organisations. Statistical “hard data” is often too general or even misleading and might be affected by several exogenous and endogenous factors while questionnaire or survey related “soft data” is hardly available or can be biased by the interviewees position in the organisation or by its own personal orientation. On the other hand, besides the aforementioned information sources business organisations, education- and research institutions etc. provide many times textual data on themselves as well, that can further contribute to the understanding of the investigated population. In this paper a topic modeling of 51 Central European business-, educational- and research organisation has been performed by Latent Dirichlet Allocation (LDA). The investigated organisations were partakers of an online survey where their textual organisational descriptions were collected together with basic geographical and industry related data. Based on the result a grouping of the stakeholders has been implemented and an LDA based methodology has been tested in order to further support cluster-forming efforts of business- and other type of organisations within the Central European region.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于文本数据的商业组织聚类——一种LDA主题建模方法
文本数据提供了一个新的视角和巨大的潜力,在分析和细分业务组织的附加信息。统计上的“硬数据”往往过于笼统,甚至具有误导性,可能受到几种外生和内生因素的影响,而问卷或调查相关的“软数据”很难获得,或者可能因受访者在组织中的职位或其个人取向而产生偏见。另一方面,除了上述信息来源,商业组织,教育和研究机构等也提供了很多次关于他们自己的文本数据,这可以进一步有助于了解被调查人群。本文利用潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)对51个中欧商业、教育和研究机构进行了主题建模。被调查的组织是在线调查的参与者,他们的文本组织描述与基本的地理和行业相关数据一起收集。根据调查结果,对利益相关者进行了分组,并对基于LDA的方法进行了测试,以进一步支持中欧地区商业组织和其他类型组织的集群形成工作。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Clustering of Business Organisations based on Textual Data - An LDA Topic Modeling Approach Effective ECG data conversion solution to solve ECG data interoperability problems Adaptive Online Opponent Game Policy Modeling with Association Rule Mining Meat Factory Cell: Assisting meat processors address sustainability in meat production Validation of a Self-developed Algorithm for Solving Inverse Problems on Impedance Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1