Open Stylometric System WebSty : Integrated Language Processing, Analysis and Visualisation

Maciej Piasecki, T. Walkowiak, Maciej Eder
{"title":"Open Stylometric System WebSty : Integrated Language Processing, Analysis and Visualisation","authors":"Maciej Piasecki, T. Walkowiak, Maciej Eder","doi":"10.12921/CMST.2018.0000007","DOIUrl":null,"url":null,"abstract":"The paper presents an open, web-based system for stylometric analysis named WebSty, which is a part of the CLARIN-PL research infrastructure. WebSty does not require local installation by users, can be used via any web browser, offers rich set-up, and runs on a computing cluster. We discuss the underlying ideas of the system, its architecture, a pipeline of language tools for processing Polish, and its integration with systems for clustering, visualizing the results of clustering, and identifying the features of the strongest discrimination power. The techniques used for feature weighting and text similarity measuring are also concisely overviewed. In conclusions, we present preliminary evaluation of WebSty on the corpus of 1000 literary works, and we report on the results of the first research applications of WebSty. Even if the system was initially focused on processing Polish texts, we also briefly discuss its development towards a multilingual system, which already supports English, German and Hungarian.","PeriodicalId":10561,"journal":{"name":"computational methods in science and technology","volume":"6 12","pages":"43-58"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"computational methods in science and technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12921/CMST.2018.0000007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

The paper presents an open, web-based system for stylometric analysis named WebSty, which is a part of the CLARIN-PL research infrastructure. WebSty does not require local installation by users, can be used via any web browser, offers rich set-up, and runs on a computing cluster. We discuss the underlying ideas of the system, its architecture, a pipeline of language tools for processing Polish, and its integration with systems for clustering, visualizing the results of clustering, and identifying the features of the strongest discrimination power. The techniques used for feature weighting and text similarity measuring are also concisely overviewed. In conclusions, we present preliminary evaluation of WebSty on the corpus of 1000 literary works, and we report on the results of the first research applications of WebSty. Even if the system was initially focused on processing Polish texts, we also briefly discuss its development towards a multilingual system, which already supports English, German and Hungarian.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
开放式文体学系统WebSty:综合语言处理、分析和可视化
本文提出了一个开放的、基于网络的文体分析系统WebSty,它是CLARIN-PL研究基础设施的一部分。WebSty不需要用户在本地安装,可以通过任何web浏览器使用,提供丰富的设置,并在计算集群上运行。我们讨论了系统的基本思想,它的架构,用于处理波兰语的语言工具管道,以及它与聚类系统的集成,可视化聚类结果,并确定最强辨别力的特征。简要介绍了特征加权和文本相似度度量的相关技术。最后,我们在1000部文学作品的语料库上对WebSty进行了初步评估,并报告了WebSty的首次研究应用结果。即使该系统最初专注于处理波兰文本,我们也简要讨论了其向多语言系统的发展,该系统已经支持英语,德语和匈牙利语。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Contactless Patient Authentication for Registration Using Face Recognition Technology A Scalable Cloud-Based Medical Adherence System with Data Analytic for Enabling Home Hospitalization Fake News Detection Issues and Challenges for Teaching Successful Programming Courses at National Secondary Schools of Malaysia Computational Science and Technology: 7th ICCST 2020, Pattaya, Thailand, 29–30 August, 2020
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1