A corpus-based real-time text classification and tagging approach for social data

IF 2.4 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Frontiers in Computer Science Pub Date : 2024-03-13 DOI:10.3389/fcomp.2024.1294985
A. Memon, Dileep Kumar Sootahar, K. K. Luhana, Kyrill Meyer
{"title":"A corpus-based real-time text classification and tagging approach for social data","authors":"A. Memon, Dileep Kumar Sootahar, K. K. Luhana, Kyrill Meyer","doi":"10.3389/fcomp.2024.1294985","DOIUrl":null,"url":null,"abstract":"With the rapid accumulation of large amounts of user-generated content through social media, social data reuse and integration have gained increasing attention recently. This has made it almost obsolete for software applications to collect, store, and work with their own data stored on local servers. While, with the provision of Application Programming Interfaces from the leading social networking sites, data acquisition and integration has become possible, the meaningful usage of such unstructured, non-uniform, and incoherent data collections needs special procedures of data summarization, understanding, and visualization. One particular aspect in this regard that needs special attention is the procedures for data (text snippets in the form of social media posts) categorization and concept tagging to filter out the relevant and most suitable data for the particular audience and for the particular purpose. In this regard, we propose a corpus-based approach for searching and successively categorizing and tagging the social data with relevant concepts in real time. The proposed approach is capable of addressing the semantical and morphological similarities, as well as domain-specific vocabularies of query strings and tagged concepts. We demonstrate the feasibility and application of our proposed approach in a web-based tool that allows searching Facebook posts and provides search results together with a concept map for further navigation, filtering, and refining of search results. The tool has been evaluated by performing multiple search queries, and resultant concept maps and annotated texts are analyzed in terms of their precision. The approach is thereby found effective in achieving its stated goal of classifying text snippets in real time.","PeriodicalId":52823,"journal":{"name":"Frontiers in Computer Science","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fcomp.2024.1294985","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

With the rapid accumulation of large amounts of user-generated content through social media, social data reuse and integration have gained increasing attention recently. This has made it almost obsolete for software applications to collect, store, and work with their own data stored on local servers. While, with the provision of Application Programming Interfaces from the leading social networking sites, data acquisition and integration has become possible, the meaningful usage of such unstructured, non-uniform, and incoherent data collections needs special procedures of data summarization, understanding, and visualization. One particular aspect in this regard that needs special attention is the procedures for data (text snippets in the form of social media posts) categorization and concept tagging to filter out the relevant and most suitable data for the particular audience and for the particular purpose. In this regard, we propose a corpus-based approach for searching and successively categorizing and tagging the social data with relevant concepts in real time. The proposed approach is capable of addressing the semantical and morphological similarities, as well as domain-specific vocabularies of query strings and tagged concepts. We demonstrate the feasibility and application of our proposed approach in a web-based tool that allows searching Facebook posts and provides search results together with a concept map for further navigation, filtering, and refining of search results. The tool has been evaluated by performing multiple search queries, and resultant concept maps and annotated texts are analyzed in terms of their precision. The approach is thereby found effective in achieving its stated goal of classifying text snippets in real time.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于语料库的社交数据实时文本分类和标记方法
随着大量用户通过社交媒体产生的内容迅速积累,社交数据的重用和整合近来日益受到关注。这使得软件应用程序收集、存储和处理自己存储在本地服务器上的数据的方式几乎已经过时。虽然主要社交网站提供了应用编程接口,使数据采集和整合成为可能,但要有效利用这些非结构化、非统一和不连贯的数据集合,还需要特殊的数据汇总、理解和可视化程序。在这方面,需要特别注意的一个方面是数据(社交媒体帖子形式的文本片段)分类和概念标记程序,以筛选出最适合特定受众和特定目的的相关数据。为此,我们提出了一种基于语料库的方法,用于实时搜索、连续分类和标记具有相关概念的社交数据。所提出的方法能够解决查询字符串和标记概念的语义和形态相似性以及特定领域词汇的问题。我们在一个基于网络的工具中演示了所提方法的可行性和应用,该工具允许搜索 Facebook 帖子,并提供搜索结果和概念图,以便进一步导航、过滤和完善搜索结果。通过执行多个搜索查询对该工具进行了评估,并从精确度的角度对搜索结果概念图和注释文本进行了分析。结果发现,该方法能有效地实现实时对文本片段进行分类的既定目标。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Frontiers in Computer Science
Frontiers in Computer Science COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-
CiteScore
4.30
自引率
0.00%
发文量
152
审稿时长
13 weeks
期刊最新文献
A Support Vector Machine based approach for plagiarism detection in Python code submissions in undergraduate settings Working with agile and crowd: human factors identified from the industry Energy-efficient, low-latency, and non-contact eye blink detection with capacitive sensing Experimenting with D-Wave quantum annealers on prime factorization problems Fuzzy Markov model for the reliability analysis of hybrid microgrids
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1