基于数学语言学方法的门户网站用户短信文本作者身份识别方法

Sukhoparov Milhail, Lebedev Ilya
{"title":"基于数学语言学方法的门户网站用户短信文本作者身份识别方法","authors":"Sukhoparov Milhail, Lebedev Ilya","doi":"10.1109/ICAICT.2014.7035939","DOIUrl":null,"url":null,"abstract":"The article deals with the peculiarities of Internet portals, blogs and websites short messages texts authorship determination. The article focuses on possibility to search people who have several different accounts and send messages from them. Sentences dependence on the number of words in portals users' comments is represented. The model of Internet portal text message is provided. Method of Internet portals users' short messages texts authorship identification based on the naive Bayesian classifier is represented. The specific feature of the proposed method is not only frequency dictionary analysis based on messages selection to identify users, but their usage of rules and connections on the base of language syntactic information. The parts of speech frequency and connection frequency between parts of speech are given. The communication graph of parts of speech connections of limited natural language in commentaries is represented. Linguistic characteristics used to identify portal user are given. Structures are distinguished on the base of the communication graph between parts of speech as regards noun prepositional casal form of limited natural language used to identify text authorship. The experiment showing achievable indicators of Internet portal user identification probability depending on training sample is carried out. Probability diagrams of authorship identification based on selected characteristics are represented.","PeriodicalId":103329,"journal":{"name":"2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Methodologies of Internet portals users' short messages texts authorship identification based on the methods of mathematical linguistics\",\"authors\":\"Sukhoparov Milhail, Lebedev Ilya\",\"doi\":\"10.1109/ICAICT.2014.7035939\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The article deals with the peculiarities of Internet portals, blogs and websites short messages texts authorship determination. The article focuses on possibility to search people who have several different accounts and send messages from them. Sentences dependence on the number of words in portals users' comments is represented. The model of Internet portal text message is provided. Method of Internet portals users' short messages texts authorship identification based on the naive Bayesian classifier is represented. The specific feature of the proposed method is not only frequency dictionary analysis based on messages selection to identify users, but their usage of rules and connections on the base of language syntactic information. The parts of speech frequency and connection frequency between parts of speech are given. The communication graph of parts of speech connections of limited natural language in commentaries is represented. Linguistic characteristics used to identify portal user are given. Structures are distinguished on the base of the communication graph between parts of speech as regards noun prepositional casal form of limited natural language used to identify text authorship. The experiment showing achievable indicators of Internet portal user identification probability depending on training sample is carried out. Probability diagrams of authorship identification based on selected characteristics are represented.\",\"PeriodicalId\":103329,\"journal\":{\"name\":\"2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAICT.2014.7035939\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICT.2014.7035939","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

本文论述了互联网门户网站、博客网站和网站短消息文本作者身份认定的特点。本文关注的是搜索拥有多个不同账户的人并从他们那里发送消息的可能性。句子依赖于门户用户评论中的单词数。提出了Internet门户文本消息的模型。提出了一种基于朴素贝叶斯分类器的互联网门户用户短信文本作者身份识别方法。该方法的具体特点不仅在于基于消息选择的频率字典分析来识别用户,而且在于基于语言句法信息的用户规则和连接的使用。给出了词类频率和词类之间的连接频率。给出了解说词中有限自然语言词类连接的传播图。给出了用于识别门户用户的语言特征。以有限自然语言的名词、介词、词性形式为例,根据词性之间的交际图来区分结构。实验给出了基于训练样本的互联网门户用户识别概率可实现指标。给出了基于选定特征的作者身份识别的概率图。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Methodologies of Internet portals users' short messages texts authorship identification based on the methods of mathematical linguistics
The article deals with the peculiarities of Internet portals, blogs and websites short messages texts authorship determination. The article focuses on possibility to search people who have several different accounts and send messages from them. Sentences dependence on the number of words in portals users' comments is represented. The model of Internet portal text message is provided. Method of Internet portals users' short messages texts authorship identification based on the naive Bayesian classifier is represented. The specific feature of the proposed method is not only frequency dictionary analysis based on messages selection to identify users, but their usage of rules and connections on the base of language syntactic information. The parts of speech frequency and connection frequency between parts of speech are given. The communication graph of parts of speech connections of limited natural language in commentaries is represented. Linguistic characteristics used to identify portal user are given. Structures are distinguished on the base of the communication graph between parts of speech as regards noun prepositional casal form of limited natural language used to identify text authorship. The experiment showing achievable indicators of Internet portal user identification probability depending on training sample is carried out. Probability diagrams of authorship identification based on selected characteristics are represented.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A new robust binary image embedding algorithm in discrete wavelet domain Polyalphabetic Euclidean ciphers Complex system state generalized presentation based on concepts Using a knowledge base in developing modification for MS Dynamics AX TOFI technology capabilities for data processing and visualization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1