基于文本语料库的恐怖事件实体和属性提取

曹文斌, 武卓峰, 杨涛, 凡友荣
{"title":"基于文本语料库的恐怖事件实体和属性提取","authors":"曹文斌, 武卓峰, 杨涛, 凡友荣","doi":"10.13374/J.ISSN2095-9389.2019.09.13.003","DOIUrl":null,"url":null,"abstract":"Affected by complex international factors in recent years, terrorism events are increasingly rampant in many countries,thereby posing a great threat to the gloal community. In addition, with the widespread use of emerging technologies in military and commercial fields, terrorist organizations have begun to use emerging technologies to engage in destructive activities. As the Internet and information technology develop, terrorism has been rapidly spreading in cyberspace. Terrorist organizations have created terrorism websites, established multinational networks of terrorist organizations, released recruitment information and even conducted training activities through various mainstream websites with a worldwide reach. Compared with traditional terrorist activities, cyber terrorist activities have a greater degree of destructiveness. Cybercrime and cyber terrorism have become the most serious challenges for societies. Terrorist organizations take advantage of the Internet in rapid dissemination of extremism ideas, and develop a large number of terrorists and supporters around the world, especially in developed Western countries. Terrorist organizations even use the Internet and\"dark net\" networks to conduct terrorist training, and their activities are concealed. As a result, the \"lone wolf\" terrorist attacks in various countries have emerged in an endless stream, which is difficult to prevent. This study proposed a method of extracting entities and attributes of terrorist events based on semantic role analysis, and provided technical support for monitoring and predicting cyberspace terrorism activities. Firstly, a naive Bayesian text classification algorithm is used to identify terrorism events on the cleaned text corpus collected from the Anti-Terrorism Information Site of the Northwest University of Political Science and Law.The keyword extraction algorithm TF-IDF is adopted for constructing the terrorism vocabularies from the classified text corpus,combining natural language processing technology.Then,semantic role and syntactic dependency analyses are conducted to mine the attributive posttargeting relationship,the name//place name//organization,and the mediator-like relationship.Finally,regular expressions and constructed lexical terrorism-specific vocabularies are used to extract six entities and attributes(occurrence time,occurrence location,casualties,attack methods,weapon types and terrorist organizations)of terrorism event based on the four types of triad short texts.The F1 values of the six types of entity attribute extraction evaluation results exceeded 80%based on the experimental data of 4221 articles collected.Therefore,the method proposed has practical significance for maintaining social public safety because of the positive effect in monitoring and predicting cyberspace terrorism events.","PeriodicalId":31263,"journal":{"name":"工程设计学报","volume":"60 1","pages":"500-508"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Entity and attribute extraction of terrorism event based on text corpus\",\"authors\":\"曹文斌, 武卓峰, 杨涛, 凡友荣\",\"doi\":\"10.13374/J.ISSN2095-9389.2019.09.13.003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Affected by complex international factors in recent years, terrorism events are increasingly rampant in many countries,thereby posing a great threat to the gloal community. In addition, with the widespread use of emerging technologies in military and commercial fields, terrorist organizations have begun to use emerging technologies to engage in destructive activities. As the Internet and information technology develop, terrorism has been rapidly spreading in cyberspace. Terrorist organizations have created terrorism websites, established multinational networks of terrorist organizations, released recruitment information and even conducted training activities through various mainstream websites with a worldwide reach. Compared with traditional terrorist activities, cyber terrorist activities have a greater degree of destructiveness. Cybercrime and cyber terrorism have become the most serious challenges for societies. Terrorist organizations take advantage of the Internet in rapid dissemination of extremism ideas, and develop a large number of terrorists and supporters around the world, especially in developed Western countries. Terrorist organizations even use the Internet and\\\"dark net\\\" networks to conduct terrorist training, and their activities are concealed. As a result, the \\\"lone wolf\\\" terrorist attacks in various countries have emerged in an endless stream, which is difficult to prevent. This study proposed a method of extracting entities and attributes of terrorist events based on semantic role analysis, and provided technical support for monitoring and predicting cyberspace terrorism activities. Firstly, a naive Bayesian text classification algorithm is used to identify terrorism events on the cleaned text corpus collected from the Anti-Terrorism Information Site of the Northwest University of Political Science and Law.The keyword extraction algorithm TF-IDF is adopted for constructing the terrorism vocabularies from the classified text corpus,combining natural language processing technology.Then,semantic role and syntactic dependency analyses are conducted to mine the attributive posttargeting relationship,the name//place name//organization,and the mediator-like relationship.Finally,regular expressions and constructed lexical terrorism-specific vocabularies are used to extract six entities and attributes(occurrence time,occurrence location,casualties,attack methods,weapon types and terrorist organizations)of terrorism event based on the four types of triad short texts.The F1 values of the six types of entity attribute extraction evaluation results exceeded 80%based on the experimental data of 4221 articles collected.Therefore,the method proposed has practical significance for maintaining social public safety because of the positive effect in monitoring and predicting cyberspace terrorism events.\",\"PeriodicalId\":31263,\"journal\":{\"name\":\"工程设计学报\",\"volume\":\"60 1\",\"pages\":\"500-508\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"工程设计学报\",\"FirstCategoryId\":\"1087\",\"ListUrlMain\":\"https://doi.org/10.13374/J.ISSN2095-9389.2019.09.13.003\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"工程设计学报","FirstCategoryId":"1087","ListUrlMain":"https://doi.org/10.13374/J.ISSN2095-9389.2019.09.13.003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 0

摘要

近年来,受复杂国际因素影响,恐怖主义事件在许多国家日益猖獗,对国际社会构成巨大威胁。此外,随着新兴技术在军事和商业领域的广泛应用,恐怖组织也开始利用新兴技术从事破坏性活动。随着互联网和信息技术的发展,恐怖主义在网络空间迅速蔓延。恐怖组织建立恐怖网站,建立跨国恐怖组织网络,通过遍布全球的各种主流网站发布招募信息,甚至开展培训活动。与传统恐怖活动相比,网络恐怖活动具有更大程度的破坏性。网络犯罪和网络恐怖主义已成为社会面临的最严峻挑战。恐怖组织利用互联网迅速传播极端思想,在世界范围内,特别是在西方发达国家发展了大批恐怖分子和支持者。恐怖组织甚至利用互联网和“暗网”网络进行恐怖训练,其活动是隐蔽的。因此,各国“独狼”式恐怖袭击层出不穷,难以防范。本研究提出了一种基于语义角色分析的恐怖事件实体和属性提取方法,为网络空间恐怖活动的监测和预测提供技术支持。首先,利用朴素贝叶斯文本分类算法,对西北政法大学反恐信息网站的清洗文本语料库进行恐怖事件识别。结合自然语言处理技术,采用关键字提取算法TF-IDF从分类文本语料库中构建恐怖主义词汇。然后进行语义角色和句法依赖分析,挖掘属性后目标关系、名称/地名/组织和类中介关系。最后,利用正则表达式和构建的词汇恐怖主义专用词汇,基于四种类型的黑社会短文本,提取恐怖主义事件的六个实体和属性(发生时间、发生地点、伤亡人数、袭击方式、武器类型和恐怖组织)。基于收集的4221篇文章的实验数据,6类实体属性提取评价结果的F1值均超过80%。因此,所提出的方法对网络空间恐怖事件的监测和预测具有积极的作用,对于维护社会公共安全具有现实意义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Entity and attribute extraction of terrorism event based on text corpus
Affected by complex international factors in recent years, terrorism events are increasingly rampant in many countries,thereby posing a great threat to the gloal community. In addition, with the widespread use of emerging technologies in military and commercial fields, terrorist organizations have begun to use emerging technologies to engage in destructive activities. As the Internet and information technology develop, terrorism has been rapidly spreading in cyberspace. Terrorist organizations have created terrorism websites, established multinational networks of terrorist organizations, released recruitment information and even conducted training activities through various mainstream websites with a worldwide reach. Compared with traditional terrorist activities, cyber terrorist activities have a greater degree of destructiveness. Cybercrime and cyber terrorism have become the most serious challenges for societies. Terrorist organizations take advantage of the Internet in rapid dissemination of extremism ideas, and develop a large number of terrorists and supporters around the world, especially in developed Western countries. Terrorist organizations even use the Internet and"dark net" networks to conduct terrorist training, and their activities are concealed. As a result, the "lone wolf" terrorist attacks in various countries have emerged in an endless stream, which is difficult to prevent. This study proposed a method of extracting entities and attributes of terrorist events based on semantic role analysis, and provided technical support for monitoring and predicting cyberspace terrorism activities. Firstly, a naive Bayesian text classification algorithm is used to identify terrorism events on the cleaned text corpus collected from the Anti-Terrorism Information Site of the Northwest University of Political Science and Law.The keyword extraction algorithm TF-IDF is adopted for constructing the terrorism vocabularies from the classified text corpus,combining natural language processing technology.Then,semantic role and syntactic dependency analyses are conducted to mine the attributive posttargeting relationship,the name//place name//organization,and the mediator-like relationship.Finally,regular expressions and constructed lexical terrorism-specific vocabularies are used to extract six entities and attributes(occurrence time,occurrence location,casualties,attack methods,weapon types and terrorist organizations)of terrorism event based on the four types of triad short texts.The F1 values of the six types of entity attribute extraction evaluation results exceeded 80%based on the experimental data of 4221 articles collected.Therefore,the method proposed has practical significance for maintaining social public safety because of the positive effect in monitoring and predicting cyberspace terrorism events.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
工程设计学报
工程设计学报 Engineering-Engineering (miscellaneous)
CiteScore
0.60
自引率
0.00%
发文量
2447
审稿时长
14 weeks
期刊介绍: Chinese Journal of Engineering Design is a reputable journal published by Zhejiang University Press Co., Ltd. It was founded in December, 1994 as the first internationally cooperative journal in the area of engineering design research. Administrated by the Ministry of Education of China, it is sponsored by both Zhejiang University and Chinese Society of Mechanical Engineering. Zhejiang University Press Co., Ltd. is fully responsible for its bimonthly domestic and oversea publication. Its page is in A4 size. This journal is devoted to reporting most up-to-date achievements of engineering design researches and therefore, to promote the communications of academic researches and their applications to industry. Achievments of great creativity and practicablity are extraordinarily desirable. Aiming at supplying designers, developers and researchers of diversified technical artifacts with valuable references, its content covers all aspects of design theory and methodology, as well as its enabling environment, for instance, creative design, concurrent design, conceptual design, intelligent design, web-based design, reverse engineering design, industrial design, design optimization, tribology, design by biological analogy, virtual reality in design, structural analysis and design, design knowledge representation, design knowledge management, design decision-making systems, etc.
期刊最新文献
Innovative design of box elevator epidemic prevention function integrating AD and TRIZ Discrete element simulation for evolution characteristics of multi-funnel mineral-rock force chain under flexible isolation layer Application progress of artificial intelligence in military confrontation Cloud storage data integrity audit based on an index–stub table Clinical named entity recognition from Chinese electronic medical records using a double-layer annotation model combining a domain dictionary with CRF
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1