使用基于web的语义相似度度量来识别犯罪语料库中命名实体之间的关系

Priyanka Das, A. Das
{"title":"使用基于web的语义相似度度量来识别犯罪语料库中命名实体之间的关系","authors":"Priyanka Das, A. Das","doi":"10.1109/ICRCICN.2017.8234525","DOIUrl":null,"url":null,"abstract":"The present work proposes an unsupervised approach for recognising relations between named entities from a large corpora based on crime in Indian states and union territories. Initially, named entities have been identified from the extracted crime corpus and certain pair of entities have been chosen that facilitates the crime analysis. Then the entity pairs with their intermediate context words have been represented as a shallow parse tree for relation instance. From the parse trees, only the head words (in each entity pair) reflecting the main meaning of the phrases has been considered for measuring a semantic similarity using a web search engine that retrieves the page count of those particular words and their conjunctives. The derived page count is used for measuring the Simpson Coefficient between the pairs and based on this similarity score, an agglomerative hierarchical clustering technique has been applied that makes several clusters of entity pairs of same relationship. The resultant clusters also have been characterised with the most frequent head word present in the group. This proposed method shows a simple similarity measure technique for relation extraction from crime data providing better accuracy than other existing methods.","PeriodicalId":166298,"journal":{"name":"2017 Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Relation recognition among named entities from a crime corpus using a web-based semantic similarity measurement\",\"authors\":\"Priyanka Das, A. Das\",\"doi\":\"10.1109/ICRCICN.2017.8234525\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The present work proposes an unsupervised approach for recognising relations between named entities from a large corpora based on crime in Indian states and union territories. Initially, named entities have been identified from the extracted crime corpus and certain pair of entities have been chosen that facilitates the crime analysis. Then the entity pairs with their intermediate context words have been represented as a shallow parse tree for relation instance. From the parse trees, only the head words (in each entity pair) reflecting the main meaning of the phrases has been considered for measuring a semantic similarity using a web search engine that retrieves the page count of those particular words and their conjunctives. The derived page count is used for measuring the Simpson Coefficient between the pairs and based on this similarity score, an agglomerative hierarchical clustering technique has been applied that makes several clusters of entity pairs of same relationship. The resultant clusters also have been characterised with the most frequent head word present in the group. This proposed method shows a simple similarity measure technique for relation extraction from crime data providing better accuracy than other existing methods.\",\"PeriodicalId\":166298,\"journal\":{\"name\":\"2017 Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN)\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICRCICN.2017.8234525\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRCICN.2017.8234525","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

目前的工作提出了一种无监督的方法,用于识别基于印度各州和联邦领土犯罪的大型语料库中指定实体之间的关系。首先,从提取的犯罪语料库中识别出命名实体,并选择特定的实体对以方便犯罪分析。然后将实体对及其中间上下文词表示为关系实例的浅解析树。从解析树中,只考虑反映短语主要含义的头词(在每个实体对中),以便使用检索这些特定词及其连词的页面计数的web搜索引擎测量语义相似性。导出的页面计数用于测量对之间的辛普森系数,并基于此相似性得分,应用了一种凝聚分层聚类技术,使具有相同关系的实体对组成多个聚类。由此产生的集群也具有在组中出现的最频繁的头部词的特征。该方法提供了一种简单的相似性度量技术,可用于犯罪数据的关联提取,比现有方法具有更高的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Relation recognition among named entities from a crime corpus using a web-based semantic similarity measurement
The present work proposes an unsupervised approach for recognising relations between named entities from a large corpora based on crime in Indian states and union territories. Initially, named entities have been identified from the extracted crime corpus and certain pair of entities have been chosen that facilitates the crime analysis. Then the entity pairs with their intermediate context words have been represented as a shallow parse tree for relation instance. From the parse trees, only the head words (in each entity pair) reflecting the main meaning of the phrases has been considered for measuring a semantic similarity using a web search engine that retrieves the page count of those particular words and their conjunctives. The derived page count is used for measuring the Simpson Coefficient between the pairs and based on this similarity score, an agglomerative hierarchical clustering technique has been applied that makes several clusters of entity pairs of same relationship. The resultant clusters also have been characterised with the most frequent head word present in the group. This proposed method shows a simple similarity measure technique for relation extraction from crime data providing better accuracy than other existing methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
RGB image encryption using hyper chaotic system Characterisation of wireless network traffic: Fractality and stationarity Security risk assessment in online social networking: A detailed survey Optimalized hydel-thermic operative planning using IRECGA Designing an enhanced ZRP algorithm for MANET and simulation using OPNET
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1