使用基于web的语义相似度度量来识别犯罪语料库中命名实体之间的关系

2017 Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN) Pub Date : 2017-11-01 DOI:10.1109/ICRCICN.2017.8234525

Priyanka Das, A. Das

{"title":"使用基于web的语义相似度度量来识别犯罪语料库中命名实体之间的关系","authors":"Priyanka Das, A. Das","doi":"10.1109/ICRCICN.2017.8234525","DOIUrl":null,"url":null,"abstract":"The present work proposes an unsupervised approach for recognising relations between named entities from a large corpora based on crime in Indian states and union territories. Initially, named entities have been identified from the extracted crime corpus and certain pair of entities have been chosen that facilitates the crime analysis. Then the entity pairs with their intermediate context words have been represented as a shallow parse tree for relation instance. From the parse trees, only the head words (in each entity pair) reflecting the main meaning of the phrases has been considered for measuring a semantic similarity using a web search engine that retrieves the page count of those particular words and their conjunctives. The derived page count is used for measuring the Simpson Coefficient between the pairs and based on this similarity score, an agglomerative hierarchical clustering technique has been applied that makes several clusters of entity pairs of same relationship. The resultant clusters also have been characterised with the most frequent head word present in the group. This proposed method shows a simple similarity measure technique for relation extraction from crime data providing better accuracy than other existing methods.","PeriodicalId":166298,"journal":{"name":"2017 Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Relation recognition among named entities from a crime corpus using a web-based semantic similarity measurement\",\"authors\":\"Priyanka Das, A. Das\",\"doi\":\"10.1109/ICRCICN.2017.8234525\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The present work proposes an unsupervised approach for recognising relations between named entities from a large corpora based on crime in Indian states and union territories. Initially, named entities have been identified from the extracted crime corpus and certain pair of entities have been chosen that facilitates the crime analysis. Then the entity pairs with their intermediate context words have been represented as a shallow parse tree for relation instance. From the parse trees, only the head words (in each entity pair) reflecting the main meaning of the phrases has been considered for measuring a semantic similarity using a web search engine that retrieves the page count of those particular words and their conjunctives. The derived page count is used for measuring the Simpson Coefficient between the pairs and based on this similarity score, an agglomerative hierarchical clustering technique has been applied that makes several clusters of entity pairs of same relationship. The resultant clusters also have been characterised with the most frequent head word present in the group. This proposed method shows a simple similarity measure technique for relation extraction from crime data providing better accuracy than other existing methods.\",\"PeriodicalId\":166298,\"journal\":{\"name\":\"2017 Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN)\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICRCICN.2017.8234525\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRCICN.2017.8234525","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

目前的工作提出了一种无监督的方法，用于识别基于印度各州和联邦领土犯罪的大型语料库中指定实体之间的关系。首先，从提取的犯罪语料库中识别出命名实体，并选择特定的实体对以方便犯罪分析。然后将实体对及其中间上下文词表示为关系实例的浅解析树。从解析树中，只考虑反映短语主要含义的头词(在每个实体对中)，以便使用检索这些特定词及其连词的页面计数的web搜索引擎测量语义相似性。导出的页面计数用于测量对之间的辛普森系数，并基于此相似性得分，应用了一种凝聚分层聚类技术，使具有相同关系的实体对组成多个聚类。由此产生的集群也具有在组中出现的最频繁的头部词的特征。该方法提供了一种简单的相似性度量技术，可用于犯罪数据的关联提取，比现有方法具有更高的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Relation recognition among named entities from a crime corpus using a web-based semantic similarity measurement

The present work proposes an unsupervised approach for recognising relations between named entities from a large corpora based on crime in Indian states and union territories. Initially, named entities have been identified from the extracted crime corpus and certain pair of entities have been chosen that facilitates the crime analysis. Then the entity pairs with their intermediate context words have been represented as a shallow parse tree for relation instance. From the parse trees, only the head words (in each entity pair) reflecting the main meaning of the phrases has been considered for measuring a semantic similarity using a web search engine that retrieves the page count of those particular words and their conjunctives. The derived page count is used for measuring the Simpson Coefficient between the pairs and based on this similarity score, an agglomerative hierarchical clustering technique has been applied that makes several clusters of entity pairs of same relationship. The resultant clusters also have been characterised with the most frequent head word present in the group. This proposed method shows a simple similarity measure technique for relation extraction from crime data providing better accuracy than other existing methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN)

自引率

0.00%

发文量

期刊最新文献

RGB image encryption using hyper chaotic system Characterisation of wireless network traffic: Fractality and stationarity Security risk assessment in online social networking: A detailed survey Optimalized hydel-thermic operative planning using IRECGA Designing an enhanced ZRP algorithm for MANET and simulation using OPNET