{"title":"Relation recognition among named entities from a crime corpus using a web-based semantic similarity measurement","authors":"Priyanka Das, A. Das","doi":"10.1109/ICRCICN.2017.8234525","DOIUrl":null,"url":null,"abstract":"The present work proposes an unsupervised approach for recognising relations between named entities from a large corpora based on crime in Indian states and union territories. Initially, named entities have been identified from the extracted crime corpus and certain pair of entities have been chosen that facilitates the crime analysis. Then the entity pairs with their intermediate context words have been represented as a shallow parse tree for relation instance. From the parse trees, only the head words (in each entity pair) reflecting the main meaning of the phrases has been considered for measuring a semantic similarity using a web search engine that retrieves the page count of those particular words and their conjunctives. The derived page count is used for measuring the Simpson Coefficient between the pairs and based on this similarity score, an agglomerative hierarchical clustering technique has been applied that makes several clusters of entity pairs of same relationship. The resultant clusters also have been characterised with the most frequent head word present in the group. This proposed method shows a simple similarity measure technique for relation extraction from crime data providing better accuracy than other existing methods.","PeriodicalId":166298,"journal":{"name":"2017 Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRCICN.2017.8234525","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The present work proposes an unsupervised approach for recognising relations between named entities from a large corpora based on crime in Indian states and union territories. Initially, named entities have been identified from the extracted crime corpus and certain pair of entities have been chosen that facilitates the crime analysis. Then the entity pairs with their intermediate context words have been represented as a shallow parse tree for relation instance. From the parse trees, only the head words (in each entity pair) reflecting the main meaning of the phrases has been considered for measuring a semantic similarity using a web search engine that retrieves the page count of those particular words and their conjunctives. The derived page count is used for measuring the Simpson Coefficient between the pairs and based on this similarity score, an agglomerative hierarchical clustering technique has been applied that makes several clusters of entity pairs of same relationship. The resultant clusters also have been characterised with the most frequent head word present in the group. This proposed method shows a simple similarity measure technique for relation extraction from crime data providing better accuracy than other existing methods.