{"title":"地理抓取数据的语境化以支持人类的判断和分类","authors":"Luca Mazzola, A. Tsois, T. Dimitrova, E. Camossi","doi":"10.1109/EISIC.2013.33","DOIUrl":null,"url":null,"abstract":"When dealing with information extraction or data mining for security, one of the prerequisite is the data cleaning process, a process that influence deeply the final result. This is particularly true in case of data scraped automatically from online sources (web pages) that contain geographical or geo-referenced information. In this paper we present a model, and a first partial implementation, for location resolution of string descriptions. The domain is the monitoring and analysis of maritime container traffic, relying on the status messages generated by container carriers. The model is based on the usage of three different data dimensions: string similarity, trajectories similarity and most frequent patterns. The realized interface, through a map-based view, provide an integration of the three dimensions. This functionality supports human experts in associating a location to the string description provided in the raw record, in order to improve the numbers of messages usable for route-based analysis.","PeriodicalId":229195,"journal":{"name":"2013 European Intelligence and Security Informatics Conference","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Contextualisation of Geographical Scraped Data to Support Human Judgment and Classification\",\"authors\":\"Luca Mazzola, A. Tsois, T. Dimitrova, E. Camossi\",\"doi\":\"10.1109/EISIC.2013.33\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When dealing with information extraction or data mining for security, one of the prerequisite is the data cleaning process, a process that influence deeply the final result. This is particularly true in case of data scraped automatically from online sources (web pages) that contain geographical or geo-referenced information. In this paper we present a model, and a first partial implementation, for location resolution of string descriptions. The domain is the monitoring and analysis of maritime container traffic, relying on the status messages generated by container carriers. The model is based on the usage of three different data dimensions: string similarity, trajectories similarity and most frequent patterns. The realized interface, through a map-based view, provide an integration of the three dimensions. This functionality supports human experts in associating a location to the string description provided in the raw record, in order to improve the numbers of messages usable for route-based analysis.\",\"PeriodicalId\":229195,\"journal\":{\"name\":\"2013 European Intelligence and Security Informatics Conference\",\"volume\":\"73 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-08-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 European Intelligence and Security Informatics Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EISIC.2013.33\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 European Intelligence and Security Informatics Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EISIC.2013.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Contextualisation of Geographical Scraped Data to Support Human Judgment and Classification
When dealing with information extraction or data mining for security, one of the prerequisite is the data cleaning process, a process that influence deeply the final result. This is particularly true in case of data scraped automatically from online sources (web pages) that contain geographical or geo-referenced information. In this paper we present a model, and a first partial implementation, for location resolution of string descriptions. The domain is the monitoring and analysis of maritime container traffic, relying on the status messages generated by container carriers. The model is based on the usage of three different data dimensions: string similarity, trajectories similarity and most frequent patterns. The realized interface, through a map-based view, provide an integration of the three dimensions. This functionality supports human experts in associating a location to the string description provided in the raw record, in order to improve the numbers of messages usable for route-based analysis.