地理抓取数据的语境化以支持人类的判断和分类

Luca Mazzola, A. Tsois, T. Dimitrova, E. Camossi
{"title":"地理抓取数据的语境化以支持人类的判断和分类","authors":"Luca Mazzola, A. Tsois, T. Dimitrova, E. Camossi","doi":"10.1109/EISIC.2013.33","DOIUrl":null,"url":null,"abstract":"When dealing with information extraction or data mining for security, one of the prerequisite is the data cleaning process, a process that influence deeply the final result. This is particularly true in case of data scraped automatically from online sources (web pages) that contain geographical or geo-referenced information. In this paper we present a model, and a first partial implementation, for location resolution of string descriptions. The domain is the monitoring and analysis of maritime container traffic, relying on the status messages generated by container carriers. The model is based on the usage of three different data dimensions: string similarity, trajectories similarity and most frequent patterns. The realized interface, through a map-based view, provide an integration of the three dimensions. This functionality supports human experts in associating a location to the string description provided in the raw record, in order to improve the numbers of messages usable for route-based analysis.","PeriodicalId":229195,"journal":{"name":"2013 European Intelligence and Security Informatics Conference","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Contextualisation of Geographical Scraped Data to Support Human Judgment and Classification\",\"authors\":\"Luca Mazzola, A. Tsois, T. Dimitrova, E. Camossi\",\"doi\":\"10.1109/EISIC.2013.33\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When dealing with information extraction or data mining for security, one of the prerequisite is the data cleaning process, a process that influence deeply the final result. This is particularly true in case of data scraped automatically from online sources (web pages) that contain geographical or geo-referenced information. In this paper we present a model, and a first partial implementation, for location resolution of string descriptions. The domain is the monitoring and analysis of maritime container traffic, relying on the status messages generated by container carriers. The model is based on the usage of three different data dimensions: string similarity, trajectories similarity and most frequent patterns. The realized interface, through a map-based view, provide an integration of the three dimensions. This functionality supports human experts in associating a location to the string description provided in the raw record, in order to improve the numbers of messages usable for route-based analysis.\",\"PeriodicalId\":229195,\"journal\":{\"name\":\"2013 European Intelligence and Security Informatics Conference\",\"volume\":\"73 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-08-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 European Intelligence and Security Informatics Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EISIC.2013.33\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 European Intelligence and Security Informatics Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EISIC.2013.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

在进行安全信息提取或数据挖掘时,数据清洗过程是一个重要的前提条件,它对最终结果的影响很大。对于从包含地理或地理参考信息的在线资源(网页)中自动抓取的数据尤其如此。在本文中,我们提出了一个模型和第一部分实现,用于字符串描述的位置解析。该领域是对海上集装箱交通的监控和分析,依赖于集装箱承运人产生的状态信息。该模型基于三个不同数据维度的使用:字符串相似性、轨迹相似性和最常见的模式。所实现的界面,通过基于地图的视图,提供了三个维度的集成。此功能支持人类专家将位置与原始记录中提供的字符串描述相关联,以便增加可用于基于路由的分析的消息数量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Contextualisation of Geographical Scraped Data to Support Human Judgment and Classification
When dealing with information extraction or data mining for security, one of the prerequisite is the data cleaning process, a process that influence deeply the final result. This is particularly true in case of data scraped automatically from online sources (web pages) that contain geographical or geo-referenced information. In this paper we present a model, and a first partial implementation, for location resolution of string descriptions. The domain is the monitoring and analysis of maritime container traffic, relying on the status messages generated by container carriers. The model is based on the usage of three different data dimensions: string similarity, trajectories similarity and most frequent patterns. The realized interface, through a map-based view, provide an integration of the three dimensions. This functionality supports human experts in associating a location to the string description provided in the raw record, in order to improve the numbers of messages usable for route-based analysis.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Tool for Visualizing and Analyzing Users on Discussion Boards Cross Domain Assessment of Document to HTML Conversion Tools to Quantify Text and Structural Loss during Document Analysis The CriLiM Methodology: Crime Linkage with a Fuzzy MCDM Approach Radiated Emission from Handheld Devices with Touch-Screen LCDs A Pilot Study of Using Honeypots as Cyber Intelligence Sources
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1