A system for efficient cleaning and transformation of geospatial data attributes

Yao-Yi Chiang, Bo Wu, Akshay Anand, Ketan Akade, Craig A. Knoblock
{"title":"A system for efficient cleaning and transformation of geospatial data attributes","authors":"Yao-Yi Chiang, Bo Wu, Akshay Anand, Ketan Akade, Craig A. Knoblock","doi":"10.1145/2666310.2666373","DOIUrl":null,"url":null,"abstract":"A significant challenge in handling geographic datasets is that the datasets can come from heterogeneous sources with various data qualities and formats. Before these datasets can be used in a Geographic Information System (GIS) for spatial analysis or to create maps, a typical task is to clean the attribute data and transform the data into a uniform format. However, conventional GIS products focus on manipulating the spatial component of geographic features and only offer basic tools for editing the attribute data (e.g., one row at a time). This limits the capability for handling large datasets in a GIS since manually editing and transforming attribute data between different formats is not practical for thousands of geographic features. In this demo, we present ArcKarma, which is built on our previous work on data transformation, to efficiently clean and transform data attributes in a GIS. ArcKarma generates transformation programs from a few user-provided examples and applies these programs to transform individual attribute columns into the desired formats. We show that ArcKarma produces accurate results and eliminates the need for laborious manual data cleaning and scripting tasks.","PeriodicalId":153031,"journal":{"name":"Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","volume":"121 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2666310.2666373","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

A significant challenge in handling geographic datasets is that the datasets can come from heterogeneous sources with various data qualities and formats. Before these datasets can be used in a Geographic Information System (GIS) for spatial analysis or to create maps, a typical task is to clean the attribute data and transform the data into a uniform format. However, conventional GIS products focus on manipulating the spatial component of geographic features and only offer basic tools for editing the attribute data (e.g., one row at a time). This limits the capability for handling large datasets in a GIS since manually editing and transforming attribute data between different formats is not practical for thousands of geographic features. In this demo, we present ArcKarma, which is built on our previous work on data transformation, to efficiently clean and transform data attributes in a GIS. ArcKarma generates transformation programs from a few user-provided examples and applies these programs to transform individual attribute columns into the desired formats. We show that ArcKarma produces accurate results and eliminates the need for laborious manual data cleaning and scripting tasks.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
地理空间数据属性的高效清理和转换系统
处理地理数据集的一个重大挑战是,这些数据集可能来自具有各种数据质量和格式的异构源。在将这些数据集用于地理信息系统(GIS)进行空间分析或创建地图之前,一个典型的任务是清理属性数据并将数据转换为统一的格式。然而,传统的GIS产品侧重于操作地理特征的空间组成部分,只提供编辑属性数据的基本工具(例如,一次一行)。这限制了在GIS中处理大型数据集的能力,因为在不同格式之间手动编辑和转换属性数据对于数千个地理特征是不切实际的。在这个演示中,我们展示了ArcKarma,它是建立在我们之前的数据转换工作之上的,可以有效地清理和转换GIS中的数据属性。ArcKarma从一些用户提供的示例中生成转换程序,并应用这些程序将单个属性列转换为所需的格式。我们展示了ArcKarma产生准确的结果,并消除了费力的手动数据清理和脚本编写任务的需要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A parallel query engine for interactive spatiotemporal analysis Spatio-temporal trajectory simplification for inferring travel paths Parameterized spatial query processing based on social probabilistic clustering Accurate and efficient map matching for challenging environments Top-k point of interest retrieval using standard indexes
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1