A data preprocessing framework of geoscience data sharing portal for user behavior mining

Mo Wang, Juanle Wang
{"title":"A data preprocessing framework of geoscience data sharing portal for user behavior mining","authors":"Mo Wang, Juanle Wang","doi":"10.1109/GEOINFORMATICS.2015.7378637","DOIUrl":null,"url":null,"abstract":"Science data sharing has many advantages for both scientific research and education. Knowing about behaviors of science data sharing participants is valuable to support informed decision making on data sharing policy and data sharing website design. Nowadays, data sharing is mainly carried through the Internet, and web usage mining provides an ideal approach to uncover user behaviors of data sharing. This paper presents a data preprocessing framework for further user behavior mining of a geoscience data sharing portal (geodata.cn). The preprocessing steps included data cleaning, user identification, session identification, and data modeling. Web server logs served as the major data source of this study. Heuristic algorithms were employed to accomplish data cleaning and user identification. Different session identification methods were applied for comparison. Users' geolocation were identified using an online Geo-IP lookup tool, which provides geographical coordinates of an IP address. On the basis of all the preprocessing procedures, a web usage data model of science data sharing portal were proposed for further user behavior mining, such as user classification and spatial association rules mining.","PeriodicalId":371399,"journal":{"name":"2015 23rd International Conference on Geoinformatics","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 23rd International Conference on Geoinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GEOINFORMATICS.2015.7378637","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Science data sharing has many advantages for both scientific research and education. Knowing about behaviors of science data sharing participants is valuable to support informed decision making on data sharing policy and data sharing website design. Nowadays, data sharing is mainly carried through the Internet, and web usage mining provides an ideal approach to uncover user behaviors of data sharing. This paper presents a data preprocessing framework for further user behavior mining of a geoscience data sharing portal (geodata.cn). The preprocessing steps included data cleaning, user identification, session identification, and data modeling. Web server logs served as the major data source of this study. Heuristic algorithms were employed to accomplish data cleaning and user identification. Different session identification methods were applied for comparison. Users' geolocation were identified using an online Geo-IP lookup tool, which provides geographical coordinates of an IP address. On the basis of all the preprocessing procedures, a web usage data model of science data sharing portal were proposed for further user behavior mining, such as user classification and spatial association rules mining.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
面向用户行为挖掘的地学数据共享门户数据预处理框架
科学数据共享对科学研究和教育都有很多好处。了解科学数据共享参与者的行为对支持数据共享政策和数据共享网站设计的知情决策具有重要意义。目前,数据共享主要通过互联网进行,网络使用挖掘为揭示用户数据共享行为提供了一种理想的方法。为进一步挖掘地球科学数据共享门户网站(geodata.cn)的用户行为,提出了一种数据预处理框架。预处理步骤包括数据清理、用户标识、会话标识和数据建模。Web服务器日志是本研究的主要数据源。采用启发式算法完成数据清理和用户识别。采用不同的会话识别方法进行比较。用户的地理位置是通过在线地理IP查找工具确定的,该工具提供IP地址的地理坐标。在所有预处理步骤的基础上,提出了科学数据共享门户网站的web使用数据模型,用于用户分类和空间关联规则挖掘等用户行为挖掘。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A conceptual framework for the design of geo-dynamics visualization Research and application of Jinggangshan geological disaster prevention system based on wireless sensor network system Random forests methodology to analyze landslide susceptibility: An example in Lushan earthquake Identification of the Yancheng region water quality using GIS and fuzzy synthetic evaluation approach The progress in the research of flood damage loss assessment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1