A Crawling Method with No Parameters for Geo-social Data based on Road Maps

Sou Ijima, Masaharu Hirota, Shohei Yokoyama
{"title":"A Crawling Method with No Parameters for Geo-social Data based on Road Maps","authors":"Sou Ijima, Masaharu Hirota, Shohei Yokoyama","doi":"10.1145/3366030.3366094","DOIUrl":null,"url":null,"abstract":"Researchers must crawl geo-social data to analyze and visualize geo-social data. A conventional method to exhaustively crawl geosocial data is based on a grid. The crawler divides a specified area into a grid and uses the center coordinates of each cell to query databases using APIs. However, there is a difficult problem when using the grid-based method. It is that researchers cannot estimate the optimized grid size to exhaustively crawl geo-social data in advance because the optimized grid size depends on data density owing to geographical characteristics of an area. We focus on the fact that geo-social data are dense along roads. Thus, we propose a method based on road maps to exhaustively crawl geo-social data. We demonstrated that our method can crawl geo-social data by using almost the same number of queries compared to the crawler with an optimized grid size.","PeriodicalId":446280,"journal":{"name":"Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3366030.3366094","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Researchers must crawl geo-social data to analyze and visualize geo-social data. A conventional method to exhaustively crawl geosocial data is based on a grid. The crawler divides a specified area into a grid and uses the center coordinates of each cell to query databases using APIs. However, there is a difficult problem when using the grid-based method. It is that researchers cannot estimate the optimized grid size to exhaustively crawl geo-social data in advance because the optimized grid size depends on data density owing to geographical characteristics of an area. We focus on the fact that geo-social data are dense along roads. Thus, we propose a method based on road maps to exhaustively crawl geo-social data. We demonstrated that our method can crawl geo-social data by using almost the same number of queries compared to the crawler with an optimized grid size.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种基于道路地图的地理社会数据无参数抓取方法
研究人员必须抓取地理社会数据来分析和可视化地理社会数据。对地理社会数据进行详尽抓取的传统方法是基于网格的。爬虫将指定区域划分为网格,并使用每个网格的中心坐标使用api查询数据库。然而,在使用基于网格的方法时存在一个难题。由于一个地区的地理特征,优化的网格大小取决于数据密度,因此研究人员无法预先估计出最优的网格大小来详尽地抓取地理社会数据。我们关注的是地理社会数据在道路沿线密集的事实。因此,我们提出了一种基于路线图的方法来详尽地抓取地理社会数据。我们证明,与使用优化网格大小的爬虫相比,我们的方法可以使用几乎相同数量的查询来爬行地理社交数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Crawling Method with No Parameters for Geo-social Data based on Road Maps PLDSD Fake News Classification Based on Subjective Language Computing Ranges for Temporal Parameters of Composed Web Services Microbiological Water Quality Test Results Extraction from Mobile Photographs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1