Fan Zhang, Meijuan Yin, Fenlin Liu, Xiangyang Luo, Shuodi Zu
{"title":"IP2vec: an IP node representation model for IP geolocation","authors":"Fan Zhang, Meijuan Yin, Fenlin Liu, Xiangyang Luo, Shuodi Zu","doi":"10.1007/s11704-023-2616-9","DOIUrl":null,"url":null,"abstract":"<p>IP geolocation is essential for the territorial analysis of sensitive network entities, location-based services (LBS) and network fraud detection. It has important theoretical significance and application value. Measurement-based IP geolocation is a hot research topic. However, the existing IP geolocation algorithms cannot effectively utilize the distance characteristics of the delay, and the nodes’ connection relation, resulting in high geolocation error. It is challenging to obtain the mapping between delay, nodes’ connection relation, and geographical location. Based on the idea of network representation learning, we propose a representation learning model for IP nodes (IP2vec for short) and apply it to street-level IP geolocation. IP2vec model vectorizes nodes according to the connection relation and delay between nodes so that the IP vectors can reflect the distance and topological proximity between IP nodes. The steps of the street-level IP geolocation algorithm based on IP2vec model are as follows: Firstly, we measure landmarks and target IP to obtain delay and path information to construct the network topology. Secondly, we use the IP2vec model to obtain the IP vectors from the network topology. Thirdly, we train a neural network to fit the mapping relation between vectors and locations of landmarks. Finally, the vector of target IP is fed into the neural network to obtain the geographical location of target IP. The algorithm can accurately infer geographical locations of target IPs based on delay and topological proximity embedded in the IP vectors. The cross-validation experimental results on 10023 target IPs in New York, Beijing, Hong Kong, and Zhengzhou demonstrate that the proposed algorithm can achieve street-level geolocation. Compared with the existing algorithms such as Hop-Hot, IP-geolocater and SLG, the mean geolocation error of the proposed algorithm is reduced by 33%, 39%, and 51%, respectively.</p>","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":"31 1","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2023-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers of Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11704-023-2616-9","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
IP geolocation is essential for the territorial analysis of sensitive network entities, location-based services (LBS) and network fraud detection. It has important theoretical significance and application value. Measurement-based IP geolocation is a hot research topic. However, the existing IP geolocation algorithms cannot effectively utilize the distance characteristics of the delay, and the nodes’ connection relation, resulting in high geolocation error. It is challenging to obtain the mapping between delay, nodes’ connection relation, and geographical location. Based on the idea of network representation learning, we propose a representation learning model for IP nodes (IP2vec for short) and apply it to street-level IP geolocation. IP2vec model vectorizes nodes according to the connection relation and delay between nodes so that the IP vectors can reflect the distance and topological proximity between IP nodes. The steps of the street-level IP geolocation algorithm based on IP2vec model are as follows: Firstly, we measure landmarks and target IP to obtain delay and path information to construct the network topology. Secondly, we use the IP2vec model to obtain the IP vectors from the network topology. Thirdly, we train a neural network to fit the mapping relation between vectors and locations of landmarks. Finally, the vector of target IP is fed into the neural network to obtain the geographical location of target IP. The algorithm can accurately infer geographical locations of target IPs based on delay and topological proximity embedded in the IP vectors. The cross-validation experimental results on 10023 target IPs in New York, Beijing, Hong Kong, and Zhengzhou demonstrate that the proposed algorithm can achieve street-level geolocation. Compared with the existing algorithms such as Hop-Hot, IP-geolocater and SLG, the mean geolocation error of the proposed algorithm is reduced by 33%, 39%, and 51%, respectively.
IP 地理定位对于敏感网络实体的地域分析、基于位置的服务(LBS)和网络欺诈检测至关重要。它具有重要的理论意义和应用价值。基于测量的 IP 地理定位是一个热门研究课题。然而,现有的 IP 地理定位算法不能有效利用延迟的距离特性和节点的连接关系,导致地理定位误差较大。如何获取延迟、节点连接关系和地理位置之间的映射关系是一项挑战。基于网络表示学习的思想,我们提出了一种 IP 节点表示学习模型(简称 IP2vec),并将其应用于街道级 IP 地理定位。IP2vec 模型根据节点之间的连接关系和延迟对节点进行矢量化,从而使 IP 矢量能够反映 IP 节点之间的距离和拓扑接近程度。基于 IP2vec 模型的街道级 IP 地理定位算法步骤如下:首先,测量地标和目标 IP,获取延迟和路径信息,构建网络拓扑。其次,利用 IP2vec 模型从网络拓扑结构中获取 IP 向量。第三,我们训练神经网络来拟合向量与地标位置之间的映射关系。最后,将目标 IP 的向量输入神经网络,以获得目标 IP 的地理位置。该算法可以根据 IP 向量中蕴含的延迟和拓扑邻近性准确推断出目标 IP 的地理位置。对纽约、北京、香港和郑州的 10023 个目标 IP 的交叉验证实验结果表明,所提出的算法可以实现街道级地理定位。与 Hop-Hot、IP-geolocater 和 SLG 等现有算法相比,所提算法的平均地理定位误差分别减少了 33%、39% 和 51%。
期刊介绍:
Frontiers of Computer Science aims to provide a forum for the publication of peer-reviewed papers to promote rapid communication and exchange between computer scientists. The journal publishes research papers and review articles in a wide range of topics, including: architecture, software, artificial intelligence, theoretical computer science, networks and communication, information systems, multimedia and graphics, information security, interdisciplinary, etc. The journal especially encourages papers from new emerging and multidisciplinary areas, as well as papers reflecting the international trends of research and development and on special topics reporting progress made by Chinese computer scientists.