{"title":"每秒解析数百万个 URL","authors":"Yagiz Nizipli, Daniel Lemire","doi":"10.1002/spe.3296","DOIUrl":null,"url":null,"abstract":"URLs are fundamental elements of web applications. By applying vector algorithms, we built a fast standard-compliant C++ implementation. Our parser uses three times fewer instructions than competing parsers following the WHATWG standard (e.g., Servo's rust-url) and up to eight times fewer instructions than the popular curl parser. The Node.js environment adopted our C++ library. In our tests on realistic data, a recent Node.js version (20.0) with our parser is four to five times faster than the last version with the legacy URL parser.","PeriodicalId":21899,"journal":{"name":"Software: Practice and Experience","volume":"49 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Parsing millions of URLs per second\",\"authors\":\"Yagiz Nizipli, Daniel Lemire\",\"doi\":\"10.1002/spe.3296\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"URLs are fundamental elements of web applications. By applying vector algorithms, we built a fast standard-compliant C++ implementation. Our parser uses three times fewer instructions than competing parsers following the WHATWG standard (e.g., Servo's rust-url) and up to eight times fewer instructions than the popular curl parser. The Node.js environment adopted our C++ library. In our tests on realistic data, a recent Node.js version (20.0) with our parser is four to five times faster than the last version with the legacy URL parser.\",\"PeriodicalId\":21899,\"journal\":{\"name\":\"Software: Practice and Experience\",\"volume\":\"49 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Software: Practice and Experience\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/spe.3296\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Software: Practice and Experience","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/spe.3296","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
URL 是网络应用程序的基本要素。通过应用向量算法,我们构建了一个符合标准的快速 C++ 实现。我们的解析器比遵循 WHATWG 标准的同类解析器(如 Servo 的 rust-url)少用了三倍的指令,比流行的 curl 解析器少用了多达八倍的指令。Node.js 环境采用了我们的 C++ 库。在我们对现实数据的测试中,使用我们的解析器的最新 Node.js 版本(20.0)比使用传统 URL 解析器的上一版本快四到五倍。
URLs are fundamental elements of web applications. By applying vector algorithms, we built a fast standard-compliant C++ implementation. Our parser uses three times fewer instructions than competing parsers following the WHATWG standard (e.g., Servo's rust-url) and up to eight times fewer instructions than the popular curl parser. The Node.js environment adopted our C++ library. In our tests on realistic data, a recent Node.js version (20.0) with our parser is four to five times faster than the last version with the legacy URL parser.