T. Silveira, F. Soares, Wladmir Cardoso Brandão, H. Freitas
{"title":"倒排索引生成的异构并行架构","authors":"T. Silveira, F. Soares, Wladmir Cardoso Brandão, H. Freitas","doi":"10.5753/wscad.2019.8664","DOIUrl":null,"url":null,"abstract":"The amount of data generated on the Web has increased dramatically, as well as the need for computational power to prepare this information. In particular, indexers process these data to extract terms and their occurrences, storing them in an inverted file, a compact data structure that provides quick search. However, this task involves processing of a large amount of data, requiring high computational power. In this article, we present a heterogeneous parallel architecture that uses CPU and GPU in a cluster to accelerate inverted index generation. Experimental results show that the proposed architecture provides faster execution times, up to 60 times in classification and 23 times in the compression of 1 million elements.","PeriodicalId":117711,"journal":{"name":"Anais do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Heterogeneous Parallel Architecture for Inverted Index Generation\",\"authors\":\"T. Silveira, F. Soares, Wladmir Cardoso Brandão, H. Freitas\",\"doi\":\"10.5753/wscad.2019.8664\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The amount of data generated on the Web has increased dramatically, as well as the need for computational power to prepare this information. In particular, indexers process these data to extract terms and their occurrences, storing them in an inverted file, a compact data structure that provides quick search. However, this task involves processing of a large amount of data, requiring high computational power. In this article, we present a heterogeneous parallel architecture that uses CPU and GPU in a cluster to accelerate inverted index generation. Experimental results show that the proposed architecture provides faster execution times, up to 60 times in classification and 23 times in the compression of 1 million elements.\",\"PeriodicalId\":117711,\"journal\":{\"name\":\"Anais do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Anais do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5753/wscad.2019.8664\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais do Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/wscad.2019.8664","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Heterogeneous Parallel Architecture for Inverted Index Generation
The amount of data generated on the Web has increased dramatically, as well as the need for computational power to prepare this information. In particular, indexers process these data to extract terms and their occurrences, storing them in an inverted file, a compact data structure that provides quick search. However, this task involves processing of a large amount of data, requiring high computational power. In this article, we present a heterogeneous parallel architecture that uses CPU and GPU in a cluster to accelerate inverted index generation. Experimental results show that the proposed architecture provides faster execution times, up to 60 times in classification and 23 times in the compression of 1 million elements.