Jingyong Wan, Beizhan Wang, W. Guo, Kang Chen, Jiajun Wang
{"title":"一种基于重新排序算法模型的分布式搜索引擎","authors":"Jingyong Wan, Beizhan Wang, W. Guo, Kang Chen, Jiajun Wang","doi":"10.1109/ICCSE.2015.7250325","DOIUrl":null,"url":null,"abstract":"With the rapid increase of websites and the explosive growth of Internet information, the centralized search engine will face great challenge in mass data processing and mass data storage. However, the distributed search engine can solve the problem effectively. In this paper, we describe the design and implementation of a distributed search engine that is based on Apache Nutch, Solr and Hadoop. Considering users click logs, we propose a re-ranking algorithm based on Lucene scoring. Our experimental results show that our approaches significantly satisfy users' massive data searching demand while maintaining high reliability and scalability.","PeriodicalId":311451,"journal":{"name":"2015 10th International Conference on Computer Science & Education (ICCSE)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A distributed search engine based on a re-ranking algorithm model\",\"authors\":\"Jingyong Wan, Beizhan Wang, W. Guo, Kang Chen, Jiajun Wang\",\"doi\":\"10.1109/ICCSE.2015.7250325\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the rapid increase of websites and the explosive growth of Internet information, the centralized search engine will face great challenge in mass data processing and mass data storage. However, the distributed search engine can solve the problem effectively. In this paper, we describe the design and implementation of a distributed search engine that is based on Apache Nutch, Solr and Hadoop. Considering users click logs, we propose a re-ranking algorithm based on Lucene scoring. Our experimental results show that our approaches significantly satisfy users' massive data searching demand while maintaining high reliability and scalability.\",\"PeriodicalId\":311451,\"journal\":{\"name\":\"2015 10th International Conference on Computer Science & Education (ICCSE)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 10th International Conference on Computer Science & Education (ICCSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCSE.2015.7250325\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 10th International Conference on Computer Science & Education (ICCSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSE.2015.7250325","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A distributed search engine based on a re-ranking algorithm model
With the rapid increase of websites and the explosive growth of Internet information, the centralized search engine will face great challenge in mass data processing and mass data storage. However, the distributed search engine can solve the problem effectively. In this paper, we describe the design and implementation of a distributed search engine that is based on Apache Nutch, Solr and Hadoop. Considering users click logs, we propose a re-ranking algorithm based on Lucene scoring. Our experimental results show that our approaches significantly satisfy users' massive data searching demand while maintaining high reliability and scalability.