{"title":"Research and implementation of efficient retrieval algorithm in big data environment","authors":"pan gao, Shuhua shao","doi":"10.1117/12.3014436","DOIUrl":null,"url":null,"abstract":"Under the background of digital information age, faced with the increasing data scale and complexity, the application limitations of traditional centralized retrieval services are becoming more and more obvious, and it is urgent to improve the data structure expansion, incremental update control and retrieval operation efficiency. In this paper, the efficient retrieval algorithm and technology of massive data information are taken as the research object, and a set of construction scheme of big data storage and retrieval system is proposed for unstructured data, which promotes the organic combination of distributed technology and full-text retrieval technology and realizes the optimization of fast retrieval processing mode of large-scale data. The system is based on Hadoop framework, with Hbase as the data storage module, and combined with ElasticSearch engine, IKAnalyzer word breaker and Redis cache to complete real-time and efficient data retrieval. Finally, based on Java web technology, a network application program convenient for users to operate online is formed. Practice has proved that the system has solved many problems in the process of collecting, storing and retrieving massive unstructured text data. At the same time, it improves the sharing transmission efficiency and concurrent access control ability of data information, and opens up a brand-new big data retrieval service model.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"24 4","pages":"129690H - 129690H-4"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3014436","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Under the background of digital information age, faced with the increasing data scale and complexity, the application limitations of traditional centralized retrieval services are becoming more and more obvious, and it is urgent to improve the data structure expansion, incremental update control and retrieval operation efficiency. In this paper, the efficient retrieval algorithm and technology of massive data information are taken as the research object, and a set of construction scheme of big data storage and retrieval system is proposed for unstructured data, which promotes the organic combination of distributed technology and full-text retrieval technology and realizes the optimization of fast retrieval processing mode of large-scale data. The system is based on Hadoop framework, with Hbase as the data storage module, and combined with ElasticSearch engine, IKAnalyzer word breaker and Redis cache to complete real-time and efficient data retrieval. Finally, based on Java web technology, a network application program convenient for users to operate online is formed. Practice has proved that the system has solved many problems in the process of collecting, storing and retrieving massive unstructured text data. At the same time, it improves the sharing transmission efficiency and concurrent access control ability of data information, and opens up a brand-new big data retrieval service model.
在数字信息时代背景下,面对日益增长的数据规模和复杂性,传统集中式检索服务的应用局限性日益明显,亟需提高数据结构扩展、增量更新控制和检索操作效率。本文以海量数据信息的高效检索算法与技术为研究对象,针对非结构化数据提出了一套大数据存储与检索系统的构建方案,促进了分布式技术与全文检索技术的有机结合,实现了大规模数据快速检索处理模式的优化。该系统基于Hadoop框架,以Hbase为数据存储模块,结合ElasticSearch引擎、IKAnalyzer断字器和Redis缓存,完成实时高效的数据检索。最后,基于 Java Web 技术,形成了方便用户在线操作的网络应用程序。实践证明,该系统解决了海量非结构化文本数据采集、存储和检索过程中的诸多问题。同时,提高了数据信息的共享传输效率和并发访问控制能力,开辟了一种全新的大数据检索服务模式。