大数据环境下高效检索算法的研究与实现

pan gao, Shuhua shao
{"title":"大数据环境下高效检索算法的研究与实现","authors":"pan gao, Shuhua shao","doi":"10.1117/12.3014436","DOIUrl":null,"url":null,"abstract":"Under the background of digital information age, faced with the increasing data scale and complexity, the application limitations of traditional centralized retrieval services are becoming more and more obvious, and it is urgent to improve the data structure expansion, incremental update control and retrieval operation efficiency. In this paper, the efficient retrieval algorithm and technology of massive data information are taken as the research object, and a set of construction scheme of big data storage and retrieval system is proposed for unstructured data, which promotes the organic combination of distributed technology and full-text retrieval technology and realizes the optimization of fast retrieval processing mode of large-scale data. The system is based on Hadoop framework, with Hbase as the data storage module, and combined with ElasticSearch engine, IKAnalyzer word breaker and Redis cache to complete real-time and efficient data retrieval. Finally, based on Java web technology, a network application program convenient for users to operate online is formed. Practice has proved that the system has solved many problems in the process of collecting, storing and retrieving massive unstructured text data. At the same time, it improves the sharing transmission efficiency and concurrent access control ability of data information, and opens up a brand-new big data retrieval service model.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"24 4","pages":"129690H - 129690H-4"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research and implementation of efficient retrieval algorithm in big data environment\",\"authors\":\"pan gao, Shuhua shao\",\"doi\":\"10.1117/12.3014436\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Under the background of digital information age, faced with the increasing data scale and complexity, the application limitations of traditional centralized retrieval services are becoming more and more obvious, and it is urgent to improve the data structure expansion, incremental update control and retrieval operation efficiency. In this paper, the efficient retrieval algorithm and technology of massive data information are taken as the research object, and a set of construction scheme of big data storage and retrieval system is proposed for unstructured data, which promotes the organic combination of distributed technology and full-text retrieval technology and realizes the optimization of fast retrieval processing mode of large-scale data. The system is based on Hadoop framework, with Hbase as the data storage module, and combined with ElasticSearch engine, IKAnalyzer word breaker and Redis cache to complete real-time and efficient data retrieval. Finally, based on Java web technology, a network application program convenient for users to operate online is formed. Practice has proved that the system has solved many problems in the process of collecting, storing and retrieving massive unstructured text data. At the same time, it improves the sharing transmission efficiency and concurrent access control ability of data information, and opens up a brand-new big data retrieval service model.\",\"PeriodicalId\":516634,\"journal\":{\"name\":\"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)\",\"volume\":\"24 4\",\"pages\":\"129690H - 129690H-4\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.3014436\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3014436","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在数字信息时代背景下,面对日益增长的数据规模和复杂性,传统集中式检索服务的应用局限性日益明显,亟需提高数据结构扩展、增量更新控制和检索操作效率。本文以海量数据信息的高效检索算法与技术为研究对象,针对非结构化数据提出了一套大数据存储与检索系统的构建方案,促进了分布式技术与全文检索技术的有机结合,实现了大规模数据快速检索处理模式的优化。该系统基于Hadoop框架,以Hbase为数据存储模块,结合ElasticSearch引擎、IKAnalyzer断字器和Redis缓存,完成实时高效的数据检索。最后,基于 Java Web 技术,形成了方便用户在线操作的网络应用程序。实践证明,该系统解决了海量非结构化文本数据采集、存储和检索过程中的诸多问题。同时,提高了数据信息的共享传输效率和并发访问控制能力,开辟了一种全新的大数据检索服务模式。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Research and implementation of efficient retrieval algorithm in big data environment
Under the background of digital information age, faced with the increasing data scale and complexity, the application limitations of traditional centralized retrieval services are becoming more and more obvious, and it is urgent to improve the data structure expansion, incremental update control and retrieval operation efficiency. In this paper, the efficient retrieval algorithm and technology of massive data information are taken as the research object, and a set of construction scheme of big data storage and retrieval system is proposed for unstructured data, which promotes the organic combination of distributed technology and full-text retrieval technology and realizes the optimization of fast retrieval processing mode of large-scale data. The system is based on Hadoop framework, with Hbase as the data storage module, and combined with ElasticSearch engine, IKAnalyzer word breaker and Redis cache to complete real-time and efficient data retrieval. Finally, based on Java web technology, a network application program convenient for users to operate online is formed. Practice has proved that the system has solved many problems in the process of collecting, storing and retrieving massive unstructured text data. At the same time, it improves the sharing transmission efficiency and concurrent access control ability of data information, and opens up a brand-new big data retrieval service model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The ship classification and detection method of optical remote sensing image based on improved YOLOv7-tiny Collaborative filtering recommendation method based on graph convolutional neural networks Research on the simplification of building complex model under multi-factor constraints Improved ant colony algorithm based on artificial gravity field for adaptive dynamic path planning Application analysis of three-dimensional laser scanning technology in the protection of dong drum tower in Sanjiang county
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1