Data farm: Information system for collecting, storing and processing unstructured data from heterogeneous sources

Sergey Pavlovich Levashkin, Konstantin Nikolaevich Ivanov, Sergey Vladimirovich Kushukov
{"title":"Data farm: Information system for collecting, storing and processing unstructured data from heterogeneous sources","authors":"Sergey Pavlovich Levashkin, Konstantin Nikolaevich Ivanov, Sergey Vladimirovich Kushukov","doi":"10.15514/ispras-2023-35(2)-5","DOIUrl":null,"url":null,"abstract":"The original information system «data farm» is presented. Today, the successful application of artificial intelligence algorithms, primarily deep learning based on artificial neural networks, almost completely depends on the availability of data. And the larger the amount of these data (big data), the better are the results of the algorithms execution. There are well-known examples of such algorithms from Facebook, Google, Microsoft, Yandex, etc. The data must contain both the training sample and the test one. Moreover, the data must be of good quality and have a certain structure, ideally, be labeled in order for the learning algorithms to work adequately. This is a serious problem requiring huge computational and human resources. This paper is dedicated to solve this problem. Today data farm is a rather complex information system built on a modular basis, similar to the well-known Lego constructor. Separate modules of the system are various modern algorithms, technologies and entire libraries of artificial intelligence, and all together they are designed to automate the process of obtaining and structuring high-quality big data in various subject domains. The system has been tested on data of COVID-19 in regions of Russia and countries around the world. In addition, a user-friendly interface for visualizing collected and processed on the farm data was developed. This makes it possible to conduct visual numerical experiments of computer simulation and compare them with real data, turning the farm into an intelligent decision support information system.","PeriodicalId":33459,"journal":{"name":"Trudy Instituta sistemnogo programmirovaniia RAN","volume":"77 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Trudy Instituta sistemnogo programmirovaniia RAN","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15514/ispras-2023-35(2)-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The original information system «data farm» is presented. Today, the successful application of artificial intelligence algorithms, primarily deep learning based on artificial neural networks, almost completely depends on the availability of data. And the larger the amount of these data (big data), the better are the results of the algorithms execution. There are well-known examples of such algorithms from Facebook, Google, Microsoft, Yandex, etc. The data must contain both the training sample and the test one. Moreover, the data must be of good quality and have a certain structure, ideally, be labeled in order for the learning algorithms to work adequately. This is a serious problem requiring huge computational and human resources. This paper is dedicated to solve this problem. Today data farm is a rather complex information system built on a modular basis, similar to the well-known Lego constructor. Separate modules of the system are various modern algorithms, technologies and entire libraries of artificial intelligence, and all together they are designed to automate the process of obtaining and structuring high-quality big data in various subject domains. The system has been tested on data of COVID-19 in regions of Russia and countries around the world. In addition, a user-friendly interface for visualizing collected and processed on the farm data was developed. This makes it possible to conduct visual numerical experiments of computer simulation and compare them with real data, turning the farm into an intelligent decision support information system.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
数据场:用于收集、存储和处理来自异构源的非结构化数据的信息系统
介绍了原始的信息系统“数据农场”。今天,人工智能算法的成功应用,主要是基于人工神经网络的深度学习,几乎完全取决于数据的可用性。这些数据(大数据)的数量越大,算法执行的结果就越好。Facebook、谷歌、微软、Yandex等公司都有这种算法的著名例子。数据必须同时包含训练样本和测试样本。此外,数据必须质量好,具有一定的结构,理想情况下,为了使学习算法充分工作,数据必须被标记。这是一个严重的问题,需要大量的计算和人力资源。本文致力于解决这一问题。今天,数据农场是一个相当复杂的信息系统,建立在模块化的基础上,类似于著名的乐高构造器。系统的独立模块是各种现代算法、技术和整个人工智能库,它们共同设计用于自动化获取和构建各个学科领域的高质量大数据的过程。该系统已在俄罗斯地区和世界各国的新冠肺炎数据上进行了测试。此外,还开发了一个用户友好的界面,用于可视化收集和处理的农场数据。这使得可以进行计算机模拟的可视化数值实验,并与实际数据进行比较,将农场变成一个智能决策支持信息系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
18
审稿时长
4 weeks
期刊最新文献
Development of Legal Document Classification System Based on Support Vector Machine Scrumlity: A Quality User Story Framework Doctor of Technical Sciences, Professor, Chief Researcher at ISP RAS, Professor at the Departments of System Programming of MSU, MIPT, and HSE On open third-party libraries usage in implementation of vortex particle methods of computational fluid dynamics Data farm: Information system for collecting, storing and processing unstructured data from heterogeneous sources
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1