首页 > 最新文献

2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT)最新文献

英文 中文
Automatic generation of questions based on semantic text analysis 基于语义文本分析的问题自动生成
Buldin Ilya, Murov Vadim, D. Silnov
Recently, scientists have shown great interest in the processing of natural language. This article presents a template-based approach to the automatic generation of test questions that takes note of the semantic structure of the text. The main idea of the work consists in an automated composition of questions for testing from affirmative sentences that make up the text. The generated questions are then selected by the teacher. The results of experiments with text fragments from literary sources on various topics are given. The main problems for further work and successful testing of the software product in the conditions of classrooms of educational institutions are singled out.
最近,科学家们对自然语言的处理表现出极大的兴趣。本文介绍了一种基于模板的自动生成测试问题的方法,该方法会注意到文本的语义结构。这项工作的主要思想是由组成文本的肯定句自动组成测试问题。然后由老师选择生成的问题。本文给出了对不同主题的文本片段进行实验的结果。指出了软件产品在教育机构教室环境下进一步工作和成功测试所面临的主要问题。
{"title":"Automatic generation of questions based on semantic text analysis","authors":"Buldin Ilya, Murov Vadim, D. Silnov","doi":"10.1109/AICT50176.2020.9368686","DOIUrl":"https://doi.org/10.1109/AICT50176.2020.9368686","url":null,"abstract":"Recently, scientists have shown great interest in the processing of natural language. This article presents a template-based approach to the automatic generation of test questions that takes note of the semantic structure of the text. The main idea of the work consists in an automated composition of questions for testing from affirmative sentences that make up the text. The generated questions are then selected by the teacher. The results of experiments with text fragments from literary sources on various topics are given. The main problems for further work and successful testing of the software product in the conditions of classrooms of educational institutions are singled out.","PeriodicalId":136491,"journal":{"name":"2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127466161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Approach for Arranging the Learning Process in Terms of Digital Transformations with the Use of the Object-Oriented Approach and Reusable Abstractions 利用面向对象方法和可重用抽象安排数字化转换学习过程的方法
A. Dukhanov, Leonid Gorokhovatsky, Nikita Dobrovolskii, A. Lutsenko
This paper deals with the proposal to make the learning process suitable according to contemporary digital and educational trends. We offer to design and apply entities named "Basic learning resources" and "Individual learning resources" in an educational process. They are based on two reusable abstractions: research object and reusable learning object. Thanks to them and the object-oriented approach, we received the flexible means to construct from existing digital objects, individualize, and personalize learning resources for different reasons and educational conditions. Our approach allows us to shift writing textbooks/learning manuals to the creative process of building of learning resources. In addition, we extend possibilities to evaluate the quality of learning resources, solve auto-actualization and copyright protection issues, and problem in digital footprint accumulation.
本文根据当今数字化和教育的发展趋势,提出了适合学习过程的建议。我们提供在教育过程中设计和应用“基础学习资源”和“个人学习资源”的实体。它们基于两个可重用的抽象:研究对象和可重用的学习对象。由于它们和面向对象的方法,我们获得了灵活的方法,可以根据不同的原因和教育条件,从现有的数字对象中构建个性化和个性化的学习资源。我们的方法允许我们将编写教科书/学习手册转变为构建学习资源的创造性过程。此外,我们还扩展了评估学习资源质量的可能性,解决了自动实现和版权保护问题,以及数字足迹积累问题。
{"title":"An Approach for Arranging the Learning Process in Terms of Digital Transformations with the Use of the Object-Oriented Approach and Reusable Abstractions","authors":"A. Dukhanov, Leonid Gorokhovatsky, Nikita Dobrovolskii, A. Lutsenko","doi":"10.1109/AICT50176.2020.9368583","DOIUrl":"https://doi.org/10.1109/AICT50176.2020.9368583","url":null,"abstract":"This paper deals with the proposal to make the learning process suitable according to contemporary digital and educational trends. We offer to design and apply entities named \"Basic learning resources\" and \"Individual learning resources\" in an educational process. They are based on two reusable abstractions: research object and reusable learning object. Thanks to them and the object-oriented approach, we received the flexible means to construct from existing digital objects, individualize, and personalize learning resources for different reasons and educational conditions. Our approach allows us to shift writing textbooks/learning manuals to the creative process of building of learning resources. In addition, we extend possibilities to evaluate the quality of learning resources, solve auto-actualization and copyright protection issues, and problem in digital footprint accumulation.","PeriodicalId":136491,"journal":{"name":"2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128025942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Models Of Integration Of Information Systems In Higher Education Institutions 高校信息系统集成模型研究
Muminov B. B., Karimov U.U., Bekmurodov U.B.
At present a lot of automated systems are developing and implementing to support the educational and research processes in the universities. Often these systems duplicate some functions, databases, and also there are problems of compatibility of these systems. The most common educational systems are systems for creating electronic libraries, access to scientific and educational information, a program for detecting plagiarism, testing knowledge, etc. In this article, models and solutions for the integration of such educational automated systems as the information library system (ILS) and the anti-plagiarism system are examined. Integration of systems is based on the compatibility of databases, if more precisely in the metadata of different information models. At the same time, Cloud technologies are used - data processing technology, in which computer resources are provided to the user of the integrated system as an online service. ILS creates e-library of graduation papers and dissertations on the main server. During the creation of the electronic catalog, the communication format MARC21 is used. The database development is distributed for each department. The subsystem of anti-plagiarism analyzes the full-text database for the similarity of texts (dissertations, diploma works and others). Also it identifies the percentage of coincidence, creates the table of statistical information on the coincidence of tests for each author and division, indicating similar fields. The integrated system was developed and tested at the Tashkent University of Information Technologies to work in the corporate mode of various departments (faculties, departments, TUIT branches)
目前,许多自动化系统正在开发和实施,以支持大学的教育和研究过程。这些系统通常会重复一些功能、数据库,而且这些系统之间也存在兼容性问题。最常见的教育系统是创建电子图书馆的系统,获取科学和教育信息的系统,检测剽窃的程序,测试知识等。本文探讨了信息图书馆系统(ILS)与反抄袭系统等教育自动化系统集成的模型和解决方案。系统的集成基于数据库的兼容性,更准确地说,是基于不同信息模型的元数据的兼容性。同时,采用云技术——数据处理技术,将计算机资源作为在线服务提供给集成系统的用户。ILS在主服务器上创建毕业论文电子图书馆。在创建电子目录的过程中,使用了MARC21通信格式。数据库开发分布在各个部门。反抄袭子系统对全文数据库(论文、毕业论文等)的相似度进行分析。此外,它还识别巧合百分比,创建关于每个作者和部门的测试巧合的统计信息表,指示相似的字段。该综合系统是在塔什干信息技术大学开发和测试的,可在各部门(学院、系、TUIT分支机构)的合作模式下工作。
{"title":"Models Of Integration Of Information Systems In Higher Education Institutions","authors":"Muminov B. B., Karimov U.U., Bekmurodov U.B.","doi":"10.1109/AICT50176.2020.9368789","DOIUrl":"https://doi.org/10.1109/AICT50176.2020.9368789","url":null,"abstract":"At present a lot of automated systems are developing and implementing to support the educational and research processes in the universities. Often these systems duplicate some functions, databases, and also there are problems of compatibility of these systems. The most common educational systems are systems for creating electronic libraries, access to scientific and educational information, a program for detecting plagiarism, testing knowledge, etc. In this article, models and solutions for the integration of such educational automated systems as the information library system (ILS) and the anti-plagiarism system are examined. Integration of systems is based on the compatibility of databases, if more precisely in the metadata of different information models. At the same time, Cloud technologies are used - data processing technology, in which computer resources are provided to the user of the integrated system as an online service. ILS creates e-library of graduation papers and dissertations on the main server. During the creation of the electronic catalog, the communication format MARC21 is used. The database development is distributed for each department. The subsystem of anti-plagiarism analyzes the full-text database for the similarity of texts (dissertations, diploma works and others). Also it identifies the percentage of coincidence, creates the table of statistical information on the coincidence of tests for each author and division, indicating similar fields. The integrated system was developed and tested at the Tashkent University of Information Technologies to work in the corporate mode of various departments (faculties, departments, TUIT branches)","PeriodicalId":136491,"journal":{"name":"2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132530892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Support of aircraft pipeline production with precedent-oriented skeleton models 面向先例的骨架模型对飞机流水线生产的支持
P. Pavlov
The paper deals with attribute extensions that represented at skeleton models of parts and assembly units of aviation pipeline. The aviation pipelines are one of the important parts of aircraft systems such as fuel system, control system, hydraulics system and many others. The main functions of aviation pipeline are transfer of energy and matter. This skeleton model is allow to build associative links between conceptual, geometrical and physical spaces. The use of such associative links allows expanding the precedent model that uses to accumulation of professional experience. In addition, skeleton models is used as digital shadows that save production processes data for using at life cycle stages.
本文研究了航空管道部件和装配单元在骨架模型上的属性扩展。航空管道是飞机燃油系统、控制系统、液压系统等诸多系统的重要组成部分之一。航空管道的主要功能是传递能量和物质。这个骨架模型可以在概念空间、几何空间和物理空间之间建立联系。这种联想链接的使用,可以扩展用于专业经验积累的先例模型。此外,骨架模型用作数字阴影,可以保存生产过程数据,以便在生命周期阶段使用。
{"title":"The Support of aircraft pipeline production with precedent-oriented skeleton models","authors":"P. Pavlov","doi":"10.1109/AICT50176.2020.9368715","DOIUrl":"https://doi.org/10.1109/AICT50176.2020.9368715","url":null,"abstract":"The paper deals with attribute extensions that represented at skeleton models of parts and assembly units of aviation pipeline. The aviation pipelines are one of the important parts of aircraft systems such as fuel system, control system, hydraulics system and many others. The main functions of aviation pipeline are transfer of energy and matter. This skeleton model is allow to build associative links between conceptual, geometrical and physical spaces. The use of such associative links allows expanding the precedent model that uses to accumulation of professional experience. In addition, skeleton models is used as digital shadows that save production processes data for using at life cycle stages.","PeriodicalId":136491,"journal":{"name":"2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130635761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SAP Analytics Cloud: intellectual analysis of small and medium-sized business activities in Russia in the context of COVID-19 SAP分析云:在2019冠状病毒病背景下对俄罗斯中小企业活动进行智能分析
D. Nazarov, D. Kovtun, T. Reichert
The global trend of transition to a digital economy is pushing the scientific community to thoroughly research intellectual analytics models since the quality of models directly affects the choice of an effective decision-making strategy. The article discusses the possibilities and technologies for constructing data mining models in the digital service SAP Analytics Cloud, based on open data on the registration of legal entities and individual entrepreneurs in the Russian Federation. The impact of government support measures on the business activity of small and medium-sized businesses in the context of the spread of COVID-19 is assessed. Predictive analytics models are being implemented in the SAP Analytics Cloud, which allows us to assess the future development trends of small and medium-sized businesses in Russia.
全球向数字经济转型的趋势正在推动科学界深入研究智能分析模型,因为模型的质量直接影响到有效决策策略的选择。本文讨论了基于俄罗斯联邦法律实体和个体企业家注册的开放数据,在数字服务SAP分析云中构建数据挖掘模型的可能性和技术。评估新冠疫情背景下政府支持措施对中小企业经营活动的影响。SAP分析云正在实施预测分析模型,这使我们能够评估俄罗斯中小型企业的未来发展趋势。
{"title":"SAP Analytics Cloud: intellectual analysis of small and medium-sized business activities in Russia in the context of COVID-19","authors":"D. Nazarov, D. Kovtun, T. Reichert","doi":"10.1109/AICT50176.2020.9368635","DOIUrl":"https://doi.org/10.1109/AICT50176.2020.9368635","url":null,"abstract":"The global trend of transition to a digital economy is pushing the scientific community to thoroughly research intellectual analytics models since the quality of models directly affects the choice of an effective decision-making strategy. The article discusses the possibilities and technologies for constructing data mining models in the digital service SAP Analytics Cloud, based on open data on the registration of legal entities and individual entrepreneurs in the Russian Federation. The impact of government support measures on the business activity of small and medium-sized businesses in the context of the spread of COVID-19 is assessed. Predictive analytics models are being implemented in the SAP Analytics Cloud, which allows us to assess the future development trends of small and medium-sized businesses in Russia.","PeriodicalId":136491,"journal":{"name":"2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125293896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Information Extraction from Arabic Law Documents 从阿拉伯法律文件中提取信息
Samah Abu Shamma, Aseel Ayasa, Wala’ Sleem, A. Yahya
Information hidden in unstructured or semi-structured law documents can be very useful but may not be readily accessible. To get this information, an information extraction (IE) system is needed. Making extracted information available in structured form enables answering complex queries that may go well beyond simple keyword search and thus may be of interest to law professionals. In this paper we address the issue of Arabic information extraction from law documents. We describe a system we developed to extract important information, that may be of interest to potential users of these documents, with minimal human intervention. We employs a hybrid approach that utilizes machine learning and rule-based methods and Arabic NLP to facilitate the extraction of needed information. The approach was applied to a limited class of Arabic law documents and we are working on extending it to other document types and to other fields.
隐藏在非结构化或半结构化法律文件中的信息可能非常有用,但可能不容易获取。为了获得这些信息,需要一个信息提取(IE)系统。将提取的信息以结构化的形式提供,可以回答复杂的查询,这些查询可能远远超出简单的关键字搜索,因此可能会引起法律专业人员的兴趣。在本文中,我们解决了从法律文件中提取阿拉伯语信息的问题。我们描述了一个我们开发的提取重要信息的系统,这些信息可能是这些文档的潜在用户感兴趣的,人工干预最少。我们采用混合方法,利用机器学习和基于规则的方法以及阿拉伯语NLP来促进所需信息的提取。该方法已应用于有限类别的阿拉伯法律文件,我们正在努力将其扩展到其他文件类型和其他领域。
{"title":"Information Extraction from Arabic Law Documents","authors":"Samah Abu Shamma, Aseel Ayasa, Wala’ Sleem, A. Yahya","doi":"10.1109/AICT50176.2020.9368577","DOIUrl":"https://doi.org/10.1109/AICT50176.2020.9368577","url":null,"abstract":"Information hidden in unstructured or semi-structured law documents can be very useful but may not be readily accessible. To get this information, an information extraction (IE) system is needed. Making extracted information available in structured form enables answering complex queries that may go well beyond simple keyword search and thus may be of interest to law professionals. In this paper we address the issue of Arabic information extraction from law documents. We describe a system we developed to extract important information, that may be of interest to potential users of these documents, with minimal human intervention. We employs a hybrid approach that utilizes machine learning and rule-based methods and Arabic NLP to facilitate the extraction of needed information. The approach was applied to a limited class of Arabic law documents and we are working on extending it to other document types and to other fields.","PeriodicalId":136491,"journal":{"name":"2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114743500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Modelling of multi-path transmission system of various priority multimodal information 不同优先级多模态信息的多路径传输系统建模
A. Ryndin, E. Pakulova, O. Basov, Gennady Veselov
Today multimodal approach is relevant in any sphere of our digital life. The technologies aim to suggest the best and the most comfortable way for human-human and human-machine interaction. One of the direction to do that is the usage of multimodal interaction. We consider that modalities may be different by priority. It means that some modalities are more significant than others in a particular task. In this paper, we propose the mathematical model of the multi-path transmission system of various priority multimodal information based on queueing theory. We present a multipath multimodal data transmission system as a multi-phase queuing system (QS), consisting of two typical nodes. The first node (QS1) defines the queue of packets of modalities of different priority, the second node (QS2) distributes the packets over available access networks. We build a simulated model on AnyLogic tool. We show that the rate of packets losses is significant depends on the priority level.
如今,多模式方法与我们数字生活的任何领域都息息相关。这些技术旨在为人类和人机交互提供最佳和最舒适的方式。其中一个方向是使用多模态交互。我们认为,方式可能因优先次序而不同。这意味着在特定任务中,某些模式比其他模式更重要。本文基于排队理论,提出了不同优先级多模态信息的多路径传输系统的数学模型。提出了一种多路径多模式数据传输系统,它是由两个典型节点组成的多阶段排队系统(QS)。第一个节点(QS1)定义不同优先级模式的数据包队列,第二个节点(QS2)在可用的接入网络上分发数据包。在AnyLogic工具上建立仿真模型。我们表明,数据包丢失率是显著依赖于优先级级别。
{"title":"Modelling of multi-path transmission system of various priority multimodal information","authors":"A. Ryndin, E. Pakulova, O. Basov, Gennady Veselov","doi":"10.1109/AICT50176.2020.9368802","DOIUrl":"https://doi.org/10.1109/AICT50176.2020.9368802","url":null,"abstract":"Today multimodal approach is relevant in any sphere of our digital life. The technologies aim to suggest the best and the most comfortable way for human-human and human-machine interaction. One of the direction to do that is the usage of multimodal interaction. We consider that modalities may be different by priority. It means that some modalities are more significant than others in a particular task. In this paper, we propose the mathematical model of the multi-path transmission system of various priority multimodal information based on queueing theory. We present a multipath multimodal data transmission system as a multi-phase queuing system (QS), consisting of two typical nodes. The first node (QS1) defines the queue of packets of modalities of different priority, the second node (QS2) distributes the packets over available access networks. We build a simulated model on AnyLogic tool. We show that the rate of packets losses is significant depends on the priority level.","PeriodicalId":136491,"journal":{"name":"2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123638741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Anomaly Detection Between Judicial Text-Based Documents 基于文本的司法文书异常检测
Mukhsimbayev Bobur, Kuralbayev Aibek, Bekbaganbetov Abay, Fuad Hajiyev
The problem of searching for anomalies or outliers are extremely important in various fields with problems like fraud detection, crime research, network reliability analysis, medical diagnostics etc.What is an anomaly in the judicial system? A court case is to be considered as an anomaly if the judge’s decision differs significantly from existing decisions in similar cases.In most cases, the existing outlier’s search methods use high-dimensional domains in which data can contain hundreds of dimensions. Such an approach requires lots of resources and clearly is not efficient.Objectives: In this article, the authors:•present two methods (or two models) for searching for anomalies in judicial practice;•give a comparative analysis of the results of the effectiveness of both methods.Methodology: The First method for searching for anomalies is a mix of two models: classification and similarity algorithms. Here algorithms like Logistic regression, Extreme Gradient Boosting (XGBoost), Tensorflow for classification and Latent Dirichlet Allocation (LDA), Latent semantic indexing (LSI) to find similar documents. The Second method shows the usage of the Bidirectional Encoder Representations from Transformers (BERT) embedding model and the Annoy indexing model.Findings: The second method shows better and fast results for searching outliers.Data source: Authors used the set of acts provided by the Supreme Court of the Republic of Kazakhstan. The dataset contains 1 million text documents and metadata.
在欺诈检测、犯罪研究、网络可靠性分析、医疗诊断等各个领域中,寻找异常或异常值的问题都是非常重要的。司法系统中的异常是什么?如果法官的判决与类似案件的现有判决有重大不同,则法院案件将被视为异常案件。在大多数情况下,现有的离群值搜索方法使用高维域,其中的数据可以包含数百个维度。这种方法需要大量资源,显然效率不高。目的:在本文中,作者:•提出了两种方法(或两种模型)来搜索司法实践中的异常;•对两种方法的有效性结果进行了比较分析。方法:搜索异常的第一种方法是两种模型的混合:分类和相似算法。这里的算法包括逻辑回归、极端梯度增强(XGBoost)、用于分类的Tensorflow和用于查找类似文档的潜在狄利克雷分配(LDA)、潜在语义索引(LSI)。第二种方法展示了双向编码器表示从变压器(BERT)嵌入模型和骚扰索引模型的使用。结果:第二种方法对异常值的搜索结果更好、更快。数据来源:作者使用了哈萨克斯坦共和国最高法院提供的一套法令。该数据集包含100万个文本文档和元数据。
{"title":"Anomaly Detection Between Judicial Text-Based Documents","authors":"Mukhsimbayev Bobur, Kuralbayev Aibek, Bekbaganbetov Abay, Fuad Hajiyev","doi":"10.1109/AICT50176.2020.9368621","DOIUrl":"https://doi.org/10.1109/AICT50176.2020.9368621","url":null,"abstract":"The problem of searching for anomalies or outliers are extremely important in various fields with problems like fraud detection, crime research, network reliability analysis, medical diagnostics etc.What is an anomaly in the judicial system? A court case is to be considered as an anomaly if the judge’s decision differs significantly from existing decisions in similar cases.In most cases, the existing outlier’s search methods use high-dimensional domains in which data can contain hundreds of dimensions. Such an approach requires lots of resources and clearly is not efficient.Objectives: In this article, the authors:•present two methods (or two models) for searching for anomalies in judicial practice;•give a comparative analysis of the results of the effectiveness of both methods.Methodology: The First method for searching for anomalies is a mix of two models: classification and similarity algorithms. Here algorithms like Logistic regression, Extreme Gradient Boosting (XGBoost), Tensorflow for classification and Latent Dirichlet Allocation (LDA), Latent semantic indexing (LSI) to find similar documents. The Second method shows the usage of the Bidirectional Encoder Representations from Transformers (BERT) embedding model and the Annoy indexing model.Findings: The second method shows better and fast results for searching outliers.Data source: Authors used the set of acts provided by the Supreme Court of the Republic of Kazakhstan. The dataset contains 1 million text documents and metadata.","PeriodicalId":136491,"journal":{"name":"2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT)","volume":"53 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123709553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The approach to building a graph knowledge base using social media data 使用社交媒体数据构建图形知识库的方法
V. Moshkin
The aim of the work was to develop a model of a knowledge base of an information system that collects information from various social networks. The model should improve search efficiency and facilitate the unification of data from heterogeneous sources. The work presents an ontological model for the unification of data profiles of different social networks. This model avoids data redundancy by including contextual information in annotations to ontology relations. In addition, an approach to information retrieval using syntagmatic patterns in the formation of a database tree of posts of social network users is proposed. The article also presents the results of experiments with data from the social network Facebook confirming the effectiveness of the proposed models and algorithms.
这项工作的目的是开发一个从各种社会网络收集信息的信息系统的知识库模型。该模型应提高搜索效率,促进异构数据源数据的统一。该工作提出了一个统一不同社会网络数据概况的本体论模型。该模型通过在本体关系的注释中包含上下文信息来避免数据冗余。此外,本文还提出了一种利用社交网络用户帖子数据库树的组合模式进行信息检索的方法。本文还介绍了社交网络Facebook数据的实验结果,证实了所提出模型和算法的有效性。
{"title":"The approach to building a graph knowledge base using social media data","authors":"V. Moshkin","doi":"10.1109/AICT50176.2020.9368794","DOIUrl":"https://doi.org/10.1109/AICT50176.2020.9368794","url":null,"abstract":"The aim of the work was to develop a model of a knowledge base of an information system that collects information from various social networks. The model should improve search efficiency and facilitate the unification of data from heterogeneous sources. The work presents an ontological model for the unification of data profiles of different social networks. This model avoids data redundancy by including contextual information in annotations to ontology relations. In addition, an approach to information retrieval using syntagmatic patterns in the formation of a database tree of posts of social network users is proposed. The article also presents the results of experiments with data from the social network Facebook confirming the effectiveness of the proposed models and algorithms.","PeriodicalId":136491,"journal":{"name":"2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126969856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computation Model of Data Intensive Computing with MapReduce 基于MapReduce的数据密集型计算模型
A. Adamov
It becomes obvious that traditional platforms and processing paradigms can’t store and process huge amounts of data. The only solution is to use specially designed ad-hoc platform/architecture based on parallelization that distributes data across large cluster of physical machines. Data Intensive Computing is a subclass of general parallel computing concept which is based on division of large amounts of data into independent parts and processing them in parallel. In the paper the alternative parallelization architectures are reviewed. MapReduce Programming model associated with distributed massive parallel processing of large amount of data is examined. The main objective of this study is to investigate conceptual fundament behind very popular data-drive computation model MapReduce.
很明显,传统平台和处理范式无法存储和处理大量数据。唯一的解决方案是使用专门设计的基于并行化的ad-hoc平台/体系结构,将数据分布在大型物理机器集群上。数据密集计算是一般并行计算概念的一个子类,它基于将大量数据划分为独立的部分并并行处理它们。本文对现有的并行化体系结构进行了综述。研究了分布式大规模并行处理海量数据的MapReduce编程模型。本研究的主要目的是研究非常流行的数据驱动计算模型MapReduce背后的概念基础。
{"title":"Computation Model of Data Intensive Computing with MapReduce","authors":"A. Adamov","doi":"10.1109/AICT50176.2020.9368841","DOIUrl":"https://doi.org/10.1109/AICT50176.2020.9368841","url":null,"abstract":"It becomes obvious that traditional platforms and processing paradigms can’t store and process huge amounts of data. The only solution is to use specially designed ad-hoc platform/architecture based on parallelization that distributes data across large cluster of physical machines. Data Intensive Computing is a subclass of general parallel computing concept which is based on division of large amounts of data into independent parts and processing them in parallel. In the paper the alternative parallelization architectures are reviewed. MapReduce Programming model associated with distributed massive parallel processing of large amount of data is examined. The main objective of this study is to investigate conceptual fundament behind very popular data-drive computation model MapReduce.","PeriodicalId":136491,"journal":{"name":"2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130779646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2020 IEEE 14th International Conference on Application of Information and Communication Technologies (AICT)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1