2022 20th International Conference on Language Engineering (ESOLEC)最新文献

英文中文

Deep Learning in Arabic Text Summarization: Approaches, Datasets, and Evaluation Metrics 阿拉伯语文本摘要中的深度学习:方法，数据集和评估指标

2022 20th International Conference on Language Engineering (ESOLEC)

Pub Date : 2022-10-12 DOI: 10.1109/ESOLEC54569.2022.10009528

Yasmin Einieh, Amal Almansour

Recently, there is a massive amount of data available on the internet. Hence, it is quite difficult for the users to go through all the available online information to generate a precise summary manually. Automatic Text Summarization (ATS) systems provide a solution to this problem as they produce a shorter and manageable version of the input text while keeping the most important information. Deep learning has achieved good results in Natural Language Processing (NLP) tasks and the use of deep learning techniques specifically in Automatic Text Summarization (ATS) has increased in English language. However, there is still a shortage of studies evaluating these techniques in Arabic language. In this research work, we review several articles that address the usage of deep learning with Arabic language. Specifically, we study the available models, datasets, and evaluation metrics for extractive and abstractive Arabic text summarization. We reviewed 12 research papers and found that most of the studies employed deep learning for the abstractive summarization type.

最近，互联网上有大量的数据。因此，用户很难通过所有可用的在线信息手动生成精确的摘要。自动文本摘要(Automatic Text Summarization, ATS)系统为这个问题提供了解决方案，因为它们在保留最重要信息的同时，生成了更短、更易于管理的输入文本版本。深度学习在自然语言处理(NLP)任务中取得了良好的效果，在英语语言中，深度学习技术特别是在自动文本摘要(ATS)中的应用越来越多。然而，在阿拉伯语中评估这些技术的研究仍然不足。在这项研究工作中，我们回顾了几篇关于阿拉伯语深度学习使用的文章。具体来说，我们研究了抽取和抽象阿拉伯语文本摘要的可用模型、数据集和评估指标。我们回顾了12篇研究论文，发现大多数研究使用深度学习进行抽象摘要类型。

引用次数: 0

Towards a Psycholinguistic Database of Arabic 建立阿拉伯语心理语言学数据库

2022 20th International Conference on Language Engineering (ESOLEC)

Pub Date : 2022-10-12 DOI: 10.1109/ESOLEC54569.2022.10009144

N. Fathy, S. Alansary

Psycholinguistic databases are indispensable resources for psycholinguistic and computational research. Many languages have such valuable resources, such as English, Croatian, Dutch, French, and Chinese. Unfortunately, Arabic doesn't have such databases. This research aims at introducing the guidelines for building a psycholinguistic database of Arabic. The database will be available in two phases: the first is a psycholinguistic phase in which subjective ratings are collected for several variables such as concreteness, imageability, subjective frequency, and number of meanings, the second is a computational phase in which ratings are stacked with other linguistic information obtained from corpora, such as root, stem, objective frequency, number of syllables, and word length. This phase is meant to provide an online searchable release that can be used by psycholinguists and computational linguists for building cognitive-based artificial intelligence models. This survey is meant to introduce the building process of the psycholinguistic phase in detail.

心理语言学数据库是心理语言学和计算学研究不可缺少的资源。许多语言都有这样宝贵的资源，如英语、克罗地亚语、荷兰语、法语和汉语。不幸的是，阿拉伯语没有这样的数据库。本研究旨在介绍建立阿拉伯语心理语言学数据库的指导原则。该数据库将分两个阶段提供:第一个是心理语言学阶段，在这个阶段中，主观评分被收集到几个变量，如具体性、可想象性、主观频率和意义数量;第二个是计算阶段，在这个阶段中，评分与从语料库中获得的其他语言信息(如词根、词干、客观频率、音节数和单词长度)堆叠在一起。这一阶段的目的是提供一个可在线搜索的版本，供心理语言学家和计算语言学家用于构建基于认知的人工智能模型。本调查旨在详细介绍心理语言学阶段的建立过程。

引用次数: 0

Arabic Sentences Semantic Similarity Based on Word Embedding 基于词嵌入的阿拉伯语句子语义相似度研究

2022 20th International Conference on Language Engineering (ESOLEC)

Pub Date : 2022-10-12 DOI: 10.1109/ESOLEC54569.2022.10009099

Badrya Dahy, M. Farouk, Khaled Fathy

Natural language processing pays significant attention to semantic textual similarity. It's useful in a variety of NLP-applications, including information retrieval, plagiarism detection, data extraction, and machine translation. Sentence similarity in the Arabic language has not been investigated deeply because of the lack of Arabic language resources. Moreover, it's critical to calculate the degree of similarity between Arabic sentences accurately. The method for determining the semantic similarity of Arabic sentences is suggested in this research. The strategy suggested uses word embedding to measure the similarity between words. Moreover, more than one similarity measure is combined to calculate the final similarity. Furthermore, due to the lack of Arabic resources, a new dataset for evaluating similarity techniques has been constructed. The new dataset is available for public use. An experiment have been conducted to show the efficiency of the strategy suggested. Two datasets are used to compare other approaches. Experiments reveal that the proposed methods outperform alternative approaches to measuring sentence similarity in the Arabic language.

自然语言处理非常重视语义文本相似度。它在各种nlp应用中都很有用，包括信息检索、剽窃检测、数据提取和机器翻译。由于阿拉伯文资源的缺乏，对阿拉伯文句子相似度的研究尚未深入。此外，准确地计算阿拉伯语句子之间的相似度也是至关重要的。本研究提出了确定阿拉伯语句子语义相似度的方法。该策略使用词嵌入来衡量词之间的相似度。此外，将多个相似度量组合起来计算最终的相似度。此外，由于缺乏阿拉伯语资源，构建了一个新的评估相似技术的数据集。新的数据集可供公众使用。实验证明了所提策略的有效性。两个数据集用于比较其他方法。实验表明，本文提出的方法在测量阿拉伯语句子相似度方面优于其他方法。

{"title":"Arabic Sentences Semantic Similarity Based on Word Embedding","authors":"Badrya Dahy, M. Farouk, Khaled Fathy","doi":"10.1109/ESOLEC54569.2022.10009099","DOIUrl":"https://doi.org/10.1109/ESOLEC54569.2022.10009099","url":null,"abstract":"Natural language processing pays significant attention to semantic textual similarity. It's useful in a variety of NLP-applications, including information retrieval, plagiarism detection, data extraction, and machine translation. Sentence similarity in the Arabic language has not been investigated deeply because of the lack of Arabic language resources. Moreover, it's critical to calculate the degree of similarity between Arabic sentences accurately. The method for determining the semantic similarity of Arabic sentences is suggested in this research. The strategy suggested uses word embedding to measure the similarity between words. Moreover, more than one similarity measure is combined to calculate the final similarity. Furthermore, due to the lack of Arabic resources, a new dataset for evaluating similarity techniques has been constructed. The new dataset is available for public use. An experiment have been conducted to show the efficiency of the strategy suggested. Two datasets are used to compare other approaches. Experiments reveal that the proposed methods outperform alternative approaches to measuring sentence similarity in the Arabic language.","PeriodicalId":179850,"journal":{"name":"2022 20th International Conference on Language Engineering (ESOLEC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116849632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A new framework for an eKYC system 一个新的eKYC系统框架

2022 20th International Conference on Language Engineering (ESOLEC)

Pub Date : 2022-10-12 DOI: 10.1109/ESOLEC54569.2022.10009253

Abdallah Gomaa, Omar Rashed, Abdelkarim Refaey, Abdel-rahman Mohamed, M. Sayed, M. Rashwan

Identity verification has long been a crucial problem to solve to automate financial operations which requires user authentication and detect fraudulency. Until recently the realization of this task was nearly impossible to do with considerable accuracy, thanks to advancements in machine learning over the past few years we can achieve this task. This paper will discuss a proposed solution for high accuracy, high-performance eKYC system. In an eKYC system, we need to verify our client's identity as per his identity documents with the constraint that he passed a liveness detection test to ensure he is doing the financial operation in person. In our proposed system, verification is done in three main stages, which are: face detection, face verification, and face antispoofing detection. We employed an AI model to perform each task, We employed MTCNN [1] for face detection and FaceNet [12] for face verification. For face antispoofing, we implemented a state-of-the-art model PatchNet [15].

长期以来，身份验证一直是实现金融业务自动化的关键问题，这需要用户身份验证和欺诈检测。直到最近，这个任务的实现几乎不可能以相当高的精度完成，由于过去几年机器学习的进步，我们可以实现这个任务。本文将讨论一种高精度、高性能eKYC系统的解决方案。在eKYC系统中，我们需要根据客户的身份文件验证客户的身份，并约束他通过了活体检测测试，以确保他亲自进行金融操作。在我们提出的系统中，验证主要分为三个阶段，即人脸检测、人脸验证和人脸防欺骗检测。我们使用AI模型来执行每个任务，我们使用MTCNN[1]进行人脸检测，使用FaceNet[12]进行人脸验证。对于人脸防欺骗，我们实现了最先进的模型PatchNet[15]。

引用次数: 0

Recommender Diagnosis System with Fuzzy Logic in Cloud Environment 云环境下模糊逻辑推荐诊断系统

2022 20th International Conference on Language Engineering (ESOLEC)

Pub Date : 2022-10-12 DOI: 10.1109/ESOLEC54569.2022.10009214

Maie Aboghazalah, Rasha Elnemr, Nedaa Elsayed, Ayman El-Sayed, Passant El-Kafrawy

Recommendation systems are now used in a wide range in many fields. In the medical field, recommendation systems have a great stature to both doctors and patients for its accurate prediction. It can reduce the time and efforts spent by doctors and patients. The present work introduces a simple and effective methodology for medical recommendation system based on fuzzy logic. Fuzzy logic is an important method to be used based on fuzzy input data. The input data for each patient are not the same, on which recommendation can differ. This work aims to develop techniques for handling the patient data to urge accurate lifestyle recommendations to the patient. Fuzzy logic is utilized to form different recommendations for the patient like lifestyle recommendations, medicine recommendations, and sports recommendations based on different patient factors like age, gender and patient diseases. After evaluating the system its efficiency reached 94%. This Experiment is the final module in a four modules recommendation system. The first one is responsible for diagnosing chest diseases using ECG signals. The second one makes diagnosis using X-ray images. The third is utilizing the security of the whole system through encryption when sending user data over the cloud.

推荐系统目前在许多领域得到了广泛的应用。在医疗领域，推荐系统以其准确的预测，在医生和患者中都具有很高的地位。它可以减少医生和病人花费的时间和精力。本文介绍了一种简单有效的基于模糊逻辑的医疗推荐系统方法。模糊逻辑是基于模糊输入数据的一种重要方法。每个患者的输入数据不相同，因此建议可能会有所不同。这项工作旨在开发处理患者数据的技术，以敦促患者提供准确的生活方式建议。利用模糊逻辑，根据患者的年龄、性别、疾病等不同因素，对患者形成不同的生活方式、药物、运动建议等建议。经评估，该系统的效率达到94%。本实验是推荐系统四个模块中的最后一个模块。第一个负责使用心电信号诊断胸部疾病。第二种是利用x射线图像进行诊断。第三是通过加密在云上发送用户数据时利用整个系统的安全性。

{"title":"Recommender Diagnosis System with Fuzzy Logic in Cloud Environment","authors":"Maie Aboghazalah, Rasha Elnemr, Nedaa Elsayed, Ayman El-Sayed, Passant El-Kafrawy","doi":"10.1109/ESOLEC54569.2022.10009214","DOIUrl":"https://doi.org/10.1109/ESOLEC54569.2022.10009214","url":null,"abstract":"Recommendation systems are now used in a wide range in many fields. In the medical field, recommendation systems have a great stature to both doctors and patients for its accurate prediction. It can reduce the time and efforts spent by doctors and patients. The present work introduces a simple and effective methodology for medical recommendation system based on fuzzy logic. Fuzzy logic is an important method to be used based on fuzzy input data. The input data for each patient are not the same, on which recommendation can differ. This work aims to develop techniques for handling the patient data to urge accurate lifestyle recommendations to the patient. Fuzzy logic is utilized to form different recommendations for the patient like lifestyle recommendations, medicine recommendations, and sports recommendations based on different patient factors like age, gender and patient diseases. After evaluating the system its efficiency reached 94%. This Experiment is the final module in a four modules recommendation system. The first one is responsible for diagnosing chest diseases using ECG signals. The second one makes diagnosis using X-ray images. The third is utilizing the security of the whole system through encryption when sending user data over the cloud.","PeriodicalId":179850,"journal":{"name":"2022 20th International Conference on Language Engineering (ESOLEC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127782064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gold Price Prediction using Sentiment Analysis 利用情绪分析预测黄金价格

2022 20th International Conference on Language Engineering (ESOLEC)

Pub Date : 2022-10-12 DOI: 10.1109/ESOLEC54569.2022.10009529

Mariam Abdou, Menna Shaltout, Alaa Godah, Karim Sobh, Yomna Eid, Walaa Medhat

Gold is one of the valuable materials that is used for funding trading purchases. Nowadays, more investors are interested in gold investments due to the sudden increase in gold prices. However, transactions involving gold are risky, the price of gold fluctuates wildly due to the unpredictability of the gold market. Hence, there is a need for the development of gold price prediction scheme to assist and support investors, marketers, and financial institutions in making effective economic and monetary decisions. This paper analyzes the correlation between gold price movements and sentiments of Arabic tweets in Egypt. After performing sentiment analysis on these tweets, three supervised machine learning algorithms were used for predicting the gold price. The algorithms include Multiple linear regression, Ridge regression, and Lasso regression. The result of this work shows that the Lasso regression model performs better than the other two models. However, it is concluded that there is a weak correlation between gold prices and Twitter data. Therefore, gold prices cannot be accurately predicted using Twitter data alone.

黄金是一种有价值的材料，用于为交易购买提供资金。如今，由于金价的突然上涨，越来越多的投资者对黄金投资感兴趣。然而，涉及黄金的交易是有风险的，由于黄金市场的不可预测性，黄金价格波动很大。因此，有必要开发黄金价格预测方案，以帮助和支持投资者、营销人员和金融机构做出有效的经济和货币决策。本文分析了黄金价格走势与埃及阿拉伯语推特情绪之间的相关性。在对这些推文进行情绪分析后，使用三种监督机器学习算法来预测黄金价格。算法包括多元线性回归、Ridge回归和Lasso回归。研究结果表明，Lasso回归模型的性能优于其他两种模型。然而，结论是黄金价格与Twitter数据之间存在弱相关性。因此，仅使用Twitter数据无法准确预测金价。

{"title":"Gold Price Prediction using Sentiment Analysis","authors":"Mariam Abdou, Menna Shaltout, Alaa Godah, Karim Sobh, Yomna Eid, Walaa Medhat","doi":"10.1109/ESOLEC54569.2022.10009529","DOIUrl":"https://doi.org/10.1109/ESOLEC54569.2022.10009529","url":null,"abstract":"Gold is one of the valuable materials that is used for funding trading purchases. Nowadays, more investors are interested in gold investments due to the sudden increase in gold prices. However, transactions involving gold are risky, the price of gold fluctuates wildly due to the unpredictability of the gold market. Hence, there is a need for the development of gold price prediction scheme to assist and support investors, marketers, and financial institutions in making effective economic and monetary decisions. This paper analyzes the correlation between gold price movements and sentiments of Arabic tweets in Egypt. After performing sentiment analysis on these tweets, three supervised machine learning algorithms were used for predicting the gold price. The algorithms include Multiple linear regression, Ridge regression, and Lasso regression. The result of this work shows that the Lasso regression model performs better than the other two models. However, it is concluded that there is a weak correlation between gold prices and Twitter data. Therefore, gold prices cannot be accurately predicted using Twitter data alone.","PeriodicalId":179850,"journal":{"name":"2022 20th International Conference on Language Engineering (ESOLEC)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126514827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Design and Implementation of a Dockerized, Cross Platform, Multi-Purpose Cryptography as a Service Framework Featuring Scalability, Extendibility and Ease of Integration Dockerized, Cross - Platform, Multi-Purpose Cryptography as a Service Framework的设计与实现，具有可扩展性，可扩展性和易于集成的特点

2022 20th International Conference on Language Engineering (ESOLEC)

Pub Date : 2022-10-12 DOI: 10.1109/ESOLEC54569.2022.10009317

A. Merdan, H. Aslan, Nashwa Abdelbaki

Following cybersecurity standards nowadays is becoming one of the highest priorities to the digital specialists. Due to the global direction to apply digital transformation, data security is a concern. It becomes crucial to ensure data confidentiality, integrity, and availability whether while transmitting, at rest or even while processing it. The difficulty being faced by organizations, is the challenge of applying the needed security measures. Also, implementing, and maintaining the cryptographic algorithms that ensure the wellness of the data encryption. Having a crypto library or a server that can fit multiple use-cases is either too costly to implement, or expensive to buy (including licensing options, per user/server/year…etc.). The goal of our work is to identify the data protection challenges, by implementing a solution that could match a theoretical hypothesis of having cryptography as a service framework. The term “as a service” has been promoted lately due to its capabilities to provide a ready-made solution by the vendors to satisfy their customer base. In this paper, we are proposing a framework that works cross-platform with ease. It is a scalable, extendible solution with multiple hosting options, from an on-premises hosting to cloud hosting. The proposed framework is implemented and evaluated. The results show that the proposed framework can efficiently process enormous amounts of data. In addition, it could be easily accessed by standard HTTPS requests using JSON format. Also, proving the used deployment technique, we were able to evaluate it on-premises and on cloud with the same allocated resources, getting matching results.

如今，遵循网络安全标准正成为数字专家的首要任务之一。由于数字化转型是全球应用的方向，数据安全备受关注。无论是在传输、静态还是在处理数据时，确保数据的机密性、完整性和可用性都变得至关重要。组织面临的困难是应用所需安全措施的挑战。此外，实现和维护确保数据加密的健康的加密算法。拥有一个可以适应多个用例的加密库或服务器，要么实现成本太高，要么购买成本太高(包括许可选项，每用户/服务器/年……等等)。我们工作的目标是通过实现一个解决方案来识别数据保护挑战，该解决方案可以匹配将加密作为服务框架的理论假设。术语“即服务”最近得到了推广，因为它能够提供供应商提供的现成解决方案来满足其客户群。在本文中，我们提出了一个易于跨平台工作的框架。它是一个可扩展的、可扩展的解决方案，具有多种托管选项，从本地托管到云托管。提出的框架被实施和评估。结果表明，该框架能够有效地处理海量数据。此外，它可以通过使用JSON格式的标准HTTPS请求轻松访问。此外，为了证明所使用的部署技术，我们能够使用相同的分配资源在本地和云上对其进行评估，得到匹配的结果。

{"title":"Design and Implementation of a Dockerized, Cross Platform, Multi-Purpose Cryptography as a Service Framework Featuring Scalability, Extendibility and Ease of Integration","authors":"A. Merdan, H. Aslan, Nashwa Abdelbaki","doi":"10.1109/ESOLEC54569.2022.10009317","DOIUrl":"https://doi.org/10.1109/ESOLEC54569.2022.10009317","url":null,"abstract":"Following cybersecurity standards nowadays is becoming one of the highest priorities to the digital specialists. Due to the global direction to apply digital transformation, data security is a concern. It becomes crucial to ensure data confidentiality, integrity, and availability whether while transmitting, at rest or even while processing it. The difficulty being faced by organizations, is the challenge of applying the needed security measures. Also, implementing, and maintaining the cryptographic algorithms that ensure the wellness of the data encryption. Having a crypto library or a server that can fit multiple use-cases is either too costly to implement, or expensive to buy (including licensing options, per user/server/year…etc.). The goal of our work is to identify the data protection challenges, by implementing a solution that could match a theoretical hypothesis of having cryptography as a service framework. The term “as a service” has been promoted lately due to its capabilities to provide a ready-made solution by the vendors to satisfy their customer base. In this paper, we are proposing a framework that works cross-platform with ease. It is a scalable, extendible solution with multiple hosting options, from an on-premises hosting to cloud hosting. The proposed framework is implemented and evaluated. The results show that the proposed framework can efficiently process enormous amounts of data. In addition, it could be easily accessed by standard HTTPS requests using JSON format. Also, proving the used deployment technique, we were able to evaluate it on-premises and on cloud with the same allocated resources, getting matching results.","PeriodicalId":179850,"journal":{"name":"2022 20th International Conference on Language Engineering (ESOLEC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134263651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using Knowledge Graph Embeddings in Embedding Based Recommender Systems 知识图嵌入在基于嵌入的推荐系统中的应用

2022 20th International Conference on Language Engineering (ESOLEC)

Pub Date : 2022-10-12 DOI: 10.1109/ESOLEC54569.2022.10009491

Ahmed Hussein Ragab, Passant El-Kafrawy

This paper proposes using entity2rec [1] which utilizes knowledge graph-based embeddings (node2vec) instead of traditional embedding layers in embedding based recommender systems. This opens the door to increasing the accuracy of some of the most implemented recommender systems running in production in many companies by just replacing the traditional embedding layer with node2vec graph embedding without the risk of completely migrating to newer SOTA systems and risking unexpected performance issues. Also, Graph embeddings will be able to incorporate user and item features which can help in solving the well-known Cold start problem in recommender systems. Both embedding methods are compared on the movie-Lens 100-K dataset in an item-item collaborative filtering recommender and we show that the suggested replacement improves the representation learning of the embedding layer by adding a semantic layer that can increase the overall performance of the normal embedding based recommenders. First, normal Recommender systems are introduced, and a brief explanation of both traditional and graph-based embeddings is presented. Then, the proposed approach is presented along with related work. Finally, results are presented along with future work.

本文提出在基于嵌入的推荐系统中使用基于知识图的嵌入(node2vec)的entity2rec[1]代替传统的嵌入层。这为许多公司在生产环境中运行的一些实现最多的推荐系统的准确性打开了大门，只需用node2vec图嵌入取代传统的嵌入层，而无需完全迁移到较新的SOTA系统和冒意外性能问题的风险。此外，图嵌入将能够整合用户和项目特征，这有助于解决推荐系统中众所周知的冷启动问题。在item-item协同过滤推荐器的movie-Lens 100-K数据集上比较了两种嵌入方法，结果表明，建议的替换方法通过添加语义层来改善嵌入层的表示学习，从而提高基于常规嵌入的推荐器的整体性能。首先，介绍了常规推荐系统，并简要介绍了传统嵌入和基于图的嵌入。然后，提出了该方法并进行了相关工作。最后给出了研究结果，并对今后的工作进行了展望。

{"title":"Using Knowledge Graph Embeddings in Embedding Based Recommender Systems","authors":"Ahmed Hussein Ragab, Passant El-Kafrawy","doi":"10.1109/ESOLEC54569.2022.10009491","DOIUrl":"https://doi.org/10.1109/ESOLEC54569.2022.10009491","url":null,"abstract":"This paper proposes using entity2rec [1] which utilizes knowledge graph-based embeddings (node2vec) instead of traditional embedding layers in embedding based recommender systems. This opens the door to increasing the accuracy of some of the most implemented recommender systems running in production in many companies by just replacing the traditional embedding layer with node2vec graph embedding without the risk of completely migrating to newer SOTA systems and risking unexpected performance issues. Also, Graph embeddings will be able to incorporate user and item features which can help in solving the well-known Cold start problem in recommender systems. Both embedding methods are compared on the movie-Lens 100-K dataset in an item-item collaborative filtering recommender and we show that the suggested replacement improves the representation learning of the embedding layer by adding a semantic layer that can increase the overall performance of the normal embedding based recommenders. First, normal Recommender systems are introduced, and a brief explanation of both traditional and graph-based embeddings is presented. Then, the proposed approach is presented along with related work. Finally, results are presented along with future work.","PeriodicalId":179850,"journal":{"name":"2022 20th International Conference on Language Engineering (ESOLEC)","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133733125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Fine-tuning Arabic Pre-Trained Transformer Models for Egyptian-Arabic Dialect Offensive Language and Hate Speech Detection and Classification 微调阿拉伯语预训练变压器模型，用于埃及-阿拉伯方言攻击性语言和仇恨言论的检测和分类

2022 20th International Conference on Language Engineering (ESOLEC)

Pub Date : 2022-10-12 DOI: 10.1109/ESOLEC54569.2022.10009167

Ibrahim Ahmed, Mostafa Abbas, Rany Hatem, Andrew Ihab, Mohamed Waleed Fahkr

Offensive language and Hate Speech are rampant on social media platforms (Facebook, Twitter, etc.) in Egypt for quite a while now, appearing in Tweets, Facebook posts and comments, etc., It is an increasingly outreaching problem that needs immediate attention. This paper focuses on the problem of detecting and classifying both offensive language and Hate Speech using State-of-the-art techniques in text classification. Pre-trained transformer models have gained a reputation of astounding general language understanding that could be fine-tuned for language-specific tasks like Text classification, We collected an Egyptian-Arabic dialect Custom dataset of about 8,000 text samples manually labelled into 5 distinct classes: (Neutral, Offensive, Sexism, Religious Discrimination, Racism), It was used to fine-tune and evaluate multiple different Arabic pre-trained transformer models based on different transformer architectures and pre-training approaches for the Natural Language Processing downstream task of text classification. We achieved an average accuracy of about 96% across all fine-tuned transformer models.

在埃及的社交媒体平台(Facebook, Twitter等)上，攻击性语言和仇恨言论已经猖獗了一段时间，出现在Twitter, Facebook帖子和评论等中，这是一个日益外延的问题，需要立即关注。本文主要研究了使用最新的文本分类技术对攻击性语言和仇恨言论进行检测和分类的问题。预训练的变压器模型已经获得了惊人的通用语言理解能力，可以对特定语言的任务进行微调，如文本分类。我们收集了一个埃及-阿拉伯语方言自定义数据集，其中大约有8000个文本样本，手动标记为5个不同的类别:(中性、冒犯性、性别歧视、宗教歧视、种族主义)，基于不同的变压器架构和预训练方法，对多个不同的阿拉伯语预训练变压器模型进行微调和评估，用于文本分类的自然语言处理下游任务。我们在所有微调变压器模型中实现了约96%的平均精度。

{"title":"Fine-tuning Arabic Pre-Trained Transformer Models for Egyptian-Arabic Dialect Offensive Language and Hate Speech Detection and Classification","authors":"Ibrahim Ahmed, Mostafa Abbas, Rany Hatem, Andrew Ihab, Mohamed Waleed Fahkr","doi":"10.1109/ESOLEC54569.2022.10009167","DOIUrl":"https://doi.org/10.1109/ESOLEC54569.2022.10009167","url":null,"abstract":"Offensive language and Hate Speech are rampant on social media platforms (Facebook, Twitter, etc.) in Egypt for quite a while now, appearing in Tweets, Facebook posts and comments, etc., It is an increasingly outreaching problem that needs immediate attention. This paper focuses on the problem of detecting and classifying both offensive language and Hate Speech using State-of-the-art techniques in text classification. Pre-trained transformer models have gained a reputation of astounding general language understanding that could be fine-tuned for language-specific tasks like Text classification, We collected an Egyptian-Arabic dialect Custom dataset of about 8,000 text samples manually labelled into 5 distinct classes: (Neutral, Offensive, Sexism, Religious Discrimination, Racism), It was used to fine-tune and evaluate multiple different Arabic pre-trained transformer models based on different transformer architectures and pre-training approaches for the Natural Language Processing downstream task of text classification. We achieved an average accuracy of about 96% across all fine-tuned transformer models.","PeriodicalId":179850,"journal":{"name":"2022 20th International Conference on Language Engineering (ESOLEC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124073685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Neural Networks for Bilingual Machine Translation Model 基于神经网络的双语机器翻译模型

2022 20th International Conference on Language Engineering (ESOLEC)

Pub Date : 2022-10-12 DOI: 10.1109/ESOLEC54569.2022.10009266

Hassanin M. Al-Barhamtoshy, Ashraf Said Qutb Metwalli

Machine translation can be involved in statistical-based, corpus-based or dataset-based machine translation systems, in addition to linguistic systems. This paper objects to develop a bilingual English to Arabic translation model with quality for continuous improvement and flexible to be expanded multi-lingual other language pairs. This in addition to create an integrated translation environment that incorporates computer-assisted facilities to enhance the quality of automatically produced texts, increase translators' productivity and help their professional capabilities. Therefore, a machine translation model based on neural networks will be developed. Consequently, bilingual dictionaries will be involved, after cleaning and removing non-alphanumeric texts using linguistic modification tasks for the proposed machine translation model. Therefore, encoder and decoder models are involved for such machine translation. Finally, the training model is used to inference on new input to translate and therefore, the testing phase of the proposed machine translation model will be evaluated.

除了语言系统外，机器翻译还可以涉及基于统计、基于语料库或基于数据集的机器翻译系统。本文的目标是建立一个有质量的、持续改进的、可灵活扩展的多语种其他语言对的英阿双语翻译模型。此外，还创建了一个综合翻译环境，其中包括计算机辅助设施，以提高自动生成文本的质量，提高翻译人员的生产力并帮助他们提高专业能力。因此，将开发一种基于神经网络的机器翻译模型。因此，在使用提出的机器翻译模型的语言修改任务清理和删除非字母数字文本后，将涉及双语词典。因此，这种机器翻译涉及到编码器和解码器模型。最后，将训练模型用于对新输入的翻译进行推理，从而对所提出的机器翻译模型的测试阶段进行评估。

引用次数: 0

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 20th International Conference on Language Engineering (ESOLEC)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀