首页 > 最新文献

2022 20th International Conference on Language Engineering (ESOLEC)最新文献

英文 中文
A Novel Dataset for Known and Unknown Ancient Arabic Manuscripts 已知和未知古代阿拉伯手稿的新数据集
Pub Date : 2022-10-12 DOI: 10.1109/ESOLEC54569.2022.10009168
Lutfieh S. Al-homed, K. M. Jambi, Hassanin M. Al-Barhamtoshy
This paper presents a new dataset of Ancient Arabic-Islamic Manuscripts to detect unknown manuscripts and classify them from the known manuscripts. Unknown Manuscripts are identified as those that have been affected badly by human or natural forces, such as humidity, temperature, and air pollution, which degraded their quality and missed their identification information, such as the title, author, and date of the manuscripts. Thus, The Known Manuscripts are characterized by having a known title, author, etc. Recognizing the unknown manuscripts is essential to further the analysis process, facilitate information extraction from such degraded manuscripts, enable their indexing, and make them easily accessed and retrieved. The objectives of the constructed dataset are as follows: 1) Collect a set of known and unknown manuscripts of similar forms and highlight the characteristics of the unknown manuscripts. 2) Promote the automatic detection and recognition of unknown manuscripts. 3) Formulate the problem of recognizing unknown manuscripts as a supervised machine-learning problem, and boost this recognition with the advances in machine learning and deep learning techniques. A total of 108 manuscripts were collected, distributed equally by the known and unknown categories. The preliminary results for classifying and recognizing unknown manuscripts showed that using a decision tree classifier achieved an accuracy of 88% in classifying unknown manuscripts.
本文提出了一种新的古阿拉伯-伊斯兰手稿数据集,用于检测未知手稿并将其与已知手稿进行分类。未知手稿被认为是那些受到人为或自然力量(如湿度、温度和空气污染)严重影响的手稿,这些影响降低了它们的质量,并且丢失了它们的识别信息,如手稿的标题、作者和日期。因此,已知手稿的特点是有一个已知的标题,作者等。识别未知手稿是进一步分析过程的必要条件,有助于从这些退化的手稿中提取信息,使其能够被索引,并使其易于访问和检索。构建数据集的目标是:1)收集一组形式相似的已知和未知手稿,并突出未知手稿的特征。2)推进未知稿件自动检测与识别。3)将未知手稿的识别问题制定为一个监督机器学习问题,并通过机器学习和深度学习技术的进步来促进这种识别。共收集到108份手稿,按已知和未知类别平均分配。对未知稿件进行分类识别的初步结果表明,决策树分类器对未知稿件的分类准确率达到88%。
{"title":"A Novel Dataset for Known and Unknown Ancient Arabic Manuscripts","authors":"Lutfieh S. Al-homed, K. M. Jambi, Hassanin M. Al-Barhamtoshy","doi":"10.1109/ESOLEC54569.2022.10009168","DOIUrl":"https://doi.org/10.1109/ESOLEC54569.2022.10009168","url":null,"abstract":"This paper presents a new dataset of Ancient Arabic-Islamic Manuscripts to detect unknown manuscripts and classify them from the known manuscripts. Unknown Manuscripts are identified as those that have been affected badly by human or natural forces, such as humidity, temperature, and air pollution, which degraded their quality and missed their identification information, such as the title, author, and date of the manuscripts. Thus, The Known Manuscripts are characterized by having a known title, author, etc. Recognizing the unknown manuscripts is essential to further the analysis process, facilitate information extraction from such degraded manuscripts, enable their indexing, and make them easily accessed and retrieved. The objectives of the constructed dataset are as follows: 1) Collect a set of known and unknown manuscripts of similar forms and highlight the characteristics of the unknown manuscripts. 2) Promote the automatic detection and recognition of unknown manuscripts. 3) Formulate the problem of recognizing unknown manuscripts as a supervised machine-learning problem, and boost this recognition with the advances in machine learning and deep learning techniques. A total of 108 manuscripts were collected, distributed equally by the known and unknown categories. The preliminary results for classifying and recognizing unknown manuscripts showed that using a decision tree classifier achieved an accuracy of 88% in classifying unknown manuscripts.","PeriodicalId":179850,"journal":{"name":"2022 20th International Conference on Language Engineering (ESOLEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113998902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Contrastive Analysis of Color Representations Using Semantic Corpus Annotation “POS Tagging”:The Holy Quran- A Case Study 语义语料库标注“词性标注”的颜色表示对比分析——以《古兰经》为例
Pub Date : 2022-10-12 DOI: 10.1109/ESOLEC54569.2022.10009594
Ahmed H. Kassem, S. Alansary
This paper addresses the challenging task of identifying semantic features in the Quran from a corpus-based as well as computational perspective, namely color identification. The study attempts to identify, locate, and demonstrate the frequencies, occurrences, and concordances of the colors in the Quran using AntConc and The Simple Corpus Tool, the results are compared to earlier manual work and the information available at corpus.quran.com, a University of Leeds's Corpus work on the Holy Quran. The research undertakes the task of semantically annotating lexical items related to colors as well as examining them in concordance and corpus software tools. The results are compared with special attention to the colors' co-occurrences in an endeavor to better understand the connotations of colors in the Quran. The paper identifies a gap in the Leeds's Corpus work on the Quran and recommends filling the gap with the work entailed in the study.
本文从基于语料库和计算的角度解决了古兰经语义特征识别的挑战性任务,即颜色识别。该研究试图使用AntConc和The Simple Corpus Tool来识别、定位和展示古兰经中颜色的频率、出现频率和一致性,并将结果与早期的手工工作和corpus.quran.com上的信息进行比较,corpus.quran.com是利兹大学关于神圣古兰经的语料库工作。本研究的任务是对与颜色相关的词汇项目进行语义标注,并在语料库软件工具中对其进行检查。为了更好地理解《古兰经》中颜色的内涵,将这些结果与特别关注颜色的共现进行了比较。这篇论文指出了利兹的《古兰经》语料库工作中的一个空白,并建议用这项研究所涉及的工作来填补这一空白。
{"title":"Contrastive Analysis of Color Representations Using Semantic Corpus Annotation “POS Tagging”:The Holy Quran- A Case Study","authors":"Ahmed H. Kassem, S. Alansary","doi":"10.1109/ESOLEC54569.2022.10009594","DOIUrl":"https://doi.org/10.1109/ESOLEC54569.2022.10009594","url":null,"abstract":"This paper addresses the challenging task of identifying semantic features in the Quran from a corpus-based as well as computational perspective, namely color identification. The study attempts to identify, locate, and demonstrate the frequencies, occurrences, and concordances of the colors in the Quran using AntConc and The Simple Corpus Tool, the results are compared to earlier manual work and the information available at corpus.quran.com, a University of Leeds's Corpus work on the Holy Quran. The research undertakes the task of semantically annotating lexical items related to colors as well as examining them in concordance and corpus software tools. The results are compared with special attention to the colors' co-occurrences in an endeavor to better understand the connotations of colors in the Quran. The paper identifies a gap in the Leeds's Corpus work on the Quran and recommends filling the gap with the work entailed in the study.","PeriodicalId":179850,"journal":{"name":"2022 20th International Conference on Language Engineering (ESOLEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128085952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sentiments and Cognition Interdependence: An Exploratory Study of Sentiment Analysis and Image Schema 情感与认知的相互依存:情感分析与意象图式的探索性研究
Pub Date : 2022-10-12 DOI: 10.1109/ESOLEC54569.2022.10009506
Mai Magdy M. Sleim
Cognition is mostly seen as the motivation towards certain emotive decisions and/or actions. Thus, it is only restricted to evaluation and appraisal in the studies of emotions. However, the long-term emotions, i.e. sentiments, are hardly represented in relation to cognitive aspects. The aim of this study is to provide an in-depth understanding of cognition and affection through an adaptation of Johnson's (1987) and Kimmel's (2005) image schema theory, and Plutchik's (1980, 1988) sentiment analysis in quotes of Goodreads. The data selected for this study consists of three themes of Goodreads: life, death, and inspiration. From each theme, six hundred quotes were selected and analyzed. The study focuses on the correlation between sentiments and image schemas detecting aspects of control and/or responsibility of the self and the other. Therefore, the third tool of analysis is used to identify sentiments, i.e. lexico-syntactic analysis which assists in the identification of the self/other control and responsibility through the agent/doer, experiencer knowledge. A mixed methodology, incorporating both qualitative and quantitative analyses, is adopted highlighting relationships between variables. The analysis is highlighted through the framework of Johnson's image schema theory and Plutchik's theories (Theory of Emotions and Theory of Cognition-Emotion Relations). The statistical analyses show that there is a significant relationship between cognition and emotions except with LOCOMOTION and SPACE. This might be due to the nature of those schemas where there are several subschemas in addition to their linear nature. This study adds texture to human knowledge as it detects different aspects of human experiences. Furthermore, the amount of manual data analyzed contributes to the fields of cognitive semantics and psychology, especially, image schema modeling.
认知通常被视为某些情绪决定和/或行动的动机。因此,在情绪研究中仅局限于评价和评价。然而,长期的情绪,即情绪,几乎没有与认知方面的关系。本研究的目的是通过改编Johnson(1987)和Kimmel(2005)的意象图式理论,以及Plutchik(1980、1988)对Goodreads引文的情感分析,提供对认知和情感的深入理解。本研究选择的数据包括Goodreads的三个主题:生命、死亡和灵感。从每个主题中选取600句语录进行分析。该研究的重点是情绪和图像图式之间的相关性,以检测自我和他人的控制和/或责任方面。因此,第三种分析工具被用来识别情绪,即词汇-句法分析,它有助于通过代理人/实施者、体验者知识识别自我/他人控制和责任。采用混合方法,结合定性和定量分析,突出变量之间的关系。在约翰逊的意象图式理论和普鲁契克的情感理论和认知-情感关系理论的框架下进行分析。统计分析表明,除运动和空间外,认知与情绪之间存在显著的相关关系。这可能是由于这些模式的性质,其中除了线性性质之外还有几个子模式。这项研究增加了人类知识的质感,因为它发现了人类经历的不同方面。此外,所分析的大量人工数据有助于认知语义学和心理学领域,特别是图像图式建模。
{"title":"Sentiments and Cognition Interdependence: An Exploratory Study of Sentiment Analysis and Image Schema","authors":"Mai Magdy M. Sleim","doi":"10.1109/ESOLEC54569.2022.10009506","DOIUrl":"https://doi.org/10.1109/ESOLEC54569.2022.10009506","url":null,"abstract":"Cognition is mostly seen as the motivation towards certain emotive decisions and/or actions. Thus, it is only restricted to evaluation and appraisal in the studies of emotions. However, the long-term emotions, i.e. sentiments, are hardly represented in relation to cognitive aspects. The aim of this study is to provide an in-depth understanding of cognition and affection through an adaptation of Johnson's (1987) and Kimmel's (2005) image schema theory, and Plutchik's (1980, 1988) sentiment analysis in quotes of Goodreads. The data selected for this study consists of three themes of Goodreads: life, death, and inspiration. From each theme, six hundred quotes were selected and analyzed. The study focuses on the correlation between sentiments and image schemas detecting aspects of control and/or responsibility of the self and the other. Therefore, the third tool of analysis is used to identify sentiments, i.e. lexico-syntactic analysis which assists in the identification of the self/other control and responsibility through the agent/doer, experiencer knowledge. A mixed methodology, incorporating both qualitative and quantitative analyses, is adopted highlighting relationships between variables. The analysis is highlighted through the framework of Johnson's image schema theory and Plutchik's theories (Theory of Emotions and Theory of Cognition-Emotion Relations). The statistical analyses show that there is a significant relationship between cognition and emotions except with LOCOMOTION and SPACE. This might be due to the nature of those schemas where there are several subschemas in addition to their linear nature. This study adds texture to human knowledge as it detects different aspects of human experiences. Furthermore, the amount of manual data analyzed contributes to the fields of cognitive semantics and psychology, especially, image schema modeling.","PeriodicalId":179850,"journal":{"name":"2022 20th International Conference on Language Engineering (ESOLEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125154223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Morphological Analysis of Egyptian Children Corpus by KIDEVAL Program 用kidval程序对埃及儿童语料库进行形态学分析
Pub Date : 2022-10-12 DOI: 10.1109/ESOLEC54569.2022.10009437
H. Salama, S. Alansary, Amany Elshazly
The aim of this study is to provide a morphological analysis of the Egyptian children corpus, which is a morphologically tagged and disambiguated in CHILDES. This allows the KIDEVAL program to be readily used on the corpus to address questions regarding the acquisition of Egyptian Arabic. KIDEVAL is one of the useful tools in CLAN program which has been particularly useful toolsets in the study of language acquisition in many languages. However, applications of corpus-based analyses to Egyptian children's language have not yet been conducted. This study describes how to use the KIDEVAL program for analyzing Egyptian children's language and study the development of word frequency patterns of parts of speech and order of development of grammatical morphemes in Egyptian Arabic. The output of morphological analysis enables researchers to study and answer many questions regarding the development of a grammatical morpheme in Egyptian Arabic, as well as a lot of questions that can readily be probed with KIDEVAL. The Egyptian Arabic corpus is downloaded from the Arabic part of the CHILDES database. It comprises 10transcripts from Egyptian-speaking children aged 1;7 to3;8 years, with a total of 25,645 words. The KIDEVAL program analysis profile for Egyptian Arabic children's corpus in this study reveals extensive and valuable analysis, displaying the number of occurrences of each part of speech for each child depends on his age which includes 54 categories and subcategories. The usage of the KIDEVAL tool is efficient because it reduces the time needed to label the corpus manually.
本研究的目的是提供一个形态学分析的埃及儿童语料库,这是一个形态标记和消除歧义在CHILDES。这使得KIDEVAL程序可以很容易地在语料库上使用,以解决有关埃及阿拉伯语获取的问题。KIDEVAL是CLAN程序中非常有用的工具之一,在许多语言的语言习得研究中一直是非常有用的工具集。然而,基于语料库的分析尚未应用于埃及儿童语言。本研究描述了如何使用KIDEVAL程序分析埃及儿童的语言,研究埃及阿拉伯语词性词频模式的发展和语法语素的发展顺序。形态分析的输出使研究人员能够研究和回答关于埃及阿拉伯语语法语素发展的许多问题,以及许多可以很容易地用KIDEVAL探索的问题。埃及阿拉伯语语料库是从CHILDES数据库的阿拉伯语部分下载的。它包括10份来自说埃及语的1、7到3、8岁儿童的成绩单,共计25,645个单词。本研究中埃及阿拉伯语儿童语料库的KIDEVAL程序分析概况揭示了广泛而有价值的分析,显示了每个儿童的每个词性的出现次数取决于他的年龄,其中包括54个类别和子类别。使用KIDEVAL工具是有效的,因为它减少了手动标记语料库所需的时间。
{"title":"Morphological Analysis of Egyptian Children Corpus by KIDEVAL Program","authors":"H. Salama, S. Alansary, Amany Elshazly","doi":"10.1109/ESOLEC54569.2022.10009437","DOIUrl":"https://doi.org/10.1109/ESOLEC54569.2022.10009437","url":null,"abstract":"The aim of this study is to provide a morphological analysis of the Egyptian children corpus, which is a morphologically tagged and disambiguated in CHILDES. This allows the KIDEVAL program to be readily used on the corpus to address questions regarding the acquisition of Egyptian Arabic. KIDEVAL is one of the useful tools in CLAN program which has been particularly useful toolsets in the study of language acquisition in many languages. However, applications of corpus-based analyses to Egyptian children's language have not yet been conducted. This study describes how to use the KIDEVAL program for analyzing Egyptian children's language and study the development of word frequency patterns of parts of speech and order of development of grammatical morphemes in Egyptian Arabic. The output of morphological analysis enables researchers to study and answer many questions regarding the development of a grammatical morpheme in Egyptian Arabic, as well as a lot of questions that can readily be probed with KIDEVAL. The Egyptian Arabic corpus is downloaded from the Arabic part of the CHILDES database. It comprises 10transcripts from Egyptian-speaking children aged 1;7 to3;8 years, with a total of 25,645 words. The KIDEVAL program analysis profile for Egyptian Arabic children's corpus in this study reveals extensive and valuable analysis, displaying the number of occurrences of each part of speech for each child depends on his age which includes 54 categories and subcategories. The usage of the KIDEVAL tool is efficient because it reduces the time needed to label the corpus manually.","PeriodicalId":179850,"journal":{"name":"2022 20th International Conference on Language Engineering (ESOLEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124747067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic POS tagging of Arabic words using the YAMCHA machine learning tool 使用YAMCHA机器学习工具的阿拉伯词自动POS标记
Pub Date : 2022-10-12 DOI: 10.1109/ESOLEC54569.2022.10009473
Alaa Elnily, Ahmed Abdelghany
The process of automatically giving the proper POS tag to each word in a text based on context is known as automatic POS tagging. The majority of NLP applications require this process as a crucial step. This study intends to propose a machine learning-based Arabic POS tagger. YAMCHA tool is the machine learning system employed in this study. YAMCHA utilizes Support Vector Machines as a machine learning algorithm. SVM classifies data with high accuracy because it makes use of part of data in training process. As a result, in order to train the system, a substantial amount of annotated data must be evaluated at the POS level. A corpus of 100,039 words is utilized in this study. It was divided into training and testing parts, totaling 64,608 and 35,431 words, respectively. A tag set of 48 morphological tags were used in training and testing. To reach the best result in the automatic POS tagging, the system was trained multiple times with changing the range of linguistic information used in training process, and then new texts were tested and evaluated. The least error rate achieved was 11.4%. This rate was reached when the preceding word of the target one was considered in the training process without considering its POS tag (F: −1‥0: 0‥).
根据上下文为文本中的每个单词自动提供适当的词性标记的过程称为自动词性标记。大多数NLP应用程序都需要这个过程作为关键步骤。本研究拟提出一种基于机器学习的阿拉伯语POS标注器。YAMCHA工具是本研究使用的机器学习系统。YAMCHA利用支持向量机作为机器学习算法。支持向量机由于利用了训练过程中的部分数据,对数据的分类精度较高。因此,为了训练系统,必须在POS级别评估大量带注释的数据。本研究使用的语料库为100,039个单词。分为训练部分和测试部分,分别有64608和35431个单词。使用48个形态学标签集进行训练和测试。为了达到最佳的自动词性标注效果,系统在训练过程中通过改变语言信息的范围进行多次训练,然后对新文本进行测试和评价。最低错误率为11.4%。当在训练过程中考虑目标单词的前一个单词而不考虑它的POS标签时,达到了这个速率(F:−1‥0:0‥)。
{"title":"Automatic POS tagging of Arabic words using the YAMCHA machine learning tool","authors":"Alaa Elnily, Ahmed Abdelghany","doi":"10.1109/ESOLEC54569.2022.10009473","DOIUrl":"https://doi.org/10.1109/ESOLEC54569.2022.10009473","url":null,"abstract":"The process of automatically giving the proper POS tag to each word in a text based on context is known as automatic POS tagging. The majority of NLP applications require this process as a crucial step. This study intends to propose a machine learning-based Arabic POS tagger. YAMCHA tool is the machine learning system employed in this study. YAMCHA utilizes Support Vector Machines as a machine learning algorithm. SVM classifies data with high accuracy because it makes use of part of data in training process. As a result, in order to train the system, a substantial amount of annotated data must be evaluated at the POS level. A corpus of 100,039 words is utilized in this study. It was divided into training and testing parts, totaling 64,608 and 35,431 words, respectively. A tag set of 48 morphological tags were used in training and testing. To reach the best result in the automatic POS tagging, the system was trained multiple times with changing the range of linguistic information used in training process, and then new texts were tested and evaluated. The least error rate achieved was 11.4%. This rate was reached when the preceding word of the target one was considered in the training process without considering its POS tag (F: −1‥0: 0‥).","PeriodicalId":179850,"journal":{"name":"2022 20th International Conference on Language Engineering (ESOLEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124616315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Revised Survey of Paraphrasing Generation Approaches and Tools for Arabic 阿拉伯语释义生成方法和工具的修订调查
Pub Date : 2022-10-12 DOI: 10.1109/ESOLEC54569.2022.10009462
Mahinaz Hegazy, S. Alansary
Due to the technological advancement and progress in NLP and text editing tools, there is an increasing demand for the paraphrasing practice. This demand has motivated researchers because numerous NLP applications are associated with it, including information retrieval, query answering, essay authenticity, text summarization, etc. This paper is a survey of several computational approaches for paraphrasing generation since the task of generating or identifying semantic equivalence for different linguistic elements is an essential part of NLP. It surveys a revised account including the most recent approaches to paraphrase generation up to Universal Networking Language Systems. The research specifically examines paraphrasing generation for Arabic using transformational rules. This is achieved by testing a paraphrasing on-line tool for Arabic and attempting an analysis and evaluation of its paraphrasing practice. The free online selected tool is Paraphrase Tool.com which is used to paraphrase more than one hundred languages, including Arabic. The resulting output is evaluated using BLEU to determine the accuracy of the rendered paraphrase and its semantic equivalence to the original source text.
由于自然语言处理和文本编辑工具的技术进步和进步,人们对意译练习的需求越来越大。这一需求激发了研究人员的积极性,因为许多NLP应用都与之相关,包括信息检索、查询回答、论文真实性、文本摘要等。本文综述了几种用于释义生成的计算方法,因为生成或识别不同语言元素的语义等价是自然语言处理的重要组成部分。它调查了一个修订的帐户,包括最新的方法来解释生成通用网络语言系统。该研究特别考察了使用转换规则的阿拉伯语释义生成。这是通过测试一个阿拉伯语释义在线工具,并尝试分析和评估其释义实践来实现的。免费的在线选择工具是释义Tool.com,用于释义一百多种语言,包括阿拉伯语。使用BLEU评估结果输出,以确定所呈现的释义的准确性及其与原始源文本的语义等效性。
{"title":"A Revised Survey of Paraphrasing Generation Approaches and Tools for Arabic","authors":"Mahinaz Hegazy, S. Alansary","doi":"10.1109/ESOLEC54569.2022.10009462","DOIUrl":"https://doi.org/10.1109/ESOLEC54569.2022.10009462","url":null,"abstract":"Due to the technological advancement and progress in NLP and text editing tools, there is an increasing demand for the paraphrasing practice. This demand has motivated researchers because numerous NLP applications are associated with it, including information retrieval, query answering, essay authenticity, text summarization, etc. This paper is a survey of several computational approaches for paraphrasing generation since the task of generating or identifying semantic equivalence for different linguistic elements is an essential part of NLP. It surveys a revised account including the most recent approaches to paraphrase generation up to Universal Networking Language Systems. The research specifically examines paraphrasing generation for Arabic using transformational rules. This is achieved by testing a paraphrasing on-line tool for Arabic and attempting an analysis and evaluation of its paraphrasing practice. The free online selected tool is Paraphrase Tool.com which is used to paraphrase more than one hundred languages, including Arabic. The resulting output is evaluated using BLEU to determine the accuracy of the rendered paraphrase and its semantic equivalence to the original source text.","PeriodicalId":179850,"journal":{"name":"2022 20th International Conference on Language Engineering (ESOLEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115901173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sentiment Analysis For Arabic Low Resource Data Using BERT-CNN 基于BERT-CNN的阿拉伯语低资源数据情感分析
Pub Date : 2022-10-12 DOI: 10.1109/ESOLEC54569.2022.10009633
Mohamed Fawzy, M. Fakhr, M. A. Rizka
Users share opinions and discussions on the internet through social media platforms. Nowadays, a significant number of internet users speak the Arabic language. They tend to express their opinions using different dialects. Therefore, understanding people's opinions and emotions become an urgent matter. The Arabic sentiment analysis is challenging because of linguistic complexity, data availability, and data quality, and it has multiple dialects. Therefore, research for low resources sentiment analysis became necessary. This study proposes a Bidirectional Encoder Representations from Transformers (BERT) that uses Convolutional Neural Network (CNN) as a classification head for Arabic low data resources for sentiment analysis. The classification head includes the CNN layer, drop-out layer, and a Relu activation function. The proposed approach experimented on three datasets collected from Twitter containing different dialects. The last four BERT layers were fined-tuned and while other layers were frozen. The suggested model outperforms current state-of-the-art models' accuracy with 50% fewer batch size, fewer training layers, and ∼20% fewer epochs.
用户通过社交媒体平台在互联网上分享意见和讨论。如今,相当多的互联网用户说阿拉伯语。他们倾向于用不同的方言表达自己的观点。因此,了解人们的意见和情绪成为一件紧迫的事情。由于语言复杂性、数据可用性和数据质量,阿拉伯语情感分析具有挑战性,而且它有多种方言。因此,对低资源情绪分析的研究变得十分必要。本研究提出了一种使用卷积神经网络(CNN)作为阿拉伯语低数据资源的分类头,用于情感分析的双向编码器表示(BERT)。分类头包括CNN层、drop-out层和Relu激活函数。提出的方法在三个从Twitter收集的包含不同方言的数据集上进行了实验。最后四个BERT层被微调,而其他层被冻结。建议的模型优于当前最先进的模型的准确性,批大小减少了50%,训练层减少了50%,epoch减少了20%。
{"title":"Sentiment Analysis For Arabic Low Resource Data Using BERT-CNN","authors":"Mohamed Fawzy, M. Fakhr, M. A. Rizka","doi":"10.1109/ESOLEC54569.2022.10009633","DOIUrl":"https://doi.org/10.1109/ESOLEC54569.2022.10009633","url":null,"abstract":"Users share opinions and discussions on the internet through social media platforms. Nowadays, a significant number of internet users speak the Arabic language. They tend to express their opinions using different dialects. Therefore, understanding people's opinions and emotions become an urgent matter. The Arabic sentiment analysis is challenging because of linguistic complexity, data availability, and data quality, and it has multiple dialects. Therefore, research for low resources sentiment analysis became necessary. This study proposes a Bidirectional Encoder Representations from Transformers (BERT) that uses Convolutional Neural Network (CNN) as a classification head for Arabic low data resources for sentiment analysis. The classification head includes the CNN layer, drop-out layer, and a Relu activation function. The proposed approach experimented on three datasets collected from Twitter containing different dialects. The last four BERT layers were fined-tuned and while other layers were frozen. The suggested model outperforms current state-of-the-art models' accuracy with 50% fewer batch size, fewer training layers, and ∼20% fewer epochs.","PeriodicalId":179850,"journal":{"name":"2022 20th International Conference on Language Engineering (ESOLEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131198218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Arabic Machine Translation (ArMT) based on LSTM with Attention Mechanism Architecture 基于LSTM的阿拉伯语机器翻译(ArMT)
Pub Date : 2022-10-12 DOI: 10.1109/ESOLEC54569.2022.10009530
Dalal Abdullah Aljohany, Hassanin M. Al-Barhamtoshy, Felwa A. Abukhodair
As Arabic is considered a low-resource and a rich morphology language. As result, Arabic is considered one of the most challenging languages in Machine Translation (MT). While numerous translation research concentrated on Indo-European languages, much less was made in Arabic. Therefore, the quality of Arabic Machine Translation (ArMT) continues to require improvement. Neural Machine Translation (NMT) is now the state-of-the-art in MT approaches. In this paper, we propose a model for two-way translation between the Arabic and English languages. The proposed model based on NMT and use the Long Short-Term Memory (LSTM) encoder-decoder model with attention mechanism. In the basic encoder–decoder performance is linked to the size of the input sentence, such that as the latter increases, performance diminishes swiftly. Attention mechanisms (AMs) are used to overcome this issue. The proposed model by combining LSTM and attention mechanism is capable to improve accuracy result of translation. The experimental results show that this proposed model improves accuracy of translation and reduces the loss.
阿拉伯语被认为是一种资源少而形态丰富的语言。因此,阿拉伯语被认为是机器翻译(MT)中最具挑战性的语言之一。虽然大量的翻译研究集中在印欧语言上,但对阿拉伯语的研究却少得多。因此,阿拉伯语机器翻译(ArMT)的质量需要不断提高。神经机器翻译(NMT)是目前最先进的机器翻译方法。本文提出了一种阿拉伯语和英语双向翻译模型。该模型以神经网络机器学习为基础,采用具有注意机制的长短期记忆(LSTM)编码器-解码器模型。在基本的编码器-解码器中,性能与输入句子的大小有关,因此,当后者增加时,性能会迅速下降。注意机制(AMs)被用来克服这个问题。该模型将LSTM与注意机制相结合,能够提高翻译的准确率。实验结果表明,该模型提高了翻译精度,减少了翻译损失。
{"title":"Arabic Machine Translation (ArMT) based on LSTM with Attention Mechanism Architecture","authors":"Dalal Abdullah Aljohany, Hassanin M. Al-Barhamtoshy, Felwa A. Abukhodair","doi":"10.1109/ESOLEC54569.2022.10009530","DOIUrl":"https://doi.org/10.1109/ESOLEC54569.2022.10009530","url":null,"abstract":"As Arabic is considered a low-resource and a rich morphology language. As result, Arabic is considered one of the most challenging languages in Machine Translation (MT). While numerous translation research concentrated on Indo-European languages, much less was made in Arabic. Therefore, the quality of Arabic Machine Translation (ArMT) continues to require improvement. Neural Machine Translation (NMT) is now the state-of-the-art in MT approaches. In this paper, we propose a model for two-way translation between the Arabic and English languages. The proposed model based on NMT and use the Long Short-Term Memory (LSTM) encoder-decoder model with attention mechanism. In the basic encoder–decoder performance is linked to the size of the input sentence, such that as the latter increases, performance diminishes swiftly. Attention mechanisms (AMs) are used to overcome this issue. The proposed model by combining LSTM and attention mechanism is capable to improve accuracy result of translation. The experimental results show that this proposed model improves accuracy of translation and reduces the loss.","PeriodicalId":179850,"journal":{"name":"2022 20th International Conference on Language Engineering (ESOLEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121797645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Dynamic Modeling and Identification of the COVID-19 Stochastic Dispersion COVID-19随机离散度的动态建模与辨识
Pub Date : 2022-10-12 DOI: 10.1109/ESOLEC54569.2022.10009467
M. Taher, M. Hedaya, B. Bakeer, Passant El Kafrawy, Mahmoud Zakaria
In this work, the stochastic dispersion of novel coronavirus disease 2019 (COVID-19) at the borders between France and Italy has been considered using a multi-input multi-output stochastic model. The physical effects of wind, temperature and altitude have been investigated as these factors and physical relationships are stochastic in nature. Stochastic terms have also been included to take into account the turbulence effect, and the random nature of the above physical parameters considered. Then, a method is proposed to identify the developed model's order and parameters. The actual data has been used in the identification and prediction process as a reference. These data have been divided into two parts: the first part is used to calculate the stochastic parameters of the model which are used to predict the COVID-19 level, while the second part is used as a check data. The predicted results are in good agreement with the check data.
在这项工作中,使用多输入多输出随机模型考虑了2019年新型冠状病毒病(COVID-19)在法国和意大利边境的随机扩散。研究了风、温度和海拔的物理效应,因为这些因素和物理关系在本质上是随机的。还包括随机项,以考虑湍流效应,并考虑上述物理参数的随机性。然后,提出了一种识别已开发模型阶数和参数的方法。实际数据已用于识别和预测过程中作为参考。这些数据分为两部分,第一部分用于计算模型的随机参数,用于预测COVID-19水平,第二部分作为检验数据。预测结果与实测数据吻合较好。
{"title":"Dynamic Modeling and Identification of the COVID-19 Stochastic Dispersion","authors":"M. Taher, M. Hedaya, B. Bakeer, Passant El Kafrawy, Mahmoud Zakaria","doi":"10.1109/ESOLEC54569.2022.10009467","DOIUrl":"https://doi.org/10.1109/ESOLEC54569.2022.10009467","url":null,"abstract":"In this work, the stochastic dispersion of novel coronavirus disease 2019 (COVID-19) at the borders between France and Italy has been considered using a multi-input multi-output stochastic model. The physical effects of wind, temperature and altitude have been investigated as these factors and physical relationships are stochastic in nature. Stochastic terms have also been included to take into account the turbulence effect, and the random nature of the above physical parameters considered. Then, a method is proposed to identify the developed model's order and parameters. The actual data has been used in the identification and prediction process as a reference. These data have been divided into two parts: the first part is used to calculate the stochastic parameters of the model which are used to predict the COVID-19 level, while the second part is used as a check data. The predicted results are in good agreement with the check data.","PeriodicalId":179850,"journal":{"name":"2022 20th International Conference on Language Engineering (ESOLEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121957990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learner Corpus in Teaching Greek to Arabic Natives: A Computational Linguistic Study of the Play Oedipus the King of Sophocles 向阿拉伯人教授希腊语的学习者语料库:索福克勒斯国王俄狄浦斯戏剧的计算语言学研究
Pub Date : 2022-10-12 DOI: 10.1109/ESOLEC54569.2022.10009068
Fatma G. Rizk
Interdisciplinary studies have become a must; they are deep studies to link the exact specialization with other sciences so that the two fields will benefit the most. And here are the different types of corpora, one of the forms of inter-studies, through which the researcher tries to link between Greek and computer studies, and how they can be applied. Therefore, this research was employed to apply the use of the learner corpus as a work with a specific methodology planned with clear objectives in computer form as an application to university students. Why? To detect linguistic errors on the computer when translating Greek literary texts into Arabic, his results in the translation of their texts into a correct Arabic translation, in addition to defining this type of corpus and their most important characteristics and how much they benefit from Arab students studying the ancient Greek language. Indeed, this research came as an applied study on students of the sixth semester of Greek and Latin texts at the Faculty of Archeology - Fayoum University, where the student depends on the Greek source through (TLG) in addition to the use of the Perseus website, with the provision of grammatical books, dictionaries, to enable the teacher to identify the most important strengths and weaknesses that the student may face during the study, translate these texts, and store them inside the corpus. The study came as an application to the play Oedipus belonging to the Greek poet Sophocles.
跨学科研究已成为必须;它们是将确切的专业化与其他科学联系起来的深入研究,以便这两个领域受益最大。这里是不同类型的语料库,这是交叉研究的一种形式,研究者试图通过它将希腊语和计算机研究联系起来,以及如何应用它们。因此,本研究旨在将学习者语料库的使用作为一项工作,以计算机形式计划了具有明确目标的具体方法,并将其应用于大学生。为什么?为了检测计算机在将希腊文学文本翻译成阿拉伯语时的语言错误,他将这些文本翻译成正确的阿拉伯语翻译,并定义了这种类型的语料库,以及它们最重要的特征,以及它们从学习古希腊语言的阿拉伯学生中受益多少。事实上,这项研究是作为一个应用研究的第六学期希腊语和拉丁语的学生文本考古学的教师——法尤姆省的大学,那里的学生依赖于希腊源通过(TLG)除了使用珀尔修斯的网站,提供的语法书,字典,使老师能够识别最重要的优点和缺点,学生可能面临在研究过程中,翻译这些文本,并将它们存储在语料库。这项研究是对希腊诗人索福克勒斯的戏剧《俄狄浦斯》的应用。
{"title":"Learner Corpus in Teaching Greek to Arabic Natives: A Computational Linguistic Study of the Play Oedipus the King of Sophocles","authors":"Fatma G. Rizk","doi":"10.1109/ESOLEC54569.2022.10009068","DOIUrl":"https://doi.org/10.1109/ESOLEC54569.2022.10009068","url":null,"abstract":"Interdisciplinary studies have become a must; they are deep studies to link the exact specialization with other sciences so that the two fields will benefit the most. And here are the different types of corpora, one of the forms of inter-studies, through which the researcher tries to link between Greek and computer studies, and how they can be applied. Therefore, this research was employed to apply the use of the learner corpus as a work with a specific methodology planned with clear objectives in computer form as an application to university students. Why? To detect linguistic errors on the computer when translating Greek literary texts into Arabic, his results in the translation of their texts into a correct Arabic translation, in addition to defining this type of corpus and their most important characteristics and how much they benefit from Arab students studying the ancient Greek language. Indeed, this research came as an applied study on students of the sixth semester of Greek and Latin texts at the Faculty of Archeology - Fayoum University, where the student depends on the Greek source through (TLG) in addition to the use of the Perseus website, with the provision of grammatical books, dictionaries, to enable the teacher to identify the most important strengths and weaknesses that the student may face during the study, translate these texts, and store them inside the corpus. The study came as an application to the play Oedipus belonging to the Greek poet Sophocles.","PeriodicalId":179850,"journal":{"name":"2022 20th International Conference on Language Engineering (ESOLEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126762993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 20th International Conference on Language Engineering (ESOLEC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1