首页 > 最新文献

2016 7th International Conference on Computer Science and Information Technology (CSIT)最新文献

英文 中文
An ontology for Juz' Amma based on expert knowledge 基于专家知识的Juz' Amma本体
Noor Siti Husnah Ab Rahim Periamalai, A. Mustapha, Ahmad Alqurneh
This paper reports the development of an ontology for Juz' Amma in the Quran manuscript that is designed based on the contextual information support sourced from expert knowledge. The ontology development adopts an existing methology called the Methontology that covers steps from identifying motivation scenarios, formulating the competency questions, development, and evaluation. The ontology was evaluated based on the competency questions determined at the beginning of the development life cycle and the results were promising. The developed ontology is hoped to serve as the domain knowledge for other applications such as the question-answering, dialogue or expert systems.
本文报道了基于专家知识的上下文信息支持,为古兰经手稿中Juz' Amma本体的开发。本体论的开发采用了一种叫做方法论的现有方法,它涵盖了从识别动机场景、制定能力问题、开发和评估的步骤。基于在开发生命周期开始时确定的能力问题对本体进行了评估,结果很有希望。所开发的本体有望成为问答、对话或专家系统等其他应用的领域知识。
{"title":"An ontology for Juz' Amma based on expert knowledge","authors":"Noor Siti Husnah Ab Rahim Periamalai, A. Mustapha, Ahmad Alqurneh","doi":"10.1109/CSIT.2016.7549480","DOIUrl":"https://doi.org/10.1109/CSIT.2016.7549480","url":null,"abstract":"This paper reports the development of an ontology for Juz' Amma in the Quran manuscript that is designed based on the contextual information support sourced from expert knowledge. The ontology development adopts an existing methology called the Methontology that covers steps from identifying motivation scenarios, formulating the competency questions, development, and evaluation. The ontology was evaluated based on the competency questions determined at the beginning of the development life cycle and the results were promising. The developed ontology is hoped to serve as the domain knowledge for other applications such as the question-answering, dialogue or expert systems.","PeriodicalId":210905,"journal":{"name":"2016 7th International Conference on Computer Science and Information Technology (CSIT)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127861683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Arabic OCR evaluation tool 阿拉伯语OCR评价工具
Mansoor Alghamdi, Ibrahim Alkhazi, W. Teahan
Performance evaluation of Optical Character Recognition (OCR) systems is an essential task for OCR systems development. However, studies in Arabic OCR suffer from the lack of proper performance evaluation metrics and the availability of evaluation tools. Although the literature provides typical performance metrics, such as character accuracy and word accuracy for OCR performance evaluation, these metrics are not sufficient for evaluating Arabic OCR. This paper presents an open source automated software tool with various metrics for the evaluation of Arabic OCR performance. The developed tool is available for OCR researchers, thus it can be applied for ranking different OCR algorithms.
光学字符识别(OCR)系统的性能评估是OCR系统开发的重要环节。然而,阿拉伯语OCR的研究缺乏适当的绩效评价指标和评价工具的可用性。虽然文献提供了典型的性能指标,如字符准确性和单词准确性用于OCR性能评估,但这些指标不足以评估阿拉伯语OCR。本文提出了一个开源的自动化软件工具,其中包含了用于评估阿拉伯语OCR性能的各种度量。所开发的工具可供OCR研究人员使用,可用于对不同的OCR算法进行排序。
{"title":"Arabic OCR evaluation tool","authors":"Mansoor Alghamdi, Ibrahim Alkhazi, W. Teahan","doi":"10.1109/CSIT.2016.7549460","DOIUrl":"https://doi.org/10.1109/CSIT.2016.7549460","url":null,"abstract":"Performance evaluation of Optical Character Recognition (OCR) systems is an essential task for OCR systems development. However, studies in Arabic OCR suffer from the lack of proper performance evaluation metrics and the availability of evaluation tools. Although the literature provides typical performance metrics, such as character accuracy and word accuracy for OCR performance evaluation, these metrics are not sufficient for evaluating Arabic OCR. This paper presents an open source automated software tool with various metrics for the evaluation of Arabic OCR performance. The developed tool is available for OCR researchers, thus it can be applied for ranking different OCR algorithms.","PeriodicalId":210905,"journal":{"name":"2016 7th International Conference on Computer Science and Information Technology (CSIT)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115649199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Evaluating SentiStrength for Arabic Sentiment Analysis 评估阿拉伯语情感分析的SentiStrength
Abdullateef Rabab'ah, M. Al-Ayyoub, Y. Jararweh, M. Al-Kabi
Social networking websites are used today as platforms enabling their users to write down almost anything about everything. Social media users express their opinions and feelings about lots of events occurring in their daily lives. Lots of studies are conducted to study the sentiments presented by social media users regarding different topics. Sentiment Analysis (SA) is a new field that is concerned with measuring the sentiment presented in a given text. Due to their wide set of applications, several SA tools are available. Most of them are designed for English text. As for other languages such as Arabic, the case is different since only few tools are available. In fact, many of these tools were originally designed for English and were later adapted to deal with Arabic. SentiStrength is an example of tools that are successful for English and were later adapted to Arabic. However, the adaptation has been done in a crude manner and no deep studies are available to measure the effectiveness of such tools for Arabic text. In this paper, we perform a comprehensive evaluation of SentiStrength using 11 Arabic datasets consisting of tens of thousands of reviews/comments from different domains and in different dialects. We perform the evaluation in terms of positive and negative sentiments. The evaluation results show that overall SentiStrength achieves 62% accuracy, 83.7% precision, 64% recall (positive correct), 68% F1 measure and 55% negative correct.
如今,社交网站被用作一个平台,让用户可以写下几乎任何事情。社交媒体用户表达了他们对日常生活中发生的许多事件的看法和感受。很多研究都是为了研究社交媒体用户对不同话题的情绪表现。情感分析是一个新兴的研究领域,它关注的是对给定文本中所呈现的情感进行测量。由于具有广泛的应用程序集,因此有几种SA工具可用。它们中的大多数是为英语文本设计的。至于阿拉伯语等其他语言,情况就不同了,因为可用的工具很少。事实上,许多这些工具最初是为英语设计的,后来被用于处理阿拉伯语。SentiStrength是一个在英语中取得成功的工具,后来被用于阿拉伯语。然而,这种适应是粗糙的,没有深入的研究来衡量这些工具对阿拉伯语文本的有效性。在本文中,我们使用11个阿拉伯语数据集对SentiStrength进行了全面评估,这些数据集由来自不同领域和不同方言的数万条评论/评论组成。我们根据积极情绪和消极情绪来进行评估。评估结果表明,总体上SentiStrength达到62%的正确率,83.7%的精密度,64%的召回率(正正确率),68%的F1测量和55%的负正确率。
{"title":"Evaluating SentiStrength for Arabic Sentiment Analysis","authors":"Abdullateef Rabab'ah, M. Al-Ayyoub, Y. Jararweh, M. Al-Kabi","doi":"10.1109/CSIT.2016.7549458","DOIUrl":"https://doi.org/10.1109/CSIT.2016.7549458","url":null,"abstract":"Social networking websites are used today as platforms enabling their users to write down almost anything about everything. Social media users express their opinions and feelings about lots of events occurring in their daily lives. Lots of studies are conducted to study the sentiments presented by social media users regarding different topics. Sentiment Analysis (SA) is a new field that is concerned with measuring the sentiment presented in a given text. Due to their wide set of applications, several SA tools are available. Most of them are designed for English text. As for other languages such as Arabic, the case is different since only few tools are available. In fact, many of these tools were originally designed for English and were later adapted to deal with Arabic. SentiStrength is an example of tools that are successful for English and were later adapted to Arabic. However, the adaptation has been done in a crude manner and no deep studies are available to measure the effectiveness of such tools for Arabic text. In this paper, we perform a comprehensive evaluation of SentiStrength using 11 Arabic datasets consisting of tens of thousands of reviews/comments from different domains and in different dialects. We perform the evaluation in terms of positive and negative sentiments. The evaluation results show that overall SentiStrength achieves 62% accuracy, 83.7% precision, 64% recall (positive correct), 68% F1 measure and 55% negative correct.","PeriodicalId":210905,"journal":{"name":"2016 7th International Conference on Computer Science and Information Technology (CSIT)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120856931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Towards data driven decision support for financial institutions: Predicting small companies business volume in Switzerland 面向金融机构的数据驱动决策支持:预测瑞士小公司的业务量
Daniel Müller, Funk Te, Flavien Meyer, Irena Pletikosa Cvijikj
In Switzerland small and medium-sized enterprises represent more than 99% of all businesses. Therefore, prediction of their micro- and macroeconomic business development is of importance. In this paper, we propose a novel approach for predicting business volume using company characteristics and characteristics of the county the company operates in. We investigate which data sources can be combined to achieve this goal for small and midsized enterprises in Switzerland, building a model, irrespective of industry. We build our model based on the dataset obtained from an insurance company and combined the dataset with census data. We present two quantitative models, which allow to predict business volume in Swiss franks (CHF) and classify customers by size. Our results show that operational data from financial institutions (FI) customer relationship management (CRM) systems linked with census data are valuable to predict customer business volume.
在瑞士,中小企业占所有企业的99%以上。因此,预测其微观和宏观业务发展具有重要意义。在本文中,我们提出了一种利用公司特征和公司经营所在县的特征来预测业务量的新方法。我们调查了哪些数据源可以结合起来为瑞士的中小型企业实现这一目标,建立了一个模型,无论行业如何。我们基于从保险公司获得的数据集建立模型,并将数据集与人口普查数据相结合。我们提出了两个定量模型,可以预测瑞士法郎(CHF)的业务量,并按规模对客户进行分类。我们的研究结果表明,来自金融机构(FI)客户关系管理(CRM)系统的运营数据与人口普查数据相关联,对预测客户业务量有价值。
{"title":"Towards data driven decision support for financial institutions: Predicting small companies business volume in Switzerland","authors":"Daniel Müller, Funk Te, Flavien Meyer, Irena Pletikosa Cvijikj","doi":"10.1109/CSIT.2016.7549449","DOIUrl":"https://doi.org/10.1109/CSIT.2016.7549449","url":null,"abstract":"In Switzerland small and medium-sized enterprises represent more than 99% of all businesses. Therefore, prediction of their micro- and macroeconomic business development is of importance. In this paper, we propose a novel approach for predicting business volume using company characteristics and characteristics of the county the company operates in. We investigate which data sources can be combined to achieve this goal for small and midsized enterprises in Switzerland, building a model, irrespective of industry. We build our model based on the dataset obtained from an insurance company and combined the dataset with census data. We present two quantitative models, which allow to predict business volume in Swiss franks (CHF) and classify customers by size. Our results show that operational data from financial institutions (FI) customer relationship management (CRM) systems linked with census data are valuable to predict customer business volume.","PeriodicalId":210905,"journal":{"name":"2016 7th International Conference on Computer Science and Information Technology (CSIT)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120912196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Empirical insight into the context of design patterns: Modularity analysis 对设计模式上下文的经验洞察:模块化分析
Mawal A. Mohammed, Mahmoud O. Elish, A. Qusef
Design patterns are common solutions to specific design problems. There are many claimed benefits of the application of design patterns on design quality. This paper empirically evaluates and compares the modularity of design patterns in object-oriented software. Coupling and cohesion of classes that participate in design patterns were compared with those that do not participate. We used CBO and LCOM metrics as proxy measures for coupling and cohesion respectively. Data were collected from five open source systems, and analyses were conducted at both the design and pattern levels. At the design level, we compared the modularity of participant versus non-participant classes in design patterns, whereas at the pattern level, we compared the modularity of the classes in each pattern. The results indicate that the classes that participate in design patterns are more coupled and less cohesive than the non-participant classes at both levels.
设计模式是特定设计问题的通用解决方案。设计模式的应用对设计质量有很多好处。本文对面向对象软件中设计模式的模块化进行了实证评价和比较。将参与设计模式的类与不参与设计模式的类的耦合和内聚进行了比较。我们分别使用CBO和LCOM度量作为耦合和内聚的代理度量。从五个开放源码系统中收集数据,并在设计和模式级别进行分析。在设计级别,我们比较了设计模式中参与类与非参与类的模块化,而在模式级别,我们比较了每个模式中类的模块化。结果表明,在这两个级别上,参与设计模式的类比不参与的类更耦合,更少内聚。
{"title":"Empirical insight into the context of design patterns: Modularity analysis","authors":"Mawal A. Mohammed, Mahmoud O. Elish, A. Qusef","doi":"10.1109/CSIT.2016.7549474","DOIUrl":"https://doi.org/10.1109/CSIT.2016.7549474","url":null,"abstract":"Design patterns are common solutions to specific design problems. There are many claimed benefits of the application of design patterns on design quality. This paper empirically evaluates and compares the modularity of design patterns in object-oriented software. Coupling and cohesion of classes that participate in design patterns were compared with those that do not participate. We used CBO and LCOM metrics as proxy measures for coupling and cohesion respectively. Data were collected from five open source systems, and analyses were conducted at both the design and pattern levels. At the design level, we compared the modularity of participant versus non-participant classes in design patterns, whereas at the pattern level, we compared the modularity of the classes in each pattern. The results indicate that the classes that participate in design patterns are more coupled and less cohesive than the non-participant classes at both levels.","PeriodicalId":210905,"journal":{"name":"2016 7th International Conference on Computer Science and Information Technology (CSIT)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124530518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Social Media in project communications management 项目沟通管理中的社交媒体
A. Qusef, Khaled Ismail
Many experts agree that the greatest threat to the success of any project, especially IT projects, is a failure to communicate. Project managers say they spend as much as 90 percent of their time communicating. Just as it is difficult to understand people and their motivations, it is also difficult to communicate with people effectively. Communications software and collaboration tools like e-mail, blogs, Web sites, Google docs, and tweets can aid in stakeholder communications and promote stakeholder engagement in projects. A very popular software category todaySocial Mediacan also help engage stake-holders. This paper highlights keys to good communications, provides suggestions for improving communications using the Social Media (SM), and describes how SM can assist in project communications management.
许多专家一致认为,任何项目(尤其是IT项目)成功的最大威胁是沟通失败。项目经理说,他们花了90%的时间来沟通。正如很难理解人们和他们的动机一样,有效地与人沟通也很困难。通信软件和协作工具,如电子邮件、博客、网站、Google文档和tweet,可以帮助涉众进行通信,并促进涉众参与项目。社交媒体是当今非常流行的软件类别,它也可以帮助吸引利益相关者。本文强调了良好沟通的关键,提供了使用社交媒体(SM)改善沟通的建议,并描述了SM如何协助项目沟通管理。
{"title":"Social Media in project communications management","authors":"A. Qusef, Khaled Ismail","doi":"10.1109/CSIT.2016.7549448","DOIUrl":"https://doi.org/10.1109/CSIT.2016.7549448","url":null,"abstract":"Many experts agree that the greatest threat to the success of any project, especially IT projects, is a failure to communicate. Project managers say they spend as much as 90 percent of their time communicating. Just as it is difficult to understand people and their motivations, it is also difficult to communicate with people effectively. Communications software and collaboration tools like e-mail, blogs, Web sites, Google docs, and tweets can aid in stakeholder communications and promote stakeholder engagement in projects. A very popular software category todaySocial Mediacan also help engage stake-holders. This paper highlights keys to good communications, provides suggestions for improving communications using the Social Media (SM), and describes how SM can assist in project communications management.","PeriodicalId":210905,"journal":{"name":"2016 7th International Conference on Computer Science and Information Technology (CSIT)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131440799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Unsupervised feature selection technique based on genetic algorithm for improving the Text Clustering 基于遗传算法的无监督特征选择技术改进文本聚类
L. Abualigah, A. Khader, M. Al-Betar
The increasing amount of text documents in digital forms affect the text analysis techniques. Text clustering (TC) is one of the important techniques used for showing a massive amount of text documents by clusters. Hence, the main problem that affects the text clustering technique is the presence sparse and uninformative features on the text documents. The feature selection (FS) is an essential unsupervised learning technique. This technique is used to select informative features to improve the performance of text clustering algorithm. Recently, the meta-heuristic algorithms are successfully applied to solve several hard optimization problems. In this paper, we proposed the genetic algorithm (GA) to solve the unsupervised feature selection problem, namely, (FSGATC). This method is used to create a new subset of informative features in order to obtain more accurate clusters. Experiments were conducted using four benchmark text datasets with variant characteristics. The results showed that the proposed FSGATC is improved the performance of the text clustering algorithm and got better results compared with k-mean clustering standalone. Finally, the proposed method “FSGATC” evaluated by F-measure and Accuracy, which are common measures used in the domain of text clustering.
数字形式的文本文档数量的增加影响了文本分析技术。文本聚类(TC)是用于通过聚类显示大量文本文档的重要技术之一。因此,影响文本聚类技术的主要问题是文本文档上存在稀疏和无信息的特征。特征选择(FS)是一种重要的无监督学习技术。该技术用于选择信息特征,以提高文本聚类算法的性能。近年来,元启发式算法被成功地应用于解决一些困难的优化问题。在本文中,我们提出了遗传算法(GA)来解决无监督特征选择问题,即(FSGATC)。该方法用于创建新的信息特征子集,以获得更准确的聚类。实验采用四个具有不同特征的基准文本数据集。结果表明,本文提出的FSGATC提高了文本聚类算法的性能,与k-均值单独聚类相比,得到了更好的结果。最后,利用文本聚类领域常用的度量f值和精度对本文提出的“FSGATC”方法进行了评价。
{"title":"Unsupervised feature selection technique based on genetic algorithm for improving the Text Clustering","authors":"L. Abualigah, A. Khader, M. Al-Betar","doi":"10.1109/CSIT.2016.7549453","DOIUrl":"https://doi.org/10.1109/CSIT.2016.7549453","url":null,"abstract":"The increasing amount of text documents in digital forms affect the text analysis techniques. Text clustering (TC) is one of the important techniques used for showing a massive amount of text documents by clusters. Hence, the main problem that affects the text clustering technique is the presence sparse and uninformative features on the text documents. The feature selection (FS) is an essential unsupervised learning technique. This technique is used to select informative features to improve the performance of text clustering algorithm. Recently, the meta-heuristic algorithms are successfully applied to solve several hard optimization problems. In this paper, we proposed the genetic algorithm (GA) to solve the unsupervised feature selection problem, namely, (FSGATC). This method is used to create a new subset of informative features in order to obtain more accurate clusters. Experiments were conducted using four benchmark text datasets with variant characteristics. The results showed that the proposed FSGATC is improved the performance of the text clustering algorithm and got better results compared with k-mean clustering standalone. Finally, the proposed method “FSGATC” evaluated by F-measure and Accuracy, which are common measures used in the domain of text clustering.","PeriodicalId":210905,"journal":{"name":"2016 7th International Conference on Computer Science and Information Technology (CSIT)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124074748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Are emoticons good enough to train emotion classifiers of Arabic tweets? 表情符号是否足以训练阿拉伯语推文的情感分类器?
Wegdan A. Hussien, Yahya M. Tashtoush, M. Al-Ayyoub, M. Al-Kabi
Nowadays, the automatic detection of emotions is employed by many applications across different fields like security informatics, e-learning, humor detection, targeted advertising, etc. Many of these applications focus on social media. In this study, we address the problem of emotion detection in Arabic tweets. We focus on the supervised approach for this problem where a classifier is trained on an already labeled dataset. Typically, such a training set is manually annotated, which is expensive and time consuming. We propose to use an automatic approach to annotate the training data based on using emojis, which are a new generation of emoticons. We show that such an approach produces classifiers that are more accurate than the ones trained on a manually annotated dataset. To achieve our goal, a dataset of emotional Arabic tweets is constructed, where the emotion classes under consideration are: anger, disgust, joy and sadness. Moreover, we consider two classifiers: Support Vector Machine (SVM) and Multinomial Naive Bayes (MNB). The results of the tests show that the automatic labeling approaches using SVM and MNB outperform manual labeling approaches.
如今,情绪的自动检测被应用于许多不同领域,如安全信息学、电子学习、幽默检测、定向广告等。这些应用程序中的许多都侧重于社交媒体。在这项研究中,我们解决了阿拉伯语推文中的情感检测问题。我们专注于这个问题的监督方法,其中分类器是在已经标记的数据集上训练的。通常,这样的训练集是手动标注的,这是昂贵和耗时的。emojis是新一代的表情符号,我们提出了一种基于emojis的自动标注训练数据的方法。我们表明,这种方法产生的分类器比在手动注释数据集上训练的分类器更准确。为了实现我们的目标,我们构建了一个阿拉伯语情绪推文数据集,其中考虑的情绪类别是:愤怒、厌恶、喜悦和悲伤。此外,我们考虑了两种分类器:支持向量机(SVM)和多项朴素贝叶斯(MNB)。实验结果表明,基于SVM和MNB的自动标注方法优于人工标注方法。
{"title":"Are emoticons good enough to train emotion classifiers of Arabic tweets?","authors":"Wegdan A. Hussien, Yahya M. Tashtoush, M. Al-Ayyoub, M. Al-Kabi","doi":"10.1109/CSIT.2016.7549459","DOIUrl":"https://doi.org/10.1109/CSIT.2016.7549459","url":null,"abstract":"Nowadays, the automatic detection of emotions is employed by many applications across different fields like security informatics, e-learning, humor detection, targeted advertising, etc. Many of these applications focus on social media. In this study, we address the problem of emotion detection in Arabic tweets. We focus on the supervised approach for this problem where a classifier is trained on an already labeled dataset. Typically, such a training set is manually annotated, which is expensive and time consuming. We propose to use an automatic approach to annotate the training data based on using emojis, which are a new generation of emoticons. We show that such an approach produces classifiers that are more accurate than the ones trained on a manually annotated dataset. To achieve our goal, a dataset of emotional Arabic tweets is constructed, where the emotion classes under consideration are: anger, disgust, joy and sadness. Moreover, we consider two classifiers: Support Vector Machine (SVM) and Multinomial Naive Bayes (MNB). The results of the tests show that the automatic labeling approaches using SVM and MNB outperform manual labeling approaches.","PeriodicalId":210905,"journal":{"name":"2016 7th International Conference on Computer Science and Information Technology (CSIT)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122513118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Unsupervised feature selection technique based on harmony search algorithm for improving the text clustering 基于和谐搜索算法的无监督特征选择技术改进文本聚类
L. Abualigah, A. Khader, M. Al-Betar
The increasing amount of text information on the Internet web pages affects the clustering analysis. The text clustering is a favorable analysis technique used for partitioning a massive amount of information into clusters. Hence, the major problem that affects the text clustering technique is the presence uninformative and sparse features in text documents. The feature selection (FS) is an important unsupervised technique used to eliminate uninformative features to encourage the text clustering technique. Recently, the meta-heuristic algorithms are successfully applied to solve several optimization problems. In this paper, we proposed the harmony search (HS) algorithm to solve the feature selection problem (FSHSTC). The proposed method is used to enhance the text clustering (TC) technique by obtaining a new subset of informative or useful features. Experiments were applied using four benchmark text datasets. The results show that the proposed FSHSTC is improved the performance of the k-mean clustering algorithm measured by F-measure and Accuracy.
互联网网页上不断增加的文本信息量影响了聚类分析。文本聚类是一种很好的分析技术,用于将大量信息划分成簇。因此,影响文本聚类技术的主要问题是文本文档中存在缺乏信息和稀疏的特征。特征选择(FS)是一种重要的无监督技术,用于消除非信息特征,促进文本聚类技术。近年来,元启发式算法已成功地应用于若干优化问题的求解。在本文中,我们提出了和谐搜索(HS)算法来解决特征选择问题。该方法通过获取新的信息或有用的特征子集来增强文本聚类(TC)技术。实验采用了四个基准文本数据集。结果表明,FSHSTC改进了基于F-measure和Accuracy的k-均值聚类算法的性能。
{"title":"Unsupervised feature selection technique based on harmony search algorithm for improving the text clustering","authors":"L. Abualigah, A. Khader, M. Al-Betar","doi":"10.1109/CSIT.2016.7549456","DOIUrl":"https://doi.org/10.1109/CSIT.2016.7549456","url":null,"abstract":"The increasing amount of text information on the Internet web pages affects the clustering analysis. The text clustering is a favorable analysis technique used for partitioning a massive amount of information into clusters. Hence, the major problem that affects the text clustering technique is the presence uninformative and sparse features in text documents. The feature selection (FS) is an important unsupervised technique used to eliminate uninformative features to encourage the text clustering technique. Recently, the meta-heuristic algorithms are successfully applied to solve several optimization problems. In this paper, we proposed the harmony search (HS) algorithm to solve the feature selection problem (FSHSTC). The proposed method is used to enhance the text clustering (TC) technique by obtaining a new subset of informative or useful features. Experiments were applied using four benchmark text datasets. The results show that the proposed FSHSTC is improved the performance of the k-mean clustering algorithm measured by F-measure and Accuracy.","PeriodicalId":210905,"journal":{"name":"2016 7th International Conference on Computer Science and Information Technology (CSIT)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131909517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Multi-objectives-based text clustering technique using K-mean algorithm 基于k -均值算法的多目标文本聚类技术
L. Abualigah, A. Khader, M. Al-Betar
Text documents clustering is a popular unsupervised text mining tool. It is used for partitioning a collection of text documents into similar clusters based on the distance or similarity measure as decided by an objective function. Text clustering algorithm often makes prior assumptions to satisfy objective function, which is optimized either through traditional techniques or meta-heuristic techniques. In text clustering techniques, the right decision for any document distribution is done using an objective function. Normally, clustering algorithms perform poorly when the configuration of the well-formulated objective function is not sound and complete. Therefore, we proposed multi-objectives-based method namely, combine distance and similarity measure for improving the text clustering technique. Multi-objectives text clustering method is combined with two evaluating criteria which emerge as a robust alternative in several situations. In particular, the multi-objective function in the text clustering domain is not a popular, and it is a core issue that affects the performance of the text clustering technique. The performance of multi-objectives function is investigated using the k-mean text clustering technique. The experiments were conducted using seven standard text datasets. The results showed that the proposed multi-objectives based method outperforms the other measures in term of the performance of the text clustering, evaluated by using two common clustering measures, namely, Accuracy and F-measure.
文本文档聚类是一种流行的无监督文本挖掘工具。它用于根据由目标函数决定的距离或相似性度量将文本文档集合划分为相似的簇。文本聚类算法通常采用先验假设来满足目标函数,通过传统技术或元启发式技术对目标函数进行优化。在文本聚类技术中,任何文档分布的正确决策都是使用目标函数完成的。通常,当表述良好的目标函数配置不健全和不完整时,聚类算法的性能很差。为此,我们提出了基于多目标的方法,即结合距离度量和相似度量来改进文本聚类技术。多目标文本聚类方法结合两个评价标准,在多种情况下成为一种鲁棒的选择。特别是文本聚类领域中的多目标函数一直是一个不受欢迎的问题,是影响文本聚类技术性能的核心问题。利用k-均值文本聚类技术研究了多目标函数的性能。实验使用7个标准文本数据集进行。结果表明,本文提出的基于多目标的文本聚类方法在聚类性能方面优于其他方法,并使用两个常见的聚类度量,即准确性和f -测度进行评价。
{"title":"Multi-objectives-based text clustering technique using K-mean algorithm","authors":"L. Abualigah, A. Khader, M. Al-Betar","doi":"10.1109/CSIT.2016.7549464","DOIUrl":"https://doi.org/10.1109/CSIT.2016.7549464","url":null,"abstract":"Text documents clustering is a popular unsupervised text mining tool. It is used for partitioning a collection of text documents into similar clusters based on the distance or similarity measure as decided by an objective function. Text clustering algorithm often makes prior assumptions to satisfy objective function, which is optimized either through traditional techniques or meta-heuristic techniques. In text clustering techniques, the right decision for any document distribution is done using an objective function. Normally, clustering algorithms perform poorly when the configuration of the well-formulated objective function is not sound and complete. Therefore, we proposed multi-objectives-based method namely, combine distance and similarity measure for improving the text clustering technique. Multi-objectives text clustering method is combined with two evaluating criteria which emerge as a robust alternative in several situations. In particular, the multi-objective function in the text clustering domain is not a popular, and it is a core issue that affects the performance of the text clustering technique. The performance of multi-objectives function is investigated using the k-mean text clustering technique. The experiments were conducted using seven standard text datasets. The results showed that the proposed multi-objectives based method outperforms the other measures in term of the performance of the text clustering, evaluated by using two common clustering measures, namely, Accuracy and F-measure.","PeriodicalId":210905,"journal":{"name":"2016 7th International Conference on Computer Science and Information Technology (CSIT)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126343476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
期刊
2016 7th International Conference on Computer Science and Information Technology (CSIT)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1