2018 Thirteenth International Conference on Digital Information Management (ICDIM)最新文献

英文中文

The Development and Analysis of TWISH: A Lightweight-Block-Cipher-TWINE-Based Hash Function 基于轻量级块-密码- TWISH哈希函数的开发与分析

2018 Thirteenth International Conference on Digital Information Management (ICDIM)

Pub Date : 2018-09-01 DOI: 10.1109/ICDIM.2018.8847056

Deden Irfan Afryansyah, Magfirawaty, K. Ramli

Security is one of the most important aspects of the Internet of Things (IoT) due to the increasing trend of attacks in an IoT environment. Cryptographic techniques can be used to improve the security aspects of IoT implementation. Due to the limitations of computing resources in most devices connected to the IoT, reliable and efficient hash functions are required. In this paper, TWISH, a new hash design that can be efficiently implemented without compromising the security, is proposed. TWISH is based on the lightweight block cipher TWINE using the Davies-Meyer (DM) scheme. We analyze the security and randomness of the resulting output using the Cryptographic Randomness Test package. These tests include the Strict Avalanche Criterion (SAC), Collision, Coverage, and Linear Span test. The results show that TWISH can pass the applied tests. The tests also indirectly demonstrate that TWISH is quite resistant to near-collision, preimage, and differential/linear attacks.

由于物联网环境中攻击的增加趋势，安全是物联网(IoT)最重要的方面之一。加密技术可用于提高物联网实施的安全性。由于连接物联网的大多数设备的计算资源有限，因此需要可靠高效的哈希函数。本文提出了一种可以在不影响安全性的情况下高效实现的新哈希设计TWISH。TWISH基于轻量级分组密码TWINE，使用Davies-Meyer (DM)方案。我们使用Cryptographic randomness Test包分析结果输出的安全性和随机性。这些测试包括严格雪崩标准(SAC)、碰撞、覆盖和线性跨度测试。结果表明，TWISH能够通过应用测试。测试还间接证明TWISH对近碰撞、预像和差分/线性攻击具有相当的抵抗力。

引用次数: 1

User Profile Feature-Based Approach to Address the Cold Start Problem in Collaborative Filtering for Personalized Movie Recommendation 基于用户档案特征的个性化电影推荐协同过滤冷启动问题解决方法

2018 Thirteenth International Conference on Digital Information Management (ICDIM)

Pub Date : 2018-09-01 DOI: 10.1109/ICDIM.2018.8847002

Lasitha Uyangoda, S. Ahangama, Tharindu Ranasinghe

A huge amount of user generated content related to movies is created with the popularization of web 2.0. With these continues exponential growth of data, there is an inevitable need for recommender systems as people find it difficult to make informed and timely decisions. Movie recommendation systems assist users to find the next interest or the best recommendation. In this proposed approach the authors apply the relationship of user feature-scores derived from user-item interaction via ratings to optimize the prediction algorithm’s input parameters used in the recommender system to improve the accuracy of predictions with less past user records. This addresses a major drawback in collaborative filtering, the cold start problem by showing an improvement of 8.4% compared to the base collaborative filtering algorithm. The user-feature generation and evaluation of the system is carried out using the ‘MovieLens 100k dataset’. The proposed system can be generalized to other domains as well.

随着web 2.0的普及，产生了大量与电影相关的用户生成内容。随着数据的持续指数增长，人们发现很难做出明智和及时的决定，因此对推荐系统的需求是不可避免的。电影推荐系统帮助用户找到下一个兴趣或最佳推荐。在该方法中，作者通过评级应用用户与物品交互产生的用户特征分数关系来优化推荐系统中使用的预测算法的输入参数，以提高过去用户记录较少的预测的准确性。这解决了协同过滤的一个主要缺点，即冷启动问题，与基本协同过滤算法相比，改进了8.4%。使用“MovieLens 100k数据集”进行系统的用户特征生成和评估。所提出的系统也可以推广到其他领域。

引用次数: 8

Compliance at Velocity within a DevOps Environment 在DevOps环境中快速遵守法规

2018 Thirteenth International Conference on Digital Information Management (ICDIM)

Pub Date : 2018-09-01 DOI: 10.1109/ICDIM.2018.8847007

Muhammad Zaid Abrahams, J. Langerman

DevOps has become an emerging force within the Information Technology field in today’s development/operations climate. Information security within a DevOps environment has become a focal point for most organizations that have implemented the DevOps methodology and its principles. In most cases, the ability to secure a DevOps environment and the organization’s ability to adhere to, and comply with, industry specific standards, frameworks and best practice is an integral part of information security within a DevOps environment. This investigation aims to address those issues that may arise when an organization seeks to adhere to and comply with industry standards, frameworks and best practice in a manner that does not limit the velocity of the organization’s automated delivery/deployment pipeline. This study investigates this by collecting and analyzing industry and academic literature; and through a prototype demonstration, understanding technical compliance and its requirements within a DevOps environment, using existing industry tools and solutions.

在当今的开发/运营环境中，DevOps已经成为信息技术领域的一支新兴力量。DevOps环境中的信息安全已经成为大多数实施DevOps方法及其原则的组织关注的焦点。在大多数情况下，确保DevOps环境安全的能力以及组织遵守和遵守行业特定标准、框架和最佳实践的能力是DevOps环境中信息安全的一个组成部分。本调查的目的是解决当组织寻求遵循行业标准、框架和最佳实践时可能出现的问题，这些问题不会限制组织的自动交付/部署管道的速度。本研究通过搜集和分析行业和学术文献来对此进行调查;并通过原型演示，了解DevOps环境中的技术遵从性及其需求，使用现有的行业工具和解决方案。

引用次数: 3

Grammatical Error Checking Systems: A Review of Approaches and Emerging Directions 语法错误检查系统:方法和新兴方向的回顾

2018 Thirteenth International Conference on Digital Information Management (ICDIM)

Pub Date : 2018-09-01 DOI: 10.1109/ICDIM.2018.8847020

Nora Madi, Hend Suliman Al-Khalifa

Grammatical error checking is a process of detecting and sometimes correcting erroneous words in a text. Various approaches have been used for detecting and correcting text in numerous languages. Techniques that have been used include rule-based, syntax-based, statistical-based, classification and neural networks. This paper presents previous works of Grammatical Error Correction or Detection systems, challenges associated with these systems, and, finally, suggested future directions.

语法错误检查是在文本中发现并纠正错误单词的过程。各种方法被用于检测和纠正许多语言中的文本。已经使用的技术包括基于规则、基于语法、基于统计、分类和神经网络。本文介绍了语法错误纠正或检测系统的先前工作，与这些系统相关的挑战，最后提出了未来的方向。

引用次数: 7

Machine Learning for Predicting the Damaged Parts of a Low Speed Vehicle Crash 用于预测低速车辆碰撞损坏部件的机器学习

2018 Thirteenth International Conference on Digital Information Management (ICDIM)

Pub Date : 2018-09-01 DOI: 10.1109/ICDIM.2018.8846974

M. Koch, Hao Wang, Thomas Bäck

Using time series of on-board car data, this research focuses on predicting the damaged parts of a vehicle in a low speed crash by machine learning techniques. Based on a relatively small and class-imbalanced dataset, we present our automatic and for small datasets optimized method to use time series for machine learning. Based on 3982 extracted features, we are using feature selection algorithms to find the most significant ones for each component. We train random forest models per part with its most relevant set of features and optimize the hyper-parameters by different techniques. This so-called part-wise approach provides good insights into the model performance for each part and offers opportunities for optimizing the models. The final F1 prediction scores (reaching up to 94%) show the large potential of predicting damaged parts with on-board data only. Furthermore, for the worse performing parts of this small and imbalanced dataset, it indicates the potential for reaching good prediction scores when adding more training data. The utilization of such method offers great possibilities, e.g., in vehicle insurance processing for automatized settling of low speed crash damages.

本研究利用车载数据的时间序列，通过机器学习技术来预测低速碰撞中车辆的损坏部件。基于一个相对较小且类别不平衡的数据集，我们提出了使用时间序列进行机器学习的自动和小数据集优化方法。基于提取的3982个特征，我们使用特征选择算法为每个组件找到最重要的特征。我们用每个零件最相关的特征集训练随机森林模型，并通过不同的技术优化超参数。这种所谓的部分智能方法为每个部分的模型性能提供了很好的洞察，并为优化模型提供了机会。最终的F1预测分数(高达94%)表明，仅凭车载数据预测损坏部件的潜力巨大。此外，对于这个小而不平衡的数据集中表现较差的部分，它表明当添加更多的训练数据时，有可能达到良好的预测分数。这种方法的应用提供了很大的可能性，例如在车辆保险处理中实现低速碰撞损害的自动结算。

{"title":"Machine Learning for Predicting the Damaged Parts of a Low Speed Vehicle Crash","authors":"M. Koch, Hao Wang, Thomas Bäck","doi":"10.1109/ICDIM.2018.8846974","DOIUrl":"https://doi.org/10.1109/ICDIM.2018.8846974","url":null,"abstract":"Using time series of on-board car data, this research focuses on predicting the damaged parts of a vehicle in a low speed crash by machine learning techniques. Based on a relatively small and class-imbalanced dataset, we present our automatic and for small datasets optimized method to use time series for machine learning. Based on 3982 extracted features, we are using feature selection algorithms to find the most significant ones for each component. We train random forest models per part with its most relevant set of features and optimize the hyper-parameters by different techniques. This so-called part-wise approach provides good insights into the model performance for each part and offers opportunities for optimizing the models. The final F1 prediction scores (reaching up to 94%) show the large potential of predicting damaged parts with on-board data only. Furthermore, for the worse performing parts of this small and imbalanced dataset, it indicates the potential for reaching good prediction scores when adding more training data. The utilization of such method offers great possibilities, e.g., in vehicle insurance processing for automatized settling of low speed crash damages.","PeriodicalId":120884,"journal":{"name":"2018 Thirteenth International Conference on Digital Information Management (ICDIM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128652571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

SVM-RBM based Predictive Maintenance Scheme for IoT-enabled Smart Factory 基于SVM-RBM的物联网智能工厂预测性维护方案

2018 Thirteenth International Conference on Digital Information Management (ICDIM)

Pub Date : 2018-09-01 DOI: 10.1109/ICDIM.2018.8847132

Soonsung Hwang, Jongpil Jeong, Youngbin Kang

Fault diagnosis of facility maintenance is very important. Unexpected equipment failures during the process lead to significant losses to the plant. In this paper, in order to detect defects and fault patterns, Support Vector Machine (SVM) which is one of the machine learning algorithms, classifies the data received from the equipment as normal or abnormal. After learning only normal data by using Restricted Boltzmann Machine (RBM). We propose a model to identify the data, and then we analyze the faults of facilities in real-time.

设备维护中的故障诊断是非常重要的。在这个过程中，意外的设备故障会给工厂带来重大损失。在本文中，为了检测缺陷和故障模式，机器学习算法之一的支持向量机(SVM)将设备接收到的数据分为正常和异常两类。使用受限玻尔兹曼机(RBM)学习正常数据后。提出了一种数据识别模型，对设备故障进行实时分析。

引用次数: 18

Attention Based Neural Architecture for Rumor Detection with Author Context Awareness 基于作者语境感知的基于注意力的谣言检测神经结构

2018 Thirteenth International Conference on Digital Information Management (ICDIM)

Pub Date : 2018-09-01 DOI: 10.1109/ICDIM.2018.8847052

Sansiri Tarnpradab, K. Hua

The prevalence of social media has made information sharing possible across the globe. The downside, unfortunately, is the wide spread of misinformation. Methods applied in most previous rumor classifiers give an equal weight, or attention, to words in the microblog, and do not take the context beyond microblog contents into account; therefore, the accuracy becomes plateaued. In this research, we propose an ensemble neural architecture to detect rumor on Twitter. The architecture incorporates word attention and context from the author to enhance the classification performance. In particular, the word-level attention mechanism enables the architecture to put more emphasis on important words when constructing the text representation. To derive further context, microblog posts composed by individual authors are exploited since they can reflect style and characteristics in spreading information, which are significant cues to help classify whether the shared content is rumor or legitimate news. The experiment on the real-world Twitter dataset collected from two well-known rumor tracking websites demonstrates promising results.

社交媒体的普及使全球范围内的信息共享成为可能。不幸的是，缺点是错误信息的广泛传播。以前大多数谣言分类器使用的方法对微博中的单词给予同等的权重或关注，而没有考虑微博内容之外的上下文;因此，精度趋于稳定。在这项研究中，我们提出了一个集成神经结构来检测Twitter上的谣言。该体系结构结合了作者的词注意和上下文，以提高分类性能。特别是词级注意机制，使得架构在构建文本表示时更加强调重要的词。为了进一步推导上下文，我们利用了个人作者撰写的微博，因为它们可以反映信息传播的风格和特征，这是帮助分类共享内容是谣言还是合法新闻的重要线索。在两个知名谣言追踪网站收集的真实Twitter数据集上进行的实验显示了令人鼓舞的结果。

{"title":"Attention Based Neural Architecture for Rumor Detection with Author Context Awareness","authors":"Sansiri Tarnpradab, K. Hua","doi":"10.1109/ICDIM.2018.8847052","DOIUrl":"https://doi.org/10.1109/ICDIM.2018.8847052","url":null,"abstract":"The prevalence of social media has made information sharing possible across the globe. The downside, unfortunately, is the wide spread of misinformation. Methods applied in most previous rumor classifiers give an equal weight, or attention, to words in the microblog, and do not take the context beyond microblog contents into account; therefore, the accuracy becomes plateaued. In this research, we propose an ensemble neural architecture to detect rumor on Twitter. The architecture incorporates word attention and context from the author to enhance the classification performance. In particular, the word-level attention mechanism enables the architecture to put more emphasis on important words when constructing the text representation. To derive further context, microblog posts composed by individual authors are exploited since they can reflect style and characteristics in spreading information, which are significant cues to help classify whether the shared content is rumor or legitimate news. The experiment on the real-world Twitter dataset collected from two well-known rumor tracking websites demonstrates promising results.","PeriodicalId":120884,"journal":{"name":"2018 Thirteenth International Conference on Digital Information Management (ICDIM)","volume":" 10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113949705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Lead-Lag Relationship between Investor Sentiment in Social Media9 Investor Attention in Google, and Stock Return 社交媒体投资者情绪、谷歌投资者关注与股票收益的前滞后关系

2018 Thirteenth International Conference on Digital Information Management (ICDIM)

Pub Date : 2018-09-01 DOI: 10.1109/ICDIM.2018.8847094

A. Rizkiana, Hasrini Sari, P. Hardjomidjojo, B. Prihartono, I. Sunaryo, I. Prasetyo

Investor sentiment has a significant role in driving stock prices. Although many previous studies show that investor sentiment in social media can be used to predict stock price movements, there are two things that still need further investigation. The first one is related to the attention of investors that affect the ability to predict the movement of stocks price and its interaction with investor sentiment. The second one is related to the effect of the lead-lag relationship between investor sentiment, investor attention, and stock return. Therefore, the purpose of this research is to understand the effect of the lead-lag relationship between the three variables as well as the interaction between investor sentiment and investor attention in predicting the movement of stock prices. The steps taken to answer the research problem are to measure investor sentiment based on comments in social media Stockbit, measure investor attention based on search volume obtained from Google Trend, and then test the effect of lead-lag relationship and interaction between variable using Granger causality analysis and vector autoregression. Test results show that investor sentiment in Indonesia is a reaction from stock returns, not the cause, so it cannot be used to predict stock price movement. Also, investor attention measured by search volume in Google Trend cannot be used to predict stock price movement either. There are four reasons on why investor sentiment has no significant effect on stock return, which is the speed of information diffusion on the stock price, data source used, size of stock capitalization tested, and selection of investor sentiment measurement method. Furthermore, there are two reasons on why investor attention has no significant effect on stock return, which is related to stock capitalization size tested and google search volume that does not reflect investor attention. The insignificant effects of investor sentiment variable and investor attention to stock returns cause the interaction between the two not significant.

投资者情绪在推动股价方面发挥着重要作用。尽管之前的许多研究表明，社交媒体中的投资者情绪可以用来预测股价走势，但有两件事仍需要进一步调查。第一个与投资者的注意力有关，它影响预测股价走势的能力及其与投资者情绪的相互作用。二是投资者情绪、投资者注意力与股票收益之间的超前滞后关系效应。因此，本研究的目的是了解这三个变量之间的超前-滞后关系以及投资者情绪和投资者注意力之间的相互作用对预测股价走势的影响。回答研究问题的步骤是基于社交媒体Stockbit上的评论来衡量投资者情绪，基于Google Trend上的搜索量来衡量投资者关注，然后使用格兰杰因果分析和向量自回归来检验变量之间的领先滞后关系和交互作用的影响。检验结果表明，印尼投资者情绪是股票收益的反应，而不是原因，因此不能用来预测股价走势。此外，在谷歌趋势中通过搜索量衡量的投资者关注也不能用于预测股价走势。投资者情绪对股票收益没有显著影响的原因有四个，即信息对股价的扩散速度、所使用的数据来源、所测试的股票资本化规模和投资者情绪测量方法的选择。此外，投资者关注对股票收益没有显著影响的原因有两个，一是与所测试的股票市值规模有关，二是与没有反映投资者关注的谷歌搜索量有关。投资者情绪变量与投资者对股票收益关注的影响不显著，导致两者之间的交互作用不显著。

{"title":"Lead-Lag Relationship between Investor Sentiment in Social Media9 Investor Attention in Google, and Stock Return","authors":"A. Rizkiana, Hasrini Sari, P. Hardjomidjojo, B. Prihartono, I. Sunaryo, I. Prasetyo","doi":"10.1109/ICDIM.2018.8847094","DOIUrl":"https://doi.org/10.1109/ICDIM.2018.8847094","url":null,"abstract":"Investor sentiment has a significant role in driving stock prices. Although many previous studies show that investor sentiment in social media can be used to predict stock price movements, there are two things that still need further investigation. The first one is related to the attention of investors that affect the ability to predict the movement of stocks price and its interaction with investor sentiment. The second one is related to the effect of the lead-lag relationship between investor sentiment, investor attention, and stock return. Therefore, the purpose of this research is to understand the effect of the lead-lag relationship between the three variables as well as the interaction between investor sentiment and investor attention in predicting the movement of stock prices. The steps taken to answer the research problem are to measure investor sentiment based on comments in social media Stockbit, measure investor attention based on search volume obtained from Google Trend, and then test the effect of lead-lag relationship and interaction between variable using Granger causality analysis and vector autoregression. Test results show that investor sentiment in Indonesia is a reaction from stock returns, not the cause, so it cannot be used to predict stock price movement. Also, investor attention measured by search volume in Google Trend cannot be used to predict stock price movement either. There are four reasons on why investor sentiment has no significant effect on stock return, which is the speed of information diffusion on the stock price, data source used, size of stock capitalization tested, and selection of investor sentiment measurement method. Furthermore, there are two reasons on why investor attention has no significant effect on stock return, which is related to stock capitalization size tested and google search volume that does not reflect investor attention. The insignificant effects of investor sentiment variable and investor attention to stock returns cause the interaction between the two not significant.","PeriodicalId":120884,"journal":{"name":"2018 Thirteenth International Conference on Digital Information Management (ICDIM)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129786885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Sensitivity Based Anonymization with Multi-dimensional Mixed Generalization 基于灵敏度的多维混合泛化匿名化

2018 Thirteenth International Conference on Digital Information Management (ICDIM)

Pub Date : 2018-09-01 DOI: 10.1109/ICDIM.2018.8847000

Esther Gachanga, Michael W. Kimwele, L. Nderu

Sensitive information about individuals must not be revealed when sharing data, but a data set must remain useful for research and analysis when published. Anonymization methods have been considered as a possible solution for protecting the privacy of individuals. This is achieved by transforming data in a way that guarantees a certain degree of protection from re-identification threats. In the process, it is important to ensure that the quality of data is preserved. K-anonymity is the most commonly used approach for the anonymization of published datasets. However, the approach causes a decline in data utility. The key challenge for data publishers is how to anonymize data without causing a significant decline in data utility. The paper addresses this challenge by proposing a multidimensional mixed generalization. We conduct experiments with mixed generalization. Our results show that mixed generalization preserves the quality of data for classification.

在共享数据时不能泄露个人的敏感信息，但数据集在发布时必须对研究和分析有用。匿名化方法被认为是保护个人隐私的一种可能的解决方案。这是通过以某种方式转换数据来实现的，这种方式保证了一定程度的保护，以防止再次识别威胁。在这个过程中，确保数据的质量是很重要的。k -匿名是发表数据集匿名化最常用的方法。然而，这种方法会导致数据效用的下降。数据发布者面临的主要挑战是如何在不导致数据效用显著下降的情况下对数据进行匿名化。本文通过提出一个多维混合泛化来解决这一挑战。我们进行混合泛化实验。我们的研究结果表明，混合泛化保留了分类数据的质量。

引用次数: 1

Urdu Text Classification: A comparative study using machine learning techniques 乌尔都语文本分类:使用机器学习技术的比较研究

2018 Thirteenth International Conference on Digital Information Management (ICDIM)

Pub Date : 2018-09-01 DOI: 10.1109/ICDIM.2018.8847044

Imran Rasheed, Vivek Gupta, H. Banka, C. Kumar

In the last decade, online content has entered a stage where news related organizations are reluctant to invest in offline operations due to excessive aberrations in content distributions. However, the proliferation of digital data in an unstructured or rather disordered form particularly for languages like Urdu has complicated the easy access to information. Consequently, the paper addresses the peculiarities of Urdu text classification of news origin. For this, the performance of the three classifiers such as Decision Tree (J48), Support Vector Machine (SVM) and k-nearest neighbor (KNN) was measured on the classification of Urdu text using WEKA (Waikato Environment Knowledge Analysis) tool. The assessment was carried out on a relatively large collection of Urdu text having over 16,678 documents containing mainly news articles from The Daily Roshni, an Urdu newspaper. Additionally, TF-IDF weighting scheme was used for feature selection and extraction of data. The Urdu text classification using SVM classifier performed quite better with promising accuracy and superior efficiency when compared to the other two classifiers. For this study, the dataset was formulated as per TRC (Text Retrieval Conference) community standard.

在过去的10年里，在线内容已经进入了新闻相关机构不愿投资于线下运营的阶段，因为内容的分布存在过多的偏差。然而，非结构化或无序形式的数字数据的激增，特别是乌尔都语等语言，使获取信息变得更加容易。因此，本文探讨了乌尔都语新闻来源文本分类的特点。为此，利用WEKA(怀卡托环境知识分析)工具，对决策树(J48)、支持向量机(SVM)和k近邻(KNN)三种分类器在乌尔都语文本分类上的性能进行了测量。评估是在一个相对较大的乌尔都语文本集合中进行的，其中有超过16,678份文件，主要包括乌尔都语报纸《每日罗什尼》的新闻文章。此外，采用TF-IDF加权方案对数据进行特征选择和提取。与其他两种分类器相比，使用SVM分类器进行乌尔都语文本分类具有更好的准确性和更高的效率。本研究的数据集是按照TRC (Text Retrieval Conference)社区标准制定的。

{"title":"Urdu Text Classification: A comparative study using machine learning techniques","authors":"Imran Rasheed, Vivek Gupta, H. Banka, C. Kumar","doi":"10.1109/ICDIM.2018.8847044","DOIUrl":"https://doi.org/10.1109/ICDIM.2018.8847044","url":null,"abstract":"In the last decade, online content has entered a stage where news related organizations are reluctant to invest in offline operations due to excessive aberrations in content distributions. However, the proliferation of digital data in an unstructured or rather disordered form particularly for languages like Urdu has complicated the easy access to information. Consequently, the paper addresses the peculiarities of Urdu text classification of news origin. For this, the performance of the three classifiers such as Decision Tree (J48), Support Vector Machine (SVM) and k-nearest neighbor (KNN) was measured on the classification of Urdu text using WEKA (Waikato Environment Knowledge Analysis) tool. The assessment was carried out on a relatively large collection of Urdu text having over 16,678 documents containing mainly news articles from The Daily Roshni, an Urdu newspaper. Additionally, TF-IDF weighting scheme was used for feature selection and extraction of data. The Urdu text classification using SVM classifier performed quite better with promising accuracy and superior efficiency when compared to the other two classifiers. For this study, the dataset was formulated as per TRC (Text Retrieval Conference) community standard.","PeriodicalId":120884,"journal":{"name":"2018 Thirteenth International Conference on Digital Information Management (ICDIM)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114439676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2018 Thirteenth International Conference on Digital Information Management (ICDIM)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀