A Feasible Chinese Text Data Preprocessing Strategy

2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON) Pub Date : 2020-10-28 DOI:10.1109/UEMCON51285.2020.9298131

Jingang Liu, Chunhe Xia, Haihua Yan, Jie Sun

{"title":"A Feasible Chinese Text Data Preprocessing Strategy","authors":"Jingang Liu, Chunhe Xia, Haihua Yan, Jie Sun","doi":"10.1109/UEMCON51285.2020.9298131","DOIUrl":null,"url":null,"abstract":"With the rapid rise of artificial intelligence technologies such as machine learning and the rapid development of the big data industry, more and more attention is paid to the use of data itself, especially the Chinese text data, which is more complex in expression and richer in the information. It is a necessary step to process the raw Chinese text data before it is used for specific application tasks. However, the current strategies for processing data are generally to deal with data in different fields and specific application tasks. In this paper, to further improve the quality of Chinese data processing and give play to the application value of Chinese data, we propose a general and feasible Chinese text preprocessing strategy, named the multi-level data preprocessing strategy (MLDPS). This strategy uses four effective links to process raw Chinese text data systematically. We believe that the proposed MLDPS has relatively strong practical significance, and provides a better idea for preprocessing Chinese text data.","PeriodicalId":433609,"journal":{"name":"2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UEMCON51285.2020.9298131","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

With the rapid rise of artificial intelligence technologies such as machine learning and the rapid development of the big data industry, more and more attention is paid to the use of data itself, especially the Chinese text data, which is more complex in expression and richer in the information. It is a necessary step to process the raw Chinese text data before it is used for specific application tasks. However, the current strategies for processing data are generally to deal with data in different fields and specific application tasks. In this paper, to further improve the quality of Chinese data processing and give play to the application value of Chinese data, we propose a general and feasible Chinese text preprocessing strategy, named the multi-level data preprocessing strategy (MLDPS). This strategy uses four effective links to process raw Chinese text data systematically. We believe that the proposed MLDPS has relatively strong practical significance, and provides a better idea for preprocessing Chinese text data.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一种可行的中文文本数据预处理策略

随着机器学习等人工智能技术的迅速兴起和大数据产业的快速发展，人们越来越关注数据本身的使用，尤其是中文文本数据，其表达更加复杂，信息更加丰富。在将原始中文文本数据用于特定的应用程序任务之前，对其进行处理是必要的步骤。然而，目前的数据处理策略通常是处理不同领域和特定应用任务中的数据。为了进一步提高中文数据的处理质量，发挥中文数据的应用价值，本文提出了一种通用的、可行的中文文本预处理策略，即多级数据预处理策略(MLDPS)。该策略采用四个有效环节对原始中文文本数据进行系统处理。我们认为所提出的MLDPS具有较强的实际意义，为中文文本数据的预处理提供了更好的思路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)

自引率

0.00%

发文量