2020 International Conference on Machine Learning and Cybernetics (ICMLC)最新文献

英文中文

Imbalanced Toxicity Prediction Using Multi-Task Learning and Over-Sampling 使用多任务学习和过度抽样的不平衡毒性预测

2020 International Conference on Machine Learning and Cybernetics (ICMLC)

Pub Date : 2020-12-02 DOI: 10.1109/ICMLC51923.2020.9469546

Jincheng Li

Chemical compound toxicity prediction is a challenge learning problem that the number of active chemicals obtained for toxicity assays are far smaller than the inactive chemicals, i.e. imbalanced data. Neural Networks learned from these tasks with imbalanced data tend to misclassify the minority samples into majority samples. In this paper, we propose a novel learning method that combine multi-task deep neural networks learning with over-sampling method to handle imbalanced data and lack of training data problems of toxicity prediction. Over-sampling is a kind of re-sampling method that tackle the class imbalance problem by replicating the minority class samples. For each toxicity prediction task, we apply over-sampling method on training set to generate synthetic samples of the minority class to balance the training data. Then, we train the multi-task deep neural network on the tasks with balanced training set. Multi-task learning can share common information among tasks and the balanced data set have larger number of training data that benefit the multi-task deep neural networks learning.Experiment results on tox21 toxicity prediction data set shows that our method significantly relieve imbalanced data problem of multi-task deep neural networks learning and outperforms multi-task deep neural network method that without over-sampling and many other computational approaches like support vector machine and random forests.

化学化合物毒性预测是一个具有挑战性的学习问题，因为用于毒性分析的活性化学物质的数量远远少于非活性化学物质，即数据不平衡。从这些具有不平衡数据的任务中学习的神经网络往往会将少数样本错误地分类为多数样本。本文提出了一种将多任务深度神经网络学习与过采样方法相结合的学习方法，以解决毒性预测中数据不平衡和训练数据缺乏的问题。过度抽样是一种通过复制少数类样本来解决类不平衡问题的重新抽样方法。对于每个毒性预测任务，我们在训练集上应用过采样方法生成少数类的合成样本来平衡训练数据。然后，我们在具有平衡训练集的任务上训练多任务深度神经网络。多任务学习可以在任务之间共享公共信息，平衡数据集有更多的训练数据，有利于多任务深度神经网络的学习。在tox21毒性预测数据集上的实验结果表明，该方法显著缓解了多任务深度神经网络学习的数据不平衡问题，优于无过采样的多任务深度神经网络方法以及支持向量机、随机森林等多种计算方法。

{"title":"Imbalanced Toxicity Prediction Using Multi-Task Learning and Over-Sampling","authors":"Jincheng Li","doi":"10.1109/ICMLC51923.2020.9469546","DOIUrl":"https://doi.org/10.1109/ICMLC51923.2020.9469546","url":null,"abstract":"Chemical compound toxicity prediction is a challenge learning problem that the number of active chemicals obtained for toxicity assays are far smaller than the inactive chemicals, i.e. imbalanced data. Neural Networks learned from these tasks with imbalanced data tend to misclassify the minority samples into majority samples. In this paper, we propose a novel learning method that combine multi-task deep neural networks learning with over-sampling method to handle imbalanced data and lack of training data problems of toxicity prediction. Over-sampling is a kind of re-sampling method that tackle the class imbalance problem by replicating the minority class samples. For each toxicity prediction task, we apply over-sampling method on training set to generate synthetic samples of the minority class to balance the training data. Then, we train the multi-task deep neural network on the tasks with balanced training set. Multi-task learning can share common information among tasks and the balanced data set have larger number of training data that benefit the multi-task deep neural networks learning.Experiment results on tox21 toxicity prediction data set shows that our method significantly relieve imbalanced data problem of multi-task deep neural networks learning and outperforms multi-task deep neural network method that without over-sampling and many other computational approaches like support vector machine and random forests.","PeriodicalId":170815,"journal":{"name":"2020 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116388453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Research on Hotspot Mining Method of Twitter News Report Based on LDA and Sentiment Analysis 基于LDA和情感分析的Twitter新闻报道热点挖掘方法研究

2020 International Conference on Machine Learning and Cybernetics (ICMLC)

Pub Date : 2020-12-02 DOI: 10.1109/ICMLC51923.2020.9469557

Lingfei Zhang, Chunfang Li

Nowadays, media from various countries have published a large number of report tweets on international hot topics. The rapid spread of news events on twitter has become increasingly popular. For hotspot mining of news events, topic division and sentiment analysis are two indispensable factors. In this Paper, we use topic segmentation and sentiment analysis to conduct hot mining of social media news for the US media and Chinese media tweets in Huawei-related news in 2019. First, we apply LDA to media tweets to divide topics and obtain related topic words. Then we devised improved methods for effective sentiment analysis on media tweets and influencer comments respectively. What's more, we draw some valid conclusions about news hotspot mining in social media tweets.

如今，各国媒体就国际热点话题发布了大量的报道推文。新闻事件在推特上的快速传播越来越受欢迎。对于新闻事件的热点挖掘，话题划分和情感分析是不可或缺的两个因素。本文采用话题分割和情感分析的方法，对2019年美国媒体和中国媒体在华为相关新闻中的推文进行社交媒体新闻热点挖掘。首先，我们利用LDA对媒体推文进行主题划分，得到相关主题词。然后，我们设计了改进的方法，分别对媒体推文和网红评论进行有效的情感分析。此外，我们还得出了一些关于社交媒体推文新闻热点挖掘的有效结论。

引用次数: 1

Hybrid Separable Convolutional Inception Residual Network for Human Facial Expression Recognition 混合可分离卷积初始残差网络用于人脸表情识别

2020 International Conference on Machine Learning and Cybernetics (ICMLC)

Pub Date : 2020-12-02 DOI: 10.1109/ICMLC51923.2020.9469558

Xinqi Fan, Rizwan Qureshi, A. Shahid, Jianfeng Cao, Luoxiao Yang, H. Yan

Facial expression recognition has been applied widely in human-machine interactions, security and business applications. The aim of facial expression recognition is to classify human expressions from their face images. In this work, we propose a novel neural network-based pipeline for facial expression recognition, Hybrid Separable Convolutional Inception Residual Network, using transfer learning with Inception residual network and depth-wise separable convolution. Specifically, our method uses multi-task convolutional neural network for face detection, then modifies the last two blocks of the original Inception residual network using depthwise separable convolution to reduce the computation cost, and finally utilizes transfer learning to take advantages of the transferable weights from a large face recognition dataset. Experimental result on three different databases - the Radboud Faces Database, Compounded Facial Expression of Emotions Database, and Real-word Affective Face Database, shows superior performance compared with the existing studies. Moreover, the proposed method is computationally efficient and reduces the trainable parameters by approximately 25% than the original Inception residual network.

面部表情识别在人机交互、安全、商业等领域有着广泛的应用。面部表情识别的目的是从人脸图像中对人类表情进行分类。在这项工作中，我们提出了一种新的基于神经网络的面部表情识别管道，混合可分离卷积初始残差网络，使用迁移学习与初始残差网络和深度可分离卷积。具体来说，我们的方法使用多任务卷积神经网络进行人脸检测，然后使用深度可分离卷积修改原始Inception残差网络的最后两个块以降低计算成本，最后利用迁移学习利用来自大型人脸识别数据集的可转移权。在Radboud Faces数据库、complex Facial Expression of Emotions数据库和Real-word Affective Face数据库上的实验结果与已有的研究结果相比，显示出了更好的性能。此外，该方法计算效率高，可训练参数比原始Inception残差网络减少约25%。

{"title":"Hybrid Separable Convolutional Inception Residual Network for Human Facial Expression Recognition","authors":"Xinqi Fan, Rizwan Qureshi, A. Shahid, Jianfeng Cao, Luoxiao Yang, H. Yan","doi":"10.1109/ICMLC51923.2020.9469558","DOIUrl":"https://doi.org/10.1109/ICMLC51923.2020.9469558","url":null,"abstract":"Facial expression recognition has been applied widely in human-machine interactions, security and business applications. The aim of facial expression recognition is to classify human expressions from their face images. In this work, we propose a novel neural network-based pipeline for facial expression recognition, Hybrid Separable Convolutional Inception Residual Network, using transfer learning with Inception residual network and depth-wise separable convolution. Specifically, our method uses multi-task convolutional neural network for face detection, then modifies the last two blocks of the original Inception residual network using depthwise separable convolution to reduce the computation cost, and finally utilizes transfer learning to take advantages of the transferable weights from a large face recognition dataset. Experimental result on three different databases - the Radboud Faces Database, Compounded Facial Expression of Emotions Database, and Real-word Affective Face Database, shows superior performance compared with the existing studies. Moreover, the proposed method is computationally efficient and reduces the trainable parameters by approximately 25% than the original Inception residual network.","PeriodicalId":170815,"journal":{"name":"2020 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123889317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Identification of Inverted Pendulum System Using Frequency Domain Maximum Likelihood Estimation 基于频域极大似然估计的倒立摆系统辨识

2020 International Conference on Machine Learning and Cybernetics (ICMLC)

Pub Date : 2020-12-02 DOI: 10.1109/ICMLC51923.2020.9469529

Si-Ting Zou, Qing Sun, Dangdang Du

This paper explores the method of establishing dynamic model of inverted pendulum based on system identification. The maximum likelihood method in frequency domain is innovatively applied to identify the model parameters of the inverted pendulum system(IPS). The frequency domain maximum likelihood (ML) method is used to resolve the output error(OE) model with transient term for the system transfer function between the output state variables and the input variable. Finally, the parameters are identified using the frequency domain ML method and compared with the time-domain weighted least square method. Under the condition with only measured data, the experiment of a single inverted pendulum system with random time-varying control signal as excitation signal is designed. The numerical results show that the frequency domain ML method is effective in the identification of the inverted pendulum system.

探讨了基于系统辨识的倒立摆动力学模型的建立方法。创新性地将频域最大似然法应用于倒立摆系统的模型参数辨识。采用频域最大似然方法求解了系统输出状态变量与输入变量之间传递函数具有暂态项的输出误差模型。最后，采用频域ML方法进行参数辨识，并与时域加权最小二乘法进行比较。在只有实测数据的条件下，设计了以随机时变控制信号作为激励信号的单倒立摆系统实验。数值结果表明，频域ML方法对倒立摆系统的辨识是有效的。

引用次数: 1

Sentiment Analysis of Online Product Reviews Based On SenBERT-CNN 基于SenBERT-CNN的在线产品评论情感分析

2020 International Conference on Machine Learning and Cybernetics (ICMLC)

Pub Date : 2020-12-02 DOI: 10.1109/ICMLC51923.2020.9469551

F. Wu, Zhenjie Shi, Zhaowei Dong, C. Pang, Bailing Zhang

Sentiment analysis, also known as opinion mining, is an important area of research to analyze people’s opinions. In online e-commerce marketplace like Taobao, customers are allowed to comment on different products, brands and services using text and numerical ratings. Such reviews towards a product are valuable for the improvement of the product quality as they influence consumers’ purchase decisions. In this paper, we introduce a novel model, SenBERT-CNN, to analyze customer’s review. In order to capture more sentiment information in sentences, SenBERT-CNN model combines a pre-trained Bidirectional Encoder Representations from Transformers (BERT) network with Convolutional Neural Network (CNN). Specifically, we use BERT structure to better express sentence semantics as a text vector, and then further extract the deep features of the sentence through a Convolutional Neural Network. The effectiveness of the proposed method is validated through a collected product reviews of mobile phone from the e-commerce website, JD.com.

情感分析，也被称为意见挖掘，是分析人们意见的一个重要研究领域。在像淘宝这样的在线电子商务市场上，顾客可以用文字和数字对不同的产品、品牌和服务进行评价。这种对产品的评论对产品质量的提高是有价值的，因为它们会影响消费者的购买决策。在本文中，我们引入了一个新的模型SenBERT-CNN来分析顾客评论。为了在句子中捕获更多的情感信息，SenBERT-CNN模型将预训练的双向编码器表示(Bidirectional Encoder Representations from Transformers, BERT)网络与卷积神经网络(Convolutional Neural network, CNN)相结合。具体来说，我们使用BERT结构将句子语义更好地表达为文本向量，然后通过卷积神经网络进一步提取句子的深层特征。通过收集电子商务网站京东的手机产品评论，验证了所提出方法的有效性。

引用次数: 4

Comparative Study of Speech Emotion Recognition Based On CNN and CRNN 基于CNN和CRNN的语音情感识别比较研究

2020 International Conference on Machine Learning and Cybernetics (ICMLC)

Pub Date : 2020-12-02 DOI: 10.1109/ICMLC51923.2020.9469540

Nan Jiang, Junwei Jia, Dongmei Shao

This paper compares and analyzes the training effect of Convolutional Recurrent Neural Network (CRNN) and Convolutional Neural Network (CNN) in speech emotion recognition. In order to solve the problem that CNN lacks the extraction of temporal information and the general temporal model is insufficient to represent the spatial information, CRNN is applied to speech emotion recognition. Taking Mel Frequency Cepstrum Coefficient (MFCC) and Gammatone Frequency Cepstrum Coefficient (GFCC) as the input features of the model, the recognition performances of CRNN and CNN in speech emotion recognition are compared and analyzed. The research shows that CRNN has higher accuracy for both features, which effectively improves the computing power of speech emotion model and provides a theoretical basis and optimization direction for improving the accuracy of speech emotion recognition.

对比分析了卷积递归神经网络(CRNN)和卷积神经网络(CNN)在语音情感识别中的训练效果。为了解决CNN缺乏对时间信息的提取以及一般时间模型不足以表示空间信息的问题，将CRNN应用于语音情感识别。以Mel频率倒频谱系数(MFCC)和Gammatone频率倒频谱系数(GFCC)作为模型的输入特征，对比分析了CRNN和CNN在语音情感识别中的识别性能。研究表明，CRNN在这两个特征上都具有较高的准确率，有效地提高了语音情感模型的计算能力，为提高语音情感识别的准确率提供了理论依据和优化方向。

引用次数: 1

A Modified Q-Learning Algorithm for Control of Two-Qubit Systems 一种用于双量子比特系统控制的改进q -学习算法

2020 International Conference on Machine Learning and Cybernetics (ICMLC)

Pub Date : 2020-12-02 DOI: 10.1109/ICMLC51923.2020.9469044

Omar Shindi, Qi Yu, D. Dong, Jiangjun Tang

This paper investigates quantum control problems using tabular Q-learning. A modified tabular Q-learning algorithm based on dynamic greedy method is proposed and the proposed algorithm succeeds for finding control sequences to drive a two-qubit system to a given target state with high fidelity. The modified algorithm also shows improved performance over the traditional Q-learning for solving quantum control problems on continuous states space. Moreover, the modified tabular Q-learning algorithm is compared with stochastic gradient descent and Krotov algorithms for solving quantum control problems. Simulation results on a two-qubit system demonstrate the effectiveness of the proposed algorithm.

本文利用表格q -学习研究量子控制问题。提出了一种改进的基于动态贪心方法的表q学习算法，该算法成功地找到了控制序列，使双量子位系统高保真地达到给定的目标状态。改进后的算法在求解连续状态空间上的量子控制问题时，也比传统的q -学习算法表现出更高的性能。此外，将改进的表格q -学习算法与随机梯度下降和Krotov算法进行了比较，用于求解量子控制问题。在双量子比特系统上的仿真结果验证了该算法的有效性。

引用次数: 0

Conservative Generalisation for Small Data Analytics –An Extended Lattice Machine Approach 小数据分析的保守推广——扩展格机方法

2020 International Conference on Machine Learning and Cybernetics (ICMLC)

Pub Date : 2020-12-02 DOI: 10.1109/ICMLC51923.2020.9469579

Shuangshuang Kong, Hui Wang, Kaijun Wang

Small data analytics is to tackle the data analysis challenges such as overfitting when the data set is small. There are different approaches to small data analytics, including knowledge-based learning, but most of these approaches need experience to use. In this paper we consider another approach, lattice machine. Lattice machine is a conservative generalisation based learning algorithm. It is a learning paradigm that "learns" by generalising data in a consistent, conservative and parsimonious way. A lattice machine model built from a dataset is a set of hyper tuples that tightly "wraps around" clusters of data, each of which is a conservative generalisation of the underlying cluster. A key feature of lattice machine, indeed any conservative generalisation based learning algorithm, is that it has high precision and low recall, limiting its applications as high recall is needed in some applications such as disease (e.g. covid-19) screening. It is thus necessary to improve lattice machine’s recall whilst retaining his high precision. In this paper, we present a study on how to achieve this for lattice machine.

小数据分析是解决数据集较小时的过拟合等数据分析难题。小数据分析有不同的方法，包括基于知识的学习，但大多数方法都需要使用经验。在本文中，我们考虑另一种方法，晶格机。点阵机是一种基于保守泛化的学习算法。它是一种学习范式，通过以一致、保守和简约的方式概括数据来“学习”。从数据集构建的晶格机模型是一组超元组，它们紧密地“包裹”着数据簇，每个数据簇都是底层簇的保守泛化。晶格机的一个关键特征，实际上是任何基于保守泛化的学习算法，是它具有高精度和低召回率，限制了它的应用，因为在某些应用中需要高召回率，如疾病(例如covid-19)筛查。因此，有必要在保持格子机高精度的同时提高其召回率。在本文中，我们研究了如何在格子机上实现这一目标。

{"title":"Conservative Generalisation for Small Data Analytics –An Extended Lattice Machine Approach","authors":"Shuangshuang Kong, Hui Wang, Kaijun Wang","doi":"10.1109/ICMLC51923.2020.9469579","DOIUrl":"https://doi.org/10.1109/ICMLC51923.2020.9469579","url":null,"abstract":"Small data analytics is to tackle the data analysis challenges such as overfitting when the data set is small. There are different approaches to small data analytics, including knowledge-based learning, but most of these approaches need experience to use. In this paper we consider another approach, lattice machine. Lattice machine is a conservative generalisation based learning algorithm. It is a learning paradigm that \"learns\" by generalising data in a consistent, conservative and parsimonious way. A lattice machine model built from a dataset is a set of hyper tuples that tightly \"wraps around\" clusters of data, each of which is a conservative generalisation of the underlying cluster. A key feature of lattice machine, indeed any conservative generalisation based learning algorithm, is that it has high precision and low recall, limiting its applications as high recall is needed in some applications such as disease (e.g. covid-19) screening. It is thus necessary to improve lattice machine’s recall whilst retaining his high precision. In this paper, we present a study on how to achieve this for lattice machine.","PeriodicalId":170815,"journal":{"name":"2020 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116212585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Realization of Cross-Media Knowledge Graph of Tang and Song Poetry 唐宋诗歌跨媒介知识图谱的实现

2020 International Conference on Machine Learning and Cybernetics (ICMLC)

Pub Date : 2020-12-02 DOI: 10.1109/ICMLC51923.2020.9469590

Chan-liang Wu, Chunfang Li, Wenjuan Jiang

As the two peaks of Chinese culture, poems of the Tang and Song Dynasties attracts numerous scholars to devote themselves to research. "China Biographical Database Project" (CBDB) is a relational database of Chinese historical figures established by Harvard University, and "Chinese poetry: the most complete database of ancient Chinese poetry" is an open source project of GitHub. This paper uses the above databases, takes poetry in Tang and Song Dynasties as the main body, combines the related video of the cultural variety show "Chinese Poetry Conference" and the Chinese textbooks for primary school to establish the system of "Knowledge Graph of Tang and Song Dynasties", and achieves the integration of video and teaching, characters and works across the media. It provides the comprehensive search function of characters, poems and related videos, the online generation function of knowledge graph, and the display function of the chronology of the top 100 figures divided by the emperors of past dynasties. Users can interact with the knowledge graph by clicking, typing, dragging and so on to complete exploratory visual analysis.

唐宋诗歌作为中国文化的两个高峰，吸引了众多学者致力于研究。“中国传记数据库项目”(CBDB)是哈佛大学建立的中国历史人物关系型数据库，“中国诗歌:最完整的中国古诗数据库”是GitHub的开源项目。本文利用上述数据库，以唐宋诗歌为主体，结合文化综艺节目《中国诗词大会》的相关视频和小学语文教材，建立“唐宋知识图谱”系统，实现视频与教学、文字与作品跨媒介的融合。提供文字、诗词及相关视频的综合检索功能，知识图谱的在线生成功能，以及历代皇帝分百位人物年表的显示功能。用户可以通过点击、输入、拖动等方式与知识图谱进行交互，完成探索性的可视化分析。

{"title":"The Realization of Cross-Media Knowledge Graph of Tang and Song Poetry","authors":"Chan-liang Wu, Chunfang Li, Wenjuan Jiang","doi":"10.1109/ICMLC51923.2020.9469590","DOIUrl":"https://doi.org/10.1109/ICMLC51923.2020.9469590","url":null,"abstract":"As the two peaks of Chinese culture, poems of the Tang and Song Dynasties attracts numerous scholars to devote themselves to research. \"China Biographical Database Project\" (CBDB) is a relational database of Chinese historical figures established by Harvard University, and \"Chinese poetry: the most complete database of ancient Chinese poetry\" is an open source project of GitHub. This paper uses the above databases, takes poetry in Tang and Song Dynasties as the main body, combines the related video of the cultural variety show \"Chinese Poetry Conference\" and the Chinese textbooks for primary school to establish the system of \"Knowledge Graph of Tang and Song Dynasties\", and achieves the integration of video and teaching, characters and works across the media. It provides the comprehensive search function of characters, poems and related videos, the online generation function of knowledge graph, and the display function of the chronology of the top 100 figures divided by the emperors of past dynasties. Users can interact with the knowledge graph by clicking, typing, dragging and so on to complete exploratory visual analysis.","PeriodicalId":170815,"journal":{"name":"2020 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127877048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Digitalization of Electronic Textbook Based on OPENCV 基于OPENCV的电子教科书数字化

2020 International Conference on Machine Learning and Cybernetics (ICMLC)

Pub Date : 2020-12-02 DOI: 10.1109/ICMLC51923.2020.9469536

Zhi-Ming Deng, Minyong Shi, Chunfang Li

The traditional digitization method of electronic textbooks is limited by text data and illustration layout, and the data processing effect is poor. In order to adapt to the complex and changeable data formats, this paper proposes an adaptive data partitioning technique. We divide all the texts and illustrations in the textbooks into independent data blocks, locate and cut them, and use OCR technology to identify the information of each area to make the processing goals more clear. Experiments were conducted on the junior middle school history textbooks in terms of data recognition rate. The experimental results show that the method proposed in this paper has a good effect on the digitalization of electronic textbooks.

传统的电子教科书数字化方法受文本数据和插图排版的限制，数据处理效果较差。为了适应复杂多变的数据格式，提出了一种自适应数据分区技术。我们将教科书中的所有文本和插图划分为独立的数据块，对其进行定位和切割，并使用OCR技术对每个区域的信息进行识别，使处理目标更加明确。在初中历史教科书上进行了数据识别率的实验。实验结果表明，本文提出的方法对电子教科书的数字化具有良好的效果。

引用次数: 2

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2020 International Conference on Machine Learning and Cybernetics (ICMLC)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀