首页 > 最新文献

2021 13th International Conference on Machine Learning and Computing最新文献

英文 中文
An Improved K-means Algorithm Based on Multiple Clustering and Density 基于多聚类和密度的改进K-means算法
Pub Date : 2021-02-26 DOI: 10.1145/3457682.3457695
Yulong Ling, Xiao Zhang
The initial clustering center set of the k-means algorithm is randomly selected, which leads to unstable clustering results. To address this shortcoming, many improved k-means algorithms based on density have propersed, but the time complexity of these algorithms is too high. In order to improve clustering stability and reduce the clustering time, this paper proposes an improved algorithm based on multiple clustering and density. This algorithm firstly calls the k-means algorithm for many time, and adaptively selects excellent sample set according to the distance between samples and the corresponding cluster center. Then the initial cluster center set is selected according to the principle of the furthest distance and high density. The experiment on the UCI data sets shows that the algorithm in this paper not only improves the performance but also ensures the stability of clustering result compared with the k-means algorithm and the kmeans++ algorithm. Compare to improved density-based k-means algorithms, the proposed algorithm can greatly save the clustering time.
k-means算法的初始聚类中心集是随机选择的,导致聚类结果不稳定。为了解决这一缺点,许多改进的基于密度的k-means算法得到了改进,但这些算法的时间复杂度太高。为了提高聚类稳定性,减少聚类时间,本文提出了一种基于多重聚类和密度的改进算法。该算法首先多次调用k-means算法,根据样本与相应聚类中心的距离自适应选择优秀的样本集。然后根据距离最远、密度高的原则选择初始聚类中心集。在UCI数据集上的实验表明,与k-means算法和kmeans++算法相比,本文算法不仅提高了聚类性能,而且保证了聚类结果的稳定性。与改进的基于密度的k-means算法相比,该算法可以大大节省聚类时间。
{"title":"An Improved K-means Algorithm Based on Multiple Clustering and Density","authors":"Yulong Ling, Xiao Zhang","doi":"10.1145/3457682.3457695","DOIUrl":"https://doi.org/10.1145/3457682.3457695","url":null,"abstract":"The initial clustering center set of the k-means algorithm is randomly selected, which leads to unstable clustering results. To address this shortcoming, many improved k-means algorithms based on density have propersed, but the time complexity of these algorithms is too high. In order to improve clustering stability and reduce the clustering time, this paper proposes an improved algorithm based on multiple clustering and density. This algorithm firstly calls the k-means algorithm for many time, and adaptively selects excellent sample set according to the distance between samples and the corresponding cluster center. Then the initial cluster center set is selected according to the principle of the furthest distance and high density. The experiment on the UCI data sets shows that the algorithm in this paper not only improves the performance but also ensures the stability of clustering result compared with the k-means algorithm and the kmeans++ algorithm. Compare to improved density-based k-means algorithms, the proposed algorithm can greatly save the clustering time.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116456638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Active Learning for Concept Prerequisite Learning in Wikipedia 维基百科中概念前提学习的主动学习
Pub Date : 2021-02-26 DOI: 10.1145/3457682.3457771
Xinying Hu, Yu He, Guangzhong Sun
The prerequisite relationship of the concept plays an important role in education. Previously, the prerequisites were given by experts, which is very costly. With the development of the Internet, many new concepts have emerged. And there are a growing number of electronic materials available. In this case, it's important to produce an efficient and accessible prerequisite annotator that is beneficial to make an efficient learning plan. This paper proposes a method to mine prerequisite relationships of concepts from Wikipedia by using active learning, which can use fewer artificial labels to obtain an accurate model. The proposed method extracts features from Wikipedia articles, and designs a new active learning algorithm based on the characteristics of concept prerequisites. Experimental results show that the proposed model outperforms existing active learning methods for concept prerequisite learning.
概念的前提关系在教育中起着重要的作用。以前,先决条件是由专家给出的,这是非常昂贵的。随着互联网的发展,出现了许多新的概念。而且有越来越多的电子材料可用。在这种情况下,重要的是生成一个有效且可访问的先决条件注释器,它有助于制定有效的学习计划。本文提出了一种利用主动学习挖掘维基百科中概念的前提关系的方法,该方法可以使用较少的人工标签获得准确的模型。该方法从维基百科文章中提取特征,并设计了一种基于概念先决条件特征的主动学习算法。实验结果表明,该模型在概念前提学习方面优于现有的主动学习方法。
{"title":"Active Learning for Concept Prerequisite Learning in Wikipedia","authors":"Xinying Hu, Yu He, Guangzhong Sun","doi":"10.1145/3457682.3457771","DOIUrl":"https://doi.org/10.1145/3457682.3457771","url":null,"abstract":"The prerequisite relationship of the concept plays an important role in education. Previously, the prerequisites were given by experts, which is very costly. With the development of the Internet, many new concepts have emerged. And there are a growing number of electronic materials available. In this case, it's important to produce an efficient and accessible prerequisite annotator that is beneficial to make an efficient learning plan. This paper proposes a method to mine prerequisite relationships of concepts from Wikipedia by using active learning, which can use fewer artificial labels to obtain an accurate model. The proposed method extracts features from Wikipedia articles, and designs a new active learning algorithm based on the characteristics of concept prerequisites. Experimental results show that the proposed model outperforms existing active learning methods for concept prerequisite learning.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128220378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Algorithmic Generation of Positive Samples for Compound-Target Interaction Prediction 化合物-靶标相互作用预测阳性样本的生成算法
Pub Date : 2021-02-26 DOI: 10.1145/3457682.3457689
Ebenezer Nanor, Wei-Ping Wu, S. Bayitaa, V. K. Agbesi, Brighter Agyemang
Machine Learning (ML) methods have become the preferred computational methods for Compound-Target Interaction (CTI) prediction in small drug development in Bioinformatics, because they have been proven to be very efficient. However, the extremely imbalance nature of CTI datasets presents a major challenge when ML methods are leveraged to predict CTIs. To a large extent, these methods inaccurately predict the class of the minority samples, i.e. positive samples, which are rather of much interest to players in the business of drug development. In this study, we aim to improve the performance of ML-based methods for prediction of CTIs, particularly the positive samples, by addressing the challenge of class imbalance. We applied the technique of deep generative modeling to oversample selected positive samples from the original dataset in order to construct balance datasets. The process of oversampling espoused the General-based approach and a novel Domain Specific-based approach. In the experimental section, 3 Deep Learning (DL) methods and 6 classical ML methods were trained on the original imbalance dataset and two constructed sets of balance data to investigate their performance in the prediction of CTIs. To ensure robustness of the ML-based predictive methods, a Grid Search with 5-fold Cross Validation (CV) was performed to estimate the best hyperparameters for training. Convolutional Neural Network (CNN) produced the most competitive results in predicting positive samples following evaluation carried out with Recall metric.
机器学习(ML)方法已经成为生物信息学领域小药物开发中化合物-靶点相互作用(CTI)预测的首选计算方法,因为它已被证明是非常有效的。然而,CTI数据集的极度不平衡性质在利用ML方法预测CTI时提出了一个重大挑战。在很大程度上,这些方法不能准确地预测少数样本的类别,即阳性样本,这是药物开发业务参与者非常感兴趣的。在本研究中,我们的目标是通过解决类别不平衡的挑战,提高基于ml的cti预测方法的性能,特别是正样本。我们应用深度生成建模技术从原始数据集中选择正样本进行过采样,以构建平衡数据集。过采样过程支持基于通用的方法和一种新的基于领域特定的方法。在实验部分,在原始失衡数据集和两组构建的平衡数据集上训练了3种深度学习(DL)方法和6种经典ML方法,以研究它们在cti预测中的性能。为了确保基于ml的预测方法的稳健性,进行了5倍交叉验证(CV)的网格搜索来估计训练的最佳超参数。卷积神经网络(CNN)在预测阳性样本方面产生了最具竞争力的结果。
{"title":"Algorithmic Generation of Positive Samples for Compound-Target Interaction Prediction","authors":"Ebenezer Nanor, Wei-Ping Wu, S. Bayitaa, V. K. Agbesi, Brighter Agyemang","doi":"10.1145/3457682.3457689","DOIUrl":"https://doi.org/10.1145/3457682.3457689","url":null,"abstract":"Machine Learning (ML) methods have become the preferred computational methods for Compound-Target Interaction (CTI) prediction in small drug development in Bioinformatics, because they have been proven to be very efficient. However, the extremely imbalance nature of CTI datasets presents a major challenge when ML methods are leveraged to predict CTIs. To a large extent, these methods inaccurately predict the class of the minority samples, i.e. positive samples, which are rather of much interest to players in the business of drug development. In this study, we aim to improve the performance of ML-based methods for prediction of CTIs, particularly the positive samples, by addressing the challenge of class imbalance. We applied the technique of deep generative modeling to oversample selected positive samples from the original dataset in order to construct balance datasets. The process of oversampling espoused the General-based approach and a novel Domain Specific-based approach. In the experimental section, 3 Deep Learning (DL) methods and 6 classical ML methods were trained on the original imbalance dataset and two constructed sets of balance data to investigate their performance in the prediction of CTIs. To ensure robustness of the ML-based predictive methods, a Grid Search with 5-fold Cross Validation (CV) was performed to estimate the best hyperparameters for training. Convolutional Neural Network (CNN) produced the most competitive results in predicting positive samples following evaluation carried out with Recall metric.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128799153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Objective Optimal Design of Excitation Systems of Synchronous Condensers for HVDC Systems Based on MOEA/D 基于MOEA/D的高压直流系统同步冷凝器励磁系统多目标优化设计
Pub Date : 2021-02-26 DOI: 10.1145/3457682.3457770
Fan Shi, Hong-hua Wang, Tianhang Lu, Chengliang Wang
In order to optimize the reactive power characteristics of synchronous condensers and improve the capability of condensers to support the voltage of AC systems, in this paper, the outer loop control of the reactive power of condensers and the outer loop control of the voltage of AC systems are introduced into the design of the main excitation systems of condensers in high voltage direct current (HVDC) systems. Meanwhile, taking the integral values, peak values and steady-state values of voltage deviations of AC systems as objective functions, the multi-objective optimization design of the proportional adjustment coefficients in the outer loop control of the reactive power of condensers and the voltage of AC systems is carried out via utilizing a multi-objective evolutionary algorithm based on decomposition (MOEA/D) combining with fuzzy control method. Its purpose is to alleviate the overvoltage problems of power grids caused by the feedback of the reactive power of condensers and the voltage of AC systems. Lastly, the simulation model of ±100 kV HVDC system with a synchronous condenser is established. The simulation results show that the optimal design method of excitation systems of synchronous condensers proposed in this paper can optimize the reactive power characteristics of the condenser, ensure the rapid regulation of the voltage of the AC system by the condenser, and solve the overvoltage problem in the AC system caused by the reactive power regulation of the condenser which can not change suddenly and the feedback links of the reactive power of the condenser and the voltage of the AC system in the excitation system.
为了优化同步电容器的无功特性,提高电容器对交流系统电压的支持能力,本文将电容器无功的外环控制和交流系统电压的外环控制引入到高压直流系统中电容器主励磁系统的设计中。同时,以交流系统电压偏差的积分值、峰值值和稳态值为目标函数,利用基于分解的多目标进化算法(MOEA/D)与模糊控制方法相结合,对电容器无功功率与交流系统电压外环控制中的比例调节系数进行多目标优化设计。其目的是为了缓解因电容器无功功率与交流系统电压反馈而引起的电网过电压问题。最后,建立了带同步冷凝器的±100kv高压直流系统的仿真模型。仿真结果表明,本文提出的同步凝汽器励磁系统优化设计方法能够优化凝汽器的无功特性,保证凝汽器对交流系统电压的快速调节;解决了因凝汽器无功调节不能突然变化以及凝汽器无功与励磁系统交流电压反馈环节造成的交流系统过电压问题。
{"title":"Multi-Objective Optimal Design of Excitation Systems of Synchronous Condensers for HVDC Systems Based on MOEA/D","authors":"Fan Shi, Hong-hua Wang, Tianhang Lu, Chengliang Wang","doi":"10.1145/3457682.3457770","DOIUrl":"https://doi.org/10.1145/3457682.3457770","url":null,"abstract":"In order to optimize the reactive power characteristics of synchronous condensers and improve the capability of condensers to support the voltage of AC systems, in this paper, the outer loop control of the reactive power of condensers and the outer loop control of the voltage of AC systems are introduced into the design of the main excitation systems of condensers in high voltage direct current (HVDC) systems. Meanwhile, taking the integral values, peak values and steady-state values of voltage deviations of AC systems as objective functions, the multi-objective optimization design of the proportional adjustment coefficients in the outer loop control of the reactive power of condensers and the voltage of AC systems is carried out via utilizing a multi-objective evolutionary algorithm based on decomposition (MOEA/D) combining with fuzzy control method. Its purpose is to alleviate the overvoltage problems of power grids caused by the feedback of the reactive power of condensers and the voltage of AC systems. Lastly, the simulation model of ±100 kV HVDC system with a synchronous condenser is established. The simulation results show that the optimal design method of excitation systems of synchronous condensers proposed in this paper can optimize the reactive power characteristics of the condenser, ensure the rapid regulation of the voltage of the AC system by the condenser, and solve the overvoltage problem in the AC system caused by the reactive power regulation of the condenser which can not change suddenly and the feedback links of the reactive power of the condenser and the voltage of the AC system in the excitation system.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"68 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116282801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Practical Indoor and Outdoor Seamless Navigation System Based on Electronic Map and Geomagnetism 一种实用的基于电子地图和地磁的室内外无缝导航系统
Pub Date : 2021-02-26 DOI: 10.1145/3457682.3457772
K. Qiu, Ruizhi Chen, He Huang
In order to solve the problem that the transition point facing indoor and outdoor seamless positioning is low in accuracy and the coordinates are difficult to be uniformly converted, in this paper, a combination of Baidu map app positioning technology using GPS, base station and Wi-Fi signal positioning and indoor geomagnetic fingerprint node is developed to develop a system for seamless positioning and navigation indoors and outdoors. We propose a novel and rapid method for establishing coordinate uniformity to solve the key problem of indoor and outdoor seamless positioning - coordinate smoothing conversion. Through the combination of 3D laser scanning technology and GPS positioning technology, the data from multiple viewing angles are organized into the same coordinate system according to the transformation matrix. The iterative closest point algorithm registration technique is used to obtain a three-dimensional model of the high-precision local coordinate system of indoor and outdoor critical points.
为了解决面向室内外无缝定位的过渡点精度低、坐标难以统一转换的问题,本文将利用GPS、基站和Wi-Fi信号定位的百度地图app定位技术与室内地磁指纹节点相结合,开发出室内外无缝定位导航系统。为解决室内外无缝定位的关键问题——坐标平滑转换,提出了一种新颖快速的坐标均匀性建立方法。将三维激光扫描技术与GPS定位技术相结合,根据变换矩阵将多个视角的数据组织到同一坐标系中。采用迭代最近点算法配准技术,得到室内外临界点高精度局部坐标系的三维模型。
{"title":"A Practical Indoor and Outdoor Seamless Navigation System Based on Electronic Map and Geomagnetism","authors":"K. Qiu, Ruizhi Chen, He Huang","doi":"10.1145/3457682.3457772","DOIUrl":"https://doi.org/10.1145/3457682.3457772","url":null,"abstract":"In order to solve the problem that the transition point facing indoor and outdoor seamless positioning is low in accuracy and the coordinates are difficult to be uniformly converted, in this paper, a combination of Baidu map app positioning technology using GPS, base station and Wi-Fi signal positioning and indoor geomagnetic fingerprint node is developed to develop a system for seamless positioning and navigation indoors and outdoors. We propose a novel and rapid method for establishing coordinate uniformity to solve the key problem of indoor and outdoor seamless positioning - coordinate smoothing conversion. Through the combination of 3D laser scanning technology and GPS positioning technology, the data from multiple viewing angles are organized into the same coordinate system according to the transformation matrix. The iterative closest point algorithm registration technique is used to obtain a three-dimensional model of the high-precision local coordinate system of indoor and outdoor critical points.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114864866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Biological Named Entity Recognition and Role Labeling via Deep Multi-task Learning 基于深度多任务学习的生物命名实体识别和角色标记
Pub Date : 2021-02-26 DOI: 10.1145/3457682.3457751
Fei Deng, Dongdong Zhang, Jing Peng
Bioscience is an experimental science. The qualitative and quantitative findings of the biological experiments are often exclusively available in the form of figures in published papers. In this paper, we introduce the SourceData model, which captures a key aspect of the biological experimental design by categorizing biological entity involved in the experiment into one of the six roles. Our work aims at determining whether a given entity is subjected to a perturbation or is the object of a measurement (entity role labeling) through automatic natural language algorithms. We use state-of-the-art transformer models (e.g., Bert and its variants) as a strong baseline, find that after jointly trained with biological named entity recognition task by deep multi-task learning (MTL), the F1 score gets improved by 2% compared to previous single-task architecture. Also, for named entity recognition task, the MTL method achieves comparable performance in five public datasets. Further analysis reveals the importance of fusing entity information at the input layer of entity role labeling task and incorporating global context.
生物科学是一门实验科学。生物学实验的定性和定量结果通常只以发表论文的数字形式提供。在本文中,我们引入了SourceData模型,该模型通过将实验中涉及的生物实体分类为六个角色之一来捕捉生物实验设计的一个关键方面。我们的工作旨在通过自动自然语言算法确定给定实体是受到扰动还是测量对象(实体角色标记)。我们使用最先进的变压器模型(例如Bert及其变体)作为强基线,发现通过深度多任务学习(MTL)与生物命名实体识别任务联合训练后,F1分数比以前的单任务架构提高了2%。此外,对于命名实体识别任务,MTL方法在5个公共数据集上也达到了相当的性能。进一步分析表明,在实体角色标注任务的输入层融合实体信息和结合全局上下文的重要性。
{"title":"Biological Named Entity Recognition and Role Labeling via Deep Multi-task Learning","authors":"Fei Deng, Dongdong Zhang, Jing Peng","doi":"10.1145/3457682.3457751","DOIUrl":"https://doi.org/10.1145/3457682.3457751","url":null,"abstract":"Bioscience is an experimental science. The qualitative and quantitative findings of the biological experiments are often exclusively available in the form of figures in published papers. In this paper, we introduce the SourceData model, which captures a key aspect of the biological experimental design by categorizing biological entity involved in the experiment into one of the six roles. Our work aims at determining whether a given entity is subjected to a perturbation or is the object of a measurement (entity role labeling) through automatic natural language algorithms. We use state-of-the-art transformer models (e.g., Bert and its variants) as a strong baseline, find that after jointly trained with biological named entity recognition task by deep multi-task learning (MTL), the F1 score gets improved by 2% compared to previous single-task architecture. Also, for named entity recognition task, the MTL method achieves comparable performance in five public datasets. Further analysis reveals the importance of fusing entity information at the input layer of entity role labeling task and incorporating global context.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133927145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Leveraging CNN and Bi-LSTM in Indonesian G2P Using Transformer 利用CNN和Bi-LSTM在印尼G2P使用变压器
Pub Date : 2021-02-26 DOI: 10.1145/3457682.3457706
A. Rachman, S. Suyanto, Ema Rachmawati
We apply a transformer called tensor2tensor toolkit, which is based on Tensorflow, to overcome the Grapheme-to-Phoneme conversion problem. This study performs conversions to produce pronunciation symbols for certain letter sequences in Indonesian particularly. The unavailability of the G2P conversion system in Indonesian is currently being faced, so research is being carried out to create a system that can solve this problem by applying the Transformer. The transformer has a simple network architecture based solely on the attention mechanism, so we took advantage of eliminating convolution and redundancies—complex recurrent and convolution neural networks including encoders and decoders as the basis for the sequence transduction model. The excellent performance of the model is obtained through the attention mechanism by connecting the encoder and decoder. By using this tool, we carry out to compare among KBBI and CMU dictionary datasets. We attained a word error rate (WER) of 6,7% on the KBBI data set after training for three days on two core CPUs, which has an accuracy of 93,3%, improving over the existing best results CMU dictionary dataset for 26% word error rate. In this study, we carried out a detailed experimental evaluation by assessing the processing time and the error rate of words and then compared it with state of the art. By demonstrating this Transformer, this tool successfully generalizes and then applies it to several Indonesian elements with limited training data and large training data. We concluded that the transformer model is suitable for dealing with the G2P problem at hand for this task.
我们应用了一个基于Tensorflow的名为tensor2tensor工具包的转换器来克服字素到音素的转换问题。本研究特别对印尼语中某些字母序列进行转换,生成发音符号。印尼目前正面临G2P转换系统无法使用的问题,因此正在进行研究,以创建一个系统,可以通过应用Transformer来解决这个问题。变压器具有简单的网络结构,仅基于注意机制,因此我们利用了消除卷积和冗余的优势-包括编码器和解码器的复杂递归和卷积神经网络作为序列转导模型的基础。该模型通过连接编码器和解码器的注意机制获得了优异的性能。利用该工具,我们对KBBI和CMU字典数据集进行了比较。在两个核心cpu上训练三天后,我们在KBBI数据集上获得了6.7%的词错误率(WER),准确率为93.3%,比现有的最佳结果CMU字典数据集的26%的词错误率有所提高。在本研究中,我们通过评估单词的处理时间和错误率进行了详细的实验评估,并将其与目前的水平进行了比较。通过演示这个Transformer,该工具成功地将其一般化,然后将其应用于具有有限训练数据和大型训练数据的几个印度尼西亚元素。我们得出结论,变压器模型适合处理手头的G2P问题。
{"title":"Leveraging CNN and Bi-LSTM in Indonesian G2P Using Transformer","authors":"A. Rachman, S. Suyanto, Ema Rachmawati","doi":"10.1145/3457682.3457706","DOIUrl":"https://doi.org/10.1145/3457682.3457706","url":null,"abstract":"We apply a transformer called tensor2tensor toolkit, which is based on Tensorflow, to overcome the Grapheme-to-Phoneme conversion problem. This study performs conversions to produce pronunciation symbols for certain letter sequences in Indonesian particularly. The unavailability of the G2P conversion system in Indonesian is currently being faced, so research is being carried out to create a system that can solve this problem by applying the Transformer. The transformer has a simple network architecture based solely on the attention mechanism, so we took advantage of eliminating convolution and redundancies—complex recurrent and convolution neural networks including encoders and decoders as the basis for the sequence transduction model. The excellent performance of the model is obtained through the attention mechanism by connecting the encoder and decoder. By using this tool, we carry out to compare among KBBI and CMU dictionary datasets. We attained a word error rate (WER) of 6,7% on the KBBI data set after training for three days on two core CPUs, which has an accuracy of 93,3%, improving over the existing best results CMU dictionary dataset for 26% word error rate. In this study, we carried out a detailed experimental evaluation by assessing the processing time and the error rate of words and then compared it with state of the art. By demonstrating this Transformer, this tool successfully generalizes and then applies it to several Indonesian elements with limited training data and large training data. We concluded that the transformer model is suitable for dealing with the G2P problem at hand for this task.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130889070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visualization Analysis of Library Research in the Context of Big Data Based on Knowledge Map 基于知识地图的大数据背景下图书馆研究可视化分析
Pub Date : 2021-02-26 DOI: 10.1145/3457682.3457775
Chen Ke
The development of big data technology has brought a series of new content, opportunities and challenges to the library, and scholars have conducted many studies around this. This study obtained 98 related papers from the core collection of Web of Science, using the knowledge map research method, and using the CiteSpace software to analyze the number of annual papers, journals, authors, institutions, keywords and topic changes. The results show that scholars’ attention to this field has gradually increased, and the number of annual papers has increased year by year. China is the country with the highest contribution to the research, and the contribution of Chinese scholars is higher than that of other countries. Big data, university library, data management and information service are the key research contents of this field. In the end, this paper makes a research prospect, and scholars should further strengthen the research on user behavior, user portrait and intellectual property risk.
大数据技术的发展给图书馆带来了一系列新的内容、机遇和挑战,学者们对此进行了大量的研究。本研究从Web of Science核心馆藏中获取相关论文98篇,采用知识图谱研究方法,利用CiteSpace软件对年度论文数量、期刊数量、作者数量、机构数量、关键词数量、课题变化情况进行分析。结果表明,学者对这一领域的关注度逐渐提高,年度论文数量逐年增加。中国是研究贡献最高的国家,中国学者的贡献高于其他国家。大数据、高校图书馆、数据管理和信息服务是该领域的重点研究内容。最后,对本文的研究进行了展望,认为学者应进一步加强对用户行为、用户画像和知识产权风险的研究。
{"title":"Visualization Analysis of Library Research in the Context of Big Data Based on Knowledge Map","authors":"Chen Ke","doi":"10.1145/3457682.3457775","DOIUrl":"https://doi.org/10.1145/3457682.3457775","url":null,"abstract":"The development of big data technology has brought a series of new content, opportunities and challenges to the library, and scholars have conducted many studies around this. This study obtained 98 related papers from the core collection of Web of Science, using the knowledge map research method, and using the CiteSpace software to analyze the number of annual papers, journals, authors, institutions, keywords and topic changes. The results show that scholars’ attention to this field has gradually increased, and the number of annual papers has increased year by year. China is the country with the highest contribution to the research, and the contribution of Chinese scholars is higher than that of other countries. Big data, university library, data management and information service are the key research contents of this field. In the end, this paper makes a research prospect, and scholars should further strengthen the research on user behavior, user portrait and intellectual property risk.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133386306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
InSAR Deformation Time-series Reconstruction for Rainfall-induced Landslides Based on Gaussian Process Regression 基于高斯过程回归的降雨诱发滑坡InSAR变形时间序列重建
Pub Date : 2021-02-26 DOI: 10.1145/3457682.3457700
Zhiyong Li, Yunqi Wang, Jinghan Mu, Wei Liao, Kui Zhang
Multi-baseline interferometric synthetic aperture radar (InSAR) techniques have been accepted as effective remote sensing tools for detecting and monitoring landslide movements. With the use of stacked synthetic aperture radar (SAR) imageries, it is capable of generating precise ground displacement time-series. In order to further suppress noise induced by atmospheric effects, a post-process step, named as temporal filter, is required to be applied to the final displacement time-series in most applications. As displacement signals are strongly correlated in time, the traditional window-based/least squares filter is widely adopted. Since the window-based filter balances a tradeoff between noise smoothing and signal smoothing, the resulting time-series may strongly deviate from the true values when ground displacements appear high nonlinearity. In this paper, a new approach is proposed to reconstruct the InSAR deformation time-series for rainfall-induced landslides. This method establishes a nonparametric model based on the idea of Gaussian process regression (GPR) and introduces precipitation data as a priori knowledge. A strong relationship between rainfall history and ground movements is therefore constructed, which is extremely helpful in preventing the loss of high-frequency displacement signals. The proposed approach was applied to the InSAR landslide displacement time-series obtained from 108 European Space Agency (ESA) Sentinel-1A satellite SAR images. Experimental results demonstrate that it is capable of preserving the details of the temporal evolution of ground displacements effectively compared to the traditional window-based method, in particular on the surface of sliding mass.
多基线干涉合成孔径雷达(InSAR)技术已被公认为探测和监测滑坡运动的有效遥感工具。利用叠加合成孔径雷达(SAR)图像,可以生成精确的地面位移时间序列。为了进一步抑制大气效应引起的噪声,在大多数应用中,需要对最终位移时间序列进行一个后处理步骤,称为时间滤波器。由于位移信号具有较强的时间相关性,传统的基于窗口/最小二乘滤波器被广泛采用。由于基于窗口的滤波器平衡了噪声平滑和信号平滑之间的权衡,因此当地面位移出现高度非线性时,所得时间序列可能会严重偏离真实值。本文提出了一种重建降雨诱发滑坡InSAR变形时间序列的新方法。该方法基于高斯过程回归的思想建立非参数模型,并将降水数据作为先验知识引入。因此,建立了降雨历史和地面运动之间的紧密关系,这对防止高频位移信号的丢失非常有帮助。将该方法应用于108张欧洲空间局(ESA) Sentinel-1A卫星SAR图像获得的InSAR滑坡位移时间序列。实验结果表明,与传统的基于窗口的方法相比,该方法能够有效地保留地面位移随时间变化的细节,特别是在滑动体表面。
{"title":"InSAR Deformation Time-series Reconstruction for Rainfall-induced Landslides Based on Gaussian Process Regression","authors":"Zhiyong Li, Yunqi Wang, Jinghan Mu, Wei Liao, Kui Zhang","doi":"10.1145/3457682.3457700","DOIUrl":"https://doi.org/10.1145/3457682.3457700","url":null,"abstract":"Multi-baseline interferometric synthetic aperture radar (InSAR) techniques have been accepted as effective remote sensing tools for detecting and monitoring landslide movements. With the use of stacked synthetic aperture radar (SAR) imageries, it is capable of generating precise ground displacement time-series. In order to further suppress noise induced by atmospheric effects, a post-process step, named as temporal filter, is required to be applied to the final displacement time-series in most applications. As displacement signals are strongly correlated in time, the traditional window-based/least squares filter is widely adopted. Since the window-based filter balances a tradeoff between noise smoothing and signal smoothing, the resulting time-series may strongly deviate from the true values when ground displacements appear high nonlinearity. In this paper, a new approach is proposed to reconstruct the InSAR deformation time-series for rainfall-induced landslides. This method establishes a nonparametric model based on the idea of Gaussian process regression (GPR) and introduces precipitation data as a priori knowledge. A strong relationship between rainfall history and ground movements is therefore constructed, which is extremely helpful in preventing the loss of high-frequency displacement signals. The proposed approach was applied to the InSAR landslide displacement time-series obtained from 108 European Space Agency (ESA) Sentinel-1A satellite SAR images. Experimental results demonstrate that it is capable of preserving the details of the temporal evolution of ground displacements effectively compared to the traditional window-based method, in particular on the surface of sliding mass.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124146100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bird Songs Recognition Based on Ensemble Extreme Learning Machine 基于集成极限学习机的鸟鸣识别
Pub Date : 2021-02-26 DOI: 10.1145/3457682.3457750
S. Xie, Haifeng Xu, Jiang Liu, Yan Zhang, Danjv Lv
ELM (Extreme Learning Machine) is a random method for Single-hidden layer feedforward neural network construction, and MFCC (Mel-frequency Cepstrum Coefficient) is a kind of feature parameter for speech recognition. Based on Ensemble ELM research on bird songs recognition technology, this paper firstly preprocesses the bird songs data collected by web crawler, then extracts MFCC feature parameters from the songs data, and gets the improved MFCC feature parameters through differential calculation. Finally, Ensemble ELM is used for bird songs classification and recognition. The experimental results show that the Ensemble ELM method can achieve a recognition rate of 90.42% in the classification of 10 kinds of birds.
ELM (Extreme Learning Machine)是构建单隐层前馈神经网络的随机方法,MFCC (Mel-frequency倒频谱系数)是语音识别的一种特征参数。基于集成ELM对鸟鸣识别技术的研究,首先对网络爬虫采集的鸟鸣数据进行预处理,然后从鸟鸣数据中提取MFCC特征参数,通过微分计算得到改进的MFCC特征参数。最后,利用集合ELM对鸟鸣进行分类识别。实验结果表明,集成ELM方法对10种鸟类的分类识别率达到90.42%。
{"title":"Bird Songs Recognition Based on Ensemble Extreme Learning Machine","authors":"S. Xie, Haifeng Xu, Jiang Liu, Yan Zhang, Danjv Lv","doi":"10.1145/3457682.3457750","DOIUrl":"https://doi.org/10.1145/3457682.3457750","url":null,"abstract":"ELM (Extreme Learning Machine) is a random method for Single-hidden layer feedforward neural network construction, and MFCC (Mel-frequency Cepstrum Coefficient) is a kind of feature parameter for speech recognition. Based on Ensemble ELM research on bird songs recognition technology, this paper firstly preprocesses the bird songs data collected by web crawler, then extracts MFCC feature parameters from the songs data, and gets the improved MFCC feature parameters through differential calculation. Finally, Ensemble ELM is used for bird songs classification and recognition. The experimental results show that the Ensemble ELM method can achieve a recognition rate of 90.42% in the classification of 10 kinds of birds.","PeriodicalId":142045,"journal":{"name":"2021 13th International Conference on Machine Learning and Computing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115084191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2021 13th International Conference on Machine Learning and Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1