首页 > 最新文献

2020 24th International Computer Science and Engineering Conference (ICSEC)最新文献

英文 中文
Drowsiness Detection using Facial Emotions and Eye Aspect Ratios 使用面部情绪和眼睛宽高比检测睡意
Pub Date : 2020-12-03 DOI: 10.1109/ICSEC51790.2020.9375240
Sunsern Ceamanunkul, Sanchit Chawla
Drowsy drivers are a major cause of many road accidents around the world. Facial emotions are known to be one of the visual cues for detecting drowsiness. In this paper, we propose a machine learning approach to drowsiness detection based on using a combination of facial emotion features extracted by using deep convolutional neural networks (CNN) and eye-aspect-ratio (EAR) features. The combined feature vectors are then used for training a classifier. From our experiments, we obtain a classification accuracy of 81.7% when we use the combined features with a support vector machines (SVM) classifier.
昏昏欲睡的司机是世界各地许多交通事故的主要原因。众所周知,面部情绪是检测睡意的视觉线索之一。在本文中,我们提出了一种基于深度卷积神经网络(CNN)和眼宽比(EAR)特征提取的面部情绪特征相结合的困倦检测机器学习方法。然后使用组合的特征向量来训练分类器。从我们的实验中,当我们将特征与支持向量机(SVM)分类器结合使用时,我们获得了81.7%的分类准确率。
{"title":"Drowsiness Detection using Facial Emotions and Eye Aspect Ratios","authors":"Sunsern Ceamanunkul, Sanchit Chawla","doi":"10.1109/ICSEC51790.2020.9375240","DOIUrl":"https://doi.org/10.1109/ICSEC51790.2020.9375240","url":null,"abstract":"Drowsy drivers are a major cause of many road accidents around the world. Facial emotions are known to be one of the visual cues for detecting drowsiness. In this paper, we propose a machine learning approach to drowsiness detection based on using a combination of facial emotion features extracted by using deep convolutional neural networks (CNN) and eye-aspect-ratio (EAR) features. The combined feature vectors are then used for training a classifier. From our experiments, we obtain a classification accuracy of 81.7% when we use the combined features with a support vector machines (SVM) classifier.","PeriodicalId":158728,"journal":{"name":"2020 24th International Computer Science and Engineering Conference (ICSEC)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122071346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Retinal Blood Vessel Extraction by Using Pre-processing and IterNet Model 基于预处理和internet模型的视网膜血管提取
Pub Date : 2020-12-03 DOI: 10.1109/ICSEC51790.2020.9375423
Kasi Tenghongsakul, Isoon Kanjanasurat, B. Purahong, A. Lasakul
At present, many of visual disease happened from the abnormality of retinal vessels. The automatic vascular extraction from fundus images is essential for the diagnosis to reduce vision loss. This paper offers retinal blood vessel segmentation using the pre-processing and IterNet model, a convolution neural network. The green channel and gray scale image that is high contrast between the blood vessel and background, including the normalization, were used to improve blood vessel image quality. The proposed method was tested with two widely used databases, including DRIVE and CHASEDB-1, which unique characteristics in each data set. The results of blood vessel extraction of Drive and CHASEDB-1 achieved sensitivity 0.8126 and 0.7541, respectively.
目前,许多视觉疾病都是由视网膜血管异常引起的。眼底图像中血管的自动提取对于减少视力损失的诊断至关重要。本文采用预处理和卷积神经网络IterNet模型对视网膜血管进行分割。利用绿色通道和血管与背景对比度高的灰度图像,包括归一化,提高血管图像质量。采用DRIVE和CHASEDB-1这两个广泛使用的数据库对所提出的方法进行了测试,这两个数据库在每个数据集中都具有独特的特征。Drive和CHASEDB-1的血管提取灵敏度分别为0.8126和0.7541。
{"title":"Retinal Blood Vessel Extraction by Using Pre-processing and IterNet Model","authors":"Kasi Tenghongsakul, Isoon Kanjanasurat, B. Purahong, A. Lasakul","doi":"10.1109/ICSEC51790.2020.9375423","DOIUrl":"https://doi.org/10.1109/ICSEC51790.2020.9375423","url":null,"abstract":"At present, many of visual disease happened from the abnormality of retinal vessels. The automatic vascular extraction from fundus images is essential for the diagnosis to reduce vision loss. This paper offers retinal blood vessel segmentation using the pre-processing and IterNet model, a convolution neural network. The green channel and gray scale image that is high contrast between the blood vessel and background, including the normalization, were used to improve blood vessel image quality. The proposed method was tested with two widely used databases, including DRIVE and CHASEDB-1, which unique characteristics in each data set. The results of blood vessel extraction of Drive and CHASEDB-1 achieved sensitivity 0.8126 and 0.7541, respectively.","PeriodicalId":158728,"journal":{"name":"2020 24th International Computer Science and Engineering Conference (ICSEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128818333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Abnormality Detection in Musculoskeletal Radiographs using EfficientNets 利用EfficientNets检测肌肉骨骼x线片中的异常
Pub Date : 2020-12-03 DOI: 10.1109/ICSEC51790.2020.9375275
Kasemsit Teeyapan
Abnormality detection in musculoskeletal radiographs, a regular task for radiologists, requires both experiences and efforts. To increase the number of radiographs interpreted each day, this paper presents cost-efficient deep learning models based on ensembles of EfficientNet architectures to help automate the detection process. We investigate the transfer learning performance of ImageNet pre-trained checkpoints on the musculoskeletal radiograph (MURA) dataset which is very different from the ImageNet dataset. The experimental results show that, the ImageNet pre-trained checkpoints have to be retrained on the entire MURA training set, before being trained on a specific study type. The performance of the EfficientNet-based models is shown to be superior to three baseline models. In particular, EfficientNet-B3 not only achieved the overall Cohen’s Kappa score of 0.717, compared to the scores 0.680, 0.688, and 0.712 for MobileNetV2, DenseNet-169, and Xception, respectively, but also being better in term of efficiency.
肌肉骨骼x线片异常检测是放射科医师的一项常规任务,需要经验和努力。为了增加每天解读的x光片数量,本文提出了基于高效网络架构集成的经济高效的深度学习模型,以帮助自动化检测过程。我们研究了与ImageNet数据集非常不同的肌肉骨骼x线照片(MURA)数据集上ImageNet预训练检查点的迁移学习性能。实验结果表明,ImageNet预训练的检查点必须在整个MURA训练集上进行再训练,然后才能对特定的研究类型进行训练。基于efficientnet的模型的性能优于三个基线模型。特别是,与MobileNetV2、DenseNet-169和Xception的0.680、0.688和0.712相比,EfficientNet-B3不仅达到了0.717的Cohen’s Kappa总分,而且在效率方面也更胜一筹。
{"title":"Abnormality Detection in Musculoskeletal Radiographs using EfficientNets","authors":"Kasemsit Teeyapan","doi":"10.1109/ICSEC51790.2020.9375275","DOIUrl":"https://doi.org/10.1109/ICSEC51790.2020.9375275","url":null,"abstract":"Abnormality detection in musculoskeletal radiographs, a regular task for radiologists, requires both experiences and efforts. To increase the number of radiographs interpreted each day, this paper presents cost-efficient deep learning models based on ensembles of EfficientNet architectures to help automate the detection process. We investigate the transfer learning performance of ImageNet pre-trained checkpoints on the musculoskeletal radiograph (MURA) dataset which is very different from the ImageNet dataset. The experimental results show that, the ImageNet pre-trained checkpoints have to be retrained on the entire MURA training set, before being trained on a specific study type. The performance of the EfficientNet-based models is shown to be superior to three baseline models. In particular, EfficientNet-B3 not only achieved the overall Cohen’s Kappa score of 0.717, compared to the scores 0.680, 0.688, and 0.712 for MobileNetV2, DenseNet-169, and Xception, respectively, but also being better in term of efficiency.","PeriodicalId":158728,"journal":{"name":"2020 24th International Computer Science and Engineering Conference (ICSEC)","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122055536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A framework for cross-datasources agricultural research-to-impact analysis 跨数据源农业研究到影响分析的框架
Pub Date : 2020-12-03 DOI: 10.1109/ICSEC51790.2020.9375271
Nalina Phisanbut, Poonsak Nuchsiri, Pasith Thanapatpisarn, Sittinun Pinthaya, Noppagorn Panpa, Piyanat Teinlek, P. Piamsa-nga
Agricultural research is a very important activity for developing countries as its economy relies on the agricultural sector. To ensure that the investment in the research is in the right direction, it is necessary to determine the relationship between trade values and invested research. However, the effective and efficient evaluation is constrained by the complexity and fragmentation of information required for analysis. The large number of agricultural products and related research items occurred between the time research grants were allocated and the time of the trade, such as research projects, publications, intellectual property, etc. mean that the amount of data to be processed is enormous and is responsible by many organizations. The data which are collected and stored in different databases are uncoordinated and there are seldom explicit links between records, both within and across databases. The only research item with direct links is research publication and even that is rarely attributed directly to research grants.In this paper, we propose a framework for cross-datasources analysis for agricultural products. The data are automatically collected from official sources of agricultural data and stored into a unified database to eliminate dependencies between the visualization and structure of datasources. The pathways are recognized by analyzing links between items among their parameters, such as names, affiliations, etc. The framework is demonstrated by analyzing agricultural research activities in Thailand. The total number of gathered data records is approximately 8.8 million records. Visualization of research-to-impact pathways of two agricultural products (pineapple and sugarcane) are used as case study.
农业研究对发展中国家来说是一项非常重要的活动,因为它们的经济依赖于农业部门。为了确保研究投入的方向正确,有必要确定贸易价值与研究投入之间的关系。然而,有效和高效的评估受到分析所需信息的复杂性和碎片化的限制。从研究经费拨付到交易时间之间发生了大量的农产品和相关研究项目,如研究项目、出版物、知识产权等,这意味着需要处理的数据量是巨大的,由许多组织负责。收集和存储在不同数据库中的数据是不协调的,数据库内部和数据库之间的记录之间很少有明确的联系。唯一与之有直接联系的研究项目是研究出版物,即使是出版物也很少直接归因于研究经费。本文提出了一个农产品跨数据源分析的框架。从官方农业数据中自动采集数据,存储到统一的数据库中,消除了数据源的可视化和结构之间的依赖关系。通过分析项目之间的参数(如名称、隶属关系等)之间的联系来识别路径。通过对泰国农业研究活动的分析,论证了该框架。收集的数据记录总数约为880万条记录。以两种农产品(菠萝和甘蔗)从研究到影响的可视化路径为例进行了研究。
{"title":"A framework for cross-datasources agricultural research-to-impact analysis","authors":"Nalina Phisanbut, Poonsak Nuchsiri, Pasith Thanapatpisarn, Sittinun Pinthaya, Noppagorn Panpa, Piyanat Teinlek, P. Piamsa-nga","doi":"10.1109/ICSEC51790.2020.9375271","DOIUrl":"https://doi.org/10.1109/ICSEC51790.2020.9375271","url":null,"abstract":"Agricultural research is a very important activity for developing countries as its economy relies on the agricultural sector. To ensure that the investment in the research is in the right direction, it is necessary to determine the relationship between trade values and invested research. However, the effective and efficient evaluation is constrained by the complexity and fragmentation of information required for analysis. The large number of agricultural products and related research items occurred between the time research grants were allocated and the time of the trade, such as research projects, publications, intellectual property, etc. mean that the amount of data to be processed is enormous and is responsible by many organizations. The data which are collected and stored in different databases are uncoordinated and there are seldom explicit links between records, both within and across databases. The only research item with direct links is research publication and even that is rarely attributed directly to research grants.In this paper, we propose a framework for cross-datasources analysis for agricultural products. The data are automatically collected from official sources of agricultural data and stored into a unified database to eliminate dependencies between the visualization and structure of datasources. The pathways are recognized by analyzing links between items among their parameters, such as names, affiliations, etc. The framework is demonstrated by analyzing agricultural research activities in Thailand. The total number of gathered data records is approximately 8.8 million records. Visualization of research-to-impact pathways of two agricultural products (pineapple and sugarcane) are used as case study.","PeriodicalId":158728,"journal":{"name":"2020 24th International Computer Science and Engineering Conference (ICSEC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121197445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tree-Based Hybrid Genetic Algorithm for Density-Based Data Clustering 基于树的基于密度的数据聚类混合遗传算法
Pub Date : 2020-12-03 DOI: 10.1109/ICSEC51790.2020.9375214
Mozammel H. A. Khan
Data clustering algorithms partition a given set of data points into groups containing very similar data points. Representative-based and density-based algorithms are generally used for data clustering. These algorithms are heuristic algorithms and may stuck at a sub-optimal clustering. Crisp clustering problem is a combinatorial optimization problem. Genetic Algorithms generally perform better than heuristic algorithms for combinatorial optimization. In this work, we propose a hybrid Genetic Algorithm for density-based clustering. For this purpose, we represent a cluster using a forest of trees, where the nodes of the trees are the data points. We use a tree-based fitness function. Beside 1-point crossover, we use a deterministic improvement of offspring. We implement the proposed algorithm using C language and run on a personal computer. We experiment with five datasets from UCI Machine Learning Repository. The proposed algorithm outperforms for both low and high-dimensional datasets over existing algorithms, except for one high-dimensional dataset.
数据聚类算法将一组给定的数据点划分为包含非常相似的数据点的组。基于代表性和基于密度的算法通常用于数据聚类。这些算法是启发式算法,可能停留在次优聚类上。脆聚类问题是一个组合优化问题。遗传算法在组合优化中的表现通常优于启发式算法。在这项工作中,我们提出了一种基于密度的聚类混合遗传算法。为此,我们使用树的森林来表示集群,其中树的节点是数据点。我们使用基于树的适应度函数。除了1点交叉,我们使用后代的确定性改进。我们用C语言实现了该算法,并在个人计算机上运行。我们用UCI机器学习存储库中的五个数据集进行实验。除了一个高维数据集外,该算法在低维和高维数据集上都优于现有算法。
{"title":"Tree-Based Hybrid Genetic Algorithm for Density-Based Data Clustering","authors":"Mozammel H. A. Khan","doi":"10.1109/ICSEC51790.2020.9375214","DOIUrl":"https://doi.org/10.1109/ICSEC51790.2020.9375214","url":null,"abstract":"Data clustering algorithms partition a given set of data points into groups containing very similar data points. Representative-based and density-based algorithms are generally used for data clustering. These algorithms are heuristic algorithms and may stuck at a sub-optimal clustering. Crisp clustering problem is a combinatorial optimization problem. Genetic Algorithms generally perform better than heuristic algorithms for combinatorial optimization. In this work, we propose a hybrid Genetic Algorithm for density-based clustering. For this purpose, we represent a cluster using a forest of trees, where the nodes of the trees are the data points. We use a tree-based fitness function. Beside 1-point crossover, we use a deterministic improvement of offspring. We implement the proposed algorithm using C language and run on a personal computer. We experiment with five datasets from UCI Machine Learning Repository. The proposed algorithm outperforms for both low and high-dimensional datasets over existing algorithms, except for one high-dimensional dataset.","PeriodicalId":158728,"journal":{"name":"2020 24th International Computer Science and Engineering Conference (ICSEC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129820468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multiclass Classification of Astronomical Objects in the Galaxy M81 using Machine Learning Techniques 用机器学习技术对M81星系中天体进行多类分类
Pub Date : 2020-12-03 DOI: 10.1109/ICSEC51790.2020.9375279
Tapanapong Chuntama, P. Techa-angkoon, C. Suwannajak, Benjamas Panyangam, N. Tanakul
Data in astronomy usually contain various classes of astronomical objects. In this study, we explore the application of multiclass classification in classifying astronomical objects in the galaxy MS1. Our objective is to specify machine learning techniques that are best suited to our data and our classification goal. We used the archival data retrieved from the CanadaFrance-Hawaii Telescope (CFHT) data archive. The imaging data were transformed into data tables, then classified based on their visual appearances into five classes, including star, globular cluster, rounded galaxy, elongated galaxy, and fuzzy object. The classified data were used for supervised machine learning model building and testing. We investigated seven classification techniques, including Random Forest, Multilayer Perceptron, Weightless neural network (WiSARD), Deep learning (Weka deep learning), Logistic Regression, Support Vector Machine (SVM), and Multiclass Classifier. Our experiments show that Random Forest and Multilayer Perceptron archived the highest overall performances and are the best-suited model for classifying astronomical objects in the CFHT data of the galaxy M81.
天文学中的数据通常包含不同种类的天体。在这项研究中,我们探索了多类分类在MS1星系天体分类中的应用。我们的目标是指定最适合我们的数据和分类目标的机器学习技术。我们使用了从加拿大-法国-夏威夷望远镜(CFHT)数据档案中检索到的档案数据。将成像数据转换成数据表,并根据其视觉外观将其分为5类:恒星、球状星团、圆形星系、细长星系和模糊天体。分类后的数据用于监督式机器学习模型构建和测试。我们研究了7种分类技术,包括随机森林、多层感知器、失重神经网络(WiSARD)、深度学习(Weka深度学习)、逻辑回归、支持向量机(SVM)和多类分类器。我们的实验表明,随机森林和多层感知器在M81星系CFHT数据中具有最高的综合性能,是最适合的天体分类模型。
{"title":"Multiclass Classification of Astronomical Objects in the Galaxy M81 using Machine Learning Techniques","authors":"Tapanapong Chuntama, P. Techa-angkoon, C. Suwannajak, Benjamas Panyangam, N. Tanakul","doi":"10.1109/ICSEC51790.2020.9375279","DOIUrl":"https://doi.org/10.1109/ICSEC51790.2020.9375279","url":null,"abstract":"Data in astronomy usually contain various classes of astronomical objects. In this study, we explore the application of multiclass classification in classifying astronomical objects in the galaxy MS1. Our objective is to specify machine learning techniques that are best suited to our data and our classification goal. We used the archival data retrieved from the CanadaFrance-Hawaii Telescope (CFHT) data archive. The imaging data were transformed into data tables, then classified based on their visual appearances into five classes, including star, globular cluster, rounded galaxy, elongated galaxy, and fuzzy object. The classified data were used for supervised machine learning model building and testing. We investigated seven classification techniques, including Random Forest, Multilayer Perceptron, Weightless neural network (WiSARD), Deep learning (Weka deep learning), Logistic Regression, Support Vector Machine (SVM), and Multiclass Classifier. Our experiments show that Random Forest and Multilayer Perceptron archived the highest overall performances and are the best-suited model for classifying astronomical objects in the CFHT data of the galaxy M81.","PeriodicalId":158728,"journal":{"name":"2020 24th International Computer Science and Engineering Conference (ICSEC)","volume":"206 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116192172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Improving Thai Word Segmentation using HMM: A Case Study of Sentiment Analysis 用HMM改进泰语分词:以情感分析为例
Pub Date : 2020-12-03 DOI: 10.1109/ICSEC51790.2020.9375142
Thapani Hengsanankun, Atchara Namburi
Word segmentation is a basic problem in the natural language processing of non-boundary delimiters language, especially for the Thai language. The ambiguity of the boundaries of the words in the sentence is one of the significant problems that can cause an unknown word and affects the word segmentation accuracy. This paper presents an improving Thai word segmentation using Hidden Markov Model to cope with an unknown word problem. The five-state of left-to-right HMMs are built according to the classes of the unknown word by applied the parts of speech of the Thai language as the observation symbols of the model. To determine the unknown word in the sentence, the String Matching algorithm is first implemented to find overlapping words and unknown words. The unknown words that unidentified by the lexical dictionary are classified according to their classes by the HMMs. Then the word combining rules are applied to determine the proper word boundary and to merge possible characters into words. In addition, the sentiment analysis task of polarity detection was selected as a case study to verify the accuracy of the proposed method. The precision, recall, and F-measure are used for evaluating the efficiency of the proposed method. The empirical results show that both segmented words and polarity classification results obtained by the proposed method tend to outperform the existing methods.
分词是自然语言处理中的一个基本问题,对泰语来说尤其如此。句子中词边界的模糊性是导致词未知并影响分词精度的重要问题之一。本文提出了一种基于隐马尔可夫模型的改进的泰语分词方法。采用泰语词类作为模型的观察符号,根据未知词的类别构建了从左到右的五态hmm。为了确定句子中的未知词,首先实现字符串匹配算法,查找重叠词和未知词。对词典中未识别的未知词进行hmm分类。然后应用单词组合规则来确定合适的单词边界,并将可能的字符合并成单词。此外,以极性检测的情感分析任务为例,验证了所提方法的准确性。用精密度、召回率和f值来评价该方法的有效性。实证结果表明,该方法得到的分词结果和极性分类结果都优于现有方法。
{"title":"Improving Thai Word Segmentation using HMM: A Case Study of Sentiment Analysis","authors":"Thapani Hengsanankun, Atchara Namburi","doi":"10.1109/ICSEC51790.2020.9375142","DOIUrl":"https://doi.org/10.1109/ICSEC51790.2020.9375142","url":null,"abstract":"Word segmentation is a basic problem in the natural language processing of non-boundary delimiters language, especially for the Thai language. The ambiguity of the boundaries of the words in the sentence is one of the significant problems that can cause an unknown word and affects the word segmentation accuracy. This paper presents an improving Thai word segmentation using Hidden Markov Model to cope with an unknown word problem. The five-state of left-to-right HMMs are built according to the classes of the unknown word by applied the parts of speech of the Thai language as the observation symbols of the model. To determine the unknown word in the sentence, the String Matching algorithm is first implemented to find overlapping words and unknown words. The unknown words that unidentified by the lexical dictionary are classified according to their classes by the HMMs. Then the word combining rules are applied to determine the proper word boundary and to merge possible characters into words. In addition, the sentiment analysis task of polarity detection was selected as a case study to verify the accuracy of the proposed method. The precision, recall, and F-measure are used for evaluating the efficiency of the proposed method. The empirical results show that both segmented words and polarity classification results obtained by the proposed method tend to outperform the existing methods.","PeriodicalId":158728,"journal":{"name":"2020 24th International Computer Science and Engineering Conference (ICSEC)","volume":"235 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132424837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AutoShapelet: Reconstructable Time Series Shapelets AutoShapelet:可重构的时间序列Shapelets
Pub Date : 2020-12-03 DOI: 10.1109/ICSEC51790.2020.9375153
Pongsakorn Ajchariyasakchai, T. Rakthanmanon
Time series shapelets is a snippets of time series that can distinguish one class from others. In the last decade, many researches show that time series shapelets is not only one of the most promising classification techniques, but also a desirable solution because it is simply an explainable result to the experts. However, Two main drawbacks of time series shapelets discovery are speed and the appearance of the candidates and its representative, i.e. the time series shapelets itself. In this paper, we do not improve the running time of discovering the time series shapelets, but we propose a new method to learn the shape of time series shapelets, instead of picking one from candidates. The number of candidates can be vary from ten thousands to millions subsequences or even more depended on the length of the candidates. In this paper, autoencoder technique is applied to reduce the complexity of candidates from the higher-dimensional space to the much smaller-dimensional space, to highlight the potential candidates as the representatives, to learn the shapes of those candidates instead of the individual one, and to reconstruct the more-smooth time series shapelets. Our time series shapelets, named autoshaplets, is not fit to the exact value of the training data anymore, which normally is noisy according to the real observation. The experiment results demonstrate that the new generated shapelets can achieve higher accuracy compared to the exact shapelets, and it is less sensitive to the training data.
时间序列shapelets是时间序列的片段,可以将一个类与其他类区分开来。在过去的十年中,许多研究表明,时间序列shapelets不仅是最有前途的分类技术之一,而且由于它对专家来说是一个简单的可解释的结果,因此是一种理想的解决方案。然而,时间序列shapelets发现的两个主要缺点是速度和候选对象及其代表(即时间序列shapelets本身)的出现。在本文中,我们没有提高发现时间序列shapelets的运行时间,但我们提出了一种新的方法来学习时间序列shapelets的形状,而不是从候选shapelets中选择一个。根据候选序列的长度,候选序列的数量可以从1万个到数百万个不等,甚至更多。本文采用自编码器技术,将候选对象的复杂度从高维空间降低到小维空间,突出潜在候选对象作为代表,学习候选对象的形状而不是单个候选对象的形状,重构更光滑的时间序列shapelets。我们的时间序列小波被称为autoshaplets,它不再适合于训练数据的精确值,根据实际观察,训练数据通常是有噪声的。实验结果表明,新生成的shapelets比原有的shapelets具有更高的精度,并且对训练数据的敏感性较低。
{"title":"AutoShapelet: Reconstructable Time Series Shapelets","authors":"Pongsakorn Ajchariyasakchai, T. Rakthanmanon","doi":"10.1109/ICSEC51790.2020.9375153","DOIUrl":"https://doi.org/10.1109/ICSEC51790.2020.9375153","url":null,"abstract":"Time series shapelets is a snippets of time series that can distinguish one class from others. In the last decade, many researches show that time series shapelets is not only one of the most promising classification techniques, but also a desirable solution because it is simply an explainable result to the experts. However, Two main drawbacks of time series shapelets discovery are speed and the appearance of the candidates and its representative, i.e. the time series shapelets itself. In this paper, we do not improve the running time of discovering the time series shapelets, but we propose a new method to learn the shape of time series shapelets, instead of picking one from candidates. The number of candidates can be vary from ten thousands to millions subsequences or even more depended on the length of the candidates. In this paper, autoencoder technique is applied to reduce the complexity of candidates from the higher-dimensional space to the much smaller-dimensional space, to highlight the potential candidates as the representatives, to learn the shapes of those candidates instead of the individual one, and to reconstruct the more-smooth time series shapelets. Our time series shapelets, named autoshaplets, is not fit to the exact value of the training data anymore, which normally is noisy according to the real observation. The experiment results demonstrate that the new generated shapelets can achieve higher accuracy compared to the exact shapelets, and it is less sensitive to the training data.","PeriodicalId":158728,"journal":{"name":"2020 24th International Computer Science and Engineering Conference (ICSEC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131552318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine Translation of LATEX Based Mathematical Equations to Spoken Mathematics 基于LATEX的数学方程到口语数学的机器翻译
Pub Date : 2020-12-03 DOI: 10.1109/ICSEC51790.2020.9375339
Myo Mar Thinn, Ye Kyaw Thu, Hlaing Myat Nwe, Nyo Nyo Yee, Thandar Myint, Hninn Aye Thant, T. Supnithi
This paper describes the machine translation of LATEX encoded mathematical equations to spoken mathematical sentences. A LATEX- Spoken math parallel corpus (5,600 sentences) was developed. In this paper, the 10-fold cross-validation experiments were carried out by applying Phrase-based Statistical Machine Translation (PBSMT), Weighted Finite-State Transducers (WFST) and Ripple Down Rules (RDR) based tagging approaches. The BLEU, RIBES, F1 and WER evaluation scoring metrics are used for measuring translation performance. The experimental results show that the PBSMT approach achieved the highest translation performance for LATEX mathematical equations to spoken mathematical sentences translation. Moreover, we found that the translation performance of RDR approach is comparable with PBSMT.
本文描述了LATEX编码数学方程到口语数学句子的机器翻译。开发了一个数学口语平行语料库(5600个句子)。本文采用基于短语的统计机器翻译(PBSMT)、加权有限状态传感器(WFST)和Ripple Down Rules (RDR)标记方法进行了10倍交叉验证实验。BLEU, RIBES, F1和WER评估评分指标用于衡量翻译性能。实验结果表明,PBSMT方法对LATEX数学方程到口语数学句子的翻译效果最好。此外,我们发现RDR方法的翻译性能与PBSMT方法相当。
{"title":"Machine Translation of LATEX Based Mathematical Equations to Spoken Mathematics","authors":"Myo Mar Thinn, Ye Kyaw Thu, Hlaing Myat Nwe, Nyo Nyo Yee, Thandar Myint, Hninn Aye Thant, T. Supnithi","doi":"10.1109/ICSEC51790.2020.9375339","DOIUrl":"https://doi.org/10.1109/ICSEC51790.2020.9375339","url":null,"abstract":"This paper describes the machine translation of LATEX encoded mathematical equations to spoken mathematical sentences. A LATEX- Spoken math parallel corpus (5,600 sentences) was developed. In this paper, the 10-fold cross-validation experiments were carried out by applying Phrase-based Statistical Machine Translation (PBSMT), Weighted Finite-State Transducers (WFST) and Ripple Down Rules (RDR) based tagging approaches. The BLEU, RIBES, F1 and WER evaluation scoring metrics are used for measuring translation performance. The experimental results show that the PBSMT approach achieved the highest translation performance for LATEX mathematical equations to spoken mathematical sentences translation. Moreover, we found that the translation performance of RDR approach is comparable with PBSMT.","PeriodicalId":158728,"journal":{"name":"2020 24th International Computer Science and Engineering Conference (ICSEC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130121913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Approximating k-Connected m-Dominating Sets in Disk Graphs 圆盘图中k连通m控制集的逼近
Pub Date : 2020-12-03 DOI: 10.1109/ICSEC51790.2020.9375178
Kunanon Burathep, Jittat Fakcharoenphol, Nonthaphat Wongwattanakij
This paper considers dominating set problems in a disk graphs, which is a generalization of unit disk graphs extensively used to analyze homogeneous sensor or wireless networks. When considering heterogeneous networks, it is useful to consider disk graphs that contain disks with different radii. Given graph $G=(V,E)$, set $Dsubseteq V$ is a $(k,m)$ -connected dominating set for G if every node in V is either in D or has at least m neighbors in D and the induced subgraph $G[D]$ is also k -connected. Many approximation algorithms are known for this problem in unit disk graphs. We prove various properties for disk graphs so that these algorithms can be generalized to disk graphs. Namely, we show that a $displaystyle minleft{frac{m}{m-k},sqrt{k}right}cdot Oleft(ln^{2}kright)$ - approximation algorithm of Nutov works in this setting. We also present a PTAS for finding a $(1+epsilon)$ -approximate solution to the m -dominating set problem in disk graphs that runs in time $n^{O(m/epsilon)}$
本文研究圆盘图中的支配集问题,它是广泛用于分析同质传感器或无线网络的单元圆盘图的一种推广。在考虑异构网络时,考虑包含不同半径磁盘的磁盘图是很有用的。给定图$G=(V,E)$,如果V中的每个节点都在D中或在D中至少有m个邻居,并且诱导子图$G[D]$也是k连通的,则集$Dsubseteq V$是G的$(k,m)$连通支配集。对于单位磁盘图中的这个问题,已知有许多近似算法。我们证明了磁盘图的各种性质,使这些算法可以推广到磁盘图。也就是说,我们证明了Nutov的$displaystyle minleft{frac{m}{m-k},sqrt{k}right}cdot Oleft(ln^{2}kright)$ -近似算法在这种情况下工作。我们还提出了一个PTAS,用于寻找在时间运行的磁盘图中m支配集问题的$(1+epsilon)$ -近似解 $n^{O(m/epsilon)}$
{"title":"Approximating k-Connected m-Dominating Sets in Disk Graphs","authors":"Kunanon Burathep, Jittat Fakcharoenphol, Nonthaphat Wongwattanakij","doi":"10.1109/ICSEC51790.2020.9375178","DOIUrl":"https://doi.org/10.1109/ICSEC51790.2020.9375178","url":null,"abstract":"This paper considers dominating set problems in a disk graphs, which is a generalization of unit disk graphs extensively used to analyze homogeneous sensor or wireless networks. When considering heterogeneous networks, it is useful to consider disk graphs that contain disks with different radii. Given graph $G=(V,E)$, set $Dsubseteq V$ is a $(k,m)$ -connected dominating set for G if every node in V is either in D or has at least m neighbors in D and the induced subgraph $G[D]$ is also k -connected. Many approximation algorithms are known for this problem in unit disk graphs. We prove various properties for disk graphs so that these algorithms can be generalized to disk graphs. Namely, we show that a $displaystyle minleft{frac{m}{m-k},sqrt{k}right}cdot Oleft(ln^{2}kright)$ - approximation algorithm of Nutov works in this setting. We also present a PTAS for finding a $(1+epsilon)$ -approximate solution to the m -dominating set problem in disk graphs that runs in time $n^{O(m/epsilon)}$","PeriodicalId":158728,"journal":{"name":"2020 24th International Computer Science and Engineering Conference (ICSEC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123279432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2020 24th International Computer Science and Engineering Conference (ICSEC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1