首页 > 最新文献

2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)最新文献

英文 中文
Text generation by probabilistic suffix tree language model 基于概率后缀树语言模型的文本生成
S. Marukatat
During last decade, language modeling has been dominated by neural structures; RNN, LSTM or Transformer. These neural language models provide excellent performance to the detriment of very high computational cost. This work investigates the use of probabilistic language model that requires much less computational cost. In particular, we are interested in variable-order Markov model that can be efficiently implemented on a probabilistic suffix tree (PST) structure. The PST construction is cheap and can be easily scaled to very large dataset. Experimental results show that this model can be used to generated realistic sentences.
在过去的十年里,语言建模一直被神经结构所主导;RNN, LSTM或Transformer。这些神经语言模型提供了优异的性能,但代价很高。这项工作研究了概率语言模型的使用,它需要更少的计算成本。我们特别感兴趣的是可以在概率后缀树(PST)结构上有效实现的变阶马尔可夫模型。PST构建成本低,可以很容易地扩展到非常大的数据集。实验结果表明,该模型可以用于生成真实的句子。
{"title":"Text generation by probabilistic suffix tree language model","authors":"S. Marukatat","doi":"10.1109/iSAI-NLP54397.2021.9678167","DOIUrl":"https://doi.org/10.1109/iSAI-NLP54397.2021.9678167","url":null,"abstract":"During last decade, language modeling has been dominated by neural structures; RNN, LSTM or Transformer. These neural language models provide excellent performance to the detriment of very high computational cost. This work investigates the use of probabilistic language model that requires much less computational cost. In particular, we are interested in variable-order Markov model that can be efficiently implemented on a probabilistic suffix tree (PST) structure. The PST construction is cheap and can be easily scaled to very large dataset. Experimental results show that this model can be used to generated realistic sentences.","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124060241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Named Entity Recognition of Thai Documents using CRF with a Simple Data Masking Technique 使用CRF和简单数据掩蔽技术的泰文文档命名实体识别
We examined the Named Entity Recognition (NER) of organizations in the Thai government’s project documents using a simple data masking technique with the help of an external dictionary. Our framework demonstrated its potential in the case that the external dictionary was incomplete and might not be used to label the training data exhaustively. A data masking technique on the administrative area part of the organization names was employed in an attempt to discover more organization entities outside the dictionary. The experimental results showed that our model gained higher recall while sacrificing a relatively small amount of precision. The proposed approach was also capable of recognizing entities which were never seen in the dictionary.
我们在外部字典的帮助下,使用简单的数据屏蔽技术检查了泰国政府项目文档中组织的命名实体识别(NER)。我们的框架展示了它在外部字典不完整的情况下的潜力,并且可能无法用于详尽地标记训练数据。在组织名称的管理区域部分采用数据屏蔽技术,试图发现字典之外的更多组织实体。实验结果表明,我们的模型在牺牲相对较小的精度的情况下获得了更高的召回率。该方法还能够识别字典中从未出现过的实体。
{"title":"Named Entity Recognition of Thai Documents using CRF with a Simple Data Masking Technique","authors":"","doi":"10.1109/iSAI-NLP54397.2021.9678156","DOIUrl":"https://doi.org/10.1109/iSAI-NLP54397.2021.9678156","url":null,"abstract":"We examined the Named Entity Recognition (NER) of organizations in the Thai government’s project documents using a simple data masking technique with the help of an external dictionary. Our framework demonstrated its potential in the case that the external dictionary was incomplete and might not be used to label the training data exhaustively. A data masking technique on the administrative area part of the organization names was employed in an attempt to discover more organization entities outside the dictionary. The experimental results showed that our model gained higher recall while sacrificing a relatively small amount of precision. The proposed approach was also capable of recognizing entities which were never seen in the dictionary.","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129696791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
iSAI-NLP Committee iSAI-NLP委员会
{"title":"iSAI-NLP Committee","authors":"","doi":"10.1109/isai-nlp54397.2021.9678191","DOIUrl":"https://doi.org/10.1109/isai-nlp54397.2021.9678191","url":null,"abstract":"","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129050097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feasibility of Prediction Model for Internal Tumor Target Volume from 4-D Computed Tomography of Lung cancer 肺癌四维计算机断层扫描内部肿瘤靶体积预测模型的可行性
U. Puangragsa, Pitchayakorn Lomvisai, P. Phasukkit, Sarut Puangragsa, J. Setakornnukul, Nongluck Houngkamhang, Petchanon Thongserm, P. Dankulchai
4-Dimensional computed tomography (4DCT) is the most common technique to determine organ movement due to breathing motion. However, the ability of 4DCT to acquire CT images as a function of the respiratory phase increases higher radiation dose. To reduce the patient’s radiation dose, this study created lung motion prediction models used to estimate tumor target movement in ten respiratory phases by detecting only external organ movement during a complete respiration cycle without radiation with Kinect. The average overall amplitude difference between RPM and Kinect signals in the phantom experiment was 0.02 ± 0.1 mm. F1 score of 100% for all most all classifications except classification 2,3,6,7 and 8 of 85%,83%,90%, 84%,85% where irregular breathing pattern. Essentially, the proposed tumor movement scheme’s total accuracy (average of F1 scores) is 92.7 %. Deep learning model can predict tumor motion range and classification zone by used detection of the external respiratory signal
四维计算机断层扫描(4DCT)是确定呼吸运动引起的器官运动的最常用技术。然而,4DCT获取CT图像的能力作为呼吸期的函数增加了更高的辐射剂量。为了减少患者的辐射剂量,本研究创建了肺运动预测模型,用于通过Kinect在没有辐射的情况下仅检测完整呼吸周期中的外部器官运动来估计十个呼吸期的肿瘤目标运动。在幻像实验中,RPM和Kinect信号的平均总振幅差为0.02±0.1 mm。除2、3、6、7、8类中呼吸方式不规则者的F1评分为85%、83%、90%、84%、85%外,其余大部分分类均为100%。从本质上讲,所提出的肿瘤运动方案的总准确率(F1评分的平均值)为92.7%。深度学习模型可以通过检测外部呼吸信号来预测肿瘤的运动范围和分类区域
{"title":"Feasibility of Prediction Model for Internal Tumor Target Volume from 4-D Computed Tomography of Lung cancer","authors":"U. Puangragsa, Pitchayakorn Lomvisai, P. Phasukkit, Sarut Puangragsa, J. Setakornnukul, Nongluck Houngkamhang, Petchanon Thongserm, P. Dankulchai","doi":"10.1109/iSAI-NLP54397.2021.9678177","DOIUrl":"https://doi.org/10.1109/iSAI-NLP54397.2021.9678177","url":null,"abstract":"4-Dimensional computed tomography (4DCT) is the most common technique to determine organ movement due to breathing motion. However, the ability of 4DCT to acquire CT images as a function of the respiratory phase increases higher radiation dose. To reduce the patient’s radiation dose, this study created lung motion prediction models used to estimate tumor target movement in ten respiratory phases by detecting only external organ movement during a complete respiration cycle without radiation with Kinect. The average overall amplitude difference between RPM and Kinect signals in the phantom experiment was 0.02 ± 0.1 mm. F1 score of 100% for all most all classifications except classification 2,3,6,7 and 8 of 85%,83%,90%, 84%,85% where irregular breathing pattern. Essentially, the proposed tumor movement scheme’s total accuracy (average of F1 scores) is 92.7 %. Deep learning model can predict tumor motion range and classification zone by used detection of the external respiratory signal","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122694083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Replay Attack Detection in Automatic Speaker Verification Based on ResNeWt18 with Linear Frequency Cepstral Coefficients 基于线性倒频系数ResNeWt18的说话人自动验证重播攻击检测
Anuwat Chaiwongyen, Kanokkarn Pinkeaw, W. Kongprawechnon, Jessada Karnjana, M. Unoki
This paper proposes, effective method for replay attack detection used in an automatic speaker verification system. The replay attack is of interest because it is the most straightforward and effective attack and is challenging to detect. It is a playback of the recording of the voice of a target speaker. From the literature, no speech features work well with all classifiers, and there is no investigation of using ResNet-based model, called ResNeWt, with linear frequency cepstral coefficient (LFCC). Therefore, a replay attack detection model based on 18-layer ResNeWt that takes LFCCs as the input, was constructed in this paper. The proposes method was tested on a dataset provided by ASVspoof 2019 competition. In terms of the equal error rate (EER), the proposed method is the best in all existing methods, with an EER of 0.29%. The comparison in terms of replay attack detection was also made in detail. The performance of the proposed method in terms of the balanced accuracy, precision, recall, and F1-score was considerably better than existing methods.
本文提出了一种有效的重放攻击检测方法,用于语音自动验证系统。重放攻击之所以令人感兴趣,是因为它是最直接、最有效的攻击,而且很难检测到。它是对目标说话者的声音录音的回放。从文献来看,没有语音特征可以很好地与所有分类器一起工作,并且没有研究使用基于resnet的模型,称为ResNeWt,具有线性频率倒谱系数(LFCC)。因此,本文构建了一个以lfc为输入的基于18层ResNeWt的重放攻击检测模型。该方法在ASVspoof 2019竞赛提供的数据集上进行了测试。在等错误率(EER)方面,该方法是所有现有方法中最好的,EER为0.29%。在重放攻击检测方面也进行了详细的比较。该方法在正确率、精密度、查全率和f1得分的平衡方面均明显优于现有方法。
{"title":"Replay Attack Detection in Automatic Speaker Verification Based on ResNeWt18 with Linear Frequency Cepstral Coefficients","authors":"Anuwat Chaiwongyen, Kanokkarn Pinkeaw, W. Kongprawechnon, Jessada Karnjana, M. Unoki","doi":"10.1109/iSAI-NLP54397.2021.9678164","DOIUrl":"https://doi.org/10.1109/iSAI-NLP54397.2021.9678164","url":null,"abstract":"This paper proposes, effective method for replay attack detection used in an automatic speaker verification system. The replay attack is of interest because it is the most straightforward and effective attack and is challenging to detect. It is a playback of the recording of the voice of a target speaker. From the literature, no speech features work well with all classifiers, and there is no investigation of using ResNet-based model, called ResNeWt, with linear frequency cepstral coefficient (LFCC). Therefore, a replay attack detection model based on 18-layer ResNeWt that takes LFCCs as the input, was constructed in this paper. The proposes method was tested on a dataset provided by ASVspoof 2019 competition. In terms of the equal error rate (EER), the proposed method is the best in all existing methods, with an EER of 0.29%. The comparison in terms of replay attack detection was also made in detail. The performance of the proposed method in terms of the balanced accuracy, precision, recall, and F1-score was considerably better than existing methods.","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114887132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DataCat: Attention-based Open Government Data (OGD) Category Recommendation Framework DataCat:基于关注的开放政府数据(OGD)类别推荐框架
Natnaree Sornkongdang, Nuttapong Sanglerdsinlapachai, Chutiporn Anutariya
A data category recommendation framework for Thailand’s open government data portal (ThOGD) is proposed to assist data providers when publishing and registering a new dataset into the portal’s data catalog. However, existing approaches such as a multi-label classification problem, have not adopted the semantic features of data categories sufficiently. Deep learning model for Natural Language Processing has recently demonstrated to achieve high potential in learning the different degrees of semantic feature abstraction because all layers of multi-head attention blocks are provided with different fragments of metadata descriptions and corresponding tags. To obtain a robust recommendation result, this paper proposes DataCat: a Category Recommendation Framework using the attention-based framework through the ThOGD portal. Within this framework, the integrated multi-layers with particular semantic information are directly attached to the output layer of a network to enhance the effectiveness of information retrieval. The results point out that the attention-based framework has a weighted effect on loss of optimization. The performance when looking at the macro average of precision and F1-score improves by 0.664% and 0.557%, respectively. The micro average of those improves by 0.806%, and 0.698%, respectively.
提出了一个针对泰国开放政府数据门户(ThOGD)的数据类别推荐框架,以帮助数据提供者将新数据集发布并注册到门户的数据目录中。然而,现有的方法,如多标签分类问题,并没有充分利用数据类别的语义特征。自然语言处理的深度学习模型在学习不同程度的语义特征抽象方面具有很大的潜力,因为多层多头注意块的每一层都提供了不同的元数据描述片段和相应的标签。为了获得稳健的推荐结果,本文通过ThOGD门户提出了基于注意力的类别推荐框架DataCat: a Category recommendation Framework。在该框架中,将具有特定语义信息的集成多层直接附加到网络的输出层,以提高信息检索的有效性。结果表明,基于注意力的框架对优化损失具有加权效应。当观察精度和f1分数的宏观平均值时,性能分别提高了0.664%和0.557%。其微观平均值分别提高了0.806%和0.698%。
{"title":"DataCat: Attention-based Open Government Data (OGD) Category Recommendation Framework","authors":"Natnaree Sornkongdang, Nuttapong Sanglerdsinlapachai, Chutiporn Anutariya","doi":"10.1109/iSAI-NLP54397.2021.9678174","DOIUrl":"https://doi.org/10.1109/iSAI-NLP54397.2021.9678174","url":null,"abstract":"A data category recommendation framework for Thailand’s open government data portal (ThOGD) is proposed to assist data providers when publishing and registering a new dataset into the portal’s data catalog. However, existing approaches such as a multi-label classification problem, have not adopted the semantic features of data categories sufficiently. Deep learning model for Natural Language Processing has recently demonstrated to achieve high potential in learning the different degrees of semantic feature abstraction because all layers of multi-head attention blocks are provided with different fragments of metadata descriptions and corresponding tags. To obtain a robust recommendation result, this paper proposes DataCat: a Category Recommendation Framework using the attention-based framework through the ThOGD portal. Within this framework, the integrated multi-layers with particular semantic information are directly attached to the output layer of a network to enhance the effectiveness of information retrieval. The results point out that the attention-based framework has a weighted effect on loss of optimization. The performance when looking at the macro average of precision and F1-score improves by 0.664% and 0.557%, respectively. The micro average of those improves by 0.806%, and 0.698%, respectively.","PeriodicalId":339826,"journal":{"name":"2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"720 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133181170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2021 16th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1