首页 > 最新文献

2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)最新文献

英文 中文
Enhancing Thai Keyphrase Extraction Using Syntactic Relations: An Adoption of Universal Dependencies Framework 使用句法关系增强泰语关键词提取:通用依赖框架的采用
Chanatip Saetia, Tawunrat Chalothorn, Supawat Taerungruang
Topical phrases representing the document and used in various fields are called keyphrases. Various methods are proposed to extract keyphrases automatically. However, most methods rely on candidate selection using linguistic heuristics in the English language. In this work for Thai keyphrases extraction, the candidate selection based on Universal Dependencies (UD) is proposed rather than using only POS sequence to make this step language independent. To enhance candidate selection, tree-based keyphrases extraction is also adapted to keep only logical candidates based on the cohesiveness index (CI). Besides that, the score filtering is proposed to combine linguistic heuristics, like stop words and the phrase's position. In the experiments, our method gained the double averaged F1 score of the state-of-the-art method, even though the UD was trained by only 1,781 EDUs and achieved 84% labeled attachment score. In addition, ablation studies on each factor in score filtering revealed which factor is important for keyphrase extraction.
表示文档并用于各个领域的主题短语称为关键短语。提出了多种自动提取关键短语的方法。然而,大多数方法依赖于在英语语言中使用语言启发式来选择候选人。本文提出了基于通用依赖关系(Universal Dependencies, UD)的候选词选择方法,而不是仅使用词序序列,从而使该步骤与语言无关。为了增强候选词的选择,基于树的关键短语提取也采用了仅保留基于内聚性指数(CI)的逻辑候选词的方法。在此基础上,提出了结合停顿词和短语位置等语言启发式的分数过滤方法。在实验中,我们的方法获得了最先进方法的两倍平均F1分数,即使UD只训练了1781个edu,并且获得了84%的标记依恋分数。此外,通过对分数过滤中各因素的消融研究,揭示了关键词提取中哪个因素是重要的。
{"title":"Enhancing Thai Keyphrase Extraction Using Syntactic Relations: An Adoption of Universal Dependencies Framework","authors":"Chanatip Saetia, Tawunrat Chalothorn, Supawat Taerungruang","doi":"10.1109/iSAI-NLP56921.2022.9960284","DOIUrl":"https://doi.org/10.1109/iSAI-NLP56921.2022.9960284","url":null,"abstract":"Topical phrases representing the document and used in various fields are called keyphrases. Various methods are proposed to extract keyphrases automatically. However, most methods rely on candidate selection using linguistic heuristics in the English language. In this work for Thai keyphrases extraction, the candidate selection based on Universal Dependencies (UD) is proposed rather than using only POS sequence to make this step language independent. To enhance candidate selection, tree-based keyphrases extraction is also adapted to keep only logical candidates based on the cohesiveness index (CI). Besides that, the score filtering is proposed to combine linguistic heuristics, like stop words and the phrase's position. In the experiments, our method gained the double averaged F1 score of the state-of-the-art method, even though the UD was trained by only 1,781 EDUs and achieved 84% labeled attachment score. In addition, ablation studies on each factor in score filtering revealed which factor is important for keyphrase extraction.","PeriodicalId":399019,"journal":{"name":"2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131502684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Anomaly Detection on Real-time Security Log using Stream Processing 基于流处理的实时安全日志异常检测
W. Limprasert, P. Jantana, Avirut Liangsiri
Many critical tasks such as document approval and banking services, which are now hosted on cloud infrastructure. This transformation introduces stress on cloud security from the physical layer of the data center to the application layer of web application. All data access and service access need to be monitored and responded to in real-time. In this paper, we study methods to detect anomaly incidents such as spikes from network volume, malicious incidents from API scanning, error messages from internal systems and timeout from Slowloris attack[l]. We select machine learning based anomaly detection algorithms, such as LOF, Isolation Forest and Elliptic Envelope, to find suitable methods to detect incidents in real-time using stream processing tools including Kafka and message ingression. The result shows that LOF is fast and robust in most of the cases. However, when log messages have unseen words, which normally need to be hashed to preprocess, the Isolation Forest shows better results. This study shows the possibility of applying stream processing with machine learning to detect anomaly behavior for cloud services.
许多关键任务,如文件审批和银行服务,现在都托管在云基础设施上。这种转变将云安全的重点从数据中心的物理层引入到web应用程序的应用层。所有的数据访问和业务访问都需要实时监控和响应。在本文中,我们研究了检测异常事件的方法,例如来自网络容量的峰值,来自API扫描的恶意事件,来自内部系统的错误消息以及来自Slowloris攻击的超时[1]。我们选择基于机器学习的异常检测算法,如LOF、隔离森林和椭圆包络,找到合适的方法来使用包括Kafka和消息入侵在内的流处理工具实时检测事件。结果表明,LOF算法在大多数情况下都具有较快的鲁棒性。但是,当日志消息包含不可见的单词(通常需要散列进行预处理)时,隔离林会显示更好的结果。本研究展示了应用流处理和机器学习来检测云服务异常行为的可能性。
{"title":"Anomaly Detection on Real-time Security Log using Stream Processing","authors":"W. Limprasert, P. Jantana, Avirut Liangsiri","doi":"10.1109/iSAI-NLP56921.2022.9960280","DOIUrl":"https://doi.org/10.1109/iSAI-NLP56921.2022.9960280","url":null,"abstract":"Many critical tasks such as document approval and banking services, which are now hosted on cloud infrastructure. This transformation introduces stress on cloud security from the physical layer of the data center to the application layer of web application. All data access and service access need to be monitored and responded to in real-time. In this paper, we study methods to detect anomaly incidents such as spikes from network volume, malicious incidents from API scanning, error messages from internal systems and timeout from Slowloris attack[l]. We select machine learning based anomaly detection algorithms, such as LOF, Isolation Forest and Elliptic Envelope, to find suitable methods to detect incidents in real-time using stream processing tools including Kafka and message ingression. The result shows that LOF is fast and robust in most of the cases. However, when log messages have unseen words, which normally need to be hashed to preprocess, the Isolation Forest shows better results. This study shows the possibility of applying stream processing with machine learning to detect anomaly behavior for cloud services.","PeriodicalId":399019,"journal":{"name":"2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133709294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Graph-based Dependency Parser Building for Myanmar Language 基于图的缅甸语依赖解析器构建
Zar Zar Hlaing, Ye Kyaw Thu, T. Supnithi, P. Netisopakul
Examining the relationships between words in a sentence to determine its grammatical structure is known as dependency parsing (DP). Based on this, a sentence is broken down into several components. The process is based on the concept that every linguistic component of a sentence has a direct relationship to one another. These relationships are called dependencies. Dependency parsing is one of the key steps in natural language processing (NLP) for several text mining approaches. As the dominant formalism for dependency parsing in recent years, Universal Dependencies (UD) have emerged. The various UD corpus and dependency parsers are publicly accessible for resource-rich languages. However, there are no publicly available resources for dependency parsing, especially for the low-resource language, Myanmar. Thus, we manually extended the existing small Myanmar UD corpus (i.e., myPOS UD corpus) as myPOS version 3.0 UD corpus to publish the extended Myanmar UD corpus as the publicly available resource. To evaluate the effects of the extended UD corpus versus the original UD corpus, we utilized the graph-based neural dependency parsing models, namely, jPTDP (joint POS tagging and dependency parsing) and UniParse (universal graph-based parsing), and the evaluation scores are measured in terms of unlabeled and labeled attachment scores: (UAS) and (LAS). We compared the accuracies of graph-based neural models based on the original and extended UD corpora. The experimental results showed that, compared to the original myPOS UD corpus, the extended myPOS version 3.0 UD corpus enhanced the accuracy of dependency parsing models.
检查句子中单词之间的关系以确定其语法结构被称为依赖解析(DP)。在此基础上,一个句子被分解成几个部分。这个过程是基于一个概念,即一个句子的每个语言成分彼此之间都有直接的关系。这些关系称为依赖关系。依赖分析是自然语言处理(NLP)中几种文本挖掘方法的关键步骤之一。通用依赖项(Universal Dependencies, UD)作为近年来依赖项分析的主流形式已经出现。对于资源丰富的语言,可以公开访问各种UD语料库和依赖解析器。但是,没有公开可用的资源用于依赖性解析,特别是对于资源较少的语言Myanmar。因此,我们手动将现有的小型缅甸语UD语料库(即myPOS UD语料库)扩展为myPOS 3.0版本的UD语料库,将扩展后的缅甸语UD语料库作为公共可用资源发布。为了评价扩展语义语料库与原始语义语料库的效果,我们使用了基于图的神经依赖解析模型,即jPTDP(联合词性标注和依赖解析)和UniParse(通用基于图的解析),并以未标记和标记的依恋分数(UAS)和(LAS)来衡量评价分数。我们比较了基于原始和扩展UD语料库的图神经模型的准确率。实验结果表明,与原始myPOS UD语料库相比,扩展的myPOS 3.0版本UD语料库提高了依赖解析模型的准确性。
{"title":"Graph-based Dependency Parser Building for Myanmar Language","authors":"Zar Zar Hlaing, Ye Kyaw Thu, T. Supnithi, P. Netisopakul","doi":"10.1109/iSAI-NLP56921.2022.9960267","DOIUrl":"https://doi.org/10.1109/iSAI-NLP56921.2022.9960267","url":null,"abstract":"Examining the relationships between words in a sentence to determine its grammatical structure is known as dependency parsing (DP). Based on this, a sentence is broken down into several components. The process is based on the concept that every linguistic component of a sentence has a direct relationship to one another. These relationships are called dependencies. Dependency parsing is one of the key steps in natural language processing (NLP) for several text mining approaches. As the dominant formalism for dependency parsing in recent years, Universal Dependencies (UD) have emerged. The various UD corpus and dependency parsers are publicly accessible for resource-rich languages. However, there are no publicly available resources for dependency parsing, especially for the low-resource language, Myanmar. Thus, we manually extended the existing small Myanmar UD corpus (i.e., myPOS UD corpus) as myPOS version 3.0 UD corpus to publish the extended Myanmar UD corpus as the publicly available resource. To evaluate the effects of the extended UD corpus versus the original UD corpus, we utilized the graph-based neural dependency parsing models, namely, jPTDP (joint POS tagging and dependency parsing) and UniParse (universal graph-based parsing), and the evaluation scores are measured in terms of unlabeled and labeled attachment scores: (UAS) and (LAS). We compared the accuracies of graph-based neural models based on the original and extended UD corpora. The experimental results showed that, compared to the original myPOS UD corpus, the extended myPOS version 3.0 UD corpus enhanced the accuracy of dependency parsing models.","PeriodicalId":399019,"journal":{"name":"2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130635463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RAS-E2E: The SincNet end-to-end with RawNet loss for text-independent speaker verification RAS-E2E: SincNet端到端的RawNet损耗,用于文本无关的说话人验证
Pantid Chantangphol, Theerat Sakdejayont, Tawunrat Chalothorn
Despite reaching satisfactory verification performance, variousness utterance duration and phonemes and the robustness of the system remain a challenge in speaker ver-ification tasks. To deal with this challenge, we propose RAS-E2E, a novel fully cross-lingual speaker verification system that discovers meaningful information from input raw waveforms of various duration utterances, including short utterance duration, to determine whether an utterance matches the target speaker by merging two powerful paradigms: SincNet and Rawnet training scheme with Bi-RNN. The conducted experiments on Voxceleb, Gowajee and internal call-center datasets demonstrate that RAS-E2E achieves better performance compared to the recent verification systems on waveforms.
尽管取得了令人满意的验证效果,但在说话人验证任务中,语音长度和音素的多样性以及系统的鲁棒性仍然是一个挑战。为了应对这一挑战,我们提出了一种全新的全跨语言说话人验证系统RAS-E2E,该系统通过将两种强大的范例:SincNet和Rawnet训练方案与Bi-RNN相结合,从各种持续时间的话语(包括短话语持续时间)的输入原始波形中发现有意义的信息,以确定话语是否与目标说话人匹配。在Voxceleb、Gowajee和内部呼叫中心数据集上进行的实验表明,与最近的波形验证系统相比,RAS-E2E具有更好的性能。
{"title":"RAS-E2E: The SincNet end-to-end with RawNet loss for text-independent speaker verification","authors":"Pantid Chantangphol, Theerat Sakdejayont, Tawunrat Chalothorn","doi":"10.1109/iSAI-NLP56921.2022.9960255","DOIUrl":"https://doi.org/10.1109/iSAI-NLP56921.2022.9960255","url":null,"abstract":"Despite reaching satisfactory verification performance, variousness utterance duration and phonemes and the robustness of the system remain a challenge in speaker ver-ification tasks. To deal with this challenge, we propose RAS-E2E, a novel fully cross-lingual speaker verification system that discovers meaningful information from input raw waveforms of various duration utterances, including short utterance duration, to determine whether an utterance matches the target speaker by merging two powerful paradigms: SincNet and Rawnet training scheme with Bi-RNN. The conducted experiments on Voxceleb, Gowajee and internal call-center datasets demonstrate that RAS-E2E achieves better performance compared to the recent verification systems on waveforms.","PeriodicalId":399019,"journal":{"name":"2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131416463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Thai Text Summarization Using Keyword-Based Abstractive Method 基于关键词抽象方法的泰语文本自动摘要
Parun Ngamcharoen, Nuttapong Sanglerdsinlapachai, P. Vejjanugraha
Traditionally, the training phase of abstractive text summarization involves inputting two sets of integer sequences; the first set representing the source text, and the second set representing words existing in the reference summary, into the encoder and decoder parts of the model, respectively. However, by using this method, the model tends to perform poorly if the source text includes words which are irrelevant or insignificant to the key ideas. In order to address this issue, we propose a new keywords-based method for abstractive summarization by combining the information provided by the source text and its keywords to generate summary. We utilize a bi-directional long short-term memory model for keyword labelling, using overlapping words between the source text and the reference summary as ground truth. The results obtained from our experiments on ThaiSum dataset show that our proposed method outperforms the traditional encoder-decoder model by 0.0425 on ROUGE-1 F1, 0.0301 on ROUGE-2 F1 and 0.0140 on BERTScore Fl.
传统上,抽象文本摘要的训练阶段包括输入两组整数序列;第一组表示源文本,第二组表示参考摘要中存在的单词,分别进入模型的编码器和解码器部分。然而,通过使用这种方法,如果源文本包含与关键思想无关或无关紧要的单词,则模型往往表现不佳。为了解决这一问题,我们提出了一种新的基于关键词的抽象摘要方法,将源文本提供的信息和关键字结合起来生成摘要。我们利用双向长短期记忆模型进行关键词标注,使用源文本和参考摘要之间的重叠词作为基础事实。在ThaiSum数据集上的实验结果表明,我们提出的方法在ROUGE-1 F1、ROUGE-2 F1和BERTScore Fl上的性能分别比传统的编码器-解码器模型高0.0425、0.0301和0.0140。
{"title":"Automatic Thai Text Summarization Using Keyword-Based Abstractive Method","authors":"Parun Ngamcharoen, Nuttapong Sanglerdsinlapachai, P. Vejjanugraha","doi":"10.1109/iSAI-NLP56921.2022.9960265","DOIUrl":"https://doi.org/10.1109/iSAI-NLP56921.2022.9960265","url":null,"abstract":"Traditionally, the training phase of abstractive text summarization involves inputting two sets of integer sequences; the first set representing the source text, and the second set representing words existing in the reference summary, into the encoder and decoder parts of the model, respectively. However, by using this method, the model tends to perform poorly if the source text includes words which are irrelevant or insignificant to the key ideas. In order to address this issue, we propose a new keywords-based method for abstractive summarization by combining the information provided by the source text and its keywords to generate summary. We utilize a bi-directional long short-term memory model for keyword labelling, using overlapping words between the source text and the reference summary as ground truth. The results obtained from our experiments on ThaiSum dataset show that our proposed method outperforms the traditional encoder-decoder model by 0.0425 on ROUGE-1 F1, 0.0301 on ROUGE-2 F1 and 0.0140 on BERTScore Fl.","PeriodicalId":399019,"journal":{"name":"2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125286388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1