基于语义相似度方法的叙事复述测试信息单元自动识别语义相似度方法评价

IF 0.3 Q4 LINGUISTICS Linguamatica Pub Date : 2020-01-04 DOI:10.21814/lm.11.2.304
L. Santos, Sandra M. Aluísio
{"title":"基于语义相似度方法的叙事复述测试信息单元自动识别语义相似度方法评价","authors":"L. Santos, Sandra M. Aluísio","doi":"10.21814/lm.11.2.304","DOIUrl":null,"url":null,"abstract":"Diagnoses of Alzheimer's Disease (AD) and Mild Cognitive Impairment (CCL) are based on the analysis of the patient's cognitive functions by administering cognitive and neuropsychological assessment batteries. The use of retelling narratives is common to help identify and quantify the degree of dementia. In general, one point is awarded for each unit recalled, and the final score represents the number of units recalled. In this paper, we evaluated two clinical tasks: the automatic identification of which elements of a retold narrative were recalled; and the binary classification of the narrative produced by a patient, having the units identified as attributes, aiming at an automatic screening of patients with cognitive impairment. We used two transcribed retelling data sets in which sentences were divided and manually annotated with the information units. These data sets were then made publicly available. They are: the Arizona Battery for Communication and Dementia Disorders (ABCD) that contains narratives of patients with CCL and Healthy Controls and the Avaliacao da Linguagem no Envelhecimento (BALE), which includes narratives of patients with AD and CCLs as well as Healthy Controls. We evaluated two methods based on semantic similarity, referred to here as STS and Chunking, and transformed the multi-label problem of identifying elements of a retold narrative into binary classification problems, finding a cutoff point for the similarity value of each information unit. In this way, we were able to overcome two baselines for the two datasets in the SubsetAccuracy metric, which is the most punitive for the multi-label scenario. In binary classification, however, not all six machine learning methods evaluated performed better than the baselines methods. For ABCD, the best methods were Decision Trees and KNN, and for BALE, SVM with RBF kernel stood out.","PeriodicalId":41819,"journal":{"name":"Linguamatica","volume":"11 1","pages":"47-63"},"PeriodicalIF":0.3000,"publicationDate":"2020-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Identificação automática de unidades de informação em testes de reconto de narrativas usando métodos de similaridade semântica avaliação de métodos de similaridade semântica\",\"authors\":\"L. Santos, Sandra M. Aluísio\",\"doi\":\"10.21814/lm.11.2.304\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Diagnoses of Alzheimer's Disease (AD) and Mild Cognitive Impairment (CCL) are based on the analysis of the patient's cognitive functions by administering cognitive and neuropsychological assessment batteries. The use of retelling narratives is common to help identify and quantify the degree of dementia. In general, one point is awarded for each unit recalled, and the final score represents the number of units recalled. In this paper, we evaluated two clinical tasks: the automatic identification of which elements of a retold narrative were recalled; and the binary classification of the narrative produced by a patient, having the units identified as attributes, aiming at an automatic screening of patients with cognitive impairment. We used two transcribed retelling data sets in which sentences were divided and manually annotated with the information units. These data sets were then made publicly available. They are: the Arizona Battery for Communication and Dementia Disorders (ABCD) that contains narratives of patients with CCL and Healthy Controls and the Avaliacao da Linguagem no Envelhecimento (BALE), which includes narratives of patients with AD and CCLs as well as Healthy Controls. We evaluated two methods based on semantic similarity, referred to here as STS and Chunking, and transformed the multi-label problem of identifying elements of a retold narrative into binary classification problems, finding a cutoff point for the similarity value of each information unit. In this way, we were able to overcome two baselines for the two datasets in the SubsetAccuracy metric, which is the most punitive for the multi-label scenario. In binary classification, however, not all six machine learning methods evaluated performed better than the baselines methods. For ABCD, the best methods were Decision Trees and KNN, and for BALE, SVM with RBF kernel stood out.\",\"PeriodicalId\":41819,\"journal\":{\"name\":\"Linguamatica\",\"volume\":\"11 1\",\"pages\":\"47-63\"},\"PeriodicalIF\":0.3000,\"publicationDate\":\"2020-01-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Linguamatica\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21814/lm.11.2.304\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Linguamatica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21814/lm.11.2.304","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 1

摘要

阿尔茨海默病(AD)和轻度认知障碍(CCL)的诊断是基于通过使用认知和神经心理评估电池对患者认知功能的分析。复述叙述的使用很常见,有助于识别和量化痴呆症的程度。一般情况下,每召回一个单位得一分,最终分数代表召回的单位数量。在这篇论文中,我们评估了两项临床任务:自动识别重述叙事的哪些元素被回忆起来;以及对患者产生的叙述进行二元分类,将单位确定为属性,旨在自动筛查认知障碍患者。我们使用了两个转录的复述数据集,其中的句子被划分,并用信息单元手动注释。这些数据集随后被公开。它们是:亚利桑那州沟通和痴呆症研究所(ABCD),其中包含CCL患者和健康对照组的叙述,以及Avaliacao da Lingagem no Envelhecimento(BALE),其中包括AD和CCL患者以及健康对照组。我们评估了两种基于语义相似性的方法,这里称为STS和Chunking,并将识别重述叙事元素的多标签问题转化为二元分类问题,为每个信息单元的相似性值找到了一个临界点。通过这种方式,我们能够克服SubsetCuracy度量中两个数据集的两个基线,这对多标签场景来说是最惩罚性的。然而,在二进制分类中,并非所有六种评估的机器学习方法都比基线方法表现得更好。对于ABCD,最好的方法是决策树和KNN,而对于BALE,带有RBF核的SVM尤为突出。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Identificação automática de unidades de informação em testes de reconto de narrativas usando métodos de similaridade semântica avaliação de métodos de similaridade semântica
Diagnoses of Alzheimer's Disease (AD) and Mild Cognitive Impairment (CCL) are based on the analysis of the patient's cognitive functions by administering cognitive and neuropsychological assessment batteries. The use of retelling narratives is common to help identify and quantify the degree of dementia. In general, one point is awarded for each unit recalled, and the final score represents the number of units recalled. In this paper, we evaluated two clinical tasks: the automatic identification of which elements of a retold narrative were recalled; and the binary classification of the narrative produced by a patient, having the units identified as attributes, aiming at an automatic screening of patients with cognitive impairment. We used two transcribed retelling data sets in which sentences were divided and manually annotated with the information units. These data sets were then made publicly available. They are: the Arizona Battery for Communication and Dementia Disorders (ABCD) that contains narratives of patients with CCL and Healthy Controls and the Avaliacao da Linguagem no Envelhecimento (BALE), which includes narratives of patients with AD and CCLs as well as Healthy Controls. We evaluated two methods based on semantic similarity, referred to here as STS and Chunking, and transformed the multi-label problem of identifying elements of a retold narrative into binary classification problems, finding a cutoff point for the similarity value of each information unit. In this way, we were able to overcome two baselines for the two datasets in the SubsetAccuracy metric, which is the most punitive for the multi-label scenario. In binary classification, however, not all six machine learning methods evaluated performed better than the baselines methods. For ABCD, the best methods were Decision Trees and KNN, and for BALE, SVM with RBF kernel stood out.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Linguamatica
Linguamatica LINGUISTICS-
CiteScore
1.40
自引率
0.00%
发文量
4
审稿时长
6 weeks
期刊最新文献
A compilação e a análise de métricas textuais de um corpus de redações Classificação da qualidade da argumentação em tweets no domínio da política brasileira Extracção de Relações de Apoio e Oposição em Títulos de Notícias de Política em Português Pais, filhos e outras relações familiares no DIP DIP - Desafio de Identificação de Personagens: objectivo, organização, recursos e resultados
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1