Identificação automática de unidades de informação em testes de reconto de narrativas usando métodos de similaridade semântica avaliação de métodos de similaridade semântica

IF 0.3 Q4 LINGUISTICS Linguamatica Pub Date : 2020-01-04 DOI:10.21814/lm.11.2.304

L. Santos, Sandra M. Aluísio

{"title":"Identificação automática de unidades de informação em testes de reconto de narrativas usando métodos de similaridade semântica avaliação de métodos de similaridade semântica","authors":"L. Santos, Sandra M. Aluísio","doi":"10.21814/lm.11.2.304","DOIUrl":null,"url":null,"abstract":"Diagnoses of Alzheimer's Disease (AD) and Mild Cognitive Impairment (CCL) are based on the analysis of the patient's cognitive functions by administering cognitive and neuropsychological assessment batteries. The use of retelling narratives is common to help identify and quantify the degree of dementia. In general, one point is awarded for each unit recalled, and the final score represents the number of units recalled. In this paper, we evaluated two clinical tasks: the automatic identification of which elements of a retold narrative were recalled; and the binary classification of the narrative produced by a patient, having the units identified as attributes, aiming at an automatic screening of patients with cognitive impairment. We used two transcribed retelling data sets in which sentences were divided and manually annotated with the information units. These data sets were then made publicly available. They are: the Arizona Battery for Communication and Dementia Disorders (ABCD) that contains narratives of patients with CCL and Healthy Controls and the Avaliacao da Linguagem no Envelhecimento (BALE), which includes narratives of patients with AD and CCLs as well as Healthy Controls. We evaluated two methods based on semantic similarity, referred to here as STS and Chunking, and transformed the multi-label problem of identifying elements of a retold narrative into binary classification problems, finding a cutoff point for the similarity value of each information unit. In this way, we were able to overcome two baselines for the two datasets in the SubsetAccuracy metric, which is the most punitive for the multi-label scenario. In binary classification, however, not all six machine learning methods evaluated performed better than the baselines methods. For ABCD, the best methods were Decision Trees and KNN, and for BALE, SVM with RBF kernel stood out.","PeriodicalId":41819,"journal":{"name":"Linguamatica","volume":"11 1","pages":"47-63"},"PeriodicalIF":0.3000,"publicationDate":"2020-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Linguamatica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21814/lm.11.2.304","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"LINGUISTICS","Score":null,"Total":0}

引用次数: 1

Abstract

Diagnoses of Alzheimer's Disease (AD) and Mild Cognitive Impairment (CCL) are based on the analysis of the patient's cognitive functions by administering cognitive and neuropsychological assessment batteries. The use of retelling narratives is common to help identify and quantify the degree of dementia. In general, one point is awarded for each unit recalled, and the final score represents the number of units recalled. In this paper, we evaluated two clinical tasks: the automatic identification of which elements of a retold narrative were recalled; and the binary classification of the narrative produced by a patient, having the units identified as attributes, aiming at an automatic screening of patients with cognitive impairment. We used two transcribed retelling data sets in which sentences were divided and manually annotated with the information units. These data sets were then made publicly available. They are: the Arizona Battery for Communication and Dementia Disorders (ABCD) that contains narratives of patients with CCL and Healthy Controls and the Avaliacao da Linguagem no Envelhecimento (BALE), which includes narratives of patients with AD and CCLs as well as Healthy Controls. We evaluated two methods based on semantic similarity, referred to here as STS and Chunking, and transformed the multi-label problem of identifying elements of a retold narrative into binary classification problems, finding a cutoff point for the similarity value of each information unit. In this way, we were able to overcome two baselines for the two datasets in the SubsetAccuracy metric, which is the most punitive for the multi-label scenario. In binary classification, however, not all six machine learning methods evaluated performed better than the baselines methods. For ABCD, the best methods were Decision Trees and KNN, and for BALE, SVM with RBF kernel stood out.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于语义相似度方法的叙事复述测试信息单元自动识别语义相似度方法评价

阿尔茨海默病（AD）和轻度认知障碍（CCL）的诊断是基于通过使用认知和神经心理评估电池对患者认知功能的分析。复述叙述的使用很常见，有助于识别和量化痴呆症的程度。一般情况下，每召回一个单位得一分，最终分数代表召回的单位数量。在这篇论文中，我们评估了两项临床任务：自动识别重述叙事的哪些元素被回忆起来；以及对患者产生的叙述进行二元分类，将单位确定为属性，旨在自动筛查认知障碍患者。我们使用了两个转录的复述数据集，其中的句子被划分，并用信息单元手动注释。这些数据集随后被公开。它们是：亚利桑那州沟通和痴呆症研究所（ABCD），其中包含CCL患者和健康对照组的叙述，以及Avaliacao da Lingagem no Envelhecimento（BALE），其中包括AD和CCL患者以及健康对照组。我们评估了两种基于语义相似性的方法，这里称为STS和Chunking，并将识别重述叙事元素的多标签问题转化为二元分类问题，为每个信息单元的相似性值找到了一个临界点。通过这种方式，我们能够克服SubsetCuracy度量中两个数据集的两个基线，这对多标签场景来说是最惩罚性的。然而，在二进制分类中，并非所有六种评估的机器学习方法都比基线方法表现得更好。对于ABCD，最好的方法是决策树和KNN，而对于BALE，带有RBF核的SVM尤为突出。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊