Suicide note sentiment classification: a supervised approach augmented by web data.

Biomedical informatics insights Pub Date : 2012-01-01 Epub Date: 2012-01-30 DOI:10.4137/BII.S8956
Yan Xu, Yue Wang, Jiahua Liu, Zhuowen Tu, Jian-Tao Sun, Junichi Tsujii, Eric Chang
{"title":"Suicide note sentiment classification: a supervised approach augmented by web data.","authors":"Yan Xu,&nbsp;Yue Wang,&nbsp;Jiahua Liu,&nbsp;Zhuowen Tu,&nbsp;Jian-Tao Sun,&nbsp;Junichi Tsujii,&nbsp;Eric Chang","doi":"10.4137/BII.S8956","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To create a sentiment classification system for the Fifth i2b2/VA Challenge Track 2, which can identify thirteen subjective categories and two objective categories.</p><p><strong>Design: </strong>We developed a hybrid system using Support Vector Machine (SVM) classifiers with augmented training data from the Internet. Our system consists of three types of classification-based systems: the first system uses spanning n-gram features for subjective categories, the second one uses bag-of-n-gram features for objective categories, and the third one uses pattern matching for infrequent or subtle emotion categories. The spanning n-gram features are selected by a feature selection algorithm that leverages emotional corpus from weblogs. Special normalization of objective sentences is generalized with shallow parsing and external web knowledge. We utilize three sources of web data: the weblog of LiveJournal which helps to improve the feature selection, the eBay List which assists in special normalization of information and instructions categories, and the suicide project web which provides unlabeled data with similar properties as suicide notes.</p><p><strong>Measurements: </strong>The performance is evaluated by the overall micro-averaged precision, recall and F-measure.</p><p><strong>Result: </strong>Our system achieved an overall micro-averaged F-measure of 0.59. Happiness_peacefulness had the highest F-measure of 0.81. We were ranked as the second best out of 26 competing teams.</p><p><strong>Conclusion: </strong>Our results indicated that classifying fine-grained sentiments at sentence level is a non-trivial task. It is effective to divide categories into different groups according to their semantic properties. In addition, our system performance benefits from external knowledge extracted from publically available web data of other purposes; performance can be further enhanced when more training data is available.</p>","PeriodicalId":88397,"journal":{"name":"Biomedical informatics insights","volume":"5 Suppl. 1","pages":"31-41"},"PeriodicalIF":0.0000,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4137/BII.S8956","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical informatics insights","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4137/BII.S8956","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2012/1/30 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

Objective: To create a sentiment classification system for the Fifth i2b2/VA Challenge Track 2, which can identify thirteen subjective categories and two objective categories.

Design: We developed a hybrid system using Support Vector Machine (SVM) classifiers with augmented training data from the Internet. Our system consists of three types of classification-based systems: the first system uses spanning n-gram features for subjective categories, the second one uses bag-of-n-gram features for objective categories, and the third one uses pattern matching for infrequent or subtle emotion categories. The spanning n-gram features are selected by a feature selection algorithm that leverages emotional corpus from weblogs. Special normalization of objective sentences is generalized with shallow parsing and external web knowledge. We utilize three sources of web data: the weblog of LiveJournal which helps to improve the feature selection, the eBay List which assists in special normalization of information and instructions categories, and the suicide project web which provides unlabeled data with similar properties as suicide notes.

Measurements: The performance is evaluated by the overall micro-averaged precision, recall and F-measure.

Result: Our system achieved an overall micro-averaged F-measure of 0.59. Happiness_peacefulness had the highest F-measure of 0.81. We were ranked as the second best out of 26 competing teams.

Conclusion: Our results indicated that classifying fine-grained sentiments at sentence level is a non-trivial task. It is effective to divide categories into different groups according to their semantic properties. In addition, our system performance benefits from external knowledge extracted from publically available web data of other purposes; performance can be further enhanced when more training data is available.

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
遗书情绪分类:一种由网络数据增强的监督方法。
目的:为第五届i2b2/VA挑战赛Track 2创建一个情感分类系统,该系统可以识别13个主观类别和2个客观类别。设计:我们开发了一个混合系统,使用支持向量机(SVM)分类器和来自互联网的增强训练数据。我们的系统由三种类型的基于分类的系统组成:第一个系统使用跨越n-gram特征进行主观分类,第二个系统使用n-gram袋特征进行客观分类,第三个系统使用模式匹配进行不频繁或微妙的情感类别。通过利用来自博客的情感语料库的特征选择算法来选择生成的n-gram特征。利用浅层解析和外部网络知识对客观句子的特殊规范化进行了推广。我们利用了三个网络数据来源:LiveJournal的weblog,它有助于改进特征选择;eBay List,它有助于对信息和指令类别进行特殊规范化;以及自杀项目web,它提供了与自杀笔记相似属性的未标记数据。测量:性能由整体微平均精度,召回率和f测量来评估。结果:我们的系统达到了0.59的整体微平均f测量值。“快乐-宁静”的f值最高,为0.81。我们在26支参赛队伍中排名第二。结论:我们的研究结果表明,在句子层面对细粒度情感进行分类是一项非常重要的任务。根据语义属性将类别划分为不同的组是有效的。此外,我们的系统性能受益于从其他目的的公开可用web数据中提取的外部知识;当有更多的训练数据可用时,性能可以进一步提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Data-Driven Approach to Predicting Septic Shock in the Intensive Care Unit A Genome Model to Explain Major Features of Neurodevelopmental Disorders in Newborns. Mathematical Model for Computer-Assisted Modification of Medication Dosing Rules. Applying Supervised Machine Learning to Identify Which Patient Characteristics Identify the Highest Rates of Mortality Post-Interhospital Transfer. Coalitional Game Theory Facilitates Identification of Non-Coding Variants Associated With Autism.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1