{"title":"Harnessing Psycho-lingual and Crowd-Sourced Dictionaries for Predicting Taboos in Written Emotional Disclosure in Anonymous Confession Boards.","authors":"Arindam Paul, Wei-Keng Liao, Alok Choudhary, Ankit Agrawal","doi":"10.1007/s41666-021-00092-w","DOIUrl":null,"url":null,"abstract":"<p><p>There have been many efforts in the last decade in the health informatics community to develop systems that can automatically recognize and predict disclosures on social media. However, a majority of such efforts have focused on simple topic prediction or sentiment classification. However, taboo disclosures on social media that people are not comfortable to talk with their friends represent an abstract theme dependent on context and background. Recent research has demonstrated the efficacy of injecting concept into the learning model to improve prediction. We present a vectorization scheme that combines corpus- and lexicon-based approaches for predicting taboo topics from anonymous social media datasets. The proposed vectorization scheme exploits two context-rich lexicons LIWC and Urban Dictionary. Our methodology achieves cross-validation accuracies of up to 78.1% for the supervised learning task on Facebook Confessions dataset, and 70.5% for the transfer learning task on the YikYak dataset. For both the tasks, supervised algorithms trained with features generated by the proposed vectorizer perform better than vanilla <i>t</i> <i>f</i> <i>-</i> <i>i</i> <i>d</i> <i>f</i> representation. This work presents a novel methodology for predicting taboos from anonymous emotional disclosures on confession boards.</p>","PeriodicalId":36444,"journal":{"name":"Journal of Healthcare Informatics Research","volume":null,"pages":null},"PeriodicalIF":5.9000,"publicationDate":"2021-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8982761/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Healthcare Informatics Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41666-021-00092-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/9/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0
Abstract
There have been many efforts in the last decade in the health informatics community to develop systems that can automatically recognize and predict disclosures on social media. However, a majority of such efforts have focused on simple topic prediction or sentiment classification. However, taboo disclosures on social media that people are not comfortable to talk with their friends represent an abstract theme dependent on context and background. Recent research has demonstrated the efficacy of injecting concept into the learning model to improve prediction. We present a vectorization scheme that combines corpus- and lexicon-based approaches for predicting taboo topics from anonymous social media datasets. The proposed vectorization scheme exploits two context-rich lexicons LIWC and Urban Dictionary. Our methodology achieves cross-validation accuracies of up to 78.1% for the supervised learning task on Facebook Confessions dataset, and 70.5% for the transfer learning task on the YikYak dataset. For both the tasks, supervised algorithms trained with features generated by the proposed vectorizer perform better than vanilla tf-idf representation. This work presents a novel methodology for predicting taboos from anonymous emotional disclosures on confession boards.
过去十年间,健康信息学界一直在努力开发能够自动识别和预测社交媒体上信息披露的系统。然而,这些努力大多集中在简单的话题预测或情感分类上。然而,在社交媒体上人们不便与朋友谈论的禁忌披露是一个抽象的主题,取决于上下文和背景。最近的研究表明,在学习模型中注入概念可以提高预测效果。我们提出了一种向量化方案,它结合了基于语料库和词典的方法,用于预测匿名社交媒体数据集中的禁忌话题。所提出的向量化方案利用了两个上下文丰富的词典 LIWC 和 Urban Dictionary。在 Facebook Confessions 数据集的监督学习任务中,我们的方法实现了高达 78.1% 的交叉验证准确率;在 YikYak 数据集的迁移学习任务中,我们的方法实现了 70.5% 的交叉验证准确率。在这两项任务中,使用由所提出的向量机生成的特征进行训练的监督算法都比 vanilla t f - i d f 表示法表现得更好。这项研究提出了一种从告白板上的匿名情感披露中预测禁忌的新方法。
期刊介绍:
Journal of Healthcare Informatics Research serves as a publication venue for the innovative technical contributions highlighting analytics, systems, and human factors research in healthcare informatics.Journal of Healthcare Informatics Research is concerned with the application of computer science principles, information science principles, information technology, and communication technology to address problems in healthcare, and everyday wellness. Journal of Healthcare Informatics Research highlights the most cutting-edge technical contributions in computing-oriented healthcare informatics. The journal covers three major tracks: (1) analytics—focuses on data analytics, knowledge discovery, predictive modeling; (2) systems—focuses on building healthcare informatics systems (e.g., architecture, framework, design, engineering, and application); (3) human factors—focuses on understanding users or context, interface design, health behavior, and user studies of healthcare informatics applications. Topics include but are not limited to: · healthcare software architecture, framework, design, and engineering;· electronic health records· medical data mining· predictive modeling· medical information retrieval· medical natural language processing· healthcare information systems· smart health and connected health· social media analytics· mobile healthcare· medical signal processing· human factors in healthcare· usability studies in healthcare· user-interface design for medical devices and healthcare software· health service delivery· health games· security and privacy in healthcare· medical recommender system· healthcare workflow management· disease profiling and personalized treatment· visualization of medical data· intelligent medical devices and sensors· RFID solutions for healthcare· healthcare decision analytics and support systems· epidemiological surveillance systems and intervention modeling· consumer and clinician health information needs, seeking, sharing, and use· semantic Web, linked data, and ontology· collaboration technologies for healthcare· assistive and adaptive ubiquitous computing technologies· statistics and quality of medical data· healthcare delivery in developing countries· health systems modeling and simulation· computer-aided diagnosis