自动识别Twitter亲密伴侣暴力报告的自然语言模型

IF 2.3 Q2 COMPUTER SCIENCE, THEORY & METHODS Array Pub Date : 2022-09-01 DOI:10.1016/j.array.2022.100217
Mohammed Ali Al-Garadi , Sangmi Kim , Yuting Guo , Elise Warren , Yuan-Chi Yang , Sahithi Lakamana , Abeed Sarker
{"title":"自动识别Twitter亲密伴侣暴力报告的自然语言模型","authors":"Mohammed Ali Al-Garadi ,&nbsp;Sangmi Kim ,&nbsp;Yuting Guo ,&nbsp;Elise Warren ,&nbsp;Yuan-Chi Yang ,&nbsp;Sahithi Lakamana ,&nbsp;Abeed Sarker","doi":"10.1016/j.array.2022.100217","DOIUrl":null,"url":null,"abstract":"<div><p>Intimate partner violence (IPV) is a preventable public health problem that affects millions of people worldwide. Approximately one in four women are estimated to be or have been victims of severe violence at some point in their lives, irrespective of age, ethnicity, and economic status. Victims often report IPV experiences on social media, and automatic detection of such reports via machine learning may enable improved surveillance and targeted distribution of support and/or interventions for those in need. However, no artificial intelligence systems for automatic detection currently exists, and we attempted to address this research gap. We collected posts from Twitter using a list of IPV-related keywords, manually reviewed subsets of retrieved posts, and prepared annotation guidelines to categorize tweets into IPV-report or non-IPV-report. We annotated 6,348 tweets in total, with the inter-annotator agreement (IAA) of 0.86 (Cohen's kappa) among 1,834 double-annotated tweets. The class distribution in the annotated dataset was highly imbalanced, with only 668 posts (∼11%) labeled as IPV-report. We then developed an effective natural language processing model to identify IPV-reporting tweets automatically. The developed model achieved classification F<sub>1</sub>-scores of 0.76 for the IPV-report class and 0.97 for the non-IPV-report class. We conducted post-classification analyses to determine the causes of system errors and to ensure that the system did not exhibit biases in its decision making, particularly with respect to race and gender. Our automatic model can be an essential component for a proactive social media-based intervention and support framework, while also aiding population-level surveillance and large-scale cohort studies.</p></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"15 ","pages":"Article 100217"},"PeriodicalIF":2.3000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/48/57/nihms-1882589.PMC10065459.pdf","citationCount":"15","resultStr":"{\"title\":\"Natural language model for automatic identification of Intimate Partner Violence reports from Twitter\",\"authors\":\"Mohammed Ali Al-Garadi ,&nbsp;Sangmi Kim ,&nbsp;Yuting Guo ,&nbsp;Elise Warren ,&nbsp;Yuan-Chi Yang ,&nbsp;Sahithi Lakamana ,&nbsp;Abeed Sarker\",\"doi\":\"10.1016/j.array.2022.100217\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Intimate partner violence (IPV) is a preventable public health problem that affects millions of people worldwide. Approximately one in four women are estimated to be or have been victims of severe violence at some point in their lives, irrespective of age, ethnicity, and economic status. Victims often report IPV experiences on social media, and automatic detection of such reports via machine learning may enable improved surveillance and targeted distribution of support and/or interventions for those in need. However, no artificial intelligence systems for automatic detection currently exists, and we attempted to address this research gap. We collected posts from Twitter using a list of IPV-related keywords, manually reviewed subsets of retrieved posts, and prepared annotation guidelines to categorize tweets into IPV-report or non-IPV-report. We annotated 6,348 tweets in total, with the inter-annotator agreement (IAA) of 0.86 (Cohen's kappa) among 1,834 double-annotated tweets. The class distribution in the annotated dataset was highly imbalanced, with only 668 posts (∼11%) labeled as IPV-report. We then developed an effective natural language processing model to identify IPV-reporting tweets automatically. The developed model achieved classification F<sub>1</sub>-scores of 0.76 for the IPV-report class and 0.97 for the non-IPV-report class. We conducted post-classification analyses to determine the causes of system errors and to ensure that the system did not exhibit biases in its decision making, particularly with respect to race and gender. Our automatic model can be an essential component for a proactive social media-based intervention and support framework, while also aiding population-level surveillance and large-scale cohort studies.</p></div>\",\"PeriodicalId\":8417,\"journal\":{\"name\":\"Array\",\"volume\":\"15 \",\"pages\":\"Article 100217\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2022-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/48/57/nihms-1882589.PMC10065459.pdf\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Array\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590005622000625\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590005622000625","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 15

摘要

亲密伴侣暴力是一个可预防的公共卫生问题,影响着全世界数百万人。据估计,不论年龄、种族和经济地位如何,大约四分之一的妇女在其生命的某个阶段是或曾经是严重暴力的受害者。受害者经常在社交媒体上报告IPV经历,通过机器学习自动检测此类报告可能有助于改善监测,并有针对性地为有需要的人提供支持和/或干预措施。然而,目前还没有用于自动检测的人工智能系统,我们试图解决这一研究空白。我们使用与ipv6相关的关键字列表从Twitter收集帖子,手动审查检索到的帖子的子集,并准备注释指南,将tweet分类为ipv6 -report或非ipv6 -report。我们一共注释了6348条tweet,在1834条双注释tweet中,注释者间协议(IAA)为0.86 (Cohen’s kappa)。注释数据集中的类分布高度不平衡,只有668篇文章(约11%)标记为ipv6 -report。然后,我们开发了一个有效的自然语言处理模型来自动识别ipv6报告推文。所开发的模型实现了分类f1 - 0.76分的ipv4 -报告类和0.97分的非ipv6 -报告类。我们进行了分类后分析,以确定系统错误的原因,并确保系统在决策过程中没有表现出偏见,特别是在种族和性别方面。我们的自动模型可以成为主动的基于社交媒体的干预和支持框架的重要组成部分,同时也有助于人口水平的监测和大规模队列研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Natural language model for automatic identification of Intimate Partner Violence reports from Twitter

Intimate partner violence (IPV) is a preventable public health problem that affects millions of people worldwide. Approximately one in four women are estimated to be or have been victims of severe violence at some point in their lives, irrespective of age, ethnicity, and economic status. Victims often report IPV experiences on social media, and automatic detection of such reports via machine learning may enable improved surveillance and targeted distribution of support and/or interventions for those in need. However, no artificial intelligence systems for automatic detection currently exists, and we attempted to address this research gap. We collected posts from Twitter using a list of IPV-related keywords, manually reviewed subsets of retrieved posts, and prepared annotation guidelines to categorize tweets into IPV-report or non-IPV-report. We annotated 6,348 tweets in total, with the inter-annotator agreement (IAA) of 0.86 (Cohen's kappa) among 1,834 double-annotated tweets. The class distribution in the annotated dataset was highly imbalanced, with only 668 posts (∼11%) labeled as IPV-report. We then developed an effective natural language processing model to identify IPV-reporting tweets automatically. The developed model achieved classification F1-scores of 0.76 for the IPV-report class and 0.97 for the non-IPV-report class. We conducted post-classification analyses to determine the causes of system errors and to ensure that the system did not exhibit biases in its decision making, particularly with respect to race and gender. Our automatic model can be an essential component for a proactive social media-based intervention and support framework, while also aiding population-level surveillance and large-scale cohort studies.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Array
Array Computer Science-General Computer Science
CiteScore
4.40
自引率
0.00%
发文量
93
审稿时长
45 days
期刊最新文献
SAMU-Net: A dual-stage polyp segmentation network with a custom attention-based U-Net and segment anything model for enhanced mask prediction Combining computational linguistics with sentence embedding to create a zero-shot NLIDB Development of automatic CNC machine with versatile applications in art, design, and engineering Dual-model approach for one-shot lithium-ion battery state of health sequence prediction Maximizing influence via link prediction in evolving networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1