法国报刊文章观点与新闻的自动文本分类。转换器和基于特征的方法比较

IF 1.3 2区 文学 Q2 COMMUNICATION Language & Communication Pub Date : 2024-10-16 DOI:10.1016/j.langcom.2024.09.004
Louis Escouflaire, Antonin Descampe, Cédrick Fairon
{"title":"法国报刊文章观点与新闻的自动文本分类。转换器和基于特征的方法比较","authors":"Louis Escouflaire,&nbsp;Antonin Descampe,&nbsp;Cédrick Fairon","doi":"10.1016/j.langcom.2024.09.004","DOIUrl":null,"url":null,"abstract":"<div><div>This study explores Natural Language Processing (NLP) methods for distinguishing between press articles belonging to the journalistic genres of ‘objective’ <em>news</em> and ‘subjective’ <em>opinion</em>. Two classification models are compared: CamemBERT, a French transformer model fine-tuned for the task, and a machine learning model using 32 linguistic features. Trained on 8000 Belgian French articles, both models are evaluated on 1000 Canadian French articles. Results show CamemBERT’s superiority but highlight potential for hybrid approaches and emphasizes the need for robust and transparent methods in NLP. The research contributes to understanding NLP’s role in journalism by addressing challenges of point of view detection in press discourse.</div></div>","PeriodicalId":47575,"journal":{"name":"Language & Communication","volume":"99 ","pages":"Pages 129-140"},"PeriodicalIF":1.3000,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automated text classification of opinion vs. news French press articles. A comparison of transformer and feature-based approaches\",\"authors\":\"Louis Escouflaire,&nbsp;Antonin Descampe,&nbsp;Cédrick Fairon\",\"doi\":\"10.1016/j.langcom.2024.09.004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This study explores Natural Language Processing (NLP) methods for distinguishing between press articles belonging to the journalistic genres of ‘objective’ <em>news</em> and ‘subjective’ <em>opinion</em>. Two classification models are compared: CamemBERT, a French transformer model fine-tuned for the task, and a machine learning model using 32 linguistic features. Trained on 8000 Belgian French articles, both models are evaluated on 1000 Canadian French articles. Results show CamemBERT’s superiority but highlight potential for hybrid approaches and emphasizes the need for robust and transparent methods in NLP. The research contributes to understanding NLP’s role in journalism by addressing challenges of point of view detection in press discourse.</div></div>\",\"PeriodicalId\":47575,\"journal\":{\"name\":\"Language & Communication\",\"volume\":\"99 \",\"pages\":\"Pages 129-140\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2024-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Language & Communication\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0271530924000624\",\"RegionNum\":2,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMMUNICATION\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Language & Communication","FirstCategoryId":"98","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0271530924000624","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMMUNICATION","Score":null,"Total":0}
引用次数: 0

摘要

本研究探讨了自然语言处理(NLP)方法,用于区分属于 "客观 "新闻和 "主观 "观点两种新闻体裁的报刊文章。本研究比较了两种分类模型:CamemBERT是一种针对该任务进行微调的法语转换器模型,而机器学习模型则使用了32种语言特征。这两个模型都在 8000 篇比利时法语文章中进行了训练,并在 1000 篇加拿大法语文章中进行了评估。结果显示了 CamemBERT 的优越性,但也凸显了混合方法的潜力,并强调了在 NLP 中采用稳健而透明的方法的必要性。这项研究通过解决新闻话语中观点检测的难题,有助于理解 NLP 在新闻业中的作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Automated text classification of opinion vs. news French press articles. A comparison of transformer and feature-based approaches
This study explores Natural Language Processing (NLP) methods for distinguishing between press articles belonging to the journalistic genres of ‘objective’ news and ‘subjective’ opinion. Two classification models are compared: CamemBERT, a French transformer model fine-tuned for the task, and a machine learning model using 32 linguistic features. Trained on 8000 Belgian French articles, both models are evaluated on 1000 Canadian French articles. Results show CamemBERT’s superiority but highlight potential for hybrid approaches and emphasizes the need for robust and transparent methods in NLP. The research contributes to understanding NLP’s role in journalism by addressing challenges of point of view detection in press discourse.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
3.40
自引率
6.70%
发文量
67
期刊介绍: This journal is unique in that it provides a forum devoted to the interdisciplinary study of language and communication. The investigation of language and its communicational functions is treated as a concern shared in common by those working in applied linguistics, child development, cultural studies, discourse analysis, intellectual history, legal studies, language evolution, linguistic anthropology, linguistics, philosophy, the politics of language, pragmatics, psychology, rhetoric, semiotics, and sociolinguistics. The journal invites contributions which explore the implications of current research for establishing common theoretical frameworks within which findings from different areas of study may be accommodated and interrelated. By focusing attention on the many ways in which language is integrated with other forms of communicational activity and interactional behaviour, it is intended to encourage approaches to the study of language and communication which are not restricted by existing disciplinary boundaries.
期刊最新文献
Hidden behind the text: A linguistic ethnographic study of stancetaking in news production Chinese thanking interaction from premodern to modern China: A diachronic analysis Human affiliative responses to companion animal vocalizations Editorial Board Ways of participating in a colleague's project: Radio use as collaborative activity in UN military observer training
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1