Can earnings conference calls tell more lies? A contrastive multimodal dialogue network for advanced financial statement fraud detection

IF 6.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Decision Support Systems Pub Date : 2025-02-01 DOI:10.1016/j.dss.2024.114381

Qi Lu , Wei Du , Shaochen Yang , Wei Xu , J. Leon Zhao

{"title":"Can earnings conference calls tell more lies? A contrastive multimodal dialogue network for advanced financial statement fraud detection","authors":"Qi Lu , Wei Du , Shaochen Yang , Wei Xu , J. Leon Zhao","doi":"10.1016/j.dss.2024.114381","DOIUrl":null,"url":null,"abstract":"<div><div>Financial statement frauds by listed firms pose significant challenges to public investors and jeopardize the stability of financial markets. Previous studies have identified deceptive verbal and vocal cues from earnings conference calls as indicators of financial statement fraud. However, these studies only extracted managers' verbal and vocal cues separately over the entire call, neglecting the utterance-level fusion between verbal and vocal cues as well as the multi-turn interaction between analysts and managers. To fill this gap, we develop a novel end-to-end <em><strong>c</strong>ontrastive <strong>m</strong>ulti<strong>m</strong>odal <strong>d</strong>ialogue network</em> (CMMD) that considers both verbal-vocal fusion and multi-role interactions to uncover hidden deceptive cues in earnings conference calls. The proposed model comprises two core modules, namely, the <em>Multimodal Fusion Learning module and the Dialogue Interaction Learning module</em>. Building on Vrij's verbal-nonverbal complementary mechanisms in deception detection, the designed <em>Multimodal Fusion Learning</em> employs contrastive learning to align verbal and vocal cues and a co-attention mechanism to learn cross-modal interaction. Inspired by the Interpersonal Deception Theory that emphasizes the dynamic interaction process between deceivers and targets, the <em>Dialogue Interaction Learning</em> utilizes a dialogue-aware co-attention mechanism to model multi-turn analyst-manager interaction and uses contrastive learning to improve dialogue representations. Our extensive empirical results show that CMMD achieves 8.64 % improvement in detecting fraudulent cases compared to the best baseline model. As such, our study advances the research frontier in fraud detection and contributes an innovative IT artifact in practice.</div></div>","PeriodicalId":55181,"journal":{"name":"Decision Support Systems","volume":"189 ","pages":"Article 114381"},"PeriodicalIF":6.7000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Decision Support Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167923624002148","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Financial statement frauds by listed firms pose significant challenges to public investors and jeopardize the stability of financial markets. Previous studies have identified deceptive verbal and vocal cues from earnings conference calls as indicators of financial statement fraud. However, these studies only extracted managers' verbal and vocal cues separately over the entire call, neglecting the utterance-level fusion between verbal and vocal cues as well as the multi-turn interaction between analysts and managers. To fill this gap, we develop a novel end-to-end contrastive multimodal dialogue network (CMMD) that considers both verbal-vocal fusion and multi-role interactions to uncover hidden deceptive cues in earnings conference calls. The proposed model comprises two core modules, namely, the Multimodal Fusion Learning module and the Dialogue Interaction Learning module. Building on Vrij's verbal-nonverbal complementary mechanisms in deception detection, the designed Multimodal Fusion Learning employs contrastive learning to align verbal and vocal cues and a co-attention mechanism to learn cross-modal interaction. Inspired by the Interpersonal Deception Theory that emphasizes the dynamic interaction process between deceivers and targets, the Dialogue Interaction Learning utilizes a dialogue-aware co-attention mechanism to model multi-turn analyst-manager interaction and uses contrastive learning to improve dialogue representations. Our extensive empirical results show that CMMD achieves 8.64 % improvement in detecting fraudulent cases compared to the best baseline model. As such, our study advances the research frontier in fraud detection and contributes an innovative IT artifact in practice.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

求助全文

约1分钟内获得全文去求助

来源期刊

Decision Support Systems 工程技术-计算机：人工智能

CiteScore

14.70

自引率

6.70%

发文量

119

审稿时长

13 months

期刊介绍： The common thread of articles published in Decision Support Systems is their relevance to theoretical and technical issues in the support of enhanced decision making. The areas addressed may include foundations, functionality, interfaces, implementation, impacts, and evaluation of decision support systems (DSSs).