{"title":"Stock Movement Prediction with Multimodal Stable Fusion via Gated Cross-Attention Mechanism","authors":"Chang Zong, Jian Shao, Weiming Lu, Yueting Zhuang","doi":"arxiv-2406.06594","DOIUrl":null,"url":null,"abstract":"The accurate prediction of stock movements is crucial for investment\nstrategies. Stock prices are subject to the influence of various forms of\ninformation, including financial indicators, sentiment analysis, news\ndocuments, and relational structures. Predominant analytical approaches,\nhowever, tend to address only unimodal or bimodal sources, neglecting the\ncomplexity of multimodal data. Further complicating the landscape are the\nissues of data sparsity and semantic conflicts between these modalities, which\nare frequently overlooked by current models, leading to unstable performance\nand limiting practical applicability. To address these shortcomings, this study\nintroduces a novel architecture, named Multimodal Stable Fusion with Gated\nCross-Attention (MSGCA), designed to robustly integrate multimodal input for\nstock movement prediction. The MSGCA framework consists of three integral\ncomponents: (1) a trimodal encoding module, responsible for processing\nindicator sequences, dynamic documents, and a relational graph, and\nstandardizing their feature representations; (2) a cross-feature fusion module,\nwhere primary and consistent features guide the multimodal fusion of the three\nmodalities via a pair of gated cross-attention networks; and (3) a prediction\nmodule, which refines the fused features through temporal and dimensional\nreduction to execute precise movement forecasting. Empirical evaluations\ndemonstrate that the MSGCA framework exceeds current leading methods, achieving\nperformance gains of 8.1%, 6.1%, 21.7% and 31.6% on four multimodal datasets,\nrespectively, attributed to its enhanced multimodal fusion stability.","PeriodicalId":501294,"journal":{"name":"arXiv - QuantFin - Computational Finance","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Computational Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.06594","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The accurate prediction of stock movements is crucial for investment
strategies. Stock prices are subject to the influence of various forms of
information, including financial indicators, sentiment analysis, news
documents, and relational structures. Predominant analytical approaches,
however, tend to address only unimodal or bimodal sources, neglecting the
complexity of multimodal data. Further complicating the landscape are the
issues of data sparsity and semantic conflicts between these modalities, which
are frequently overlooked by current models, leading to unstable performance
and limiting practical applicability. To address these shortcomings, this study
introduces a novel architecture, named Multimodal Stable Fusion with Gated
Cross-Attention (MSGCA), designed to robustly integrate multimodal input for
stock movement prediction. The MSGCA framework consists of three integral
components: (1) a trimodal encoding module, responsible for processing
indicator sequences, dynamic documents, and a relational graph, and
standardizing their feature representations; (2) a cross-feature fusion module,
where primary and consistent features guide the multimodal fusion of the three
modalities via a pair of gated cross-attention networks; and (3) a prediction
module, which refines the fused features through temporal and dimensional
reduction to execute precise movement forecasting. Empirical evaluations
demonstrate that the MSGCA framework exceeds current leading methods, achieving
performance gains of 8.1%, 6.1%, 21.7% and 31.6% on four multimodal datasets,
respectively, attributed to its enhanced multimodal fusion stability.