自然语言处理和金融市场：冠状病毒和经济新闻的半监督建模

IF 1.3 4区计算机科学 Q2 STATISTICS & PROBABILITY Advances in Data Analysis and Classification Pub Date : 2024-06-19 DOI:10.1007/s11634-024-00596-4

Carlos Moreno-Pérez, Marco Minozzo

{"title":"自然语言处理和金融市场：冠状病毒和经济新闻的半监督建模","authors":"Carlos Moreno-Pérez, Marco Minozzo","doi":"10.1007/s11634-024-00596-4","DOIUrl":null,"url":null,"abstract":"<div><p>This paper investigates the reactions of US financial markets to press news from January 2019 to 1 May 2020. To this end, we deduce the content and uncertainty of the news by developing apposite indices from the headlines and snippets of The New York Times, using unsupervised machine learning techniques. In particular, we use Latent Dirichlet Allocation to infer the content (topics) of the articles, and Word Embedding (implemented with the Skip-gram model) and K-Means to measure their uncertainty. In this way, we arrive at the definition of a set of daily topic-specific uncertainty indices. These indices are then used to find explanations for the behavior of the US financial markets by implementing a batch of EGARCH models. In substance, we find that two topic-specific uncertainty indices, one related to COVID-19 news and the other to trade war news, explain the bulk of the movements in the financial markets from the beginning of 2019 to end-April 2020. Moreover, we find that the topic-specific uncertainty index related to the economy and the Federal Reserve is positively related to the financial markets, meaning that our index is able to capture the actions of the Federal Reserve during periods of uncertainty.</p></div>","PeriodicalId":49270,"journal":{"name":"Advances in Data Analysis and Classification","volume":"19 3","pages":"769 - 793"},"PeriodicalIF":1.3000,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s11634-024-00596-4.pdf","citationCount":"0","resultStr":"{\"title\":\"Natural language processing and financial markets: semi-supervised modelling of coronavirus and economic news\",\"authors\":\"Carlos Moreno-Pérez, Marco Minozzo\",\"doi\":\"10.1007/s11634-024-00596-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>This paper investigates the reactions of US financial markets to press news from January 2019 to 1 May 2020. To this end, we deduce the content and uncertainty of the news by developing apposite indices from the headlines and snippets of The New York Times, using unsupervised machine learning techniques. In particular, we use Latent Dirichlet Allocation to infer the content (topics) of the articles, and Word Embedding (implemented with the Skip-gram model) and K-Means to measure their uncertainty. In this way, we arrive at the definition of a set of daily topic-specific uncertainty indices. These indices are then used to find explanations for the behavior of the US financial markets by implementing a batch of EGARCH models. In substance, we find that two topic-specific uncertainty indices, one related to COVID-19 news and the other to trade war news, explain the bulk of the movements in the financial markets from the beginning of 2019 to end-April 2020. Moreover, we find that the topic-specific uncertainty index related to the economy and the Federal Reserve is positively related to the financial markets, meaning that our index is able to capture the actions of the Federal Reserve during periods of uncertainty.</p></div>\",\"PeriodicalId\":49270,\"journal\":{\"name\":\"Advances in Data Analysis and Classification\",\"volume\":\"19 3\",\"pages\":\"769 - 793\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2024-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s11634-024-00596-4.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in Data Analysis and Classification\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s11634-024-00596-4\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Data Analysis and Classification","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s11634-024-00596-4","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

摘要

本文研究了 2019 年 1 月至 2020 年 5 月 1 日期间美国金融市场对新闻的反应。为此，我们利用无监督机器学习技术，从《纽约时报》的头条新闻和片段中开发出相应的指数，从而推断出新闻的内容和不确定性。特别是，我们使用 Latent Dirichlet Allocation 来推断文章的内容（主题），并使用 Word Embedding（使用 Skip-gram 模型实现）和 K-Means 来衡量其不确定性。通过这种方法，我们定义了一组每日特定主题的不确定性指数。然后，通过实施一系列 EGARCH 模型，利用这些指数来寻找美国金融市场行为的解释。实质上，我们发现两个特定主题的不确定性指数，一个与 COVID-19 新闻有关，另一个与贸易战新闻有关，解释了 2019 年初至 2020 年 4 月底金融市场的大部分走势。此外，我们还发现，与经济和美联储相关的特定主题不确定性指数与金融市场呈正相关，这意味着我们的指数能够捕捉到美联储在不确定时期的行动。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Natural language processing and financial markets: semi-supervised modelling of coronavirus and economic news

This paper investigates the reactions of US financial markets to press news from January 2019 to 1 May 2020. To this end, we deduce the content and uncertainty of the news by developing apposite indices from the headlines and snippets of The New York Times, using unsupervised machine learning techniques. In particular, we use Latent Dirichlet Allocation to infer the content (topics) of the articles, and Word Embedding (implemented with the Skip-gram model) and K-Means to measure their uncertainty. In this way, we arrive at the definition of a set of daily topic-specific uncertainty indices. These indices are then used to find explanations for the behavior of the US financial markets by implementing a batch of EGARCH models. In substance, we find that two topic-specific uncertainty indices, one related to COVID-19 news and the other to trade war news, explain the bulk of the movements in the financial markets from the beginning of 2019 to end-April 2020. Moreover, we find that the topic-specific uncertainty index related to the economy and the Federal Reserve is positively related to the financial markets, meaning that our index is able to capture the actions of the Federal Reserve during periods of uncertainty.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Advances in Data Analysis and Classification STATISTICS & PROBABILITY-

CiteScore

3.40

自引率

6.20%

发文量

审稿时长

>12 weeks

期刊介绍： The international journal Advances in Data Analysis and Classification (ADAC) is designed as a forum for high standard publications on research and applications concerning the extraction of knowable aspects from many types of data. It publishes articles on such topics as structural, quantitative, or statistical approaches for the analysis of data; advances in classification, clustering, and pattern recognition methods; strategies for modeling complex data and mining large data sets; methods for the extraction of knowledge from data, and applications of advanced methods in specific domains of practice. Articles illustrate how new domain-specific knowledge can be made available from data by skillful use of data analysis methods. The journal also publishes survey papers that outline, and illuminate the basic ideas and techniques of special approaches.

期刊最新文献

Editorial for ADAC issue 1 of volume 20 (2026) Editorial for ADAC issue 4 of volume 19 (2025) Calibrated kNN classification via second-layer neighborhood analysis Structural equation modeling with factors and composites within the framework of the basic design Editorial for ADAC issue 3 of volume 19 (2025)