一种提取文本信息的深度学习方法

Computational Linguistics & Natural Language Processing eJournal Pub Date : 2020-07-28 DOI:10.2139/ssrn.3910214

Allen Huang, Hui Wang, Yi Yang

{"title":"一种提取文本信息的深度学习方法","authors":"Allen Huang, Hui Wang, Yi Yang","doi":"10.2139/ssrn.3910214","DOIUrl":null,"url":null,"abstract":"In this paper, we develop FinBERT, a state-of-the-art deep learning algorithm that incorporates the contextual relations between words in the finance domain. First, using a researcher-labeled analyst report sample, we document that FinBERT significantly outperforms the Loughran and McDonald (LM) dictionary, the naïve Bayes, and Word2Vec in sentiment classification, primarily because of its ability to uncover sentiment in sentences that other algorithms mislabel as neutral. Next, we show that other approaches underestimate the textual informativeness of earnings conference calls by at least 32% compared with FinBERT. Our results also indicate that FinBERT’s greater accuracy is especially relevant when empirical tests may suffer from low power, such as with small samples. Last, textual sentiments summarized by FinBERT can better predict future earnings than the LM dictionary, especially after 2011, consistent with firms’ strategic disclosures reducing the information content of textual sentiments measured with LM dictionary. Our results have implications for academic researchers, investment professionals, and financial market regulators who want to extract insights from financial texts.","PeriodicalId":256367,"journal":{"name":"Computational Linguistics & Natural Language Processing eJournal","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"FinBERT—A Deep Learning Approach to Extracting Textual Information\",\"authors\":\"Allen Huang, Hui Wang, Yi Yang\",\"doi\":\"10.2139/ssrn.3910214\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we develop FinBERT, a state-of-the-art deep learning algorithm that incorporates the contextual relations between words in the finance domain. First, using a researcher-labeled analyst report sample, we document that FinBERT significantly outperforms the Loughran and McDonald (LM) dictionary, the naïve Bayes, and Word2Vec in sentiment classification, primarily because of its ability to uncover sentiment in sentences that other algorithms mislabel as neutral. Next, we show that other approaches underestimate the textual informativeness of earnings conference calls by at least 32% compared with FinBERT. Our results also indicate that FinBERT’s greater accuracy is especially relevant when empirical tests may suffer from low power, such as with small samples. Last, textual sentiments summarized by FinBERT can better predict future earnings than the LM dictionary, especially after 2011, consistent with firms’ strategic disclosures reducing the information content of textual sentiments measured with LM dictionary. Our results have implications for academic researchers, investment professionals, and financial market regulators who want to extract insights from financial texts.\",\"PeriodicalId\":256367,\"journal\":{\"name\":\"Computational Linguistics & Natural Language Processing eJournal\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Linguistics & Natural Language Processing eJournal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3910214\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Linguistics & Natural Language Processing eJournal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3910214","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

在本文中，我们开发了FinBERT，这是一种最先进的深度学习算法，它结合了金融领域中单词之间的上下文关系。首先，使用研究人员标记的分析师报告样本，我们证明FinBERT在情感分类方面显着优于Loughran和McDonald (LM)字典，naïve贝叶斯和Word2Vec，主要是因为它能够发现其他算法错误标记为中立的句子中的情感。接下来，我们表明，与FinBERT相比，其他方法低估了财报电话会议的文本信息量至少32%。我们的结果还表明，当实证测试可能受到低功率的影响时，例如使用小样本时，FinBERT的更高准确性尤其相关。最后，FinBERT总结的文本情感比LM词典能更好地预测未来收益，特别是在2011年之后，这与企业的战略披露相一致，减少了LM词典测量的文本情感的信息含量。我们的研究结果对想要从金融文本中提取见解的学术研究人员、投资专业人士和金融市场监管者具有启示意义。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

FinBERT—A Deep Learning Approach to Extracting Textual Information

In this paper, we develop FinBERT, a state-of-the-art deep learning algorithm that incorporates the contextual relations between words in the finance domain. First, using a researcher-labeled analyst report sample, we document that FinBERT significantly outperforms the Loughran and McDonald (LM) dictionary, the naïve Bayes, and Word2Vec in sentiment classification, primarily because of its ability to uncover sentiment in sentences that other algorithms mislabel as neutral. Next, we show that other approaches underestimate the textual informativeness of earnings conference calls by at least 32% compared with FinBERT. Our results also indicate that FinBERT’s greater accuracy is especially relevant when empirical tests may suffer from low power, such as with small samples. Last, textual sentiments summarized by FinBERT can better predict future earnings than the LM dictionary, especially after 2011, consistent with firms’ strategic disclosures reducing the information content of textual sentiments measured with LM dictionary. Our results have implications for academic researchers, investment professionals, and financial market regulators who want to extract insights from financial texts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computational Linguistics & Natural Language Processing eJournal

自引率

0.00%

发文量

期刊最新文献

Digital Storytelling: Computer Based Learning Activity to Enhance Young Learner Vocabulary Corporate ESG News and The Stock Market Neural Discourse Modelling of Conversations FinBERT—A Deep Learning Approach to Extracting Textual Information Implementation on Text Classification Using Bag of Words Model