BioFinBERT:微调大型语言模型 (LLM),分析生物技术股拐点附近的新闻稿和金融文本情绪

Valentina Aparicio, Daniel Gordon, Sebastian G. Huayamares, Yuhuai Luo
{"title":"BioFinBERT:微调大型语言模型 (LLM),分析生物技术股拐点附近的新闻稿和金融文本情绪","authors":"Valentina Aparicio, Daniel Gordon, Sebastian G. Huayamares, Yuhuai Luo","doi":"arxiv-2401.11011","DOIUrl":null,"url":null,"abstract":"Large language models (LLMs) are deep learning algorithms being used to\nperform natural language processing tasks in various fields, from social\nsciences to finance and biomedical sciences. Developing and training a new LLM\ncan be very computationally expensive, so it is becoming a common practice to\ntake existing LLMs and finetune them with carefully curated datasets for\ndesired applications in different fields. Here, we present BioFinBERT, a\nfinetuned LLM to perform financial sentiment analysis of public text associated\nwith stocks of companies in the biotechnology sector. The stocks of biotech\ncompanies developing highly innovative and risky therapeutic drugs tend to\nrespond very positively or negatively upon a successful or failed clinical\nreadout or regulatory approval of their drug, respectively. These clinical or\nregulatory results are disclosed by the biotech companies via press releases,\nwhich are followed by a significant stock response in many cases. In our\nattempt to design a LLM capable of analyzing the sentiment of these press\nreleases,we first finetuned BioBERT, a biomedical language representation model\ndesigned for biomedical text mining, using financial textual databases. Our\nfinetuned model, termed BioFinBERT, was then used to perform financial\nsentiment analysis of various biotech-related press releases and financial text\naround inflection points that significantly affected the price of biotech\nstocks.","PeriodicalId":501478,"journal":{"name":"arXiv - QuantFin - Trading and Market Microstructure","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"BioFinBERT: Finetuning Large Language Models (LLMs) to Analyze Sentiment of Press Releases and Financial Text Around Inflection Points of Biotech Stocks\",\"authors\":\"Valentina Aparicio, Daniel Gordon, Sebastian G. Huayamares, Yuhuai Luo\",\"doi\":\"arxiv-2401.11011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large language models (LLMs) are deep learning algorithms being used to\\nperform natural language processing tasks in various fields, from social\\nsciences to finance and biomedical sciences. Developing and training a new LLM\\ncan be very computationally expensive, so it is becoming a common practice to\\ntake existing LLMs and finetune them with carefully curated datasets for\\ndesired applications in different fields. Here, we present BioFinBERT, a\\nfinetuned LLM to perform financial sentiment analysis of public text associated\\nwith stocks of companies in the biotechnology sector. The stocks of biotech\\ncompanies developing highly innovative and risky therapeutic drugs tend to\\nrespond very positively or negatively upon a successful or failed clinical\\nreadout or regulatory approval of their drug, respectively. These clinical or\\nregulatory results are disclosed by the biotech companies via press releases,\\nwhich are followed by a significant stock response in many cases. In our\\nattempt to design a LLM capable of analyzing the sentiment of these press\\nreleases,we first finetuned BioBERT, a biomedical language representation model\\ndesigned for biomedical text mining, using financial textual databases. Our\\nfinetuned model, termed BioFinBERT, was then used to perform financial\\nsentiment analysis of various biotech-related press releases and financial text\\naround inflection points that significantly affected the price of biotech\\nstocks.\",\"PeriodicalId\":501478,\"journal\":{\"name\":\"arXiv - QuantFin - Trading and Market Microstructure\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuantFin - Trading and Market Microstructure\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2401.11011\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Trading and Market Microstructure","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2401.11011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

大型语言模型(LLM)是一种深度学习算法,被用于执行从社会科学到金融和生物医学等各个领域的自然语言处理任务。开发和训练一个新的 LLM 的计算成本非常昂贵,因此,利用现有的 LLM 并通过精心策划的数据集对其进行微调以满足不同领域的应用需求正成为一种常见的做法。在此,我们介绍 BioFinBERT,这是一种经过调整的 LLM,用于对与生物技术领域公司股票相关的公开文本进行金融情感分析。开发高度创新和高风险治疗药物的生物技术公司的股票往往会在其药物临床试验成功或失败或获得监管部门批准后分别做出非常积极或消极的反应。这些临床或监管结果由生物技术公司通过新闻稿披露,在许多情况下,新闻稿发布后,股票会出现大幅反弹。为了设计一种能够分析这些新闻稿情感的 LLM,我们首先使用金融文本数据库对 BioBERT 进行了微调,这是一种专为生物医学文本挖掘设计的生物医学语言表示模型。经过微调的模型被称为 BioFinBERT,随后被用于对各种生物技术相关新闻稿和金融文本进行金融情感分析,这些分析围绕着对生物技术股票价格有重大影响的拐点展开。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
BioFinBERT: Finetuning Large Language Models (LLMs) to Analyze Sentiment of Press Releases and Financial Text Around Inflection Points of Biotech Stocks
Large language models (LLMs) are deep learning algorithms being used to perform natural language processing tasks in various fields, from social sciences to finance and biomedical sciences. Developing and training a new LLM can be very computationally expensive, so it is becoming a common practice to take existing LLMs and finetune them with carefully curated datasets for desired applications in different fields. Here, we present BioFinBERT, a finetuned LLM to perform financial sentiment analysis of public text associated with stocks of companies in the biotechnology sector. The stocks of biotech companies developing highly innovative and risky therapeutic drugs tend to respond very positively or negatively upon a successful or failed clinical readout or regulatory approval of their drug, respectively. These clinical or regulatory results are disclosed by the biotech companies via press releases, which are followed by a significant stock response in many cases. In our attempt to design a LLM capable of analyzing the sentiment of these press releases,we first finetuned BioBERT, a biomedical language representation model designed for biomedical text mining, using financial textual databases. Our finetuned model, termed BioFinBERT, was then used to perform financial sentiment analysis of various biotech-related press releases and financial text around inflection points that significantly affected the price of biotech stocks.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Optimal position-building strategies in Competition MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model Logarithmic regret in the ergodic Avellaneda-Stoikov market making model A Financial Time Series Denoiser Based on Diffusion Model Simulation of Social Media-Driven Bubble Formation in Financial Markets using an Agent-Based Model with Hierarchical Influence Network
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1