Raeid Saqur, Ken Kato, Nicholas Vinden, Frank Rudzicz
{"title":"NIFTY Financial News Headlines Dataset","authors":"Raeid Saqur, Ken Kato, Nicholas Vinden, Frank Rudzicz","doi":"arxiv-2405.09747","DOIUrl":null,"url":null,"abstract":"We introduce and make publicly available the NIFTY Financial News Headlines\ndataset, designed to facilitate and advance research in financial market\nforecasting using large language models (LLMs). This dataset comprises two\ndistinct versions tailored for different modeling approaches: (i) NIFTY-LM,\nwhich targets supervised fine-tuning (SFT) of LLMs with an auto-regressive,\ncausal language-modeling objective, and (ii) NIFTY-RL, formatted specifically\nfor alignment methods (like reinforcement learning from human feedback (RLHF))\nto align LLMs via rejection sampling and reward modeling. Each dataset version\nprovides curated, high-quality data incorporating comprehensive metadata,\nmarket indices, and deduplicated financial news headlines systematically\nfiltered and ranked to suit modern LLM frameworks. We also include experiments\ndemonstrating some applications of the dataset in tasks like stock price\nmovement and the role of LLM embeddings in information acquisition/richness.\nThe NIFTY dataset along with utilities (like truncating prompt's context length\nsystematically) are available on Hugging Face at\nhttps://huggingface.co/datasets/raeidsaqur/NIFTY.","PeriodicalId":501294,"journal":{"name":"arXiv - QuantFin - Computational Finance","volume":"2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Computational Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.09747","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We introduce and make publicly available the NIFTY Financial News Headlines
dataset, designed to facilitate and advance research in financial market
forecasting using large language models (LLMs). This dataset comprises two
distinct versions tailored for different modeling approaches: (i) NIFTY-LM,
which targets supervised fine-tuning (SFT) of LLMs with an auto-regressive,
causal language-modeling objective, and (ii) NIFTY-RL, formatted specifically
for alignment methods (like reinforcement learning from human feedback (RLHF))
to align LLMs via rejection sampling and reward modeling. Each dataset version
provides curated, high-quality data incorporating comprehensive metadata,
market indices, and deduplicated financial news headlines systematically
filtered and ranked to suit modern LLM frameworks. We also include experiments
demonstrating some applications of the dataset in tasks like stock price
movement and the role of LLM embeddings in information acquisition/richness.
The NIFTY dataset along with utilities (like truncating prompt's context length
systematically) are available on Hugging Face at
https://huggingface.co/datasets/raeidsaqur/NIFTY.